# Parameters

The API has the following parameters, only the token and url are mandatory, the rest are optional.

# token

  • Required
  • Type string

This parameter is required for all calls

This is your authentication token. You have two tokens; one for normal requests and another one for javascript requests.

Use the javascript token when the content you need to crawl is generated via javascript, either because it's a javascript built page (React, Angular, etc.) or because the content is dynamically generated on the browser.

Normal token

_USER_TOKEN_

Javascript token

_JS_TOKEN_

Note: If you don't see your tokens, please login first here and then refresh then this page.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&url=https%3A%2F%2Fwww.amazon.com%2FJabra-Move-Wireless-Stereo-Headphones%2Fdp%2FB00MR8Z28S%2F"

# url

  • Required
  • Type string

This parameter is required for all calls

You will need a url to crawl. Make sure it starts with http or https and that is fully encoded.

For example, in the following url: https://www.amazon.com/sale?catId=0&SearchText=games the url should be encoded when calling the API like the following: https%3A%2F%2Fwww.amazon.com%2Fsale%3FcatId%3D0%26SearchText%3Dgames

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&url=https%3A%2F%2Fwww.facebook.com%2Fbritneyspears"

# format

  • Optional
  • Type string

Indicates the response format, either json or html. Defaults to html.

If format html is used, ProxyCrawl will send you back the response parameters in the headers (see HTML response below).

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&url=https%3A%2F%2Fwww.yelp.com&format=json"

# user_agent

  • Optional
  • Type string

If you want to make the request with a custom user agent, you can pass it here and our servers will forward it to the requested url.

We recommend to NOT use this parameter and let our artificial intelligence handle this.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&user_agent=Mozilla%2F5.0+%28Macintosh%3B+Intel+Mac+OS+X+10_12_5%29+AppleWebKit%2F603.2.4+%28KHTML%2C+like+Gecko%29+Version%2F10.1.1+Safari%2F603.2.4&url=https%3A%2F%2Fwww.twitter.com"

# page_wait

  • Optional
  • Type number

If you are using the javascript token, you can optionally pass page_wait parameter to wait an amount of milliseconds before the browser captures the resulting html code.

This is useful in cases where the page takes some seconds to render or some ajax needs to be loaded before the html is being captured.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_JS_TOKEN_&page_wait=1000&url=https%3A%2F%2Fwww.nfl.com"

# ajax_wait

  • Optional
  • Type boolean

If you are using the javascript token, you can optionally pass ajax_wait parameter to wait for the ajax requests to finish before getting the html response.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_JS_TOKEN_&ajax_wait=true&url=https%3A%2F%2Fwww.nfl.com"

# css_click_selector

  • Optional
  • Type string

If you are using the javascript token, you can optionally pass css_click_selector parameter to click an element on the page before the browser captures the resulting html code.

It must be a full and valid CSS selector, for example, #some-button or .some-other-button and properly encoded.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_JS_TOKEN_&css_click_selector=%23some-nice-button&page_wait=1000&url=https%3A%2F%2Fwww.nfl.com"

# device

  • Optional
  • Type string

Optionally, if you don't want to specify a user_agent but you want to have the requests from a specific device, you can use this parameter.

There are two options available: desktop and mobile.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&device=mobile&url=https%3A%2F%2Fwww.walmart.com%2Fcp%2Fhome%2F4044"

# get_cookies

  • Optional
  • Type boolean

Optionally, if you need to get the cookies that the original website sets on the response, you can use the &get_cookies=true parameter.

The cookies will come back in the header (or in the json response if you use &format=json) as original_set_cookie.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&get_cookies=true&url=https%3A%2F%2Fwww.walmart.com%2Fcp%2Fhome%2F4044"

# get_headers

  • Optional
  • Type boolean

Optionally, if you need to get the headers that the original website sets on the response, you can use the &get_headers=true parameter.

The headers will come back in the header (or in the json response if you use &format=json) as original_header_name.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&get_headers=true&url=https%3A%2F%2Fwww.walmart.com%2Fcp%2Fhome%2F4044"

# request_headers

  • Optional
  • Type string

Optionally, if you need to send request headers to the original website, you can use the &request_headers=EncodedRequestHeaders parameter.

Example request headers: accept-language:en-GB|host:api.proxycrawl.com

Example encoded: &request_headers=accept-language%3Aen-GB%7Chost%3Aapi.proxycrawl.com

Please note that not all request headers are allowed by the API. We recommend that you test the headers sent using this testing url: https://httpbin.org/headers

If you need to send some additional headers which are not allowed by the API, please let us know the header names and we will authorize them for your token.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&request_headers=accept-language%3Aen-GB%7Chost%3Aapi.proxycrawl.com&url=https://httpbin.org/headers"

# set cookies

  • Optional
  • Type string

Optionally, if you need to send cookies to the original website, you can use the &cookies=EncodedCookies parameter.

Example cookies: key1=value1; key2=value2; key3=value3

Example encoded: &cookies=key1%3Dvalue1%3B%20key2%3Dvalue2%3B%20key3%3Dvalue3

We recommend that you test the cookies sent using this testing url: https://httpbin.org/cookies

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&cookies=key1%3Dvalue1%3B%20key2%3Dvalue2%3B%20key3%3Dvalue3&url=https://httpbin.org/cookies"

# proxy_session

  • Optional
  • Type string

If you need to use the same proxy for subsequent requests, you can use the &proxy_session= parameter.

The &proxy_session= parameter can be any value. Simply send a new value to create a new proxy session (this will allow you to continue using the same proxy for all subsequent requests with that proxy session value). Sessions expire 30 seconds after the last API call.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&proxy_session=1234abcd&url=http%3A%2F%2Fhttpbin.org%2Fip"

# cookies_session

  • Optional
  • Type string

If you need to send the cookies that come back on every request to all subsequent calls, you can use the &cookies_session= parameter.

The &cookies_session= parameter can be any value. Simply send a new value to create a new cookies session (this will allow you to send the returned cookies from the subsequent calls to the next API calls with that cookies session value). Sessions expire in 300 seconds after the last API call.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&cookies_session=1234abcd&url=https%3A%2F%2Fwww.walmart.com%2Fcp%2Fhome%2F4044"

# screenshot

  • Optional
  • Type boolean

If you are using the javascript token, you can optionally pass &screenshot=true parameter to get a screenshot in the JPEG format of the whole crawled page.

ProxyCrawl will send you back the screenshot_url in the response headers (or in the json response if you use &format=json).
The screenshot_url expires in one hour.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_JS_TOKEN_&screenshot=true&url=https%3A%2F%2Fwww.amazon.com"

# store

  • Optional
  • Type boolean

Optionally pass &store=true parameter to store a copy of the API response in the ProxyCrawl Cloud Storage.

ProxyCrawl will send you back the storage_url in the response headers (or in the json response if you use &format=json).

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&store=true&url=https%3A%2F%2Fwww.amazon.com"

# scraper

  • Optional
  • Type string

Returns back the information parsed according to the specified scraper. Check the list of all the available data scrapers list of all the available data scrapers] to see which one to choose.

The response will come back as JSON.

Please note: Scraper is an optional parameter. If you don't use it, you will receive back the full HTML of the page so you can scrape it freely.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&scraper=twitter-tweet&url=https%3A%2F%2Ftwitter.com%2Famazon%2Fstatus%2F1196891901024243712"

# autoparse

  • Optional
  • Type boolean

Optionally, if you need to get the scraped data of the page that you requested, you can pass &autoparse=true parameter.

The response will come back as JSON. The structure of the response varies depending on the URL that you sent.

Please note: &autoparse=true is an optional parameter. If you don't use it, you will receive back the full HTML of the page so you can scrape it freely.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&autoparse=true&url=https%3A%2F%2Fwww.amazon.com%2Fproduct-reviews%2FB07S5QWM6L"

# country

  • Optional
  • Type string

If you want your requests to be geolocated from a specific country, you can use the &country= parameter, like &country=US (two-character country code).

Please take into account that specifying a country can reduce the number of successful requests you get back, so use it wisely and only when geolocation crawls are required.

You have access to the following countries

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&country=US&url=https%3A%2F%2Fwww.amazon.com"

# tor_network

  • Optional
  • Type boolean

If you want to crawl onion websites over the Tor network, you can pass the &tor_network=true parameter.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_USER_TOKEN_&tor_network=true&url=https%3A%2F%2Fwww.facebookcorewwwi.onion%2F"

# scroll

  • Optional
  • Type boolean

If you are using the javascript token, you can optionally pass &scroll=true to the API this will by default scroll for a scroll_interval of 10 seconds.

If you want to scroll more than 10 seconds please send the &scroll=true&scroll_interval=20. Those parameters will instruct the browser to scroll for 20 seconds after loading the page. The maximum scroll interval is 60 seconds, after 60 seconds of a scroll, the system captures the data and brings it back to you.

The default scroll interval is 10 seconds. Every 5 seconds of successful scroll counts as extra JS request on the Crawling API, so let us assume you send a scroll_interval 20, our system tries to scroll the page for a maximum of 20 seconds, if it only was able to scroll for 10 seconds, only 2 extra requests are consumed instead of 4.

Note: Please make sure to keep your connection open up to 90 seconds if you are intending to scroll for 60 seconds.

  • curl
  • ruby
  • node
  • php
  • python
  • go
curl "https://api.proxycrawl.com/?token=_JS_TOKEN_&scroll=true&url=https%3A%2F%2Fwww.quora.com%2Fsearch%3Fq%3Dproxycrawl"