Crawling API Proxy Mode | Crawlbase Documents

# How it works?

If you want to use the crawling API behind a proxy, please refer to the documenation of the Smart Proxy (opens new window) product. If you do not want to purchase a Smart Proxy subscription or you want to use all the features of the Crawling API without limitations and a higher rate limit then please continue reading the following paragraph.

All Crawling API in proxy mode calls should go to http://smartproxy.crawlbase.com and port 8000 using your access token as a proxy username. Notice that here the used port is different than the port that is used in the Smart Proxy, so make sure to use the correct port, everything else that is referred in the Smart Proxy documentation stays the same.

Therefore making your first call is as easy as running the following line in the terminal. Go ahead and try it!

curl -x "http://[email protected]:8000" -k "http://httpbin.org/ip"

To do JavaScript requests (headless browser) instead of normal requests, go ahead and try the following in your terminal:

curl -x "http://[email protected]:8000" -k "http://httpbin.org/ip"

# Rate Limit

By default the Crawling API in proxy mode is rate limited to 20 requests per second (1.728M req/day). If your proxy management solution is working with concurrent requests/threads instead of requests per second, its important to note that 20 requests per second converts to much more concurrent requests in general. As an example, If you are crawling Amazon with crawlbase, the average request takes about 4 seconds, therefore 20 requests per second converts to 80 concurrent threads. If the website you are crawling responds quick then you need less concurent requests. If you hit the limit of concurrent requests, please contact support (opens new window) with your usecase to increase your concurrency.

← Data Scrapers Try the API →