# Pushing data to the Crawler

Before starting to push urls to the Crawler, you first need to create a new Crawler one here (opens new window).

To push urls to be crawled by the Crawler, you must use the Crawling API with two additional parameters:

  • You must append &callback=true
  • You must append &crawler=YourCrawlerName using the name of the crawler you created here (opens new window).

In response to your crawler push, the API will send back a JSON representation with a unique request identifier RID. This RID is unique and will help you identify the request at any point in the future.

Example of push response:

{ "rid": "1e92e8bff32c31c2728714d4" }

By default, you can push up to 30 urls each second to the Crawler.

# Crawler waiting queue limit

The combined total for all Crawler waiting queues is capped at 1M pages. If any of the queues or all queues combined exceed 1M pages, your Crawler push will temporarily pause, and we will notify you via email. Crawler push will automatically resume once pages in the waiting queue(s) are below 1M pages.

# Sending additional data

Optionally, you can receive custom headers to your callback if you use the callback_headers parameter. That is great for passing additional data for identification purposes at your side.

The format is the following: HEADER_NAME:VALUE|HEADER_NAME_2:VALUE_2|etc. And it must be encoded properly.

Example for headers and values MY_ID 1234, some_other 4321

&callback_headers=MY_ID%3A1234%7Csome_other%3A4321

Those headers will come back in the webhook post request.