# Bulk

The /bulk endpoint allows clients to retrieve data in bulk using a list of Request IDs (RIDs). This operation supports efficient data retrieval for large datasets and provides an option to automatically delete the fetched items from the storage after retrieval.

# Parameters

Send a JSON object with the following properties:

  • rids (required): An array of RIDs for the data you want to retrieve.

  • auto_delete (optional): A boolean parameter that, when set to true, will automatically delete the fetched items from the storage after they are retrieved. The default value is false, meaning items will not be deleted unless explicitly requested.

# Request

To retrieve and automatically delete data for three RIDs:

curl -X POST 'https://api.crawlbase.com/storage/bulk?token=_USER_TOKEN_' \
-H 'Content-Type: application/json' \
-d '{ "rids": ["RID1","RID2","RID3"], "auto_delete": true }'

# Response

The response is a JSON array of objects, each representing the data for one RID. Note that the body field is base64 encoded and gzip compressed. You will need to base64 decode and then gzip decompress it to retrieve the original content.

[
  {
    "stored_at": "2021-03-01T14:22:58+02:00",
    "original_status": 200,
    "pc_status": 200,
    "rid": "RID1",
    "url": "URL1",
    "body": "BODY1"
  },
  {
    "stored_at": "2021-03-01T14:30:51+02:00",
    "original_status": 200,
    "pc_status": 200,
    "rid": "RID2",
    "url": "URL2",
    "body": "BODY2"
  }
]

# Notes

For efficient use of the /bulk API, please take note of the following:

  • The maximum number of RIDs that can be processed per request is 100. If more than 100 RIDs are sent, only the first 100 will be processed.

  • The auto_delete feature is particularly useful for maintaining storage efficiency and managing data lifecycle without requiring separate deletion requests. Use this feature judiciously to avoid unintentional data loss.