How to scrape Amazon reviews in Node
Jul 5, 20185 mins read
If you need to get the reviews of different products you can quickly do it in Node, as the async features of Node helps you to get the data from Amazon easily.
In this article, we will together scrape Amazon reviews and comments using only a couple of NodeJS libraries.
The first thing that we are going to need is a list of Amazon URLs, for the example here we will use amazon.com URLs. We have collected a sample of around 1,000 product ASIN which you can download from here.
Amazon products list downloadLoading Amazon URLs in Node
Let’s create a file start.js
which will contain our node code.
Let’s also install our two requirements:
npm i cheerio
npm i proxycrawl
Now we should have our project structure with at least the following files:

Now it’s time to start coding. Let’s write our code in the start.js
file, and we will start by loading the amazon-products.txt file into an array. We can do that with the following piece of code:
1 | const fs = require('fs'); |
Now that we have the URLs in an array, we can start crawling them. We will use the ProxyCrawl node library that we installed before.
Crawling Amazon with ProxyCrawl
We need to initialize the library and create a worker with our token. For Amazon, we should use the normal token, make sure to replace it with your actual token from your account.
We have to add the following two lines to our project:
1 | const { ProxyCrawlAPI } = require('proxycrawl'); |
With the resulting code being the following:
1 | const fs = require('fs'); |
Time now to crawl the URLs, we will do 10 requests each second which should suffice for our test, but if you need more, make sure to contact ProxyCrawl.
Let’s build our code to send 10 API requests each second…
1 | const requestsPerSecond = 10; |
We are now loading the URLs, but we are not doing anything with the result. So it’s now time to start scraping 😄
Scraping Amazon reviews
We will use Node Cheerio library that we installed before to parse the resulting HTML and extract only the reviews.
Let’s first include cheerio:
1 | const cheerio = require('cheerio'); |
And now let’s build a function which should receive the HTML and parse it accordingly.
1 | function parseHtml(html) { |
So now we have the text content of the reviews, we are close to finishing the scraping, but we are missing the most crucial part, which is to connect our function with the previous piece of code we had. When we did the call to the ProxyCrawl API.
The full code should look like the following:
1 | const fs = require('fs'); |
The code is ready, and you can quickly scrape 10 Amazon reviews each second. Obviously, for this post we are just logging in the console the results, you should replace that console.log
with whatever you would like to do. It can be saved in a database, save in a file, etc. That is up to you.
We hope you enjoyed this tutorial and we hope to see you soon in ProxyCrawl. Happy crawling!