How to scrape Amazon data in Ruby
Sep 3, 20193 mins read
Saving a product name and price from Amazon is straightforward, simply mark the name and the price of any product and store them wherever you want. But…what if you have hundreds or even thousands of product names and prices to be saved? Will the same trick work? At least not for us!
In this article, we will show you how to quickly build a simple scraper with some Ruby libraries to crawl a product name and price from Amazon, which can be applied to hundreds of Amazon products.
Let’s create a file
amazon_scraper.rb which will contain our ruby code.
Let’s also install our two requirements by pasting the below at your command prompt:
gem install proxycrawl
gem install nokogiri
Now its time to start coding. Let’s write our code in the
amazon_scraper.rb file, and we will start by loading an HTML page of one Amazon product URL using ProxyCrawl ruby library. We need to initialize the library and create a worker with our token. For Amazon, we should use the normal token, make sure to replace it with your actual token from your account.
We are now loading the URL, but we are not doing anything with the result. So it’s now time to start scraping the name and the price of the product.
We will use Ruby Nokogiri library that we installed before to parse the resulting HTML and extract only the name and price of the Amazon product.
Let’s write our code which should parse an HTML body and scrape the product name and price accordingly.
The full code should look like the following:
Now we should have our scraped Amazon product name and price like the following in the command prompt:
The code is ready, and you can quickly scrape an Amazon product to get its name and price. You can see the results in the console in which it can be saved in a database, save in a file, etc. That is up to you.
We hope you enjoyed this tutorial and we hope to see you soon in ProxyCrawl. Happy crawling!