Creating a Node.js Web Scraper for Amazon Product Data

Suppose you are creating a website for selling air coolers in this hot summer and you want to somehow track the product details of other sellers such as cooler types, prices, discounts, etc so that based on that you can change the pricing on your website. 

Well, how do you extract data? Manually? No, is very difficult to do, because in the real world,  thousands of websites selling coolers can be there having millions of product data added every day, how can you get the required one? Here come the web scrapers.

In this tutorial, we will learn more about web scraping by building an Amazon scraper to scrape data like product descriptions, product information, pricing, etc from the Amazon website.

If you are new to Node.js: Introduction to NodeJS – Installation, Setup, and Creating a Hello World Application in NodeJS

What is Web Scraping?

Web Scraping is a technique for collecting publicly available data from the Internet and putting it all together in a certain format so that we can use it later.

These data can be used in different places like for predicting the future and selling relevant products to the consumer, in the stock markets to see past data to invest money, and in the programming world to pick which framework would be the most reliable for a certain application, etc. So if we have a lot of data we can use it to our advantage and since the data is scattered all over the internet, we use web scraping to gather it together.

What is an Amazon Data Scraper?

On Amazon, if you notice, the price of the product changes very frequently. To get a good deal you must buy the product when it is at its lower price. A web scraper can help us do this. We can write code to scrape (or track) some products on Amazon and scrape its pricing data on a regular basis to get the best deal. This is called the Amazon Data scraper or Amazon product scraper.

Building this Amazon scraping application will serve as a fundamental base for you to extend and scrape whatever you want.

Puppeteer for Data Scraping

Web scraping in Node.js can be done by using several modules like Cheerio, Axios, etc but Puppeteer is the most popular module among them. 

Puppeteer is a Node.js library that provides high-level API to fully control and perform operations on Chromium browsers. It is best for browser automation and in tasks like web scraping, testing, and performance monitoring. By using Puppeteer, we can extract data from websites in JSON without much hassle. It opens the web pages, waits for content to load, and then extracts the specified data.

Installation:

Since Puppeteer is not a built-in module in Node.js, we have to install it manually using NPM or Yarn.

npm install puppeteer

// or

yarn add puppeteer

Creating an Amazon Product Scraper Using Puppeteer

Now, let’s get into the actual coding of our web scraping tool.

Step 1: Setting Up the Environment

Let’s first create the required files and folder structure and install the necessary modules.

Folder Structure:

Create a project folder, “Amazon Web Scraper”, inside this folder create a file “app.js” where we will write code for scraping.

Code Editor:

Open the project inside the code editor, for this article VS-Code is used, but you can use any editor like the atom, subline, etc.

Check Node.js:

Open the terminal and type the following command to make sure that Node.js is installed on your system.

node -v

If it returns a version then Node.js is installed, if not then install it from Node.js’ official website.

Initiate NPM:

NPM stands for Node Packages Manager, by which we can install any Node.js module.

npm init -y

This command will initiate NPM for our project.

Install Puppeteer:

npm install puppeteer

This command will install the puppeteer module inside our project.

Step 2: Writing Code for Amazon Price Scraper

With the setup out of the way, let’s get right into the steps to build your own scraper to extract information from Amazon page.

Import Puppeteer:

Open the app.js folder and import the puppeteer module using the following command:

const puppeteer = require('puppeteer');

Creating a Function To Get Full Date:

Let’s create a function that returns the current year, month, date, and time.

function getData() {
	let date = new Date();
	let fullDate = date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate() + " " + date.getHours() + ":" + date.getMinutes() + ":" + date.getSeconds();
	return fullDate;
}

Creating a Web Scraper Function:

Let’s create a webScraper function with an argument URL, URL is the location of the website from which we want to scrap data. It is required to use async before the function to make it asynchronous since we will use await inside the function. 

async function webScraper(url) {
	
};

Invoking Objects:

Create an object browser and set it to launch puppeteer using the launch method of the puppeteer module.

const browser = await puppeteer.launch({})

Create an object page and set it to open a new page using the newPage method of the browser object.

const page = await browser.newPage()

Open the given URL, using the goto method of the page object.

await page.goto(url)

Pulling Data from Amazon:

Amazon uses productTitle ID for its product’s title and an a-price-whole class for the price.

Let’s create a product variable and set it to select the element of the website with id productTitle using the waitForSelector method, which is used to select the element, then extract the text content from the product and set it to another variable so that we print it by using these commands:

var product = await page.waitForSelector("#productTitle")
var productText = await page.evaluate(product => product.textContent, product)

Do the same for the fetching price of the product:

var price = await page.waitForSelector(".a-price-whole")
var priceText = await page.evaluate(price => price.textContent, price)

Then console log the current data by using the getDate method, we created earlier and then passes the productText and priceText.

console.log("Date: " + getData() + "Product: " + productText + "Price: " + priceText)

Then close the browser that we open earlier using the following command:

browser.close()

Then call the webScraper method and pass the Amazon product URL, you want to scrape.

webScraper('https://www.amazon.in/dp/B09W9MBS1G/ref=cm_sw_r_apa_i_NWPQ1TXATPCD3XBZ0P7W_0');

Step 3: Running the Scraper

The application code is written in a single file “app.js” that runs on executing the below command in the terminal:

node app.js

Output:

Date: 2022-9-6 20:9:37    Product: ASUS Vivobook 15, 15.6-inch (39.62 cms) FHD, AMD Ryzen 7 3700U, Thin and Light Laptop (16GB/512GB SSD/Integrated Graphics/Windows 11/Office 2021/Silver/1.8 kg), M515DA-BQ722WS       Price: 50,799.

See how cleverly our application is fetching data.

The node app.js command will not automatically update the application if you make any changes to the code, so you need to first stop (ctrl + c) and then restart the server again with the same command. We have an interesting article (on nodemon) to solve this problem – Update Code Without Restarting Node Server

Step 4: Integrating the Database

You can now store the data in a regular array or even easily integrate databases like MongoDB and MySQL with your Node.js scraper to analyze the price changes of a product.

We have separate tutorials for integrating both MongoDB and MySQL databases:

Step 5: Hosting Your Node.js Web Scraper

Now if you want to keep the scraper running continually then you can host it on a cloud server.

We have a separate tutorial on the free Node.js hosting platforms if you want to give it a try:

Complete Node.js Code for Scraping Amazon Product Prices

const puppeteer = require('puppeteer');

function getData() {
	let date = new Date();
	let fullDate = date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate() + " " + date.getHours() + ":" + date.getMinutes() + ":" + date.getSeconds();
	return fullDate;
}

async function webScraper(url) {
	const browser = await puppeteer.launch({})
	const page = await browser.newPage()

	await page.goto(url)
	var product = await page.waitForSelector("#productTitle")
	var productText = await page.evaluate(product => product.textContent, product)
	var price = await page.waitForSelector(".a-price-whole")
	var priceText = await page.evaluate(price => price.textContent, price)
	console.log("Date: " + getData() + "Product: " + productText + "Price: " + priceText)
	browser.close()
};

webScraper('https://www.amazon.in/dp/B09W9MBS1G/ref=cm_sw_r_apa_i_NWPQ1TXATPCD3XBZ0P7W_0');

Conclusion

So in short, if you are building an e-commerce platform and want to become the best seller, you have to keep an eye on the dataset of your competitors, for which you can do data extraction using a web scraper. A web scraping tool enables us to scrape publicly available data. In this tutorial, we have seen how web scrappers can be made by creating a basic amazing scraper that takes URLs as an argument and scrapes product data from Amazon website.

Aditya Gupta
Aditya Gupta
Articles: 119