How to do Web Scraping using Node.js?

Article Thumbnail

Data is used to predict the future and sell relevant products to the consumer. Even in the stock markets, we see past data to invest money. And in the programming world, we use the data to see which framework would be the most reliable for a certain application. Since the data is scattered throughout the internet, we use web scrapers to gather it all together.

Web Scraping is a technique for collecting data from the Internet and putting it all together in a certain format so that we can use it later. Let’s learn more about web scraping by building a web scraper.

Also read: Introduction to NodeJS: Installation, Setup, and Creating a Hello World Application in NodeJS

Amazon Price Scraper using NodeJS

On Amazon, the price of the product changes very frequently. To get a good deal you must buy the product when it is at its lower price. This article will teach you to create a web scraper that scrapes a product’s lowest and highest price. Let’s create an application that gets the product with the price from Amazon.

Web scraping in Node.js can be done by using several modules. Puppeteer is the most popular module among them. We can install Puppeteer using NPM and then import it inside the server file to use it.

Setting up the environment

Before we get started with the code, let’s first setup the required files and folder structure and install the modules.

  • Folder Structure

Create a project folder, “Amazon Web Scraper”, inside this folder create a file app.js where we will write code for scraping.

  • Code Editor

Open the project inside the code editor, for this article VS-Code is used, but you can use any editor like the atom, subline, etc.

  • Check Node.js

Open the terminal and type the following command to make sure that Node.js is installed on your system.

node -v

If its returns a version then Node.js is installed, if not then install it from Node.js’ official website.

  • Initial NPM

NPM stands for Node Packages Manager, by which we can install any Node.js module.

Syntax:

npm init -y

This command will initiate NPM for our project.

  • Install Puppeteer 

Puppeteer is the most popular module for web scraping in Node.js, it is easy to implement.

Syntax:

npm install puppeteer

This command will install the puppeteer module inside our project.

Creating the Amazon Price Scraper Using NodeJS

With the setup out of the way, let’s get right into the steps to create the Amazon web scraper.

1. Import Puppeteer

Open the app.js folder and import the puppeteer module using the following command

const puppeteer = require('puppeteer');

2. Create a Function To Return the Full Date

Let’s create a function that returns the current year, month, date, and time.

function getData() {
	let date = new Date();
	let fullDate = date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate() + " " + date.getHours() + ":" + date.getMinutes() + ":" + date.getSeconds();
	return fullDate;
}

3. The Webscraper Function

Let’s create a webScraper function with an argument URL, URL is the location of the website from which we want to scrap data. It is required to use async before the function to make it asynchronous since we will use await inside the function. 

async function webScraper(url) {
	
};

4. Invoking Objects

Create an object browser and set it to launch puppeteer using the launch method of the puppeteer module.

const browser = await puppeteer.launch({})

Create an object page and set it to open a new page using the newPage method of the browser object.

const page = await browser.newPage()

Open the given URL, using the goto method of the page object.

await page.goto(url)

5. Pulling Data from Amazon

Amazon uses productTitle ID for its product’s title and an a-price-whole class for the price.

Let’s create a product variable and set it to select the element of the website with id productTitle using the waitForSelector method, which is used to select the element, then extract the text content from the product and set it to another variable so that we print it by using these commands

var product = await page.waitForSelector("#productTitle")
var productText = await page.evaluate(product => product.textContent, product)

Do the same for the fetching price of the product

var price = await page.waitForSelector(".a-price-whole")
var priceText = await page.evaluate(price => price.textContent, price)

Then console log the current data by using the getDate method, we created earlier and then passes the productText and priceText.

console.log("Date: " + getData() + "Product: " + productText + "Price: " + priceText)

Then close the browser which we open earlier using the following command

browser.close()

Then call the webScraper method and pass the amazon product URL, you want to scrape.

webScraper('https://www.amazon.in/dp/B09W9MBS1G/ref=cm_sw_r_apa_i_NWPQ1TXATPCD3XBZ0P7W_0');

Complete Code for Scraping Amazon Product Prices using NodeJS

const puppeteer = require('puppeteer');

function getData() {
	let date = new Date();
	let fullDate = date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate() + " " + date.getHours() + ":" + date.getMinutes() + ":" + date.getSeconds();
	return fullDate;
}

async function webScraper(url) {
	const browser = await puppeteer.launch({})
	const page = await browser.newPage()

	await page.goto(url)
	var product = await page.waitForSelector("#productTitle")
	var productText = await page.evaluate(product => product.textContent, product)
	var price = await page.waitForSelector(".a-price-whole")
	var priceText = await page.evaluate(price => price.textContent, price)
	console.log("Date: " + getData() + "Product: " + productText + "Price: " + priceText)
	browser.close()
};

webScraper('https://www.amazon.in/dp/B09W9MBS1G/ref=cm_sw_r_apa_i_NWPQ1TXATPCD3XBZ0P7W_0');

Output

Date: 2022-9-6 20:9:37    Product: ASUS Vivobook 15, 15.6-inch (39.62 cms) FHD, AMD Ryzen 7 3700U, Thin and Light Laptop (16GB/512GB SSD/Integrated Graphics/Windows 11/Office 2021/Silver/1.8 kg), M515DA-BQ722WS       Price: 50,799.

References

https://nodejs.org/en/

https://www.npmjs.com/package/puppeteer