Let's use cheerio to loop through each and get the article contents. Inspecting the page we can see that all articles are 's that are nested under a with a class of 'news-articles'. First let's use the chrome devtools to inspect the page contents and target the desired elements and data we want to extract. Here we can use the cheerio library for this. Now that we've got HTML being returned we can go through the page and extract the data we want. Loop through each article and get its data If successful, you should get a bunch of HTML logged to the terminal similar to the image below: It allows implementing web scraping routines. Node.js is a great tool to use for web scraping. The process should remove the hassle of having to browse pages manually, be automated, and allow to gather and classify the information you're interested in programmatically. This is because axios returns several properties but data is the one that contains our HTML payload. By definition, web scraping means getting useful information from web pages. Notice that I've logged response.data instead of response. To test our script, open the terminal at the root directory of the project folder and type the following command then hit enter to execute the script. You can do it by clicking the right mouse click. Inside the project’s folder, create a new file called app.js. Next, let's check if we can hit the URL by doing a simple GET request with axios by passing in the URL. Create a new folder for our web scraping application project, and name it as you wish. In the root directory of the project folder create an empty javascript file for the scraper, I've called mine scraper.js.Īt the top of the file import the dependencies.
![web scraping with nodejs web scraping with nodejs](https://www.webscreenscraping.com/assets/img/blog/2022/July/how-web-scraping-is-used-to-build-large-scale-database/Top-programing-language-used-to-build-web-scrapers.png)
#Web scraping with nodejs install#
Next, we'll install the axios and cheerio libraries. Inside the project folder create a package.json file to save the project dependencies. We'll scrape the data and save into a JSON file.Ĭreate an empty folder to hold the project files. The URL we'll be scraping is which contains a list of news items highlighted in red shown below. We'll also use the node fs module to save the scraped data into a JSON file.
![web scraping with nodejs web scraping with nodejs](https://cobaltintelligence.com/blog/wp-content/uploads/2021/01/Web-scraping-with-Nodejs-and-Cheerio-1170x657.png)
In this tutorial we'll create a simple web scraper using NodeJS, axios and cheerio.