

Or PuppeteerCrawlerOptions.requestQueue constructor options, respectively. RequestList or RequestQueue instances provided by the PuppeteerCrawlerOptions.requestList The source URLs are represented using Request objects that are fed from Which downloads the pages using raw HTTP requests and is about 10x faster. If the target website doesn’t need JavaScript, consider using CheerioCrawler, It is useful for crawling of websites that require to execute JavaScript. Since PuppeteerCrawler uses headless Chrome to download web pages and extract data, Or from a dynamic queue of URLs enabling recursive crawling of websites. The URLs to crawl are fed either from a static list of URLs Get the page URL await page.Provides a simple framework for parallel crawling of web pages Then we can use the launch() method to create a browser instance: ( async () => ) url() In a Node.js file, require it: const puppeteer = require ( 'puppeteer' ) You can opt to make puppeteer run the local installation of Chrome you already have installed by installing puppeteer-core instead, which is useful in some special cases (see puppeteer vs puppeteer-core). This will download and bundle the latest version of Chromium. Start by installing it using npm install puppeteer To be precise, it uses Chromium the open source part of Chrome, which mostly means you don’t have the proprietary codecs that are licensed by Google and can’t be open sourced (MP3, AAC, H.264.) and you don’t have the integration with Google services like crash reports, Google update and more, but from a programmatic standpoint it should all be 100% similar to Chrome (except for media playing, as noted).

It’s the most precise way to automate testing with Chrome though, since it’s using the actual browser under the hood. Since it spins up a new Chrome instance when it’s initialized, it might not be the most performant. It does not unlock anything new, per se, but it abstracts many of the nitty-gritty details we would have to deal with, without using it. create server-side rendered versions of single page apps.

We are basically using Chrome, but programmatically using JavaScript. Puppeteer is a Node library that we can use to control a headless Chrome instance. Introduction to programmatically controlling Chrome from Node.js
