PDF TOOLS

How to Convert HTML to PDF in Node.js using Puppeteer

Published May 16, 2023
Share:

In today's digital world, it is crucial to have the ability to convert web pages or HTML documents into PDF files. This can be useful for generating reports, creating invoices, or simply sharing information in a more presentable format. In this blog post, we will explore how to convert HTML pages to PDF using Node.js and Puppeteer, an open-source library developed by Google.

Introduction to Puppeteer

Puppeteer is a powerful Node.js library that allows developers to control headless browsers, mainly Google Chrome or Chromium, and perform various actions like web scraping, taking screenshots, and generating PDFs. Puppeteer provides an extensive API to interact with the browser, making it an excellent choice for converting HTML to PDF.

Why Puppeteer?

  • Ease of use: Puppeteer offers a simple and easy-to-use API that abstracts away the complexities of working with headless browsers.
  • Powerful: Puppeteer provides extensive capabilities for manipulating web pages and interacting with browser elements.
  • Scalable: With Puppeteer, you can easily scale your PDF generation process by running multiple browser instances in parallel.

Setting Up Your NodeJS Project

Before we begin, you'll need to set up a new NodeJS project. Follow these steps to get started:

  1. Install NodeJS if you haven't already (you can download it from here).
  2. Create a new folder for your project and open it in Visual Studio Code or any specific code editor.
  3. Run npm init to create a new package.json file for your project. Follow the prompts and fill in the required information.

    How to Convert HTML to PDF in Node.js: Figure 1

  4. Install Puppeteer by running npm install puppeteer.

    How to Convert HTML to PDF in Node.js: Figure 2

Now that we have our project set up, let's dive into the code.

Loading HTML Template and Converting to PDF File

To convert HTML template to a PDF file using Puppeteer, follow these steps:

Create a file named "HTML To PDF.js" in the folder.

Importing Puppeteer and fs

    const puppeteer = require('puppeteer');
    const fs = require('fs');
NODE.JS

The code starts by importing two essential libraries: puppeteer, a versatile tool for controlling headless browsers like Chrome and Chromium, and fs, a built-in NodeJS module for handling file system operations. Puppeteer enables you to automate a wide range of web-based tasks, including rendering HTML, capturing screenshots, and generating PDF files.

Defining the exportWebsiteAsPdf Function

    async function exportWebsiteAsPdf(html, outputPath) {
      // Create a browser instance
      const browser = await puppeteer.launch({
        headless: 'new'
      });

      // Create a new page
      const page = await browser.newPage();

      await page.setContent(html, { waitUntil: 'domcontentloaded' });

      // To reflect CSS used for screens instead of print
      await page.emulateMediaType('screen');

      // Download the PDF
      const PDF = await page.pdf({
        path: outputPath,
        margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
        printBackground: true,
        format: 'A4',
      });

      // Close the browser instance
      await browser.close();

      return PDF;
    }
NODE.JS

The exportWebsiteAsPdf function serves as the core of our code snippet. This asynchronous function accepts a html string and a outputPath as input parameters and returns a PDF file. The function performs the following steps:

  1. Launches a new headless browser instance using Puppeteer.
  2. Creates a new browser page.
  3. Sets the provided html string as the page content, waiting for the DOM content to load. We load html template as an HTML string to convert it into the PDF format.
  4. Emulates the 'screen' media type to apply the CSS used for screens instead of print-specific styles.
  5. Generates a PDF file from the loaded HTML content, specifying margins, background printing, and format (A4).
  6. Closes the browser instance.
  7. Returns the created PDF file.

Using the exportWebsiteAsPdf Function


    // Usage example
    //Get HTML content from HTML file
    const html = fs.readFileSync('test.html', 'utf-8');

    exportWebsiteAsPdf(html, 'result.PDF').then(() => {
      console.log('PDF created successfully.');
    }).catch((error) => {
      console.error('Error creating PDF:', error);
    });
NODE.JS

The last section of the code illustrates how to use the exportWebsiteAsPdf function. We perform the following steps:

  1. Read the HTML content from an HTML file using the fs module's readFileSync method. Here we are loading template file to generate PDF from HTML page.
  2. Call the exportWebsiteAsPdf function with the loaded html string and the desired outputPath.
  3. Utilize a .then block to handle the successful PDF creation, logging a success message to the console.
  4. Employ a .catch block to manage any errors that occur during the HTML to PDF conversion process, logging an error message to the console.

This code snippet provides a comprehensive example of how to convert an HTML template to a PDF file using NodeJS and Puppeteer. By implementing this solution, you can efficiently generate high-quality PDFs, meeting the needs of various applications and users.

How to Convert HTML to PDF in Node.js: Figure 3

Converting URLs to PDF Files

In addition to converting HTML templates, Puppeteer also allows you to convert URLs directly into PDF files.

Importing Puppeteer


    const puppeteer = require('puppeteer');
NODE.JS

The code starts by importing the Puppeteer library, which is a powerful tool for controlling headless browsers like Chrome and Chromium. Puppeteer allows you to automate a variety of web-based tasks, including rendering your HTML code, capturing screenshots, and in our case, generating PDF files.

Defining the exportWebsiteAsPdf Function


    async function exportWebsiteAsPdf(websiteUrl, outputPath) {
      // Create a browser instance
      const browser = await puppeteer.launch({
        headless: 'new'
      });

      // Create a new page
      const page = await browser.newPage();

      // Open URL in current page
      await page.goto(websiteUrl, { waitUntil: 'networkidle0' });

      // To reflect CSS used for screens instead of print
      await page.emulateMediaType('screen');

      // Download the PDF
      const PDF = await page.pdf({
        path: outputPath,
        margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
        printBackground: true,
        format: 'A4',
      });

      // Close the browser instance
      await browser.close();

      return PDF;
    }
NODE.JS

The exportWebsiteAsPdf function is the core of our code snippet. This asynchronous function accepts a websiteUrl and a outputPath as its input parameters and returns a PDF file. The function performs the following steps:

  1. Launches a new headless browser instance using Puppeteer.
  2. Creates a new browser page.
  3. Navigates to the provided websiteUrl and waits for the network to become idle using the waitUntil option set to networkidle0.
  4. Emulates the 'screen' media type to ensure the CSS used for screens is applied instead of print-specific styles.
  5. Converts the loaded web page to a PDF file with the specified margins, background printing, and format (A4).
  6. Closes the browser instance.
  7. Returns the generated PDF file.

Using the exportWebsiteAsPdf Function


    // Usage example
    exportWebsiteAsPdf('https://ironpdf.com/', 'result.pdf').then(() => {
      console.log('PDF created successfully.');
    }).catch((error) => {
      console.error('Error creating PDF:', error);
    });
NODE.JS

The final section of the code demonstrates how to use the exportWebsiteAsPdf function. We execute the following steps:

  1. Call the exportWebsiteAsPdffunction with the desired websiteUrl and outputPath.
  2. Use a then block to handle the successful PDF creation. In this block, we log a success message to the console.
  3. Use a `catch block to handle any errors that occur during the website to PDF conversion process. If an error occurs, we log an error message to the console.

By integrating this code snippet into your projects, you can effortlessly convert URLs into high-quality PDF files using NodeJS and Puppeteer.

How to Convert HTML to PDF in Node.js: Figure 4

Best HTML To PDF Library for C# Developers

IronPDF is a popular .NET library used for generating, editing, and extracting content from PDF files. It provides a simple and efficient solution for creating PDFs from HTML, text, images, and existing PDF documents. IronPDF supports .NET Core, .NET Framework, and .NET 5.0+ projects, making it a versatile choice for various applications.

Key Features of IronPDF

HTML to PDF Conversion: IronPDF allows you to convert HTML content, including CSS to PDF files. This feature enables you to create pixel-perfect PDF documents from web pages or HTML templates.

URL Rendering: IronPDF can fetch web pages directly from a server using a URL and convert them to PDF files, making it easy to archive web content or generate reports from dynamic web pages.

Text, Image, and PDF Merging: IronPDF allows you to merge text, images, and existing PDF files into a single PDF document. This feature is particularly useful for creating complex documents with multiple sources of content.

PDF Manipulation: IronPDF provides tools for editing existing PDF files, such as adding or removing pages, modifying metadata, or even extracting text and images from PDF documents.

Conclusion

In conclusion, generating and manipulating PDF files is a common requirement in many applications, and having the right tools at your disposal is crucial. The solutions provided in this article, such as using Puppeteer with NodeJS or IronPDF with .NET, offer powerful and efficient methods for converting HTML content and URLs into professional, high-quality PDF documents.

IronPDF, in particular, stands out with its extensive feature set, making it a top choice for .NET developers. IronPDF offers a free trial, allowing you to explore its capabilities.

Users can also benefit from Iron Suite, a Suite of 5 professional .NET libraries including IronXL, IronPDF, IronOCR and more.

< PREVIOUS
How to Convert HTML to PDF in C++
NEXT >
Open Source PDF Editor (Updated List)

Ready to get started? Version: 2024.10 just released

Free NuGet Download Total downloads: 11,308,499 View Licenses >