USING IRONPDF FOR NODE.JS

How to Extract Images From PDF in Node.js

Name: IronPDF
Brand: Iron Software
Availability: InStock

Kannapat Udonpant

January 14, 2025

Extracting images from PDFs is a common task for many developers, whether for file processing, data extraction, or creating document previews. In this article, we'll explore how to extract and save images from a PDF usingIronPDF, a powerful PDF library available for .NET, and how it can be integrated into a Node.js environment via its NPM package.

How to extract images from PDF files using IronPDF Node.js

Set up a Node.js application.
Install IronPDF NPM packages.
Prepare a PDF for the extraction.
Extract images from the PDF file and save.

Prerequisites

If you haven't installed Node.js yet, download and install it from https://nodejs.org/.

Introducing the IronPDF NPM package

The IronPDF NPM package is a Node.js wrapper for the IronPDF library, originally designed for .NET environments. It allows developers to harness the powerful PDF manipulation capabilities of IronPDF in Node.js applications. This package is particularly useful for working with PDF documents, offering a range of features that can be useful in many real-world applications such as file processing, report generation, and more.

Key Features of IronPDF in Node.js

PDF Creation:
IronPDF can create PDFs from various sources, including HTML content, images, or even raw text. This feature is highly useful for web applications that need to generate reports, invoices, or any other document in PDF format.
IronPDF supports styling and formatting HTML content, making it a great choice for converting web pages into well-structured PDF documents.
PDF Editing:
IronPDF allows you to manipulate existing PDFs by adding text, images, or annotations, and modifying the layout. You can also merge multiple PDFs into one, split a large document into smaller parts, or even reorder pages within a PDF.
These features make it ideal for applications that need to dynamically modify PDFs, such as document management systems or applications that require automated document generation.
PDF Conversion:
One of the standout features of IronPDF is its ability to convert PDFs into various other formats. For example, it can convert PDF documents to images (PNG, JPEG), HTML, and Word formats.
This feature is particularly useful when you need to present a PDF's content in different formats or create image previews of PDFs for user interfaces.
Extracting Text and Images:
While IronPDF does not have a direct REST API to extract raw images from a PDF, it does provide a method for rendering PDF pages as images (such as PNG or JPEG), which can be used as an indirect way of extracting content.
You can render each page of the PDF into an image, effectively capturing the visual representation of the document, and saving it for further use or display.
Rendering Pages as Images:
IronPDF can convert PDF pages into high-quality images. For example, you can convert a multipage PDF into a series of PNGs, one for each page. This is particularly useful when you need to display the pages as thumbnails or in an image-based format. It supports various image format types.
Security and Encryption:
IronPDF supports working with encrypted PDFs. It allows you to open, decrypt, and manipulate secured documents, which is essential for working with documents that require passwords or other forms of protection.
Cross-Platform Compatibility:
IronPDF is compatible with both Windows and Linux environments, making it a versatile tool for server-side applications. The Node.js wrapper simplifies the process of integrating IronPDF into Node.js-based applications.

Step 1: Set up a Node.js application

To start with, set up the Node.js project folder by creating a folder on the local machine and opening Visual Studio Code.

mkdir PdfImageExtractor
cd PdfImageExtractor
code .

Step 2: Install the IronPDF NPM packages

Install the IronPDF Node.js package and its supporting package based on Windows or Linux machines

npm install @ironsoftware/ironpdf
npm install @ironsoftware/ironpdf-engine-windows-x64

The package @ironsoftware/ironpdf-engine-windows-x64 is a platform-specific version of the IronPDF library, specifically designed for Windows 64-bit systems.

1. Platform-Specific Binary for Windows (64-bit)

The IronPDF library has platform-specific dependencies. For Node.js to work efficiently with IronPDF, it requires native binaries that are tailored for specific operating systems and architectures. In this case, the @ironsoftware/ironpdf-engine-windows-x64 package provides the native engine for Windows 64-bit environments.

2. Optimized Performance

By using this Windows-specific package, you ensure that the IronPDF library works optimally on Windows-based systems. It ensures that all the native dependencies, such as those related to PDF rendering and manipulation, are compatible and function smoothly on your machine.

3. Simplifying Installation

Instead of manually managing and configuring the required binaries for Windows 64-bit systems, installing the @ironsoftware/ironpdf-engine-windows-x64 package automates this process. This saves time and eliminates potential compatibility issues.

4. Cross-Platform Compatibility

IronPDF also supports other platforms like macOS and Linux. Providing platform-specific packages, allows developers to use the right binary for their operating system, improving the overall stability and reliability of the library.

5. Required for Certain Features

If you're using certain IronPDF features (like rendering PDFs to images or performing complex document manipulations), the native engine is required. The @ironsoftware/ironpdf-engine-windows-x64 package includes this engine specifically for Windows-based environments.

Step 3: Prepare a PDF for the extraction

Now get the PDF file that needs extraction. Copy the path to be used in the application. This article uses the following file.

How to Extract Images From PDF in Node.js: Figure 1 - Sample File

Step 4: Extract images from PDF file and save

Now use the file in the above step and write the below code snippet in an app.js file in the Node.js project folder.

const fs = require('fs');
const { IronPdfGlobalConfig, PdfDocument } = require('@ironsoftware/ironpdf')
// Apply your IronPDF license key
IronPdfGlobalConfig.getConfig().licenseKey = "Your license key";
 (async () => {
    // Extracting Image and Text content from Pdf Documents
    // Import existing PDF document
    const pdf = await PdfDocument.fromFile("ironPDF.pdf");
    // Get all text to put in a search index and log it
    const text = await pdf.extractText();
    console.log('All Text:'+text);
    // Get all Images
    const imagesBuffer = await pdf.extractRawImages();
    console.log('images count:'+imagesBuffer.length);
    fs.writeFileSync("./file1.jpg", imagesBuffer[0]);
   // this code can also be in rest api
})();
var msg = 'Complete!';
console.log(msg); //log complete in console

const fs = require('fs');
const { IronPdfGlobalConfig, PdfDocument } = require('@ironsoftware/ironpdf')
// Apply your IronPDF license key
IronPdfGlobalConfig.getConfig().licenseKey = "Your license key";
 (async () => {
    // Extracting Image and Text content from Pdf Documents
    // Import existing PDF document
    const pdf = await PdfDocument.fromFile("ironPDF.pdf");
    // Get all text to put in a search index and log it
    const text = await pdf.extractText();
    console.log('All Text:'+text);
    // Get all Images
    const imagesBuffer = await pdf.extractRawImages();
    console.log('images count:'+imagesBuffer.length);
    fs.writeFileSync("./file1.jpg", imagesBuffer[0]);
   // this code can also be in rest api
})();
var msg = 'Complete!';
console.log(msg); //log complete in console

JAVASCRIPT

Run the app:

node app.js

Code Explanation

This code snippet example demonstrates how to use the IronPDF library in Node.js to extract text and images (JPG format) from a PDF document.

License Setup: The IronPdfGlobalConfig is used to set the license key for IronPDF, which is required to use the library's features.
PDF Loading: The code loads a PDF document (ironPDF.pdf) using the PdfDocument.fromFile() method. This allows the program to work with the contents of the PDF.
Text Extraction: The extractText() method is used to extract all the text from the loaded PDF. This text can be used for tasks like indexing or searching through the document.
Image Extraction: The extractRawImages() method is used to extract raw images from the PDF. These images are returned as a buffer, which can be saved or processed further.
Saving Images: The extracted images are saved to the local file system as JPG files using Node's fs.writeFileSync() method.
Final Output: After the extraction is complete, the program prints out the extracted text and the number of images extracted, followed by saving the first image.

The code demonstrates how to interact with PDF files using IronPDF to extract content and process it within a Node.js environment.

Output

How to Extract Images From PDF in Node.js: Figure 2 - Console Output

How to Extract Images From PDF in Node.js: Figure 3 - Image Output

License (Trial Available)

IronPDF Node.js requires a license key to work. Developers can get a trial license using their email ID from the license page. Once you provide the email ID, the key will be delivered to the email and can be used in the application as below.

const { IronPdfGlobalConfig} = require('@ironsoftware/ironpdf')
// Apply your IronPDF license key
IronPdfGlobalConfig.getConfig().licenseKey = "Your license key";

const { IronPdfGlobalConfig} = require('@ironsoftware/ironpdf')
// Apply your IronPDF license key
IronPdfGlobalConfig.getConfig().licenseKey = "Your license key";

JAVASCRIPT

Conclusion

Using IronPDFin Node.js for extracting images from PDFs provides a robust and efficient way to handle PDF content. While IronPDF does not offer direct image extraction like some specialized tools, it allows you to render PDF pages as images, which is useful for creating visual representations of the document.

The library’s ability to extract both text and images from PDFs in a straightforward manner makes it a valuable tool for applications that need to process and manipulate PDF content. Its integration with Node.js allows developers to easily incorporate PDF extraction into web or server-side applications.

Overall, IronPDF is a powerful solution for PDF manipulation, offering flexibility to convert, save, and extract images from PDFs, making it suitable for a wide range of use cases such as document indexing, preview generation, and content extraction. However, if your focus is solely on extracting embedded images from PDFs, exploring additional libraries might provide more specialized solutions.

Kannapat Udonpant

Chat with engineering team now

Software Engineer

Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.

NEXT >
How to Edit A PDF File in Node.js