Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Extracting images from PDFs is a common task for many developers, whether for file processing, data extraction, or creating document previews. In this article, we'll explore how to extract and save images from a PDF usingIronPDF, a powerful PDF library available for .NET, and how it can be integrated into a Node.js environment via its NPM package.
If you haven't installed Node.js yet, download and install it from https://nodejs.org/.
The IronPDF NPM package is a Node.js wrapper for the IronPDF library, originally designed for .NET environments. It allows developers to harness the powerful PDF manipulation capabilities of IronPDF in Node.js applications. This package is particularly useful for working with PDF documents, offering a range of features that can be useful in many real-world applications such as file processing, report generation, and more.
PDF Creation:
IronPDF can create PDFs from various sources, including HTML content, images, or even raw text. This feature is highly useful for web applications that need to generate reports, invoices, or any other document in PDF format.
IronPDF supports styling and formatting HTML content, making it a great choice for converting web pages into well-structured PDF documents.
PDF Editing:
IronPDF allows you to manipulate existing PDFs by adding text, images, or annotations, and modifying the layout. You can also merge multiple PDFs into one, split a large document into smaller parts, or even reorder pages within a PDF.
These features make it ideal for applications that need to dynamically modify PDFs, such as document management systems or applications that require automated document generation.
PDF Conversion:
One of the standout features of IronPDF is its ability to convert PDFs into various other formats. For example, it can convert PDF documents to images (PNG, JPEG), HTML, and Word formats.
This feature is particularly useful when you need to present a PDF's content in different formats or create image previews of PDFs for user interfaces.
Extracting Text and Images:
While IronPDF does not have a direct REST API to extract raw images from a PDF, it does provide a method for rendering PDF pages as images (such as PNG or JPEG), which can be used as an indirect way of extracting content.
You can render each page of the PDF into an image, effectively capturing the visual representation of the document, and saving it for further use or display.
Rendering Pages as Images:
IronPDF can convert PDF pages into high-quality images. For example, you can convert a multipage PDF into a series of PNGs, one for each page. This is particularly useful when you need to display the pages as thumbnails or in an image-based format. It supports various image format types.
Security and Encryption:
IronPDF supports working with encrypted PDFs. It allows you to open, decrypt, and manipulate secured documents, which is essential for working with documents that require passwords or other forms of protection.
Cross-Platform Compatibility:
IronPDF is compatible with both Windows and Linux environments, making it a versatile tool for server-side applications. The Node.js wrapper simplifies the process of integrating IronPDF into Node.js-based applications.
To start with, set up the Node.js project folder by creating a folder on the local machine and opening Visual Studio Code.
mkdir PdfImageExtractor
cd PdfImageExtractor
code .
Install the IronPDF Node.js package and its supporting package based on Windows or Linux machines
npm install @ironsoftware/ironpdf
npm install @ironsoftware/ironpdf-engine-windows-x64
The package @ironsoftware/ironpdf-engine-windows-x64
is a platform-specific version of the IronPDF library, specifically designed for Windows 64-bit systems.
The IronPDF library has platform-specific dependencies. For Node.js to work efficiently with IronPDF, it requires native binaries that are tailored for specific operating systems and architectures. In this case, the @ironsoftware/ironpdf-engine-windows-x64 package provides the native engine for Windows 64-bit environments.
By using this Windows-specific package, you ensure that the IronPDF library works optimally on Windows-based systems. It ensures that all the native dependencies, such as those related to PDF rendering and manipulation, are compatible and function smoothly on your machine.
Instead of manually managing and configuring the required binaries for Windows 64-bit systems, installing the @ironsoftware/ironpdf-engine-windows-x64 package automates this process. This saves time and eliminates potential compatibility issues.
IronPDF also supports other platforms like macOS and Linux. Providing platform-specific packages, allows developers to use the right binary for their operating system, improving the overall stability and reliability of the library.
If you're using certain IronPDF features (like rendering PDFs to images or performing complex document manipulations), the native engine is required. The @ironsoftware/ironpdf-engine-windows-x64 package includes this engine specifically for Windows-based environments.
Now get the PDF file that needs extraction. Copy the path to be used in the application. This article uses the following file.
Now use the file in the above step and write the below code snippet in an app.js file in the Node.js project folder.
const fs = require('fs');
const { IronPdfGlobalConfig, PdfDocument } = require('@ironsoftware/ironpdf')
// Apply your IronPDF license key
IronPdfGlobalConfig.getConfig().licenseKey = "Your license key";
(async () => {
// Extracting Image and Text content from Pdf Documents
// Import existing PDF document
const pdf = await PdfDocument.fromFile("ironPDF.pdf");
// Get all text to put in a search index and log it
const text = await pdf.extractText();
console.log('All Text:'+text);
// Get all Images
const imagesBuffer = await pdf.extractRawImages();
console.log('images count:'+imagesBuffer.length);
fs.writeFileSync("./file1.jpg", imagesBuffer[0]);
// this code can also be in rest api
})();
var msg = 'Complete!';
console.log(msg); //log complete in console
Run the app:
node app.js
This code snippet example demonstrates how to use the IronPDF library in Node.js to extract text and images (JPG format) from a PDF document.
The code demonstrates how to interact with PDF files using IronPDF to extract content and process it within a Node.js environment.
IronPDF Node.js requires a license key to work. Developers can get a trial license using their email ID from the license page. Once you provide the email ID, the key will be delivered to the email and can be used in the application as below.
const { IronPdfGlobalConfig} = require('@ironsoftware/ironpdf')
// Apply your IronPDF license key
IronPdfGlobalConfig.getConfig().licenseKey = "Your license key";
Using IronPDFin Node.js for extracting images from PDFs provides a robust and efficient way to handle PDF content. While IronPDF does not offer direct image extraction like some specialized tools, it allows you to render PDF pages as images, which is useful for creating visual representations of the document.
The library’s ability to extract both text and images from PDFs in a straightforward manner makes it a valuable tool for applications that need to process and manipulate PDF content. Its integration with Node.js allows developers to easily incorporate PDF extraction into web or server-side applications.
Overall, IronPDF is a powerful solution for PDF manipulation, offering flexibility to convert, save, and extract images from PDFs, making it suitable for a wide range of use cases such as document indexing, preview generation, and content extraction. However, if your focus is solely on extracting embedded images from PDFs, exploring additional libraries might provide more specialized solutions.
10 .NET API products for your office documents