Extract Text From PDF

As part of IronPDF's extensive collection of PDF creation and editing functions, IronPDF also facilitates granular processing of a PDF document's content through its content extraction methods.

Available on all PdfDocument objects is the extractAllText method. The String that extractAllText returns holds all the text contained on every page in the PDF.

This method is a convenient way to perform document-level extraction of text from PDFs containing many pages. To extract text on a page-level (i.e., just from a specific set of pages), use the extractTextFromPage method instead.

The brief code snippet below pulls the text from the first page of a PDF document.

PdfDocument document = PdfDocument.fromFile(Paths.get("sample.pdf"));  
String firstPageText = document.extractTextFromPage(PageSelection.firstPage());

PdfDocument document = PdfDocument.fromFile(Paths.get("sample.pdf"));  
String firstPageText = document.extractTextFromPage(PageSelection.firstPage());

JAVA