This article will introduce the method of reading text and image in PDF document through Java program. The methods extractText() and extractImages() are called to read.
Use tools: Free Spire.PDF for Java (free version)
Jar file acquisition import:
Method 1: Through the official website Download jar File package. After downloading, decompress the file and import the Spire.Pdf.jar file under the lib folder into the java program. After importing, the following figure is shown:
Method 2: Through maven Warehouse Installation Import, Reference Import method.
Java code example
[Example 1] Read text in PDF
import com.spire.pdf.*; import java.io.FileWriter; import java.io.IOException; public class ExtractText { public static void main(String[]args) throws Exception { //Loading test documents PdfDocument pdf = new PdfDocument("sample.pdf"); //Instantiate StringBuilder class StringBuilder sb = new StringBuilder(); //Define an int variable int index = 0; //Traveling through each page of PDF document PdfPageBase page; for (int i= 0; i<pdf.getPages().getCount();i++) { page = pdf.getPages().get(i); //Call the extractText() method to extract text sb.append(page.extractText(true)); FileWriter writer; try { //Write text from StringBuilder object to txt writer = new FileWriter("ExtractText.txt"); writer.write(sb.toString()); writer.flush(); } catch (IOException e) { e.printStackTrace(); } } pdf.close(); } }
Text reading results:
[Example 2] Read pictures in PDF
import com.spire.pdf.*; import javax.imageio.ImageIO; import java.awt.image.BufferedImage; import java.io.File; public class ExtractImg { public static void main(String[] args) throws Exception{ //Loading test documents PdfDocument pdf = new PdfDocument(); pdf.loadFromFile("test.pdf"); //Define an int variable int index = 0; //Traverse PDF pages for (int i= 0;i< pdf.getPages().getCount(); i ++){ //Get the PDF page PdfPageBase page = pdf.getPages().get(i); //Use the extractImages method to get pictures on the page for (BufferedImage image : page.extractImages()) { //Specify the name of the output picture File output = new File( String.format("Image_%d.png", index++)); //Save the picture as a PNG file ImageIO.write(image, "PNG", output); } } } }
Picture reading results:
(End of this article)