Hello, this is the last article in the series. We will export the text record to a well-organized word document for easy reading and sharing. The source code address thomas open source project
Overall structure
This chapter is in the third step of the overall transformation process, as shown in the following figure:
Introduction to docx document format
First of all, I will give you a general introduction to docx document format. Docx is actually a compressed format file. After manually changing the suffix to zip, you can extract the file. Usually, the main content structure is the word after decompression/ document.xml File.
For example, the following figure is the simplest word document with only "hello" in the body:
After changing the suffix of the document to. zip, unzip the document, and you will see word/document.xml The main contents are as follows
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <w:document xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex wp14"> <w:body> <w:p w14:paraId="6D5AFF05" w14:textId="678C6FAC" w:rsidR="000933A6" w:rsidRDefault="008D746B"> <w:r> <w:rPr> <w:rFonts w:hint="eastAsia"/> </w:rPr> <w:t>Hello</w:t> </w:r> </w:p> </w:body> </w:document>
From the above file, we can roughly see the basic structure of the word document:
- < W: P > is paragraph
- < W: R > is the line in the paragraph
- < W: RPR > is row style information
- < W: T > is the text content
The basic logic of docx4j library is corresponding to the above xml organization structure: convert the content of the above xml into the corresponding java objects and methods, and realize the functions of document generation and editing.
docx4j document operation
Next, based on the docx4j library, the word document operation is implemented.
First, docx4j dependency is introduced:
<groupId>org.docx4j</groupId> <artifactId>docx4j-JAXB-ReferenceImpl</artifactId> <version>8.1.6</version>
First of all, we need to record the dialogue of each video file, and generate the table of the following modes:
The processing logic of the corresponding table is:
// Create header Tbl tbl = Context.getWmlObjectFactory().createTbl(); //Set the basic style of the table, including the border, etc String strTblPr = "<w:tblPr " + Namespaces.W_NAMESPACE_DECLARATION + ">" + "<w:tblStyle w:val=\"TableGrid\"/>" + "<w:tblW w:w=\"0\" w:type=\"auto\"/>" + "<w:tblLook w:val=\"04A0\"/>" + "</w:tblPr>"; try { TblPr tblPr = (TblPr) XmlUtils.unmarshalString(strTblPr); tbl.setTblPr(tblPr); } catch (JAXBException e) { log.error("be based on XML Analytic generation TblPr error", e); } // Set header row Tr hearTr = Context.getWmlObjectFactory().createTr(); tbl.getContent().add(hearTr); geneTblHearderCell(hearTr, "D9D9D9", 2629, docPart.createParagraphOfText("time")); geneTblHearderCell(hearTr, "D9D9D9", 5667, docPart.createParagraphOfText("content")); // Set content line taskResultRepo.findByTaskIdEqualsOrderByBeginTimeAsc(taskId).forEach(result -> { Tr tr = Context.getWmlObjectFactory().createTr(); tbl.getContent().add(tr); //Create first cell Tc tc1 = Context.getWmlObjectFactory().createTc(); tc1.getContent().add(docPart.createParagraphOfText(formatSecond(result.getBeginTime()))); //Create second cell Tc tc2 = Context.getWmlObjectFactory().createTc(); tc2.getContent().add(docPart.createParagraphOfText(result.getWords())); //Add cells to the table tr.getContent().addAll(Arrays.asList(tc1, tc2)); }); //Adding a form to a document docPart.getContent().add(tbl); //Add page break docPart.getContent().add(createNextPage());
As a special reminder, it is recommended not to use it as much as possible XmlUtils.unmarshalString To generate objects, except for the above reference to the official example to create the header TblPr, all other structures of this project are built with java objects. The reason is that parsing directly based on xml is easy to cause namespace errors.
docx4j also supports inserting pictures into documents, such as:
//Write pictures to word documents Inline inline = null; try { BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wordPackage, Files.readAllBytes(Paths.get("doc\\thomas-gitee.png"))); inline = imagePart.createImageInline("Open source project address", "QR code picture", 1, 2, false); } catch (Exception e) { log.error("Exception creating picture object", e); } ObjectFactory factory = Context.getWmlObjectFactory(); P p = factory.createP(); R r = factory.createR(); p.getContent().add(r); Drawing drawing = factory.createDrawing(); r.getContent().add(drawing); drawing.getAnchorOrInline().add(inline);
The following is to set the Title and chapter, and set the content style to Title and Heading1 respectively:
//Set document title mainDocumentPart.addStyledParagraphOfText("Title", THOMAS_DOCX_NAME); //Take the first line as the chapter name mainDocumentPart.addStyledParagraphOfText("Heading1", taskInfo.getTaskName());
Generating directories is also simple:
//Generate a directory, which should be placed at the back Toc.setTocHeadingText("catalog"); TocGenerator tocGenerator = new TocGenerator(wordPackage); tocGenerator.generateToc(5, " TOC \\o \"1-3\" \\h \\z \\u ", true);
It should be noted that the first parameter in the generateToc method is the location where the directory is inserted into the document. The code above is to insert the directory into the fifth location.
After the document structure is assembled, call the save method of WordprocessingMLPackage to save the document.
last
At this point, we have finally completed the dialogue in MP4 video, and finally converted it into text, and output it as a standard format word document. If there are any mistakes or omissions in the implementation process, please give feedback, thank you.
This series uses "Thomas and friends" animation video as the material. The origin is that children especially like this animation program, especially like to listen to Thomas's story. In order to better tell the children Thomas bedtime story, these functions are realized on a whim, hoping to help you.