Use spire.doc to export the standard format word that supports editing Latex formula

Keywords: JavaSE Latex

background

Some teaching assistant labeling requirements in the past. When exporting the question bank, you want to export word to view the finished product effect offline, because it is only used for preview. In order to follow the front-end style, the scheme was to directly generate html and write a word file header, so that you can open and view it in word. The file header is as follows:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
</html>

Just fill in the content in html directly. Once the suffix is changed, word (pseudo) is generated. However, such word has a fatal defect (the client machine must be connected to the Internet, otherwise the pictures in wor cannot be loaded). With the development of business, the exported word client wants to import into the system again, and this html format cannot be recognized normally. It is necessary to export word in standard format, and incidentally, I want the latex formula in the annotation to be editable in word

Common word export schemes

  • Apache POI
  • FreeMark template engine generates xml format documents
  • Aspose word (commercial payment)
  • Spire.Doc (both commercial and free)

Aspese word does not support latex formula; Using Apache POI is basically to convert latex into MathML and then write it into word. You need to use two jar packages fmath. You can find some on the Internet. If you can't find a regular one, pass first; Fmath is also used in FreeMark writing. You need to convert latex into MathML format and write it into the xml template of word. However, the engine called by secondary processing does not support word in xml encoded format, so this scheme also uses pass

Use Spire.Doc to export word that supports editing latex formulas

rely on

<dependency>
    <groupId>e-iceblue</groupId>
    <artifactId>spire.doc.free</artifactId>
    <version>3.9.0</version>
</dependency>

Create Document object and Section

Document document = new Document();
Section section = document.addSection();

Create a paragraph and set the front and back margins

Paragraph paragraph = section.addParagraph();
paragraph.getFormat().setLineSpacing(15);
paragraph.getFormat().setBeforeSpacing(20);

Write text and set Chinese and English fonts

TextRange textRange = paragraph.appendText(text);
textRange.getCharacterFormat().setFontNameFarEast("Song typeface");
textRange.getCharacterFormat().setFontNameNonFarEast("Times New Roman");

Write latex formula

OfficeMath math = new OfficeMath(paragraph.getDocument());
paragraph.getItems().add(math);
math.fromLatexMathCode(latexFormat(innerPojo.latex));

/**
 * Here, spire.doc has some defects. It does not support some symbols very well, which are greater than or equal to or less than or equal to. Here, it is replaced. Continuous Chinese has also made \ mbox{} packages. This is based on latex experience, but it is of little use in practice. Spire.doc does not support latex formula rendering with Chinese, which may be too low, So if you can't render normally, you can display the picture directly
 */
private String latexFormat(String latex) {
    if (latex.contains("leqslant")) {
        latex = latex.replace("leqslant", "leq");
    }
    if (latex.contains("geqslant")) {
        latex = latex.replace("geqslant", "geq");
    }
    StringBuilder latexBuilder = new StringBuilder();
    boolean isChinese = false;
    String regexStr = "[\u4E00-\u9FA5]";
    for (Character c : latex.toCharArray()) {
        Matcher chineseMatch = Pattern.compile(regexStr).matcher(c.toString());
        if (chineseMatch.find()) {
            if (isChinese) {
                latexBuilder.append(c);
            } else {
                latexBuilder.append("\\mbox{").append(c);
                isChinese = true;
            }
            continue;
        } else {
            if (isChinese) {
                isChinese = false;
                latexBuilder.append("}");
            }
            latexBuilder.append(c);
        }
    }
    return latexBuilder.toString();
}

Draw table
The api in spire.doc cannot directly add a table to a paragraph. You need to add a text box first, and then add a table inside the text box. Here, draw the two-dimensional array of the table in advance

TextBox textBox = paragraph.appendTextBox(500, 20 * innerPojo.rows);
textBox.getFormat().setHorizontalAlignment(ShapeHorizontalAlignment.Inside);
textBox.getFormat().setNoLine(true);
Table table = textBox.getBody().addTable(true);
table.resetCells(innerPojo.rows, innerPojo.lines);
for (int i = 0; i < innerPojo.rowLines.size(); i++) {
    List<String> rowLine = innerPojo.rowLines.get(i);
    for (int j = 0; j < rowLine.size(); j++) {
        appendWithFont(rowLine.get(j), table.get(i, j).addParagraph());
    }
}
// Set text box style embedded
textBox.setTextWrappingStyle(TextWrappingStyle.Inline);

After searching the api of this version, I didn't find a way to make the text box highly adaptive. Maybe the charged version will be much better

Insert picture
Control the width of the picture to no more than 500 and the height to no more than 300

DocPicture picture = paragraph.appendPicture(innerPojo.getImage());
log.info("pictureSize,Width:{},Height:{}", picture.getWidth(), picture.getHeight());
if (picture.getWidth() > 500) {
    BigDecimal rate = BigDecimal.valueOf(500).divide(BigDecimal.valueOf(picture.getWidth()), 8, BigDecimal.ROUND_DOWN);
    picture.setHeight(picture.getHeight() * rate.floatValue());
    picture.setWidth(500);
} else if (picture.getHeight() > 300) {
    BigDecimal rate = BigDecimal.valueOf(300).divide(BigDecimal.valueOf(picture.getHeight()), 8, BigDecimal.ROUND_DOWN);
    picture.setWidth(picture.getWidth() * rate.floatValue());
    picture.setHeight(300);
}

The export effect is as follows:

Later, we'll have time to experiment with the derivation of poi formula

Reference connection

https://www.e-iceblue.cn/spiredocforjavatext/set-character-format-in-word-in-java.html

Posted by pup200 on Sat, 30 Oct 2021 07:58:39 -0700