linux and windows install openOffice to convert excel and doc files to pdf or html

Keywords: cmake RPM Windows Excel

1. Preparations (downloading software, etc.)

1. openoffice Download Address http://www.openoffice.org/zh-cn/download/ Download the version you need, windows or linux, etc.

2. In addition to openoffice, we also need to use pdf2htmlEX software; download address https://github.com/coolwanglu/pdf2htmlEX/ https://github.com/coolwanglu/pdf2htmlEX/wiki/Building Here are specific installation steps. The author takes Fedora as an example, and CentOS is similar. Most of the dependencies can be installed through yum without using source code one by one. pdf2htmlEX needs to rely on 2.0. Download fontforge: https://github.com/coolwanglu/fontforge/tree/pdf2htmlEX Look at the INSTALL-Git.md file, where fontforge needs to be installed manually, because the current version of fontforge in the yum source is less than 2.0, and follow the instructions in turn: (windows can ignore this step, download a version of windows directly, I share the download link will also have pdf2html EX windows version)

Two, installation

1. Install openoffice

Unzip the downloaded tar package and go to the RPM directory. Then you will see many RPM packages and install them all directly.


	rpm -ivh *.rpm
When the installation is completed, a desktop-integration folder is generated and entered into the desktop-integration folder.

[zzq@weekend110 RPMS]$ cd desktop-integration/
Check out the following versions
[zzq@weekend110 desktop-integration]$ ll
 Total usage 2004
 - rw-rw-r--. 1 zzq zzq 469674 September 26 2016 OpenOffice 4.1.3-freedesktop-menus-4.1.3-9783.noarch.rpm
 - rw-rw-r--. 1 zzq zzq 490143 September 26 2016 OpenOffice 4.1.3-mandriva-menus-4.1.3-9783.noarch.rpm
 - rw-rw-r--. 1 zzq zzq 541506 September 26 2016 OpenOffice 4.1.3-redhat-menus-4.1.3-9783.noarch.rpm
 - rw-rw-r--. 1 zzq zzq 544148 September 26 2016 OpenOffice 4.1.3-suse-menus-4.1.3-9783.noarch.rpm

My centos is close to Redhat, so I installed the Redhat version.

[zzq@weekend110 desktop-integration]$ sudo rpm -ivh openoffice4.1.3-redhat-menus-4.1.3-9783.noarch.rpm 
[sudo] password for zzq: 
//In preparation uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
	file /usr/bin/soffice from install of openoffice4.1.3-redhat-menus-4.1.3-9783.noarch conflicts with file from package libreoffice-core-1:5.0.6.2-3.el7.x86_64
Here's how to start openOffice

[zzq@weekend110 desktop-integration]$ soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
[1] 23269
[zzq@weekend110 desktop-integration]$ Warning: -headless is deprecated.  Use --headless instead.
Warning: -accept=socket,host=127.0.0.1,port=8100;urp; is deprecated.  Use --accept=socket,host=127.0.0.1,port=8100;urp; instead.
Warning: -nofirststartwizard is deprecated.  Use --nofirststartwizard instead.

View process
[zzq@weekend110 desktop-integration]$ 
[zzq@weekend110 ~]$ ps -ef | grep 8100
zzq       23286      1  0 11:22 ?        00:00:00 /usr/lib64/libreoffice/program/soffice.bin -headless -accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard
zzq       27004  25312  0 11:31 pts/2    00:00:00 grep --color=auto 8100

A process indicates that openoffice has been started successfully (the command to detect ports on windows is netstat-ano | findstr "8100").

2. Install pdf2htmlEX

To download and decompress the compressed package from githut, fontforge-pdf2htmlEX must be installed first

[zzq@weekend110 software]$ unzip -o -d ~/app/  pdf2htmlEX-master.zip
[zzq@weekend110 software]$ unzip -o -d ~/app/ fontforge-pdf2htmlEX.zip 
[zzq@weekend110 software]$ cd ~/app
[zzq@weekend110 app]$ ll
//Total dosage 12
drwxr-xr-x. 11 zzq zzq  204 4 Month 1314:53 apache-activemq-5.11.1
drwxr-xr-x.  9 zzq zzq  160 3 Month 913:50 apache-tomcat-7.0.76
drwxr-xr-x. 10 zzq zzq  258 4 Month 1316:57 FastDFS
drwxrwxr-x. 33 zzq zzq 4096 3 Month 222014 fontforge-pdf2htmlEX
drwxr-xr-x.  8 zzq zzq  233 4 Month 112015 jdk1.7.0_80
drwxr-xr-x. 10 zzq zzq  187 4 Month 1201:19 jprofiler10.0.1
drwxrwxr-x.  4 zzq zzq  124 2 Month 272015 libfastcommon-master
drwxr-xr-x.  8 zzq zzq  113 7 Month 102015 nexus-2.11.4-01
drwxrwxr-x.  8 zzq zzq 4096 7 Month 1811:54 pdf2htmlEX-master
-rwxrw-r--.  1 zzq zzq   90 4 Month 918:24 run.sh
drwxr-xr-x.  3 zzq zzq   37 7 Month 102015 sonatype-work
drwxrwxr-x.  5 zzq zzq   49 9 Month 262016 zh-CN

Enter fontforge-pdf2htmlEX and execute in turn

./autogen.sh  
./configure
make
make install

Note that installing fontforge-related dependencies

sudo yum install libtool* -y
 ERROR: libtoolize failed

install fontforge-pdf2htmlEX
[zzq@weekend110 fontforge-pdf2htmlEX]$ sudo ./autogen.sh 
Preparing the fontforge build system...please wait

Found GNU Autoconf version 2.69
Found GNU Automake version 1.13.4
Found GNU Libtool version 2.4.2

Automatically preparing build ... done

The fontforge build system is now prepared.  To build here, run:
  ./configure
  make


[zzq@weekend110 fontforge-pdf2htmlEX]$ ./configure
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking how to create a pax tar archive... gnutar
checking whether to enable maintainer-specific portions of Makefiles... yes
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E

See that there is a mistake, because there is still something missing. glib and gio Install them and take another step
configure: error: Package requirements (glib-2.0 >= 2.6 gio-2.0) were not met:

No package 'glib-2.0' found
No package 'gio-2.0' found

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables GLIB_CFLAGS
and GLIB_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.
[zzq@weekend110 fontforge-pdf2htmlEX]$ make 
make: *** No target was specified and no target was found. makefile.  Stop it.
[zzq@weekend110 fontforge-pdf2htmlEX]$ sudo yum install glib* gio* freetype*  pango* -y


[zzq@weekend110 fontforge-pdf2htmlEX]$make && make install

Execute fontforge-pdf2htmlEX here and install successfully.


2. Install pdf2htmlEX

[zzq@weekend110 pdf2htmlEX-master]$ cmake .
-- checking for module 'poppler>=0.25.0'
--   package 'poppler>=0.25.0' not found
CMake Error at /usr/share/cmake/Modules/FindPkgConfig.cmake:279 (message):
  A required package was not found
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPkgConfig.cmake:333 (_pkg_check_modules_internal)
  CMakeLists.txt:22 (pkg_check_modules)


-- checking for module 'cairo>=1.10.0'
--   package 'cairo>=1.10.0' not found
CMake Error at /usr/share/cmake/Modules/FindPkgConfig.cmake:279 (message):
  A required package was not found
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPkgConfig.cmake:333 (_pkg_check_modules_internal)
  CMakeLists.txt:28 (pkg_check_modules)


Trying to locate cairo-svg...
CMake Error at CMakeLists.txt:47 (message):
  Error: no SVG support found in Cairo


-- Configuring incomplete, errors occurred!
See also "/home/zzq/app/pdf2htmlEX-master/CMakeFiles/CMakeOutput.log".
[zzq@weekend110 pdf2htmlEX-master]$ 


//Error reporting here, need to import environment to execute
[root@weekend110 pdf2htmlEX-master]# yum install poppler* -y 
[root@weekend110 pdf2htmlEX-master]#  export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig 

Execute cmake again. Last step

[root@weekend110 pdf2htmlEX-master]# make && make install

After installation, let's check the version with the command.

[root@weekend110 pdf2htmlEX-master]# pdf2htmlEX -v
pdf2htmlEX: error while loading shared libraries: libfontforge.so.2: cannot open shared object file: No such file or directory
//Check the pdf2htmlEX version. But there are also errors when performing conversion operations: therefore, you need to import the lib library path, which is very simple, vi/etc/profile, and add it at the end
[root@weekend110 pdf2htmlEX-master]# export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
[root@weekend110 pdf2htmlEX-master]# pdf2htmlEX -v
pdf2htmlEX version 0.14.6
Copyright 2012-2015 Lu Wang <coolwanglu@gmail.com> and other contributors
Libraries: 
  poppler 0.26.5
  libfontforge 20170718
  cairo 1.14.2
Default data-dir: /usr/local/share/pdf2htmlEX
Supported image format: png jpg svg

Running results of a jar after successful installation



This is the result of my running on windows:



This is the main function: Test.java

package util;

import java.io.File;

public class Test {
	public static void main(String[] args){
		//String a[] = {"C:\\Users\\sevnce\\Desktop\\exceljs\\test.xlsx","e:\\office\\1.pdf","pdfhtm.html"};
		System.out.println("args length : "+args.length);
		//The first of the three parameters is the excel file to be converted, the second is the pdf path and name to be converted, and the third is the html file name to be converted.
		office2pdf2html(args);
		//office2html();
	}
	
	public static void office2pdf2html(String [] args) {
		String sourceFile = args[0];//"C:\\Users\\sevnce\\Desktop\\exceljs\\test.xlsx";
		String destFile = args[1];//"e:\\office\\1.pdf";
		String htmlFile = args[2];//"pdfhtmlc.html";
		// office file to pdf
		int result = Office2PDFUtil.office2PDF(sourceFile, destFile);
		if(result == 0) {
			System.out.println("office turn PDF Success");
			// pdf to html
			if(Pdf2htmlEXUtil.pdf2html(destFile, htmlFile)) {
				System.out.println("pdf turn html Success");
			}
			else 
				System.out.println("pdf turn html fail");
			
		} else if(result == -1) {
			System.out.println("source file not found, or url.properties Configuration error");
		} else {
			System.out.println("office turn PDF fail");
		}
		//System.out.println(ClearHtml2Div.clearFormat(htmlStr, docImgPath));
	}
	
	public static void office2html() {
		System.out.println(Doc2HtmlUtil.office2HtmlString(new File("C:\\Users\\sevnce\\Desktop\\exceljs\\test.xlsx"), "e:/office/test"));
	}
}

Share the download address of information and code: http://download.csdn.net/detail/baidu_19473529/9903553



Fon fontforge-pdf2htmlEX.zip is Linux Installation pdf2htmlEX dependence; jodconverter-2.2.2.zip is the required lib package; pdf2htmlEX-master.zip is Linux pdf2htmlEX; pdf2htmlEX-win32-0.14.6-upx-online pop-with-pop-data-data-zip is windows pdf2htmllx2htmlEX program; preEXEXEX is the source code is well encapsulated excel, excel, documents, excel, documents, documents, excel, documents, excel, documents, excel, etc. zip is encapsulated in the zip file file file file, PDF and html methods.


Finally, it concludes that there are many pits in installing pdf2htmlEX on Linux, which always lack some dependencies, but careful observation of error hints can be quickly solved. I hope I can help you to open office.


Posted by ffdave77 on Wed, 12 Dec 2018 21:54:06 -0800