Converting word document to pdf and pdf file to jpg in Ubuntu environment

Keywords: Python pip

Environment building

Using the language Python 3

Install imagemagick(pdf to jpg is internal need to call this tool)

    apt-get install imagemagick

Install libreoffice (this tool is used to convert word documents into pdf files)

    apt-get install libreoffice

Install python wand, PIL Library

    pip install wand

    pip install PIL

  

PDF to JPG

Transfer to png and jpg is to avoid black, transparent background, resulting in the conversion of pictures and pdf file display is different

 1 from PIL import Image as Image2
 2 from wand.image import Image
 3 from wand.color import Color
 4 
 5 def convert_pdf_to_jpg(filename):
 6     end_length = len(filename.split('.')[-1]) + 1
 7     title = filename[0:-end_length]
 8     title = title.split('/')[-1]
 9 
10     #resolution For resolution, background Background color
11     with Image(filename=filename, resolution=150, background=Color('White')) as img :
12 
13         #The number of pages
14         length = len(img.sequence)
15 
16         #If the number of pages exceeds 1 page, the number of pages will be added to the generated file name in turn.
17         with img.convert('png') as converted:
18             path = 'static/local_images/%s.png' % title
19             converted.save(filename=path)
20     image_list = []
21     if length == 1:
22         path = 'static/local_images/%s.png' % title
23         image_list.append(path)
24     else:
25         for i in range(0, length):
27             path = 'static/local_images/%s-%d.png' % (title, i)
28             image_list.append(path)
29     jpg_list = []
30     for img in image_list:
31         image = Image2.open(img)
32         x,y = image.size
33         background = Image2.new('RGBA', image.size, (255,255,255))
34 
35         try:
36             background.paste(image, (0, 0, x, y), image)
37             image = background.convert('RGB')
38         except:
39             image = image.convert('RGBA')
40             background.paste(image, (0, 0, x, y), image)
41             image = background.convert('RGB')
42 
43 
44         title = img.split('.')[0]
45         name = title + '.jpg'
46         image.save(name)
47         os.remove(img)
48         name = "%s/%s" %(static_host, name)
49         jpg_list.append(name)
50 
51     return jpg_list

 

word document to PDF

python does not directly convert word into a library of pdf documents, it can only install libreoffice tools first, and then use os library system to call libreoffice tools

 1 import os
 2 
 3 def convert_doc_to_pdf(filename):
 4     end_length = len(filename.split('.')[-1]) + 1
 5     name = filename[0:-end_length]
 6 
 7     cmd = 'libreoffice  --convert-to pdf  %s' % filename
 8     os.system(cmd)
 9     name = name.split('/')[-1] + '.pdf'
10     return name

Posted by TheLostGuru on Tue, 05 Feb 2019 23:27:16 -0800