preface
One day, when we were in the editing background, we said that every time we upload ppt, pdf and word, we should export the pictures of each file first, and then upload them one by one (png is used as preview, ppt, pdf and word source files cannot be downloaded directly). We said that the efficiency is too low. We asked if there is any way to do it, just upload the files. At that time, I thought that it was really inefficient to transfer every upload, because some of them may have dozens of pictures.
Finally, through GitHub and netizen blog. Finally, the problem of automatic image conversion is solved. The first time you write a python script, there are errors. You are welcome to point out~
This article python version 3.9.5
windows platform is required and Microsoft Office is required
Script ideas
Operators upload ppt, pdf and word to the database, script reads files, remote connection - > download to local - > transfer pictures - > upload to cloud storage - > obtain remote picture connection - > store to the database.
The collection that needs to be converted for database connection query
def connectDatabase(): conn = pymysql.connect(host='127.0.0.1', user='root', password="",database ='pic',port=3306) # host=localhost #You can also write,If 127.0.0.1 If it doesn't work# Login database cur = conn.cursor(pymysql.cursors.DictCursor) return { "conn":conn, "cur":cur }
# Get the collection of files to be transferred def getUrlArr(cur): sql = 'select * from pic' # Write your own sql statements arr = '' try: cur.execute(sql) ex = cur.execute(sql) arr = cur.fetchmany(ex) except Exception as e: raise e finally: return arr
Download files locally
# Download files locally def downLoad(url): print('----url-----',url) filename='' try: suffix = os.path.basename(url).split('.')[1] filename = "miaohui."+suffix if os.path.exists(filename): # If the file exists, delete the file os.remove(filename) wget.download(url,filename) except IOError: print('Download failed',url) else: print('\n') print('Download succeeded',url) return filename
ppt to picture
# pip install pywin32 # Initialize PPT def init_powerpoint(): powerpoint = win32com.client.Dispatch('PowerPoint.Application') #comtypes.client.CreateObject("Powerpoint.Application") powerpoint.Visible = 1 return powerpoint
# PPT to png def ppt2png(url,pptFileName,powerpoint): try: ppt_path = os.path.abspath(pptFileName) ppt = powerpoint.Presentations.Open(ppt_path) #Save as picture img_path = os.path.abspath(downLoad_path + '.png') ppt.SaveAs(img_path, 18) # 17 save in jpg format # Close the open ppt file ppt.Close() except IOError: print('PPT turn png fail',url) else: print("PPT turn png success",url)
pdf to picture
# pip install PyMuPDF # pdf to picture def pdf2png(_url,pptFileName): imagePath = os.path.abspath(downLoad_path) try: pdfDoc = fitz.open(pptFileName) for pg in range(pdfDoc.pageCount): page = pdfDoc[pg] rotate = int(0) # The scaling factor for each size is 1.3, which will produce an image with a resolution improvement of 2.6. # If there is no setting here, the default picture size is 792X612, dpi=96 zoom_x = 1.33333333 # (1.33333333-->1056x816) (2-->1584x1224) zoom_y = 1.33333333 mat = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate) pix = page.get_pixmap(matrix=mat, alpha=False) if not os.path.exists(imagePath): # Determine whether the folder where the pictures are stored exists os.makedirs(imagePath) # If the picture folder does not exist, create it pix.save(imagePath + '/' + 'slide%s.png' % pg) # Writes pictures to the specified folder except IOError: print('pdf turn png fail',_url) else: print("pdf turn png success",_url)
word to picture
Word to picture should be transferred once, first convert word to pdf, and then convert pdf to picture.
# word to Pdf def word2pdf(word_file): ''' take word Convert file to pdf file :param word_file: word file :return: ''' # Get word format processing object word = Dispatch('Word.Application') # Open file as Doc object doc_ = word.Documents.Open(word_file) # Save as pdf file suffix = os.path.basename(word_file).split('.')[1] doc_.SaveAs(word_file.replace(suffix, "pdf"), FileFormat=17) print(word_file,'----turn pdf success') # Close doc object doc_.Close() # Exit word object word.Quit() return os.path.basename(word_file).split('.')[0]+'.pdf'
Then call pdf2png above
Upload to object store
We don't post it here. We use the OBS of Huawei cloud. Alibaba cloud, Tencent cloud and other object stores have their own Python SDK, which is also very convenient to access.
Finally, the group is called together
if __name__=='__main__': connect = connectDatabase() powerpoint = init_powerpoint() downArr = getUrlArr(connect['cur']) for i in downArr: if(os.path.exists('./'+downLoad_path)): removeFileInFirstDir('./'+downLoad_path) _url = unquote(i['url']) id = i['id'] pptFileName = downLoad(_url)#Download File if(('.pdf' in _url) ==True): pdf2png(_url,pptFileName) elif (('.doc' in _url) ==True): _file = os.path.abspath(pptFileName) pdfNmae = word2pdf(_file) pdf2png(_url,pdfNmae) else: ppt2png(_url,pptFileName,powerpoint) #To png imgArr = uploadImg(_url) #Upload pictures to cloud storage to get remote links setData(_url,id,imgArr,connect) #Save to database time.sleep(2) print('\n') print('\n') connect['cur'].close() #Close cursor connect['conn'].close() #Disconnect the database and free up resources powerpoint.Quit() input("Enter any key to end")
Because it is used internally, you can use pyinstaller to package it into an exe for operation. After the data is uploaded and run, you can automatically transfer pictures in batches.
#py to exe pyinstaller -c -F -i a.ico ppt_to_img.py
last
I hope this article will help you. If you have any questions, please correct them~
Change jobs? Looking for interview questions? Come to the front-end interview question bank wx to search the advanced front-end