python batch ppt to picture, pdf to picture, word to picture script

Keywords: Python


One day, when we were in the editing background, we said that every time we upload ppt, pdf and word, we should export the pictures of each file first, and then upload them one by one (png is used as preview, ppt, pdf and word source files cannot be downloaded directly). We said that the efficiency is too low. We asked if there is any way to do it, just upload the files. At that time, I thought that it was really inefficient to transfer every upload, because some of them may have dozens of pictures.

Finally, through GitHub and netizen blog. Finally, the problem of automatic image conversion is solved. The first time you write a python script, there are errors. You are welcome to point out~

This article python version 3.9.5
windows platform is required and Microsoft Office is required

Script ideas

Operators upload ppt, pdf and word to the database, script reads files, remote connection - > download to local - > transfer pictures - > upload to cloud storage - > obtain remote picture connection - > store to the database.

The collection that needs to be converted for database connection query

def connectDatabase():
    conn = pymysql.connect(host='', user='root', password="",database ='pic',port=3306)  
# host=localhost #You can also write,If If it doesn't work#  Login database
    cur = conn.cursor(pymysql.cursors.DictCursor) 
    return {
# Get the collection of files to be transferred
def getUrlArr(cur):
    sql = 'select * from pic' # Write your own sql statements
    arr = ''
        ex = cur.execute(sql)
        arr = cur.fetchmany(ex)
    except Exception as e:
        raise e
        return arr

Download files locally

# Download files locally
def downLoad(url):
        suffix = os.path.basename(url).split('.')[1]
        filename = "miaohui."+suffix
        if os.path.exists(filename):  # If the file exists, delete the file
    except IOError:
        print('Download failed',url)
        print('Download succeeded',url)
        return filename

ppt to picture

# pip install pywin32

# Initialize PPT
def init_powerpoint():
    powerpoint = win32com.client.Dispatch('PowerPoint.Application') #comtypes.client.CreateObject("Powerpoint.Application")
    powerpoint.Visible = 1
    return powerpoint
# PPT to png
def ppt2png(url,pptFileName,powerpoint):
        ppt_path = os.path.abspath(pptFileName)
        ppt = powerpoint.Presentations.Open(ppt_path)
        #Save as picture
        img_path = os.path.abspath(downLoad_path + '.png')
        ppt.SaveAs(img_path, 18) # 17 save in jpg format
        # Close the open ppt file
    except IOError:
        print('PPT turn png fail',url)
        print("PPT turn png success",url)

pdf to picture

# pip install PyMuPDF
# pdf to picture
def pdf2png(_url,pptFileName):
    imagePath = os.path.abspath(downLoad_path)
        pdfDoc =
        for pg in range(pdfDoc.pageCount):
            page = pdfDoc[pg]
            rotate = int(0)
            # The scaling factor for each size is 1.3, which will produce an image with a resolution improvement of 2.6.
            # If there is no setting here, the default picture size is 792X612, dpi=96
            zoom_x = 1.33333333  # (1.33333333-->1056x816)   (2-->1584x1224)
            zoom_y = 1.33333333
            mat = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate)
            pix = page.get_pixmap(matrix=mat, alpha=False)

            if not os.path.exists(imagePath):  # Determine whether the folder where the pictures are stored exists
                os.makedirs(imagePath)  # If the picture folder does not exist, create it
   + '/' + 'slide%s.png' % pg)  # Writes pictures to the specified folder

    except IOError:
        print('pdf turn png fail',_url)
        print("pdf turn png success",_url)

word to picture

Word to picture should be transferred once, first convert word to pdf, and then convert pdf to picture.

# word to Pdf
def word2pdf(word_file):
    take word Convert file to pdf file
    :param word_file: word file
    # Get word format processing object
    word = Dispatch('Word.Application')
    # Open file as Doc object
    doc_ = word.Documents.Open(word_file)
    # Save as pdf file
    suffix = os.path.basename(word_file).split('.')[1]
    doc_.SaveAs(word_file.replace(suffix, "pdf"), FileFormat=17)
    print(word_file,'----turn pdf success')
    # Close doc object
    # Exit word object
    return os.path.basename(word_file).split('.')[0]+'.pdf'

Then call pdf2png above

Upload to object store

We don't post it here. We use the OBS of Huawei cloud. Alibaba cloud, Tencent cloud and other object stores have their own Python SDK, which is also very convenient to access.

Finally, the group is called together

if __name__=='__main__':
    connect = connectDatabase()
    powerpoint = init_powerpoint()
    downArr = getUrlArr(connect['cur'])
    for i in downArr:
        _url = unquote(i['url'])
        id = i['id']
        pptFileName = downLoad(_url)#Download File
        if(('.pdf' in _url) ==True):
        elif (('.doc' in _url) ==True):
            _file = os.path.abspath(pptFileName)
            pdfNmae = word2pdf(_file)
             ppt2png(_url,pptFileName,powerpoint) #To png
        imgArr = uploadImg(_url) #Upload pictures to cloud storage to get remote links
        setData(_url,id,imgArr,connect) #Save to database
    connect['cur'].close()    #Close cursor
    connect['conn'].close()   #Disconnect the database and free up resources
    input("Enter any key to end")

Because it is used internally, you can use pyinstaller to package it into an exe for operation. After the data is uploaded and run, you can automatically transfer pictures in batches.

#py to exe
pyinstaller -c -F -i a.ico   


I hope this article will help you. If you have any questions, please correct them~

Change jobs? Looking for interview questions? Come to the front-end interview question bank wx to search the advanced front-end

Posted by shadiadi on Mon, 29 Nov 2021 14:16:34 -0800