Sentiment analysis of sentences in python - Baidu Intelligent Cloud API - language processing technology

background

Emotional analysis is used in my sister's graduation thesis. He has used crawler software to crawl various microblog accounts, and the text and comments on a topic are stored in excel.
Intends to use Baidu intelligent cloud, Natural Language Processing - > sentiment analysis.

realization

Get access_token

API official documentation

To use these two parameters

client_id: required parameter, API Key applied;
client_secret: required parameter, the applied Secret Key;

The specific location of these two parameters:

First log in to the official website with baidu account: https://login.bce.baidu.com/?account=
Find your product: natural language processing
In the application, create a new application and enter a name and description

You can see that we need to copy the two parameters API Key and Secret Key, and replace the following code.

def get_token():
    API_Key = 'Copy and paste as above'
    Scret_Key = 'Copy and paste as above'
    url3 = 'https://aip.baidubce.com/oauth/2.0/token'
    data3 = {
        "grant_type":"client_credentials",     # Fixed value
        "client_id":API_KEY2,            #
        "client_secret":s_key2
    }
    resp3 = requests.post(url3,data=data3)
    print("access_token:" ,resp3.json()['access_token'])

Request emotion analysis interface

import requests
import json,time
import re,os

# Remove the HTML code and emoticons in the text through the re module 
def re_delete(content):
    dr = re.compile(r'<[^>]+>', re.S)
    emoji = re.compile("["u"\U0001F600-\U0001F64F"u"\U0001F300-\U0001F5FF"
                               u""u"\U0001F680-\U0001F6FF"u"\U0001F1E0-\U0001F1FF""]+", flags=re.UNICODE)
    content = dr.sub('', content)
    content = emoji.sub('', content)
    return content

# Request Baidu interface, return positive probability, confidence, negative probability, classification result 0 negative, 1 neutral, 2 positive
def fenxi(tex = "I love my country"):
    # headers = {'Content-Type': 'application/json'}        # You can ask for it without adding it
    tex = re_delete(tex)									# If your text does not have HTML or emoticons, you can not use this
    if not tex:
        return []
    access_token = '24.37f133deb9fefa877cf39583244079f8.2592000.1640158969.282335-25209811'
    url = f'https://aip.baidubce.com/rpc/2.0/nlp/v1/sentiment_classify?access_token={access_token}&charset=UTF-8'
    if len(tex.encode()) < 2048:							# The length of text bytes written in the document is up to 2048 bytes, which is almost 680 Chinese characters
        body = {'text' : tex}
        # Convert python dictionary type to json type
        body = json.dumps(body)
        # print(body)
        resp1 = requests.post(url=url,data=body)
        try:
            items = resp1.json()['items'][0]
            # round(x,3) means to keep the floating point number x to three decimal places
            return [round(items['positive_prob'],3), round(items['confidence'],3),
                    round(items['negative_prob'],3), items['sentiment']]
        except:
            print("The request is incorrect\n",resp1.text)
            # The printed error code can be compared with the official document
            return ["The request is incorrect"]
    else:
        print("Byte encoding length exceeds 2048\n",tex)
        # Because my text is generally not so long, I won't continue to analyze it. You can split the text here and call the interface
        return ["The length is too long"]
        
if __name__ == '__main__':
    # get_token()
    result = fenxi("I love my country")
    print(result)

Official documents: https://cloud.baidu.com/apiexplorer/index.html?Product=GWSE -p64nCQphmTY&Api=GWAI-7WcMrFnWb8M
You can view the parameter description and online call attempt at the above website, which is more detailed than that written in the previous document.

Read text operation

The request interface is successful, and then the difference code reads the text in batch to make a request
Everyone's text is different. Maybe csv,excel,txt and so on will not be written in detail here
A reminder is that Baidu has a QPS limit, that is, it can request several times a second at most. If there are many requests, it will return an error
When you request, just add time.sleep(0.5). The number of 0.5 is 1/qQPS
Of course, you can also spend money. There will be a lot of requests. You can find more Baidu accounts and get more access_ Just a token
pycharm can run multiple py codes at the same time (there are still restrictions on applying for access_token s of different applications for the same account)

Thank you for reading. If you think it's useful, you might as well give a compliment before you go

Posted by Dollar on Mon, 22 Nov 2021 15:51:53 -0800

Programmer Group