Tiktok: last time, I wrote a Kwai live broadcast comment with selenium, and this new live up, sent post requests with python to get the quick video reviews.
1. first, open the webpage version Kwai.
Press F12 in the web page to open the developer mode, click the web and view Fetch/XHR. Look at the request inside.
A request called graphql was found. This is the request for comment. We click in and view the preview. You can see the following effects.
The returned is a json data, which is easy to do. Now we just need to imitate the browser to send the request to the Kwai server.
2. send request to Kwai server
First, let's take a look at what the request is, whether it is a get request or a post request. (usually post) click the header. View request.
We can see that this is a post request, and the request url is https://www.kuaishou.com/graphql , sending a post request takes some form data.
See what data the form data needs to transfer
You can see that the data in the request load is the data we want to transmit. The request load is different from that of form data. Form data is a dictionary transmitted, and Request Payload is json data. This is different. If you still pass in a dict data at this time, the data will not be returned. json data must be passed in. Then I can send the request in python now. Everything is going well here!!
3. Send the request in python
import requests import json headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36 Edg/94.0.992.31', 'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_1e83306028e4691a0acb7f78ecfbf145; didv=1632459485000; client_key=65890b29; Hm_lvt_86a27b7db2c5c0ae37fee4a8a35033ee=1632459683; userId=153866369; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABSkrOjzdufZa4lBYXxddWpU-Da_CebI6GVHm2vvSBBxRUv9kJ_BKaJ3weX3FWf3acdLw2yy6uCM1MpHB8Pfi7EnJBlcRb0GqyXYMpPlaCdMqKlVo0PI-iY9zN0wdxA89wOXtml-fWR7CFTT54hsd3PZTsTwlWBA7vhJvwim07-A1RaTmwi66PbBkj7eCrkmJX0hdCln9MiFVyA_CL44TScBoSuDcrlwmr6APhXfdZrBO5uo0FIiA8xE-BzwPs3Wp_Q9mI4y5GcZxo1E-B0xr4CkR4zohqJigFMAE; kuaishou.server.web_ph=eeb2272fa742f8d8f79bd635c3e8044ac1f8'} data = { "operationName": "commentListQuery", "variables": { "photoId": "3xhcrvum26yz3vg", "pcursor": '' }, "query": "query commentListQuery($photoId: String, $pcursor: String) {\n visionCommentList(photoId: $photoId, pcursor: $pcursor) {\n commentCount\n pcursor\n rootComments {\n commentId\n authorId\n authorName\n content\n headurl\n timestamp\n likedCount\n realLikedCount\n liked\n status\n subCommentCount\n subCommentsPcursor\n subComments {\n commentId\n authorId\n authorName\n content\n headurl\n timestamp\n likedCount\n realLikedCount\n liked\n status\n replyToUserName\n replyTo\n __typename\n }\n __typename\n }\n __typename\n }\n}\n" } conment = requests.post('https://www.kuaishou.com/graphql', headers=headers, json=data) conments = json.loads(conment.text) print(conments)
At this point, the json data is returned successfully.
However, there seems to be a problem. There are only 28 items. The returned data is incomplete!, Looks like there's a problem.
It doesn't matter. I ran it again several times, but I still can't. I went back to the Kwai Fu page. Open developer mode again. Start looking for problems. Suddenly I found that there were many requests called graphql. At this point, I wonder if this is dynamically loaded. Only when you slide down the comment can it be loaded. Comparing the request data of several graphqls, it is found that only the data in photoId and pcursor are different. After my careful search, I found that the photoId is the id of the video, which is found on the web address. Different video photold is different. OK, photold solved it. What about pucursor? What the hell is this? I looked at the data transmitted by several graphqls, except that the cursor of the first graphql is empty. The rest are all numbers. Then I printed the data that returned the incomplete json. Discovered an amazing secret. The data in the cursor is the id of the last person in the data returned by the last graphql. Decrypted!!!, It is dynamically rendered, and the pucursor receives the id of the last commenter to send the next request to request data.
4. Finally, request complete data
First write the main program, and finally solve everything by recursion. The comments are in hand smoothly!!!
import requests import json import sys lists = [] def text(w): ww =0 list = [] headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36 Edg/94.0.992.31', 'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_1e83306028e4691a0acb7f78ecfbf145; didv=1632459485000; client_key=65890b29; Hm_lvt_86a27b7db2c5c0ae37fee4a8a35033ee=1632459683; userId=153866369; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABSkrOjzdufZa4lBYXxddWpU-Da_CebI6GVHm2vvSBBxRUv9kJ_BKaJ3weX3FWf3acdLw2yy6uCM1MpHB8Pfi7EnJBlcRb0GqyXYMpPlaCdMqKlVo0PI-iY9zN0wdxA89wOXtml-fWR7CFTT54hsd3PZTsTwlWBA7vhJvwim07-A1RaTmwi66PbBkj7eCrkmJX0hdCln9MiFVyA_CL44TScBoSuDcrlwmr6APhXfdZrBO5uo0FIiA8xE-BzwPs3Wp_Q9mI4y5GcZxo1E-B0xr4CkR4zohqJigFMAE; kuaishou.server.web_ph=eeb2272fa742f8d8f79bd635c3e8044ac1f8'} data = { "operationName": "commentListQuery", "variables": { "photoId": "3xhcrvum26yz3vg", "pcursor": str(w) }, "query": "query commentListQuery($photoId: String, $pcursor: String) {\n visionCommentList(photoId: $photoId, pcursor: $pcursor) {\n commentCount\n pcursor\n rootComments {\n commentId\n authorId\n authorName\n content\n headurl\n timestamp\n likedCount\n realLikedCount\n liked\n status\n subCommentCount\n subCommentsPcursor\n subComments {\n commentId\n authorId\n authorName\n content\n headurl\n timestamp\n likedCount\n realLikedCount\n liked\n status\n replyToUserName\n replyTo\n __typename\n }\n __typename\n }\n __typename\n }\n}\n" } conment = requests.post('https://www.kuaishou.com/graphql', headers=headers, json=data) conments = json.loads(conment.text) #print(conments['data']) s = len(conments['data']['visionCommentList']['rootComments']) - 1 #print(conments['data']['visionCommentList']['rootComments'][s]['commentId']) for ii in range(0,s): #print(len(conments['data']['visionCommentList']['rootComments'])) #print(conments['data']['visionCommentList']['rootComments'][s]) print(conments['data']['visionCommentList']['rootComments'][ii]['content']) www = conments['data']['visionCommentList']['rootComments'][ii]['content'] w = conments['data']['visionCommentList']['rootComments'][ii]['commentId'] if len(lists)!=-1 : lists.append(www) print('111111') if len(lists)!=1: a = lists[-2] if lists[-1] == a and len(lists) != 1: print("The comments have finished climbing") sys.exit() for a in range(0,len(list)): ww +=list[a] print(ww) return w w = '' while w!=1: w = text(w)
Brothers, today is here, the rest depends on your own play, (secretly told you, oh, this can not only be fast Kwai, oh, many websites can do this). But still abide by laws and regulations. The next time I teach my brothers how to send messages in python, it's also a post request!