This article is for reference only. It is prohibited to use it for any form of commercial use. Violators shall bear their own responsibilities.
preparation:
(1) , mobile phones (both Android and ios) / Android simulators. Today, Android simulators are mainly used, and the operation process is the same.
(2) . packet capture tool: Fiddel download address:( https://www.telerik.com/download/fiddler )
(3) Programming tool: pycharm
(4) . installation on Android simulator (Xiaoyao installation simulator)
1, fiddler configuration
In options in tools, click Actions after checking as shown in the figure, and select Trust Root Certificate.
Configure remote links:
Select allow remote link monitoring. The port can be set at will, as long as it is not repeated. The default is 8888
Then: restart fiddler!!! This configuration takes effect.
2, Android simulator / mobile phone configuration
First, check the local IP: enter ipconfig in cmd and remember this IP
Make sure the mobile phone and the computer are on the same LAN.
Mobile phone configuration: configure the connected WiFi, select Manual for the proxy, and then enter the ip port number in the figure above as 8888
Simulator configuration: in the setting, long press the connected wifi, select Manual for the agent, and then enter the ip port number in the figure above as 8888
After the proxy is set, enter the ip port you set in the browser, such as 10.10.16.194:8888, and the fiddler page will open. Then click fiddleroot certificate to install the certificate, otherwise the mobile phone will think that the environment is unsafe.
Set the certificate name casually, and you may also need to set a lock screen password.
Next, you can catch the package of mobile phone / simulator software in fiddler.
3, Grab bag
Open the app and observe all the packages in fiddler
There is a package. The package type is json (json is the data returned by the web page, specifically Baidu). The host address is shown in the figure. The package size is generally not small. This is the video package.
Click the json package, click decode on the right side of fidder, and we will decode the json of the video package
After decoding: Click aweme_list, in which each brace represents a video. Load it one point at a time. When you finish reading the preloaded, reload some.
Jason is a dictionary. Our video link is at: aweme_list, the play under the video of each video_ url under addr_ In the list, there are 6 URLs, which are exactly the same video. It may be to cope with different environments, but generally the video with the third or fourth link is not easy to have problems. Copy the link and paste it in the browser to see the video.
Next, solve a few problems,
1. There are only a few videos in each package, so how to capture more?
At this time, you need to turn the page with the help of the simulation mouse of the simulator to make the simulator turn the page all the time, so that json packages will continue to appear.
2. How to save json for local use
One way is to manually copy and paste, but it's very low.
So we use fidder's own script to add rules in it, and automatically save the json package when the video json package is brushed out.
Custom rule package:
Link: https://pan.baidu.com/s/1wmtUUMChzuSDZFYGSyUhCg
Extraction code: 7z0l
if (oSession.uriContains("https://api-eagle.amemv.com/aweme/v1/feed/")){ var strBody=oSession.GetResponseBodyAsString(); var sps = oSession.PathAndQuery.slice(-58,); //FiddlerObject.alert(sps) var filename = "C:/Users/HEXU/Desktop/Data crawling/Crawling data/raw_data" + "/" + sps + ".json"; var curDate = new Date(); var sw : System.IO.StreamWriter; if (System.IO.File.Exists(filename)){ sw = System.IO.File.AppendText(filename); sw.Write(strBody); } else{ sw = System.IO.File.CreateText(filename); sw.Write(strBody); } sw.Close(); sw.Dispose(); }
Click rule script, and then place the custom rule in the position shown in the figure:
This script has two points to modify:
(1) URL of the first line:
This is extracted from the url of the video package. A sound will update this url from time to time, so if it can't be used, it should also be updated:
For example, the current one is different from yesterday. Remember to modify it.
(2) Path, which is the address where I set the json package to save. I must modify it myself and create a folder. Remember to save after modification.
After opening and setting the simulator and script, wait for a while, and you can see the package saved in the folder:
4, Crawler scriptNext, write a script in pycharm to get the video link in the json package:
Guide Package:
import os,json,requests
Camouflage head: headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'}
Logic code:
effect:
Source code:import os,json,requests #Camouflage head headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'} videos_list = os.listdir('C:/Users/HEXU/Desktop/Data crawling/Crawling data/raw_data/') #Get all json package names in the folder count = 1 #Count, used as video name for videos in videos_list: #Loop through the json list and operate on each json package a = open('./Crawling data/raw_data/{}'.format(videos),encoding='utf-8') #Open json package content = json.load(a)['aweme_list'] #Remove all videos from the json package for video in content: #Cycle through the video list and select each video video_url = video['video']['play_addr']['url_list'][4] #Get the video url. Each video has 6 URLs, the fifth one I choose videoMp4 = requests.request('get',video_url,headers=headers).content #Get video binaries with open('./Crawling data/VIDEO/{}.mp4'.format(count),'wb') as f: #Write the path in binary mode. Remember to create the path first f.write(videoMp4) #write in print('video{}Download complete'.format(count)) #Download tips count += 1 #Count + 1