What is m3u8 file
M3U8 file refers to M3U file in UTF-8 encoding format.
M3U file is an index plain text file.
When it is opened, the playback software does not play it, but finds the corresponding network address of audio and video files according to its index for online playback.
The original video data is divided into many TS streams, and the address of each TS stream is recorded in the m3u8 file list.
For example, I have an m3u8 file here, which reads as follows
#EXTM3U #EXT-X-VERSION:3 #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-ALLOW-CACHE:YES #EXT-X-TARGETDURATION:15 #EXTINF:6.916667, out000.ts #EXTINF:10.416667, out001.ts #EXTINF:10.416667, out002.ts #EXTINF:1.375000, out003.ts #EXTINF:1.541667, out004.ts #EXTINF:7.666667, out005.ts #EXTINF:10.416667,
How to deal with ts file in general
- Only m3u8 files need to be downloaded
- There are ts files, but because they are encrypted and cannot be played, they need to be decoded.
- ts file can play normally, but too many and too small, need to be merged
This article deals with Articles 1 and 2, and the encryption part skips.
The ts file I provided above is not encrypted, that is, there is no keyword key. After downloading the ts file, it can be merged directly.
ts file path acquisition
Since all ts files in the m3u8 file above are relative addresses, it is necessary to base on Last blog Links obtained in
{'url': 'https://videos5.jsyunbf.com/2019/02/07/iQX7y3p1dleAhIv7/playlist.m3u8', 'ext': 'dplay', 'msg': 'ok', 'playertype': None}
The preceding part is the prefix address of ts playback address.
# https://videos5.jsyunbf.com/2019/02/07/iQX7y3p1dleAhIv7/out005.ts import datetime import requests # m3u8 is the local file path def get_ts_urls(m3u8_path,base_url): urls = [] with open(m3u8_path,"r") as file: lines = file.readlines() for line in lines: if line.endswith(".ts\n"): urls.append(base_url+line.strip("\n")) return urls
ts file download
After reading all the paths, the ts file needs to be downloaded. There are many ways to download the files.
def download(ts_urls,download_path): for i in range(len(ts_urls)): ts_url = ts_urls[i] file_name = ts_url.split("/")[-1] print("Start downloading %s" %file_name) start = datetime.datetime.now().replace(microsecond=0) try: response = requests.get(ts_url,stream=True,verify=False) except Exception as e: print("Exception request:%s"%e.args) return ts_path = download_path+"/{0}.ts".format(i) with open(ts_path,"wb+") as file: for chunk in response.iter_content(chunk_size=1024): if chunk: file.write(chunk) end = datetime.datetime.now().replace(microsecond=0) print("Time consuming:%s"%(end-start))
The download process shows that the download is successful, and the rest is the time to speed up the network.
After downloading, it's a bunch of ts files. Remember, as long as one can be seen, it can be merged.
Merge ts files
If you don't know the copy command, go to Baidu.
copy/b D:\newpython\doutu\sao\ts_files\*.ts d:\fnew.ts
Code merging
import os from os import path def file_walker(path): file_list = [] for root, dirs, files in os.walk(path): # generator for fn in files: p = str(root+'/'+fn) file_list.append(p) print(file_list) return file_list def combine(ts_path, combine_path, file_name): file_list = file_walker(ts_path) file_path = combine_path + file_name + '.ts' with open(file_path, 'wb+') as fw: for i in range(len(file_list)): fw.write(open(file_list[i], 'rb').read()) if __name__ == '__main__': #urls = get_ts_urls("playlist.m3u8","https://videos5.jsyunbf.com/2019/02/07/iQX7y3p1dleAhIv7/") #download(urls,"./tsfiles") combine("./ts_files","d:/ts","haha")
After the final merge, a ts file is formed. Of course, you can also use software to convert video into mp4 format.
FFMPEG can also be used to convert m3u8 to MP4 directly.
Happy to download and watch VIP videos
Remarks section
M3u8 label and attribute description in m3u8 file
#EXTM3U The first line of each M3U file must be this tag. Please mark the function. #EXT-X-VERSION:3 Can this attribute be or not? #EXT-X-MEDIA-SEQUENCE:140651513 Each media URI has only a unique serial number in PlayList, and the serial number between adjacent media URIs is + 1. A media URI does not have to be included. If not, it defaults to 0. #EXT-X-TARGETDURATION Specify the maximum media duration (seconds). So the specified length of time in # EXTINF must be less than or equal to this Four maximum values. This tag can only appear once in the entire PlayList file (in nested cases, there are usually The tag does not appear until the m3u8 of the real ts url #EXT-X-PLAYLIST-TYPE Provides information about the variability of PlayList, which is valid for the entire PlayList file and is optional in format The following: EXT-X-PLAYLIST-TYPE:: If VOD, the server can not change the PlayList file; If it's EVENT, the server can't change or delete any part of the PlayList file, but it can do so to Add a new line to the file. #EXTINF Duration specifies the duration (seconds) of each media segment (ts), which is valid only for the URI following it, and title is url for downloading resources #EXT-X-KEY Represents how to decode media segments. Its scope of action is all media before the next tag appears. URI, attribute NONE or AES-128. NONE denotes URI and IV (Initialization) Vector attribute must not exist, AES-128 (Advanced Encryption Standard) represents URI It must exist. IV can not exist. #EXT-X-PROGRAM-DATE-TIME Associate an absolute time or date with the first sample in a media segment, only for the next meida URI s are valid in formats such as # EXT-X-PROGRAM-DATE-TIME: For example: #EXT-X-PROGRAM-DATETIME:2010-02-19T14:54:23.031+08:00 #EXT-X-ALLOW-CACHE Is caching allowed? This can appear anywhere in the PlayList file and at most once. The effect is all the media segments. The format is as follows: #EXT-X-ALLOW-CACHE: #EXT-X-ENDLIST Represents the end of the PlayList. It can appear anywhere in the PlayList, but only one. The formula is as follows: #EXT-X-ENDLIST