You don't manage money, money doesn't care about you! Can python help you with your finances?
Effect preview
Trend chart of cumulative yield
Basic information results
How to use:
Python 3 + some third-party libraries
import requests import pandas import numpy import matplotlib import lxml
Configure config.json. Code to configure the fund code and whether useCache uses cache.
{ "code":[ "002736", "003328", "003547", ], "useCache":true }
Run fund? Analysis.py
Realization principle
Data acquisition:
Open a fund from the daily fund website, and observe the loaded files in the chrome developer tool. We found a js file, which contains some basic information of the fund. This is a js file.
To obtain the cumulative yield information, you need to do some operations on the page. Click 3 years in the cumulative yield to observe the request of the developer tool, and it is easy to find out how the data source is obtained. This is a json data.
The fund rate table is on another page. We can find the information source address several times. This is html data.
Then through the analysis of the Hearders, we use the request simulation browser to get the data (if it is not clear here, please refer to the previous article). Finally, it is saved locally as a buffer. Take the cumulative yield information json as an example. The main code is as follows.
filePath = f'./cache/{fundCode}.json' requests_url='http://api.fund.eastmoney.com/pinzhong/LJSYLZS' headers = { 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36', 'Accept': 'application/json' , 'Referer': f'http://fund.eastmoney.com/{fundCode}.html', } params={ 'fundCode': f'{fundCode}', 'indexcode': '000300', 'type': 'try', } requests_page=requests.get(requests_url,headers=headers,params=params) with open(filePath, 'w') as f: json.dump(requests_page.json(), f)
Data analysis:
For the js file of basic information, read the file as a string, and get the required data through regular expression.
For example, to obtain one-year yield, you can use the following code.
syl_1n=re.search(r'syl_1n\s?=\s?"([^\s]*)"',data).group(1);
For the cumulative yield json data, directly use json to analyze and find the data needed for filtering and processing.
It uses the format of all_data [fund code] [time] = cumulative yield to store, and then fills up the empty data through the DataFrame of pandas.
df = DataFrame(all_data).sort_index().fillna(method='ffill')
For the html data of fund rate table, xpath is used. The xpath path can be obtained directly from chrome.
For management rates, refer to the following codes.
selector = lxml.html.fromstring(data); # Management fee rate mg_rate=selector.xpath('/html/body/div[1]/div[8]/div[3]/div[2]/div[3]/div/div[4]/div/table/tbody/tr/td[2]/text()')[0]
Data storage:
plot in DataFrame can be used to draw pictures quickly, and to excel can be used to save them in Excel table. You can refer to the following code.
# Save data fig,axes = plt.subplots(2, 1) # Process basic information df2 = DataFrame(all_data_base) df2.stack().unstack(0).to_excel(f'result_{time.time()}.xlsx',sheet_name='out') df2.iloc[1:5,:].plot.barh(ax=axes[0],grid=True,fontsize=25) # Processing revenue df=DataFrame(all_data).sort_index().fillna(method='ffill') df.plot(ax=axes[1],grid=True,fontsize=25) fig.savefig(f'result_{time.time()}.png')
Summary
Data acquisition mainly uses the basic method of crawler, using the requests library. Regular expression, xpath parsing library and pandas data processing library are mainly used for data analysis and preservation.
The analysis of a fund is far more than these data (such as position distribution, fund manager information, etc.), here is just a guide, hoping to give you a idea, if you have ideas or do not understand the place, welcome to leave a message or private communication!
This article is only for personal learning and communication. Please do not use it for other purposes!