background
We are responsible for a business platform. Once we found that the loading of the settings page was particularly slow, which was just outrageous
It is impossible for users to wait for 36s, so we are going to start the optimization journey.
Throw a stone to clear the road
Since it is the response of the website, we can use Chrome as a powerful tool to help us quickly find the direction of optimization.
Through Chrome's Network, you can see not only the time-consuming interface requests, but also the allocation of time. Select a project with less configuration and simply request:
Although it's just a project with only three records, it takes 17s to load the project settings. Through Timing, you can see that the total request takes 17.67s, but 17.57s is in the Waiting(TTFB) state.
TTFB is the abbreviation of Time to First Byte. It refers to the time when the browser begins to receive the server response data (background processing time + redirection time). It is an important indicator reflecting the response speed of the server.
Profile flame graph + code tuning
It can be seen that the main direction of optimization is the back-end interface processing. The back-end code is implemented by Python + Flask. First, it does not blind guess, but directly goes to Profile:
The first wave of Optimization: functional interaction redesign
To be honest, it's hopeless to see this Code: nothing at all? Just see a lot of gevent and Threading, because there are too many coroutines or threads?
At this time, it must be analyzed in combination with the code (for short space, the parameter part is replaced by "..."):
def get_max_cpus(project_code, gids): """ """ ... # Define another function to get cpu def get_max_cpu(project_setting, gid, token, headers): group_with_machines = utils.get_groups(...) hostnames = get_info_from_machines_info(...) res = fetchers.MonitorAPIFetcher.get(...) vals = [ round(100 - val, 4) for ts, val in res['series'][0]['data'] if not utils.is_nan(val) ] max_val = max(vals) if vals else float('nan') max_cpus[gid] = max_val # Start thread bulk request for gid in gids: t = Thread(target=get_max_cpu, args=(...)) threads.append(t) t.start() # Recovery thread for t in threads: t.join() return max_cpus
It can be seen from the code that in order to get all the CPU ﹣ Max data of gids more quickly, a thread is assigned to each gid to request and finally return the maximum value.
There are two problems:
- It costs a lot to create and destroy threads in a web api, because interfaces will be triggered frequently, and thread operations will occur frequently, so we should try to use thread pools to reduce system costs;
- The request is to load the CPU maximum value of a machine under a gid (Group) in the past 7 days. You can simply think about it. This value is not a real-time value or a mean value, but a maximum value. In many cases, it may not be as valuable as you think;
Now that you know the problem, there are targeted solutions:
- Adjust the function design, no longer default to load the maximum CPU value, instead of clicking load (to reduce the possibility of concurrency and not affect the whole);
- Because of the adjustment of 1, the multithreading implementation is removed;
Look at the first optimized flame chart:
Although there's still a lot of room for optimization, the flame graph looked a little normal at least.
The second wave of Optimization: Mysql operation optimization
We then enlarge the flame diagram at the page mark (interface logic) to observe:
You can see that most of the operations are database operation hotspots caused by utils.py: get ﹣ group ﹣ profile ﹣ settings.
Similarly, code analysis is also needed:
def get_group_profile_settings(project_code, gids): # Get Mysql ORM operation object ProfileSetting = unpurview(sandman.endpoint_class('profile_settings')) session = get_postman_session() profile_settings = {} for gid in gids: compound_name = project_code + ':' + gid result = session.query(ProfileSetting).filter( ProfileSetting.name == compound_name ).first() if result: result = result.as_dict() tag_indexes = result.get('tag_indexes') profile_settings[gid] = { 'tag_indexes': tag_indexes, 'interval': result['interval'], 'status': result['status'], 'profile_machines': result['profile_machines'], 'thread_settings': result['thread_settings'] } ...(ellipsis) return profile_settings
When you see Mysql, the first response is the index problem, so you should first look at the database index. If there is an index, it should not be a bottleneck:
It's strange that there is an index here. Why is the speed still like this!
Just when I have no idea, I suddenly think of the first optimization wave. I found that the more gid (Group), the more obvious the impact. Then I look back at the above code and see the sentence:
for gid in gids: ...
I seem to understand something.
Here, every gid queries the database once, and there are often 20 ~ 50 + groups in the project, which will explode directly.
In fact, Mysql supports single field multi value query, and there is not much data in each record. I can try to use Mysql OR syntax, in addition to avoiding multiple network requests, I can also avoid the damn for
Just when I want to do it directly, I can see that there is another area that can be optimized in the code just now, that is:
See here, familiar friends will probably understand what is going on.
GetAttr is a method used by Python to get methods / properties of objects. Although it is not necessary, it will cause performance loss if it is used too often.
Combined with the code:
def get_group_profile_settings(project_code, gids): # Get Mysql ORM operation object ProfileSetting = unpurview(sandman.endpoint_class('profile_settings')) session = get_postman_session() profile_settings = {} for gid in gids: compound_name = project_code + ':' + gid result = session.query(ProfileSetting).filter( ProfileSetting.name == compound_name ).first() ...
In the for traversed many times, session.query(ProfileSetting) is repeatedly and effectively executed, and then the attribute method filter is frequently read and executed, so it can also be optimized here.
The following questions are summarized:
1. There is no batch query for database query; 2. Too many objects of ORM are generated repeatedly, resulting in performance loss; 3. After the attribute is read, it is not reused, resulting in frequent getAttr in the loop with a large number of iterations, and the cost is enlarged;
So the right medicine is:
def get_group_profile_settings(project_code, gids): # Get Mysql ORM operation object ProfileSetting = unpurview(sandman.endpoint_class('profile_settings')) session = get_postman_session() # Batch query and filter out of loop query_results = query_instance.filter( ProfileSetting.name.in_(project_code + ':' + gid for gid in gids) ).all() # Single processing of all query results profile_settings = {} for result in query_results: if not result: continue result = result.as_dict() gid = result['name'].split(':')[1] tag_indexes = result.get('tag_indexes') profile_settings[gid] = { 'tag_indexes': tag_indexes, 'interval': result['interval'], 'status': result['status'], 'profile_machines': result['profile_machines'], 'thread_settings': result['thread_settings'] } ...(ellipsis) return profile_settings
Optimized flame pattern:
Compare the flame pattern of the same position before optimization:
Obvious optimization points: before optimization, the bottom of utils.py: get group profile settings and database related hotspots are greatly reduced.
In fact, we can continue to optimize this part, but we can see that the effect has reached the expectation, so we will not go further.
Optimization effect
The response time of the interface of the same project is optimized from 37.6s to 1.47s. The specific screenshot:
Optimization summary
As a famous saying goes:
If a data structure is good enough, then it doesn't need much good algorithm.
When optimizing functions, the fastest optimization is to remove that function!
Second, fast is to adjust the frequency or complexity of the function trigger!
From top to bottom, considering this function from the user's use scenario will often bring more simple and efficient optimization!
Of course, most of the time, we can't be so lucky. If we can't get rid of it or adjust it, then we can play the value of programming apes: Profile
For Python, try: cProflile + gprof2dot
For Go: pprof + go-torch
Finally, keep in mind blind tuning. Most of the code problems you see are not necessarily real performance bottlenecks, which need to be objectively analyzed with tools, so as to effectively hit the pain points!
Welcome to talk with you. QQ discussion group: 258498217
Please indicate the source of Reprint: https://segmentfault.com/a/1190000020956724