preface
Python automation is used to obtain the best selling goods of a certain kind for reference. No more nonsense.
Let's start happily~
development tool
Python version: 3.6.4
Related modules:
Pyecarts module;
And some Python built-in modules.
Environment construction
Install Python and add it to the environment variable. pip can install the relevant modules required.
preparation
1. Configure Android ADB development environment
2. Install in Python virtual environment pocoui dependency Library
# pocoui\ pip3 install pocoui # Data visualization chart pip3 install pyecharts -U Copy code
step
We have seven points There are three steps to realize this function: opening the target application client, retrieving keywords to the commodity list interface, calculating the best sliding distance, filtering commodities, obtaining commodity link address, writing file sorting and counting commodities, and configuring parameters.
Step 1, open the target application using pocoui automation.
def __pre(self): """ preparation :return: """ home() stop_app(package_name) start_my_app(package_name, activity) # Waiting to reach the desktop self.poco(text='Idle fish').wait_for_appearance() self.poco(text='fish pond').wait_for_appearance() self.poco(text='news').wait_for_appearance() self.poco(text='my').wait_for_appearance() print('Enter the idle fish main interface') Copy code
After entering the free fish home page, the application end will get the data of the clipboard. When there is a specific regular password, a dialog box will pop up immediately. Therefore, it is necessary to simulate the operation of closing the dialog box.
# If there is a Amoy password within the specified time, it will be closed\ for i in range(10, -1, -1):\ close_element = self.poco('com.taobao.idlefish:id/ivClose')\ if close_element.exists():\ close_element.click()\ break\ time.sleep(1) Copy code
Step 2: retrieve keywords to the commodity list interface
Input the keywords to be retrieved into the input box through simulation, and then click the search button until the search list appears.
In addition, in order to process the data more conveniently, the commodity list is switched to the list mode, that is, only one commodity is displayed in a row.
def __input_key_word(self): """ Enter keywords :return: """ # Enter the search interface perform_click(self.poco('com.taobao.idlefish:id/bar_tx')) # Enter text in the search box self.poco('com.taobao.idlefish:id/search_term').set_text(self.good_msg) # Click the search button while True: # Wait for the search result list to appear if not self.poco('com.taobao.idlefish:id/list_recyclerview').exists(): perform_click(self.poco('com.taobao.idlefish:id/search_button', text='search')) else: break # Wait for the product list to appear completely self.poco('com.taobao.idlefish:id/list_recyclerview').wait_for_appearance() # Switch to list perform_click(self.poco('com.taobao.idlefish:id/switch_search')) Copy code
The first Step 3, calculate the optimal sliding distance.
In order to ensure the efficiency of crawling data, the optimal distance of each sliding is obtained and calculated.
First, get the UI control tree of the current interface, then get the coordinates of goods through the property ID of the control, and then get the height of each commodity.
Finally, the optimal sliding distance is obtained by observing the number of products on the screen.
def __get_good_swipe_distance(self): """ Get the most appropriate distance for each slide :return: """ element = Element() # Save the current UI tree locally element.get_current_ui_tree() # Coordinates of the first Item position_item = element.find_elment_position_by_id_and_index("com.taobao.idlefish:id/card_root", "1") # Height of goods item_height = position_item[1][1] - position_item[0][1] # Through observation, there are 3 items on the current screen return item_height * 3 Copy code
Step 4, filter products.
The above steps get the best sliding distance, and constantly slide the page to traverse the sub items of the list element.
It should be noted that in order to avoid the error caused by sliding inertia, the sliding duration of each time should be set to more than 2s.
Filter out the products with the desired number greater than the preset number through the product Item.
# How many people want it want_element_parent = item.offspring('com.taobao.idlefish:id/search_item_flowlayout') if want_element_parent.exists(): # Wanted / paid want_element = want_element_parent.children()[0] want_content = want_element.get_text() # Filter out [paid] and other commodities, and only keep the commodities published by individuals if 'People want' not in want_content: continue # Get the specific number of goods you want, which represents the heat of the goods want_num = get_num(want_content) if int(want_num) < self.num_assign: # print('unqualified, filter out ') pass else: # If the number of goods meets the standard, add statistics Copy code
Step 5, get the product link address
For the commodities that meet the conditions in the previous step, click commodity Item to enter the commodity details page.
Then click the share button in the upper right corner to pop up the share dialog box immediately.
Then click the password control, and you will be prompted that the password has been copied to the system clipboard successfully.
# Click More while True: if self.poco('com.taobao.idlefish:id/ftShareName').exists(): break print('Click More~') perform_click(self.poco(text='more')) # Click Copy password perform_click(self.poco('com.taobao.idlefish:id/ftShareName', text='Amoy password')) # Get the password code taobao_code_element = self.poco('com.taobao.idlefish:id/tvWarnDetail') taobao_code = taobao_code_element.get_text() Copy code
Step 6, write the goods, sort and statistics
Write the commodity title, desired number and sharing address obtained above into the CSV file.
Then read the data file and sort the goods in descending order according to the desired number by reverse sorting the second column in the table.
def __sort_result(self): """ Sort the crawling results :return: """ reader = csv.reader(open(self.file_path), delimiter=",") # Header title head_title = next(reader) # Arrange in reverse order according to the second column sortedlist = sorted(reader, key=lambda x: (int(x[1])), reverse=True) # Write header data write_to_csv(self.file_path, [(head_title[0], head_title[1], head_title[2])], False) for value in sortedlist: write_to_csv(self.file_path, [(value[0], value[1], value[2])], False) return sortedlist Copy code
Finally, get the top 10 data and use it pyecharts Generate statistical charts.
def draw_image(self, sortedlist): """ Drawing :param sortedlist: :return: """ # Title List titles = [] # sales volume sales_num = [] # Get the title and sales volume of the crawling results with open(self.file_path, 'r') as csvfile: # read file reader = csv.DictReader(csvfile) # Add to list for row in reader: titles.append(row['title']) sales_num.append(row['num']) # Number limit if len(titles) > self.num: titles = titles[:self.num] sales_num = sales_num[:self.num] # Drawing bar = ( Bar() .add_xaxis(titles) .add_yaxis("What sell well", sales_num) .set_global_opts(title_opts=opts.TitleOpts(title="I want to sell")) ) bar.render('%s.html' % self.good_msg) Copy code
Article 7 Next, configure parameters
Write yaml file to specify the keyword, crawling time, number of appraisal indicators and number of filtered products to crawl.
goods: # Search item 1, including search keywords and crawling time good1: key_word: 'data' # Search keywords key_num: 100 # Filter the critical point of [desired number] num: 10 # Filter only popular items time: 600 # Crawl time (seconds) Copy code
Effect display
Configure commodity keywords, crawling time and other parameters in advance, that is, you can crawl the data of the best-selling commodities that meet the requirements, and finally display them in the form of charts.