Python crawler actual combat, pyecharts module, python data analysis tells you which goods are popular on free fish~

Keywords: Python Back-end Programmer

preface

Python automation is used to obtain the best selling goods of a certain kind for reference. No more nonsense.

Let's start happily~

development tool

Python version: 3.6.4

Related modules:

Pyecarts module;

And some Python built-in modules.

Environment construction

Install Python and add it to the environment variable. pip can install the relevant modules required.

preparation

1. Configure Android ADB development environment

2. Install in Python virtual environment   pocoui dependency Library

# pocoui\
pip3 install pocoui

#  Data visualization chart
pip3 install pyecharts -U
 Copy code

step

We have seven points   There are three steps to realize this function: opening the target application client, retrieving keywords to the commodity list interface, calculating the best sliding distance, filtering commodities, obtaining commodity link address, writing file sorting and counting commodities, and configuring parameters.

Step 1, open the target application using pocoui automation.

def __pre(self):
    """
    preparation
    :return:
    """
    home()
    stop_app(package_name)
    start_my_app(package_name, activity)


    #  Waiting to reach the desktop
    self.poco(text='Idle fish').wait_for_appearance()
    self.poco(text='fish pond').wait_for_appearance()
    self.poco(text='news').wait_for_appearance()
    self.poco(text='my').wait_for_appearance()

    print('Enter the idle fish main interface')
Copy code

After entering the free fish home page, the application end will get the data of the clipboard. When there is a specific regular password, a dialog box will pop up immediately. Therefore, it is necessary to simulate the operation of closing the dialog box.

#  If there is a Amoy password within the specified time, it will be closed\
for i in range(10, -1, -1):\
      close_element = self.poco('com.taobao.idlefish:id/ivClose')\
      if close_element.exists():\
            close_element.click()\
            break\
      time.sleep(1)
Copy code

Step 2: retrieve keywords to the commodity list interface

Input the keywords to be retrieved into the input box through simulation, and then click the search button until the search list appears.

In addition, in order to process the data more conveniently, the commodity list is switched to the list mode, that is, only one commodity is displayed in a row.

def __input_key_word(self):
    """
    Enter keywords
    :return:
    """
    #  Enter the search interface
    perform_click(self.poco('com.taobao.idlefish:id/bar_tx'))

    #  Enter text in the search box
    self.poco('com.taobao.idlefish:id/search_term').set_text(self.good_msg)

    #  Click the search button
    while True:
         #  Wait for the search result list to appear
         if not self.poco('com.taobao.idlefish:id/list_recyclerview').exists():
              perform_click(self.poco('com.taobao.idlefish:id/search_button', text='search'))
         else:
              break

    #  Wait for the product list to appear completely
    self.poco('com.taobao.idlefish:id/list_recyclerview').wait_for_appearance()

    #  Switch to list
    perform_click(self.poco('com.taobao.idlefish:id/switch_search'))
Copy code

The first   Step 3, calculate the optimal sliding distance.

In order to ensure the efficiency of crawling data, the optimal distance of each sliding is obtained and calculated.

First, get the UI control tree of the current interface, then get the coordinates of goods through the property ID of the control, and then get the height of each commodity.

Finally, the optimal sliding distance is obtained by observing the number of products on the screen.

def __get_good_swipe_distance(self):
    """
    Get the most appropriate distance for each slide
    :return:
    """
    element = Element()
    #  Save the current UI tree locally
    element.get_current_ui_tree()

    #  Coordinates of the first Item
    position_item = element.find_elment_position_by_id_and_index("com.taobao.idlefish:id/card_root",
                                                                     "1")
    #  Height of goods
    item_height = position_item[1][1] - position_item[0][1]

    #  Through observation, there are 3 items on the current screen
    return item_height * 3
 Copy code

Step 4, filter products.

The above steps get the best sliding distance, and constantly slide the page to traverse the sub items of the list element.

It should be noted that in order to avoid the error caused by sliding inertia, the sliding duration of each time should be set to more than 2s.

Filter out the products with the desired number greater than the preset number through the product Item.

#  How many people want it
want_element_parent = item.offspring('com.taobao.idlefish:id/search_item_flowlayout')

if want_element_parent.exists():
     #  Wanted / paid
     want_element = want_element_parent.children()[0]

     want_content = want_element.get_text()

     #  Filter out [paid] and other commodities, and only keep the commodities published by individuals
     if 'People want' not in want_content:
            continue
            
      #  Get the specific number of goods you want, which represents the heat of the goods
      want_num = get_num(want_content)

      if int(want_num) < self.num_assign:
             #  print('unqualified, filter out ')
             pass
      else:
            #  If the number of goods meets the standard, add statistics
 Copy code

Step 5, get the product link address

For the commodities that meet the conditions in the previous step, click commodity Item to enter the commodity details page.

Then click the share button in the upper right corner to pop up the share dialog box immediately.

Then click the password control, and you will be prompted that the password has been copied to the system clipboard successfully.

#  Click More
while True:
     if self.poco('com.taobao.idlefish:id/ftShareName').exists():
          break
     print('Click More~')
     perform_click(self.poco(text='more'))

#  Click Copy password
perform_click(self.poco('com.taobao.idlefish:id/ftShareName', text='Amoy password'))

#  Get the password code
taobao_code_element = self.poco('com.taobao.idlefish:id/tvWarnDetail')

taobao_code = taobao_code_element.get_text()
Copy code

Step 6, write the goods, sort and statistics

Write the commodity title, desired number and sharing address obtained above into the CSV file.

Then read the data file and sort the goods in descending order according to the desired number by reverse sorting the second column in the table.

def __sort_result(self):
    """
    Sort the crawling results
    :return:
    """
    reader = csv.reader(open(self.file_path), delimiter=",")

    #  Header title
    head_title = next(reader)

    #  Arrange in reverse order according to the second column
    sortedlist = sorted(reader, key=lambda x: (int(x[1])), reverse=True)

    #  Write header data
    write_to_csv(self.file_path, [(head_title[0], head_title[1], head_title[2])], False)

    for value in sortedlist:
       write_to_csv(self.file_path, [(value[0], value[1], value[2])], False)

    return sortedlist
 Copy code

Finally, get the top 10 data and use it   pyecharts   Generate statistical charts.

def draw_image(self, sortedlist):
     """
     Drawing
     :param sortedlist:
     :return:
     """

     #  Title List
     titles = []

     #  sales volume
     sales_num = []

     #  Get the title and sales volume of the crawling results
     with open(self.file_path, 'r') as csvfile:
         #  read file
         reader = csv.DictReader(csvfile)

         #  Add to list
         for row in reader:
             titles.append(row['title'])
             sales_num.append(row['num'])

     #  Number limit
     if len(titles) > self.num:
         titles = titles[:self.num]
         sales_num = sales_num[:self.num]

     #  Drawing
     bar = (
            Bar()
                .add_xaxis(titles)
                .add_yaxis("What sell well", sales_num)
                .set_global_opts(title_opts=opts.TitleOpts(title="I want to sell"))
        )
     bar.render('%s.html' % self.good_msg)
Copy code

Article 7   Next, configure parameters

Write yaml file to specify the keyword, crawling time, number of appraisal indicators and number of filtered products to crawl.

goods:
  #  Search item 1, including search keywords and crawling time
  good1:
    key_word: 'data'   #  Search keywords
    key_num: 100  #  Filter the critical point of [desired number]
    num: 10      #  Filter only popular items
    time: 600   #  Crawl time (seconds)
Copy code

Effect display

Configure commodity keywords, crawling time and other parameters in advance, that is, you can crawl the data of the best-selling commodities that meet the requirements, and finally display them in the form of charts.

Posted by Aptana on Sat, 04 Dec 2021 17:52:28 -0800