Python crawler crawls data from mobile APP

Keywords: JSON Linux Android encoding

1. Grab APP data package

Please refer to this blog post for detailed methods: http://my.oschina.net/jhao104/blog/605963

Get the login address of the super curriculum: http://120.55.151.61/V2/StudentSkip/loginCheckV4.action

Form:

The form includes the user name and password. Of course, they are all encrypted. There is also a device information. Direct post is used to be.

In addition, header must be added. What I didn't add at the beginning is the login error, so I need to bring the header information.

 

2. Login

Login code:

import urllib2
from cookielib import CookieJar
loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
headers = {
    'Content-Type''application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent''Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host''120.55.151.61',
    'Connection''Keep-Alive',
    'Accept-Encoding''gzip',
    'Content-Length''207',
    }
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
print loginResult

 

After successful login, a string of json data of account information will be returned

It is the same as the data returned during packet capturing to prove the success of login

 

3. Grab data

Get the url and post parameters of the topic in the same way

It's the same as a simulated Login site. See: http://my.oschina.net/jhao104/blog/547311

See the final code below, there are home page access and drop-down loading updates. Topic content can be loaded infinitely.

 

#!/usr/local/bin/python2.7
# -*- coding: utf8 -*-
"""
  //Topic grabbing of super Curriculum
"""
import urllib2
from cookielib import CookieJar
import json


''' read Json data '''
def fetch_data(json_data):
    data = json_data['data']
    timestampLong = data['timestampLong']
    messageBO = data['messageBOs']
    topicList = []
    for each in messageBO:
        topicDict = {}
        if each.get('content'False):
            topicDict['content'] = each['content']
            topicDict['schoolName'] = each['schoolName']
            topicDict['messageId'] = each['messageId']
            topicDict['gender'] = each['studentBO']['gender']
            topicDict['time'] = each['issueTime']
            print each['schoolName'],each['content']
            topicList.append(topicDict)
    return timestampLong, topicList


''' Load more '''
def load(timestamp, headers, url):
    headers['Content-Length'] = '159'
    loadData = 'timestamp=%s&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&' % timestamp
    req = urllib2.Request(url, loadData, headers)
    loadResult = opener.open(req).read()
    loginStatus = json.loads(loadResult).get('status'False)
    if loginStatus == 1:
        print 'load successful!'
        timestamp, topicList = fetch_data(json.loads(loadResult))
        load(timestamp, headers, url)
    else:
        print 'load fail'
        print loadResult
        return False

loginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'
topicUrl = 'http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action'
headers = {
    'Content-Type''application/x-www-form-urlencoded; charset=UTF-8',
    'User-Agent''Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)',
    'Host''120.55.151.61',
    'Connection''Keep-Alive',
    'Accept-Encoding''gzip',
    'Content-Length''207',
    }

''' ---Login part--- '''
loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
loginStatus = json.loads(loginResult).get('data'False)
if loginResult:
    print 'login successful!'
else:
    print 'login fail'
    print loginResult

''' ---Getting topics--- '''
topicData = 'timestamp=0&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'
headers['Content-Length'] = '147'
topicRequest = urllib2.Request(topicUrl, topicData, headers)
topicHtml = opener.open(topicRequest).read()
topicJson = json.loads(topicHtml)
topicStatus = topicJson.get('status'False)
print topicJson
if topicStatus == 1:
    print 'fetch topic success!'
    timestamp, topicList = fetch_data(topicJson)
    load(timestamp, headers, topicUrl)

Result:

Please indicate the source of Reprint: http://my.oschina.net/jhao104/blog/606922

Posted by brown2005 on Mon, 30 Mar 2020 07:23:31 -0700