Python Reptilian Climbing Chain Households Guangzhou House Price _03 Storage

Keywords: Database SQLite Python

Problem introduction

Series catalogues:

Python Crawler Actual Climbing Chain Household Guangzhou House Price _01 Simple Single-page Crawler

Python Reptiles Actual Climbing Chain Households Guangzhou House Price _02 Enlarges Small Reptiles

This section mainly talks about the storage that has not been implemented before. Storage can be divided into two main categories: files and databases. According to the data quantity of this crawler and the need of later analysis, this paper mainly introduces SQLite.

Introduction of thought

By encapsulating the SQLite database, multi-threaded writing is processed. Python's DB-API interface mainly deals with Connection objects and Curse objects. Communication between applications and databases requires establishing database connections. When a connection (or connection pool) is established, a cursor can be created to send requests to the database and then receive responses from the database. The attributes and methods of the objects can refer to official documentation. The combination of multi-threading and database is still in the process of exploration.

Up to now, the crawling of information of all residential districts in Chain Jia Guangzhou area has been completed, which can be used as a template to achieve the crawling of sales and transaction records, and extend to other cities and other real estate APP crawling, and analyze the data after crawling.

Code example

class SQLiteWraper(object):
    """SQLite Encapsulation of database to handle multithreaded writing

    """

    def __init__(self, path, command='', *args, **kwargs):
        self.lock = threading.RLock()
        self.path = path
        if command != '':
            conn = self.get_conn()
            cu = conn.cursor()
            cu.execute(command)

    def get_conn(self):
        conn = sqlite3.connect(self.path)
        conn.text_factory = str
        return conn

    def conn_close(self, conn=None):
        conn.close()

    def conn_trans(func):
        def connection(self, *args, **kwargs):
            self.lock.acquire()
            conn = self.get_conn()
            kwargs['conn'] = conn
            rs = func(self, *args, **kwargs)
            self.conn_close(conn)
            self.lock.release()
            return rs

        return connection

    @conn_trans
    def execute(self, command, method_flag=0, conn=None):
        cu = conn.cursor()
        try:
            if not method_flag:
                cu.execute(command)
            else:
                cu.execute(command[0], command[1])
            conn.commit()
        except sqlite3.IntegrityError, e:
            print e
            return -1
        except Exception, e:
            print e
            return -2
        return 0

The results are as follows:

sqlite> select * from xiaoqu limit 10;
http://gz.lianjia.com/xiaoqu/2113328145985364/|Surplus Port International|90Day deal0set|0Sets on rent|Nansha Islands|Jin Zhou|Built in an unknown year|No average price|0set
http://gz.lianjia.com/xiaoqu/2113328662147633/|Overlapping peaks|90Day deal0set|0Sets on rent|Nansha Islands|Nansha Prefecture|Built in an unknown year|No average price|0set
http://gz.lianjia.com/xiaoqu/2113668346245868/|Fuli Tianhai Bay|90Day deal0set|0Sets on rent|Nansha Islands|Jin Zhou|Built in an unknown year|No average price|0set
http://gz.lianjia.com/xiaoqu/2113826645830960/|Tongda Sunrise Garden|90Day deal0set|0Sets on rent|Nansha Islands|Jin Zhou|Built in an unknown year|No average price|0set
http://gz.lianjia.com/xiaoqu/2113306909962092/|Nansha City|90Day deal0set|0Sets on rent|Nansha Islands|Jin Zhou|Built in an unknown year|13226element/m2|0set
http://gz.lianjia.com/xiaoqu/2114256349654567/|Jinzhu square|90Day deal0set|0Sets on rent|Nansha Islands|Jin Zhou|2015Completed in 2000|No average price|0set
http://gz.lianjia.com/xiaoqu/2112879567532818/|Nansha Olympic Garden|90Day deal0set|0Sets on rent|Nansha Islands|Jin Zhou|Built in an unknown year|7970element/m2|0set
http://gz.lianjia.com/xiaoqu/2114393419248043/|North Ring Road|90Day deal0set|0Sets on rent|Nansha Islands|Dagang|Built in an unknown year|No average price|0set
http://gz.lianjia.com/xiaoqu/2113199275994557/|Nansha Pearl River Bay|90Day deal0set|0Sets on rent|Nansha Islands|Jin Zhou|Built in an unknown year|10376element/m2|0set
http://gz.lianjia.com/xiaoqu/2114416717295822/|Dongchung Street, Shinan Road|90Day deal0set|0Sets on rent|Nansha Islands|Tung Chung Town|Built in an unknown year|No average price|0set

Wechat's public number "Data Analysis" shares the self-cultivation of data scientists.

Posted by paulieo10 on Wed, 10 Apr 2019 03:33:31 -0700