Use Python to analyze your Internet habits!

Keywords: SQLite Database Firefox SQL

(Of course, this article is only for PC browsers)

Android users are not unavoidable, but there are too many categories of mobile browsers to study one by one. So this article only uses Chrome and Firefox browsers on the PC side as examples. At the same time, because this article is written in the Linux system environment, it may be different from the operation under Windows system.

 

As for mobile browsing history, it should be synchronized or exported.

 

A saved file for extracting historical records

 

Chrome browsers can enter: chrome://version in the address bar/

 

 

Firefox browsers can enter about:profiles in the address bar

 

 

Other types of browsers (360 security browser, Sogou high-speed browser, QQ browser, Baidu browser and so on) should not be very different, I have not verified, but also ask the officials to solve.

 

The Chrome browser saves historical data in a file called History without an extension. But it's actually sqlite's database file; Firefox browser's corresponding file for storing data is Places.sqlite, which actually stores information including history, bookmarks, settings and add-ons, as well as sqlite's database file.

 

On Sqlite

 

SQL (Structured Query Language: Structured Query Language) is a special purpose programming language for managing relational database management system (RDBMS) or flow processing in relational stream data management system (RDSMS). [1]

 

Simply put, it's a special language for managing databases. In addition to the database tools represented by SQL statements, there are also non-relational databases represented by NoSql, such as MongoDB. Its main feature is that it does not use SQL query statements.

 

Of course, this article only focuses on the use of Sqlite3, because it is helpful for what to do in this article.

 

Usually, the distribution of Linux system usually comes with sqlite3. It only needs to input sqlite3 at the terminal and add the database file name as a parameter to query the database. But there may be a little bit of preparation before that.

 

First, the SQL database is usually composed of several tables, each of which has a relationship. Each table has a header, which is the title on a regular Excel table, also known as a field. Usually this represents the main characteristics of a list of data. Other secondary features include: Type, Primary Key, Null, Default, and BOOLEAN.

 

After entering the database file, the. table command can view all the tables. (Note that there is a point before the command)

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
$ sqlite3 HistorySQLite version 3.23.1 2018-04-10 17:39:29Enter ".help" for usage hints.sqlite> .tabledownloads                meta                     urls                   downloads_slices         segment_usage            visit_source           downloads_url_chains     segments                 visits                 keyword_search_terms     typed_url_sync_metadata

pragma table_info([table_name]); commands can query all the headers of the corresponding table. (Note that there is no point before the command and a semicolon is needed at the end, or a semicolon is added after a line change to indicate the end of the command.)

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
sqlite> pragma table_info(visits);0|id|INTEGER|0||11|url|INTEGER|1||02|visit_time|INTEGER|1||03|from_visit|INTEGER|0||04|transition|INTEGER|1|0|05|segment_id|INTEGER|0||06|visit_duration|INTEGER|1|0|07|incremented_omnibox_typed_score|BOOLEAN|1|FALSE|0

The. dump command displays all the information in a table, but it's not easy to read. So it's easier to read the information of the table by using Select query statement.

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
sqlite> .dump visitsPRAGMA foreign_keys=OFF;BEGIN TRANSACTION;CREATE TABLE visits(id INTEGER PRIMARY KEY,url INTEGER NOT NULL,visit_time INTEGER NOT NULL,from_visit INTEGER,transition INTEGER DEFAULT 0 NOT NULL,segment_id INTEGER,visit_duration INTEGER DEFAULT 0 NOT NULL,incremented_omnibox_typed_score BOOLEAN DEFAULT FALSE NOT NULL);INSERT INTO visits VALUES(16093,9299,13202840367496968,0,805306368,0,0,0);INSERT INTO visits VALUES(16094,9299,13202840480577849,0,805306368,0,0,0);INSERT INTO visits VALUES(16095,9302,13202840482160798,0,268435456,0,0,0);INSERT INTO visits VALUES(16096,9303,13202840482160798,16095,2684354560,0,353266479,0);INSERT INTO visits VALUES(16097,9299,13202840816848619,0,805306368,0,0,0);...INSERT INTO visits VALUES(22432,12966,13210610317809993,22431,805306368,212,0,0);INSERT INTO visits VALUES(22433,12969,13210610329001790,0,268435460,0,0,0);INSERT INTO visits VALUES(22434,12968,13210610329001790,22433,2684354564,0,0,0);INSERT INTO visits VALUES(22435,13220,13210613943693189,0,805306368,0,0,0);INSERT INTO visits VALUES(22436,12966,13210616113132979,22431,805306376,212,0,0);INSERT INTO visits VALUES(22437,13220,13210616470103600,0,805306368,0,0,0);CREATE INDEX visits_url_index ON visits (url);CREATE INDEX visits_from_index ON visits (from_visit);CREATE INDEX visits_time_index ON visits (visit_time);COMMIT;

Select: (Note the semicolon at the end of the statement, because the Select statement can be written in multiple lines, so you need to use the semicolon as a rest.)

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
sqlite> select * from visits;16093|9299|13202840367496968|0|805306368|0|0|016094|9299|13202840480577849|0|805306368|0|0|016095|9302|13202840482160798|0|268435456|0|0|016096|9303|13202840482160798|16095|2684354560|0|353266479|016097|9299|13202840816848619|0|805306368|0|0|016098|9299|13202840831139916|0|805306368|0|882287|0...22436|12966|13210616113132979|22431|805306376|212|0|022437|13220|13210616470103600|0|805306368|0|0|0

Of course, our visual system still can't accept this cluttered stack of forms. So we can also use. mode column. This command changes the original list display mode to column (row), which is to achieve column alignment in the terminal.

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
sqlite> .mode columnsqlite> select * from visits;16093       9299        13202840367496968  0           805306368   0           0               0                              16094       9299        13202840480577849  0           805306368   0           0               0                              16095       9302        13202840482160798  0           268435456   0           0               0                              16096       9303        13202840482160798  16095       2684354560  0           353266479       0                              16097       9299        13202840816848619  0           805306368   0           0               0                              16098       9299        13202840831139916  0           805306368   0           882287          0                         16099       9304        13202878582356542  0           268435461   0           0               0...22437       13220       13210616470103600  0           805306368   0           0               0

However, if the display is not wide enough or the window of the terminal is not large enough, the table will be folded. The same will only challenge the limits of human visual endurance. In addition, too long content in the table is not allowed to display the excess part in column mode, but the width of each grid can be manually adjusted.

 

But there is still one common thing missing from the look of the table in this way, that is, the title of the table. So you also need to use the. header on command to display the header for the query results.

 

For the display status of the current table, you can view it with the. show command.

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
sqlite> .showecho: offeqp: offexplain: autoheaders: onmode: columnnullvalue: ""output: stdoutcolseparator: "|"rowseparator: "\n"stats: offwidth:     filename: History

The colseparator is also the line separator, which is "|". In this way, the exported file is also delimited by a vertical line. But in the standard format of csv files, "," is used as a line splitter. If you need to export to a csv file, you need to change the line separator. The specific orders are as follows:

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
sqlite> .separator ","sqlite> .mode listsqlite> select * from visits;id,url,visit_time,from_visit,transition,segment_id,visit_duration,incremented_omnibox_typed_score16093,9299,13202840367496968,0,805306368,0,0,016094,9299,13202840480577849,0,805306368,0,0,016095,9302,13202840482160798,0,268435456,0,0,016096,9303,13202840482160798,16095,2684354560,0,353266479,016097,9299,13202840816848619,0,805306368,0,0,0...22437,13220,13210616470103600,0,805306368,0,0,0

Next is the operation of exporting the query results to the file. The output command is used here.

  •  
  •  
  •  
sqlite> .output file_name.csvsqlite> select * from visits;sqlite> .output stdout

The following is the display effect of opening the corresponding csv file using table software.

 

In fact, the output command writes the contents of the output file_name and output stdout that were originally displayed on the terminal directly into the file as text.

 

Select query

 

If you look carefully enough, you will find that the visits form does not have any websites. In fact, the url used the number of the address, and then recorded the visit time. The url number and url address are stored in the urls table.

 

So if we want to get each web site and its corresponding access time accurately, we must use visits and urls tables. The operation of this database is usually internal connection.

 

  •  
sqlite> select urls.url, title, visit_time from urls inner join visits on urls.id = visits.url;

Similarly, using the previous file export operation, a complete form containing the address, the title of the address, and the access time can be generated.

 

Firefox browsers operate roughly the same way. Among them, moz_historyvisits correspond to visits and moz_places correspond to urls. The select statement can be written as:

  •  
  •  
  •  
sqlite> select url, title, visit_date   ...> from moz_places inner join moz_historyvisits   ...> on moz_places.id = moz_historyvisits.place_id;

However, there is also a rev_host in the moz_places table in Firefox browser, which is the reverse domain name. Although you don't understand the usefulness of reverse ordering, Firefox has extracted all domain names.

 

In addition, Linux system or Mac OS can retrieve installation methods without Sqlite3; Windows system has command-line operating version and GUI version. The GUI version has not been thoroughly understood by the author, so there is no corresponding example operation; the command line version operation should not be much different from this article.

Posted by misc00500 on Mon, 19 Aug 2019 00:13:57 -0700