Crawler: File handling

1. json file processing

1.1 What is json

JSON(JavaScript Object Notation) is a lightweight data exchange format.It is based on a subset of ECMAScript, the JS specification developed by the European Computer Association, and uses a text format that is completely independent of the programming language to store and represent data.The concise and clear hierarchy makes JSON the ideal data exchange language.It is easy for people to read and write, easy for machine to parse and generate, and effectively improves the efficiency of network transmission.

1.2 json supports data formats

1. Objects (dictionaries).Use curly brackets.

2. Arrays (lists).Use square brackets.

3. Integer, floating point, Boolean and null types.

4. String type (strings must be double quoted, not single quoted).

Multiple data are separated by commas.

Note: json is essentially a string.

1.3 Dictionary and List to json

import json

books = [
    {
        "title":"How Bad Eggs Are Refined into 1","price":9.8
    },
    {
        "title":"How bad eggs are made into 2","price":9.9
    }
 ]

json_str = json.dumps(books,ensure_ascii=False)
print(json_str)
print(type(books))
print(type(json_str))

Since json can only trigger ASCII characters when dump, the Chinese language will be escaped, so we can turn this feature off using ensure_ascii=False.

In python, only basic data types can be converted to json-formatted strings.That is, int, float, str, list, dict, tuple.

dump the json data directly into the file:

The json module handles the dumps function, as well as a dump function, which can pass in a file pointer and dump the string directly into the file.

import json

books = [
    {
        "title":"How Bad Eggs Are Refined into 1","price":9.8
    },
    {
        "title":"How bad eggs are made into 2","price":9.9
    }
 ]

with open('a.json','w',encoding='utf-8') as fp:
    json.dump(books,fp,ensure_ascii=False)

The difference between dump and dumps:

dump() does not need to use the write() method, just which dictionary and file to write; and.dumps() needs to be written using the write() method

dump() is useful if you want to write a dictionary into a file; however, if you don't need to manipulate the file, or if you need to save the contents to a database and Excel, you need to use dumps() to convert the dictionary into a string before writing.

1.4 load a json string into a python object

#encoding: utf-8
import json

json_str = '[{"title": "How Bad Eggs Are Refined into 1", "price": 9.8}, {"title": "How bad eggs are made into 2", "price": 9.9}]'

books = json.loads(json_str)
print(type(books))
for book in books:
    print(book)

Read json directly from the file:

#encoding: utf-8
import json

with open('a.json','r',encoding='utf-8') as fp:
   books = json.load(fp)
   for book in books:
       print(book)

The difference between load and loads:

loads() passes a string, and load() passes a file object

When using loads(), you need to read the file before using it, whereas load() does not.

2. CSV File Processing

CSV is a general and relatively simple file format, which is widely used by users, business and science.The most common application is to transfer tabular data between programs that operate in incompatible formats (often private and/or noncanonical formats).Because many programs support a CSV variant, at least as an optional input/output format.

2.1 Read csv files

csv data:

csv files cannot be created directly, they can only be saved by excel files in csv format.

import csv

with open('Demo.csv','r') as fp:
    reader = csv.reader(fp)
    titles = next(reader)
    for x in reader:
        print(x)

Why can't I read the first line?

Because the first line usually contains information such as headings.(

This way, when you get data later, you need to use the above method to get data.If you want to get the data by heading, you can use DictReader.

import csv

with open('Demo.csv','r') as fp:
    reader = csv.DictReader(fp)
    for x in reader:
        print(x)

We can select the returned data by heading:

import csv

with open('Demo.csv','r') as fp:
    reader = csv.DictReader(fp)
    for x in reader:
        print(x['Data 1'])

2.2 Write data to csv file

Writing data to a csv file requires a writer object plug-in, which can be used in two main ways.One is writerow, and this is to write a line.One is writerows, and this is to write multiple lines.

import csv

headers = {'name','age','height'}
values = {
    ('Zhang San',18,170),
    ('Li Si',20,175),
    ('King Five',21,180)
}
with open('test.csv','w',encoding='utf-8') as fp:
    writer = csv.writer(fp)
    writer.writerow(headers)
    writer.writerows(values)

You can also write data in a dictionary.DictWriter is now required.

import csv

headers = ['name','age','height']
values = [
    {'name':'Zhang San','age':18,'height':170},
    {'name':'Li Si','age':20,'height':175},
    {'name':'King Five','age':21,'height':180}
]
with open('test.csv','w',encoding='utf-8') as fp:
    writer = csv.DictWriter(fp,headers)
    writer.writeheader()
    writer.writerows(values)

Posted by lokesh_kumar_s on Wed, 18 Dec 2019 09:06:17 -0800

Programmer Group