Each section in the python3-cookbook explores the best solution of Python 3 to a given problem in three parts: problem, solution, and discussion, or how Python 3's own data structure, functions, classes, and so on, can be better used in a given problem.This book is very helpful for understanding Python 3 and improving Python programming capabilities, especially for improving the performance of Python programs. It is strongly recommended that you take a look if you have time.
This is a note for learning. The content in this paper is only part of the book written according to your own work needs and in the ordinary time. Most of the sample code in this paper is pasted directly into the original text code. Of course, most of the code has been validated in Python 3.6 environment.Programming concerns vary from field to field, so you can read the full text if you are interested.
python3-cookbook: https://python3-cookbook.readthedocs.io/zh_CN/latest/index.html
6.1 Read and write CSV data
For CSV files, if special processing is not required, the CSV module should always be selected to read and write CSV files in order to minimize accidents.Here are just a few simple examples of reading and writing CSV files:
The CSV file stocks.csv reads as follows:
Symbol,Price,Date,Time,Change,Volume "AA",39.48,"6/11/2007","9:36am",-0.18,181800 "AIG",71.38,"6/11/2007","9:36am",-0.15,195500 "AXP",62.58,"6/11/2007","9:36am",-0.46,935000 "BA",98.31,"6/11/2007","9:36am",+0.12,104800 "C",53.08,"6/11/2007","9:36am",-0.25,360900 "CAT",78.29,"6/11/2007","9:36am",-0.23,225400
import csv # Read data as a list
with open('stocks.csv') as f: f_csv = csv.reader(f) headers = next(f_csv) # headers and row Is a list print(headers) for row in f_csv: print(row)
import csv
# Read data as a dictionary with open('stocks.csv') as f: f_csv = csv.DictReader(f) # row Is a OrderedDict Dictionary Type for row in f_csv: # The first output is: OrderedDict([('Symbol', 'AA'), ('Price', '39.48'), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', '-0.18'), ('Volume', '181800')]) print(row)
headers = ['Symbol','Price','Date','Time','Change','Volume'] rows = [('AA', 39.48, '6/11/2007', '9:36am', -0.18, 181800), ('AIG', 71.38, '6/11/2007', '9:36am', -0.15, 195500), ('AXP', 62.58, '6/11/2007', '9:36am', -0.46, 935000), ]
# Write data as a list with open('stocks.csv','w') as f: f_csv = csv.writer(f) # Write single line data f_csv.writerow(headers) # Write multiline data f_csv.writerows(rows)
headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume'] rows = [{'Symbol':'AA', 'Price':39.48, 'Date':'6/11/2007', 'Time':'9:36am', 'Change':-0.18, 'Volume':181800}, {'Symbol':'AIG', 'Price': 71.38, 'Date':'6/11/2007', 'Time':'9:36am', 'Change':-0.15, 'Volume': 195500}, {'Symbol':'AXP', 'Price': 62.58, 'Date':'6/11/2007', 'Time':'9:36am', 'Change':-0.46, 'Volume': 935000}, ] # Write data as a dictionary with open('stocks.csv','w') as f: f_csv = csv.DictWriter(f, headers) f_csv.writeheader() f_csv.writerows(rows)
6.3 Parsing simple XML data
As the title of this subsection says, it only describes simple XML parsing. For smaller, less complex XML files, you can use the built-in xml.etree.ElementTree. For complex XML documents, you can use the tripartite library lxml, which is more powerful and faster.For the following sample code, you can replace it directly with from lxml.etree import parse.
from urllib.request import urlopen from xml.etree.ElementTree import parse # download XML File and parse u = urlopen('http://planet.python.org/rss20.xml') doc = parse(u) # Find Node channel Lower title node e = doc.find('channel/title') # Print Node Name: title print(e.tag) # Print node text: Planet Python print(e.text) # Print the value of one of the properties of the node, because the node has no other properties, so get xxx The result is None print(e.get('xxx')) # ergodic channel Lower item node for item in doc.iterfind('channel/item'): # stay item Find text for corresponding child nodes in a node title = item.findtext('title') date = item.findtext('pubDate') link = item.findtext('link') print(title) print(date) print(link) print()
title Planet Python None Codementor: Automating Everything With Python: Reading Time: 3 Mins Sat, 22 Feb 2020 09:01:58 +0000 https://www.codementor.io/maxongzb/automating-everything-with-python-reading-time-3-mins-13v57qt7y6 Quansight Labs Blog: My Unexpected Dive into Open-Source Python Fri, 21 Feb 2020 18:38:07 +0000 https://labs.quansight.org/blog/2020/02/my-unexpected-dive-into-open-source-python/ ...
6.4 Incremental parsing of large XML files
If the XML file you need to parse is too large, consider incremental parsing using from xml.etree.ElementTree import iterparse. It should be noted that in both versions of the example below, loading the entire XML document into memory performs better than incremental parsing, but consumes much more memory than incremental parsing.
The section of the XML file potholes.xml that needs to be parsed is as follows, and now you need to count the contents of the zip node in the row node:
<response> <row> <row ...> <creation_date>2012-11-18T00:00:00</creation_date> <status>Completed</status> <completion_date>2012-11-18T00:00:00</completion_date> <service_request_number>12-01906549</service_request_number> <type_of_service_request>Pot Hole in Street</type_of_service_request> <current_activity>Final Outcome</current_activity> <most_recent_action>CDOT Street Cut ... Outcome</most_recent_action> <street_address>4714 S TALMAN AVE</street_address> <zip>60632</zip> <x_coordinate>1159494.68618856</x_coordinate> <y_coordinate>1873313.83503384</y_coordinate> <ward>14</ward> <police_district>9</police_district> <community_area>58</community_area> <latitude>41.808090232127896</latitude> <longitude>-87.69053684711305</longitude> <location latitude="41.808090232127896" longitude="-87.69053684711305" /> </row> <row ...> <creation_date>2012-11-18T00:00:00</creation_date> <status>Completed</status> <completion_date>2012-11-18T00:00:00</completion_date> <service_request_number>12-01906695</service_request_number> <type_of_service_request>Pot Hole in Street</type_of_service_request> <current_activity>Final Outcome</current_activity> <most_recent_action>CDOT Street Cut ... Outcome</most_recent_action> <street_address>3510 W NORTH AVE</street_address> <zip>60647</zip> <x_coordinate>1152732.14127696</x_coordinate> <y_coordinate>1910409.38979075</y_coordinate> <ward>26</ward> <police_district>14</police_district> <community_area>23</community_area> <latitude>41.91002084292946</latitude> <longitude>-87.71435952353961</longitude> <location latitude="41.91002084292946" longitude="-87.71435952353961" /> </row> </row> </response>
Load All into Memory Resolution:
from xml.etree.ElementTree import parse from collections import Counter potholes_by_zip = Counter() doc = parse('potholes.xml') for pothole in doc.iterfind('row/row'): potholes_by_zip[pothole.findtext('zip')] += 1 for zipcode, num in potholes_by_zip.most_common(): print(zipcode, num)
Incremental parsing:
from xml.etree.ElementTree import iterparse from collections import Counter def parse_and_remove(filename, path): path_parts = path.split('/') # start Event: Generated when a node is created # end Event: Occurs when a node is created and completed doc = iterparse(filename, ('start', 'end')) # Skip Root Node next(doc) tag_stack = [] elem_stack = [] for event, elem in doc: if event == 'start': tag_stack.append(elem.tag) elem_stack.append(elem) elif event == 'end': if tag_stack == path_parts: yield elem # Here is the core statement to reduce memory consumption: yield The resulting element is removed from its parent node elem_stack[-2].remove(elem) try: tag_stack.pop() elem_stack.pop() except IndexError: pass potholes_by_zip = Counter() data = parse_and_remove('potholes.xml', 'row/row') for pothole in data: potholes_by_zip[pothole.findtext('zip')] += 1 for zipcode, num in potholes_by_zip.most_common(): print(zipcode, num)
6.5 Convert Dictionary to XML
from xml.etree.ElementTree import Element can be used to create an XML, but it is important to note that it can only construct values of type string.
from xml.etree.ElementTree import Element, tostring def dict_to_xml(tag, d): """Create one from a dictionary XML""" elem = Element(tag) for key, val in d.items(): child = Element(key) # text The value of needs to be str type child.text = str(val) elem.append(child) return elem s = {'name': 'GOOG', 'shares': 100, 'price': 490.1} e = dict_to_xml('stock', s) # Set a property value for a node e.set('_id', '1234') print(e) print(tostring(e))
<Element 'stock' at 0x000001761DB01B88> b'<stock _id="1234"><name>GOOG</name><shares>100</shares><price>490.1</price></stock>'
6.6 Parsing and modifying XML
When modifying the XML in the example, it is important to note that all modifications are made to the parent node and can be treated as a list.
- Delete Node: Use the remove() method of the parent node.
- Add Node: Use the insert() and append() methods of the parent node.
- Indexing and slicing: Nodes can be indexed and sliced using such elements [i] or [i:j].
- Create a new node: use the Element class.
Prepared file pred.xml:
<?xml version="1.0"?> <stop> <id>14791</id> <nm>Clark & Balmoral</nm> <sri> <rt>22</rt> <d>North Bound</d> <dd>North Bound</dd> </sri> <cr>22</cr> <pre> <pt>5 MIN</pt> <fd>Howard</fd> <v>1378</v> <rn>22</rn> </pre> <pre> <pt>15 MIN</pt> <fd>Howard</fd> <v>1867</v> <rn>22</rn> </pre> </stop>
>>> from xml.etree.ElementTree import parse, Element >>> doc = parse('pred.xml') >>> root = doc.getroot() >>> root <Element 'stop' at 0x100770cb0> >>> root.remove(root.find('sri')) >>> root.remove(root.find('cr')) >>> root.getchildren().index(root.find('nm')) 1 >>> e = Element('spam') >>> e.text = 'This is a test' >>> root.insert(2, e) >>> doc.write('newpred.xml', xml_declaration=True) >>>