python basics 1-5: collections.defaultdict build multidict&&OrderedDict module & & zip() function

Keywords: Python

python cookbook Chapter 1, sections 6-8

1.6 keys in the dictionary map multiple values

Question: how to implement a dictionary (also called multidict) in which a key corresponds to multiple values?

Solution: a dictionary is a mapping of a key to a single value. If you want a key to map multiple values, you need to put these values in another container, such as a list or collection. For example, you can construct a dictionary like this:

d = {
    'a' : [1, 2, 3],
    'b' : [4, 5]
}
e = {
    'a' : {1, 2, 3},
    'b' : {4, 5}
}

The choice between using lists or collections depends on your actual needs.

  1. If you want to maintain the insertion order of elements, you should use the list,
  2. If you want to remove duplicate elements, use collections (and don't care about the order of elements).

You can easily use the defaultdict in the collections module to construct such a dictionary.
A feature of defaultdict is that it will automatically initialize the corresponding value of each key at the beginning, so you only need to pay attention to the operation of adding elements. For example:

from collections import defaultdict
​
d = defaultdict(list)
d['a'].append(1)
d['a'].append(2)
d['b'].append(4)
d   #defaultdict(list, {'a': [1, 2], 'b': [4]})
​
d = defaultdict(set)
d['a'].add(1)
d['a'].add(2)
d['b'].add(4)
d   #defaultdict(set, {'a': {1, 2}, 'b': {4}})

It should be noted that defaultdict will automatically create a mapping entity for the key to be accessed (even if such a key does not exist in the current dictionary).
If you don't need such a feature, you can use the setdefault() method on an ordinary dictionary instead. For example:

d = {} # An ordinary dictionary
d.setdefault('a', []).append(1)
d.setdefault('a', []).append(2)
d.setdefault('b', []).append(4)
​
d   #{'a': [1, 2], 'b': [4]}
type(d)  #dict

But many programmers find setdefault() a little awkward to use. Because each call creates a new instance of the initial value (empty list [] in the example program).

discuss

Generally speaking, creating a multi value mapping dictionary is very simple. However, if you choose to implement it yourself, the initialization of values may be a little troublesome. You may implement it as follows:

d = {}
for key, value in pairs:
    if key not in d:
        d[key] = []
    d[key].append(value)
    
# If you use defaultdict, the code will be more concise:

d = defaultdict(list)
for key, value in pairs:
    d[key].append(value)

The problems discussed in this section are closely related to the classification of records in data processing. Refer to the example in section 1.15.

1.7 dictionary sorting

Problem: you want to create a dictionary and control the order of elements when iterating or serializing the dictionary.

Solution: to control the order of elements in a dictionary, you can use the OrderedDict class in the collections module. During the iterative operation, it will maintain the order in which the elements are inserted,

Examples are as follows:

from collections import OrderedDict

d = OrderedDict()
d['foo'] = 1
d['bar'] = 2
d['spam'] = 3
d['grok'] = 4

for key in d:
    print(key, d[key])
# Outputs "foo 1", "bar 2", "spam 3", "grok 4"

OrderedDict is very useful when you want to build a map that needs to be serialized or encoded into other formats in the future. For example, if you want to precisely control the order of fields encoded in JSON, you can first use OrderedDict to build such data:

import json
json.dumps(d)

discuss
OrderedDict internally maintains a two-way linked list sorted according to the order of key insertion.
Every time a new element is inserted, it will be placed at the end of the linked list.
Repeated assignments to an existing key do not change the order of the keys.

It should be noted that an OrderedDict is twice the size of an ordinary dictionary because it maintains another linked list internally.
So if you want to build a data structure that requires a large number of OrderedDict instances (such as reading 100000 rows of CSV data into an OrderedDict list),
Then you have to carefully weigh whether the benefits of using OrderedDict outweigh the impact of additional memory consumption.

1.8 dictionary operation

Question: how to perform some calculation operations (such as minimum, maximum, sorting, etc.) in the data dictionary?

Solution: consider the following stock name and price mapping Dictionary:

Consider the following stock name and price mapping Dictionary:

prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}

In order to perform calculations on dictionary values, you usually need to use the zip() function to reverse the keys and values first. For example, here are the codes for finding the minimum and maximum stock prices and stock values:

min_price = min(zip(prices.values(), prices.keys()))
# min_price is (10.75, 'FB')
max_price = max(zip(prices.values(), prices.keys()))
# max_price is (612.78, 'AAPL')

Similarly, the zip() and sorted() functions can be used to arrange dictionary data:

prices_sorted = sorted(zip(prices.values(), prices.keys()))
# prices_sorted is [(10.75, 'FB'), (37.2, 'HPQ'),
#                   (45.23, 'ACME'), (205.55, 'IBM'),
#                   (612.78, 'AAPL')]

When performing these calculations, it should be noted that the zip() function creates an iterator that can only be accessed once. For example, the following code will generate an error:

prices_and_names = zip(prices.values(), prices.keys())
print(min(prices_and_names)) # OK
print(max(prices_and_names)) # ValueError: max() arg is an empty sequence

discuss

If you perform ordinary mathematical operations on a dictionary, you will find that they only act on keys, not values. For example:

min(prices) # Returns 'AAPL'
max(prices) # Returns 'IBM'

This result is not what you want, because you want to perform these calculations on the value set of the dictionary.
Maybe you will try to solve this problem by using the values() method of the dictionary:

min(prices.values()) # Returns 10.75
max(prices.values()) # Returns 612.78

Unfortunately, usually this result is also not what you want.
You may also want to know the information of the corresponding key (for example, which stock price is the lowest?).

You can provide the key function parameter in the min() and max() functions to obtain the information of the key corresponding to the minimum or maximum value. For example:

min(prices, key=lambda k: prices[k]) # Returns 'FB'
max(prices, key=lambda k: prices[k]) # Returns 'AAPL'

However, if you want to get the minimum value, you have to perform another lookup operation. For example:

min_value = prices[min(prices, key=lambda k: prices[k])]   #10.75

The previous zip() function scheme solves the above problem by "inverting" the dictionary into a (value, key) tuple sequence. When comparing two tuples, the values are compared first and then the keys. In this way, you can easily realize the optimization and sorting operations in the dictionary through a simple statement.

It should be noted that (value, key) pairs are used in the calculation operation. When multiple entities have the same value, the key determines the return result. For example, when performing min() and max() operations, if the minimum or maximum value happens to be repeated, the entity with the minimum or maximum key will return:

prices = { 'AAA' : 45.23, 'ZZZ': 45.23 }

min(zip(prices.values(), prices.keys()))
#(45.23, 'AAA')

max(zip(prices.values(), prices.keys()))
#(45.23, 'ZZZ')

Posted by OM2 on Sat, 23 Oct 2021 18:33:04 -0700