cookboo Learning: 1. Common Techniques for python Data Structure and Algorithms

Data Structure and Algorithms

1. Decomposition the sequence into separate variables

problem Need to decompose the primitive and sequence of N elements into N separate variables

Solution Using unpacking to decompose iterative objects

>>> p = (4, 5)
>>> a, b = p
>>> a
4
>>> b
5
>>> data = ['xyb', 18, 180, ('mmm', 20, 165)]
>>> name, age, tall, another = data
>>> name
'xyb'
>>> another
('mmm', 20, 165)

Not only tuples and lists, but also iterative objects can be decomposed, including strings, files, iterators, and so on.

>>> s = '1,2,3'
>>> a, _, b, _, c = s
>>> a
'1'
>>> b
'2'
>>> c
'3'

master Unneeded data is discarded with ____________.

2. Decomposition of Elements from Iterative Objects of Arbitrary Length

problem You need to decompose N objects from an iterative object, but the object may be longer than N, which will be abnormal

Solution Use * to match unpacking

>>> data = ['xyb', 18, 13586971744, 19857100635]
>>> name, age, *number = data
>>> name
'xyb'
>>> number
[13586971744, 19857100635]

master Intercept unknown data by *_

>>> one, *_, last = socres
>>> one
1
>>> last
8
>>> _
[5, 6, 7, 8, 10, 64, 21]

3. Use a two-way queue to record history

problem We want to keep records of the last few times in iterations or other forms, i.e. historical records

Solution Using a two-way queue deque to help us complete

from collections import deque

	# 1. Without maximizing the queue, the queue is infinite
>>> q = deque()
>>> q.append(5)
>>> q.append(6)
>>> q.append(7)
>>> q.appendleft(0)
>>> q
deque([0, 5, 6, 7])
>>> q.pop()
7

	# 2. Give the maximum
>>> q = deque(maxlen=3)
>>> q.append(1)
>>> q.append(2)
>>> q.append(3)
>>> q
deque([1, 2, 3], maxlen=3)
>>> q.append(4)
>>> q
deque([2, 3, 4], maxlen=3)

In a two-way queue, the complexity of adding and ejecting at both ends of the queue is O[1]

4. Find the largest or smallest N elements

problem Find the largest or smallest N elements in the set

Solution Use heapq Two functions in the module nlargest( ) - and - nsmallest()

>>> import heapq
>>> nums = [1, 2, 5, 6, -8, 15, -32, 24, 6, 42, 0]
>>> heapq.nlargest(3, nums)
[42, 24, 15]
>>> heapq.nsmallest(3, nums)
[-32, -8, 0]

	# 1. He can also accept a key to operate on more complex data structures.
>>> data = [
...     {'name': 'xyb', 'age': 18},
...     {'name': 'habby', 'age': 10},
...     {'name': 'yollw', 'age': 26}
... ]
>>> heapq.nlargest(2, data, key=lambda x: x['age'])
[{'name': 'yollw', 'age': 26}, {'name': 'xyb', 'age': 18}]
>>> heapq.nsmallest(2, data, key=lambda x: x['age'])
[{'name': 'habby', 'age': 10}, {'name': 'xyb', 'age': 18}]

extend If the data N to be sought is much smaller than the total data, then heapq.heapify() There will be better performance

>>> nums = [4, 5, 6, 8,  2, 5, -5, -48, 0, -31, 51]
>>> heap = heapq.heapify(nums)
>>> heap = list(nums)
>>> heapq.heapify(heap)			# The bottom is arranged in stack order
>>> heap
[-48, -31, -5, 0, 2, 5, 6, 8, 5, 4, 51]
>>> heapq.heappop(heap)
-48
>>> heapq.heappop(heap)
-31
>>> heapq.heappop(heap)
-5
>>> heapq.heappop(heap)
0
>>> heapq.heappop(heap)
2
>>> heapq.heappop(heap)
4

5. Map a key to multiple values

problem Each key in a dictionary maps multiple values

Solution : create defaultdict(list) This dictionary

>>> d = defaultdict(list)
>>> d['a'].append(1)
>>> d
defaultdict(<class 'list'>, {'a': [1]})
>>> d['b'].extend([1,2,3])
>>> d
defaultdict(<class 'list'>, {'a': [1], 'b': [1, 2, 3]})

6. Compare dictionary values

problem We want to do a variety of operations on data in dictionaries (maximizing, minimizing, sorting, etc.)

Solution 1 Reverse to use zip() Speaking about the keys and values of a dictionary

prices = {
    'apple': 42.6,
    'binana': 42.6,
    'piple': 42.6,
    'aaa': 42.6,
}

	# 1. Maximum and Minimum
min(zip(prices.values(), prices.keys()))
max(zip(prices.values(), prices.keys()))

	# 2. Sort by value
sorted(zip(prices.values(), prices.keys()))

Solution II Pass the key parameter to the min and max functions

prices = {
    'apple': 42.6,
    'binana': 42.6,
    'piple': 42.6,
    'aaa': 42.6,
}

	# 1. Return key name
min(prices, key=lambda k: prices[k])
max(prices, key=lambda k: prices[k])

7. Find the similarities between the two dictionaries

problem There are two dictionaries. We want to find out what they have in common (the same keys, values) in the two dictionaries.

Solution Dictionary keys() and values() The keys of the dictionary support the operation of the collection.

k1 = {
    'x': 10,
    'y': 15,
    'z': 20,
    'b': 80
}
k2 = {
    'w': 10,
    'y': 15,
    'z': 30,
    'b': 80
}

	# 1. The same key in two dictionaries
k1.keys() & k2.keys()	
	# 2. Items with the same key and value in two dictionaries
k1.items() & k2.items()  	
	# 3. Filtering certain items in a dictionary
{key: k1[key] for key in k1.keys() - {'b', 'z'}}
k1 = {
    'x': 10,
    'y': 15,
}

8. Delete duplicates from the sequence and keep the key order of the elements unchanged

problem Remove duplicates from the sequence and keep the sequence unchanged

Solution 1 By means of set + generator (when elements can be hash, objects of _hash_ can be realized)

def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

Solution 2 By means of set + generator (when elements can be hash, objects of _hash_ can be realized)

def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)	# Key step
        if val not in seen:
            yield val
            seen.add(val)

Advanced usage: For advanced data types, it can also be de-duplicated

a = [{'a': 1, 'y': 2}, {'a': 1, 'y': 6}, {'a': 1, 'y': 2}, {'a': 1, 'y': 2}]
list(dedupe(a, key=lambda d: (d['a'], d['y'])))

9. Naming slices

problem There are many hard-coded slices like [0:5, 2], [-4, -3, -1] in the code. It's not easy for us to look back at the code.

Solution Naming slices by slice function

s = [1, 2, 3, 4, 5, 6, 7]
head = slice(0, 4, 1) 
s[head]						# [1, 2, 3, 4]
	
head.start					# 0
head.stop					# 4
head.step					# 1

Expand Use indices to fix a length, and all values must be within a boundary

s = [1, 2, 3, 4, 5, 6, 7]
head = slice(0, 5, 2)
print(head.indices(len(s)))			# (0, 5, 1)

for i in range(*head.indices(len(s))):
    print('Following table', i)
    print('value', s[i])
   
"""
//Table 0 below
//Value 1
//Table 2 below
//Value 3
//Table 4 below
//Value 5
"""

10. Find the most frequent element in the sequence

problem There is a sequence of elements to find out what the most frequently occurring elements are.

Solution collection.Counter can be easily implemented

most_counter() finds the three elements with the highest frequency

from collections import Counter
word = ['a', 'a', 'b', 'c', 'c', 'd']
counter = Counter(word)
counter.most_common(2)			# [('a', 2), ('c', 2)]

counter['a']					# Number of Views 2
counter['d']					# Number of Views 1

In other uses of Counter, the underlying maintenance of counter is a dictionary.

	# 1. Because behind counter, a dictionary is maintained, which is assigned by dictionary.
s = ['a', 'b', 'c', 'd', 'a', 'b', 'c']
counter = Counter()
for i in s:
    counter[i] += 1
   
print(counter)	
# Counter({'a': 2, 'b': 2, 'c': 2, 'd': 1})

	# 2. Mathematical calculation of two counter s
a = ['a', 'b', 'c', 'd']
b = ['b', 'c', 'e', 'h']
counter_a = Counter(a)
counter_b = Counter(b)
 
counter_a + counter_b
# Counter({'a': 1, 'b': 2, 'c': 2, 'd': 1, 'e': 1, 'h': 1})

11. Sort dictionary lists by common keys

problem Have a list of dictionaries that sort lists according to the values in one or more dictionaries

Solution 1 Use operator Medium itemgetter Function to sort

rows = [
    {'fname': 'B', 'iname': 'bbb', 'uid': '1006'},
    {'fname': 'C', 'iname': 'aaa', 'uid': '1001'},
    {'fname': 'D', 'iname': 'ddd', 'uid': '1000'},
    {'fname': 'C', 'iname': 'ccc', 'uid': '1002'}
]

from operator import itemgetter
rows_by_fname = sorted(rows, key=itemgetter('fname'))
rows_by_uid = sorted(rows, key=itemgetter('uid'))
	# Sort by multiple rows
rows_by_fname_uid = sorted(rows, key=itemgetter('lname', 'uid'))
......
[{'fname': 'B', 'iname': 'bbb', 'uid': '1006'},
 {'fname': 'C', 'iname': 'aaa', 'uid': '1001'}, 
 {'fname': 'C', 'iname': 'ccc', 'uid': '1002'}, 
 {'fname': 'D', 'iname': 'ddd', 'uid': '1000'}]

[{'fname': 'D', 'iname': 'ddd', 'uid': '1000'}, 
 {'fname': 'C', 'iname': 'aaa', 'uid': '1001'}, 
 {'fname': 'C', 'iname': 'ccc', 'uid': '1002'},
 {'fname': 'B', 'iname': 'bbb', 'uid': '1006'}]

Solution II To use lambda + sorted, the performance is a little slower than above

rows = [
    {'fname': 'B', 'iname': 'bbb', 'uid': '1006'},
    {'fname': 'C', 'iname': 'aaa', 'uid': '1001'},
    {'fname': 'D', 'iname': 'ddd', 'uid': '1000'},
    {'fname': 'C', 'iname': 'ccc', 'uid': '1002'}
]

from operator import itemgetter
rows_by_fname = sorted(rows, key=lambda k: k['fname'])
rows_by_uid = sorted(rows, key=lambda k: k['uid'])
	# Sort by multiple rows
rows_by_fname_uid = sorted(rows, key=lambda x:(x['fname'], x['uid']))
......
[{'fname': 'B', 'iname': 'bbb', 'uid': '1006'},
 {'fname': 'C', 'iname': 'aaa', 'uid': '1001'}, 
 {'fname': 'C', 'iname': 'ccc', 'uid': '1002'}, 
 {'fname': 'D', 'iname': 'ddd', 'uid': '1000'}]

[{'fname': 'D', 'iname': 'ddd', 'uid': '1000'}, 
 {'fname': 'C', 'iname': 'aaa', 'uid': '1001'}, 
 {'fname': 'C', 'iname': 'ccc', 'uid': '1002'},
 {'fname': 'B', 'iname': 'bbb', 'uid': '1006'}]

Expand The itemgetter and lambda methods can also be applied to max and min functions.

max(rows, key=lambda k: k['uid'])
max(rows, key=itemgetter('uid'))

{'fname': 'B', 'iname': 'bbb', 'uid': '1006'}

12. Sort instances by instance attributes

problem Ordering multiple instances of a class by instance attributes

Solution Using lambda or attrgetter by passing the key value of sorted

from operator import itemgetter

class User:
    def __init__(self, user_id):
        self.id = user_id
        
    def __repr__(self):
        return 'User ({})'.format(self.id)
        
users = [User(5), User(10), User(8), User(1)]

sorted(users, key=lambda u: u.id)
sorted(users, key=attrgetter('id'))

# [User (1), User (5), User (8), User (10)]

Expand The attrgetter and lambda methods can also be applied to max and min functions.
```
max(users, key=lambda u: u.id)
max(users, key=attrgetter('u'))

User (10)
```

13. Grouping records by field

Solve There are a series of dictionary or instance objects that are grouped iteratively according to a characteristic field (such as date)

Solution 1 Groupby () to group data

rows = [
    {'address': '5412 N CLARK', 'date': '07/01/2012'},
    {'address': '5254 E CLARK', 'date': '07/01/2012'},
    {'address': '5312 N CLARK', 'date': '07/02/2012'},
    {'address': '5482 S CLARK', 'date': '07/03/2012'},
    {'address': '5484 N CLARK', 'date': '28/02/2012'}
]

from itertools import groupby
from operator import itemgetter

rows.sort(key=itemgetter('date'))		# This step must be sorted, otherwise the grouping will go wrong
for date, items in groupby(rows, key=itemgetter('date')):
    print(date)
    for i in items:
        print(' '*4, i)

Solution II If the data is grouped in a single memory, without considering memory, use collections.defaultdict

from collections import defaultdict
rows_by_date = defaultdict(list)
for row in rows:
    rows_by_date[row['date']].append(row)
rows_by_date

14. Screening the elements in the sequence

problem Some of the data in the sequence are what we need, some are unnecessary, and they are filtered according to specific conditions.

Solution 1 Use list derivation (where native input is not very large)

>>> my_list = [1, 2, 3, -5, 2, -10, -9, -12, 15]
>>> [i for i in my_list if i > 0]
[1, 2, 3, 2, 15]
>>> [i for i in my_list if i < 0]
[-5, -10, -9, -12]

Solution II Input data is very large. Generator expressions are used to generate values lazily

>>> res = (i for i in my_list if i < 0)
>>> res
<generator object <genexpr> at 0x0000023C252D0E08>
>>> for i in range(6):
...     print(i)
...
0
1
2
3
4
5

Solution 3 The screening conditions are too complicated, so the filter() function is used for filtering.

items = [', ', '**6', '-', '125', 'ss5', '12', '-10']

def parse(item):
    try:
        if int(item) > 10:
            return True
    except Exception:
        return False
    
list(filter(parse, items))

# ['125', '12']

Development Numeric Conversion of Generating ExpressionsList-to-List Push Formulas, Trinomial Operations

	# 1. Numeric Conversion
>>> s = [1, 2, 5, 4, 6, 1, 3, 7]
>>> [i**2 for i in s]
[1, 4, 25, 16, 36, 1, 9, 49]

	# 2. Trinomial operation
>>> s = ['1', '2', 'd', 'aa', '6.6', '0.2', '**']
>>> [i if i.isdigit() else None for i in s]
['1', '2', None, None, None, None, None]

Development II. Boolean sequence + itertools.compress for filtering, data beyond the Boolean list is not calculated

from itertools import compress
data = [
    'aa1',
    'aa2',
    'aa3',
    'aa4',
    'aa5',
    'aa6',
    'aa7',
    'aa8',
    'aa9',
    'aa10'
]
s = [1, 2, 5, 4, 6, 1, 3, 7]
bool_list = [n > 5 for n in s]
# [False, False, False, False, True, False, False, True]
list(compress(data, bool_list))

# ['aa5', 'aa8']

15. Extract values from a dictionary to create a dictionary

problem We need to create a dictionary, which itself is a subset of another dictionary

Solution 1 Dictionary Derivation

d = {
    'a': 1,
    'b': 2,
    'c': 3,
    'd': 4,
    'e': 5,
    'f': 6
}
{key: value for key, value in d.items() if value > 3}
	# {'d': 4, 'e': 5, 'f': 6}
{key: value for key, value in d.items() if key in ['a', 'b', 'c']} 
	# {'a': 1, 'b': 2, 'c': 3}

Solution II Mandatory conversion of tuple sequence with dict function, twice faster than the first one!
```
dict((k, v) for k, v in d.items() if v > 2)
	# {'c': 3, 'd': 4, 'e': 5, 'f': 6}
```

Solution 3 More esoteric writing than scheme 2. Dictionary value generation is 1.6 times faster than the first one.

key = {'a', 'b', 'c'}
{key: d[key] for key in d.keys() & key}	# The key is intersected once
	# {'c': 3, 'b': 2, 'a': 1}

16. Name the value of the tuple

problem Our code accesses tuples or lists through the following table, but sometimes it's hard for us to read. We want to access the contents of tuples like attributes of instances.

Solution 1 Naming a meta-ancestor with nametuple is a relatively common tuple, which can be achieved by adding a very small amount of space

>>> from collections import namedtuple
>>> User = namedtuple('User', ['name', 'gender', 'age'])
>>> a = User('a', 'M', 20)
>>> a.name
'a'
>>> a.age
20

Although he looks like he created a class and did an instantiated operation, he supports all operations of ordinary tuples, such as indexing and decomposition, unpacking, and so on.

nametuple To create named tuples using for loop unpacking data

data = [
    ('a', 1, 5),
    ('b', 6, 3),
    ('c', 2, 5),
    ('d', 3, 7),
    ('e', 4, 8),
]
	# 1. Without naming tuples, use the following table to get values, and the semantic expression is not clear.
def comput_cost(records):
    total = 0.0
    for rec in records:
        total += rec[1] * rec[2]
    return total
comput_cost(data)
# 86.0

	# 2. Generate by unpacking using named tuples
User = namedtuple('User', ['name', 'data1', 'data2'])
def comput_cost(records):
    total = 0.0
    for rec in records:
        user = User(*rec)
        total += user.data1 * user.data2
    return total
# 86.0

Expanding Usage Instead of using dictionaries, it takes less memory than using dictionaries. __ slits __() Attribute classes are a little lower

	# !!! Alternative dictionary usage created with nametuple requires a built-in _replace() method to modify the value
>>> a = User('xyb', 20, 'F')
>>> a.age = 21
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

>>> a._replace(age=21)
User(name='xyb', age=21, gender='F')	# Modified success

Expanding Usage II Create default tuples with _replace in conjunction with dictionaries

User = namedtuple('User', ['name', 'age', 'gender', 'number', 'date'])
default_user = User('', '', '', None, None)
def dict_to_tuple(d):
    return default_user._replace(**d)

d = {'name': 'xyb', 'age': 20}
dict_to_tuple(d)
# User(name='xyb', age=20, gender='', number=None, date=None)

17. Generator expression as a single parameter of a function

problem We need to call a conversion function, but first we need to transform and filter the data.

Solutions Use generator expressions in functions

	# 1. Columns such as sum functions
nums = [1, 2, 3, 4, 5]
sum(x * x for x in nums)
	# 2. any function (returns True as long as one of the elements of the iterable object passed in is True)
import os
files = os.listdir()
if any(name.endswith('.py') for name in files):
    print('python file')
else:
    print('sorry no python')
    # 3. Compare the maximum
data = [
    {'name': 'A', 'age': 64},
    {'name': 'B', 'age': 28},
    {'name': 'C', 'age': 36},
    {'name': 'D', 'age': 15},
]
min(item['age'] for item in data)
	# 4. Output CSV
s = ['i', 'love', 'a', 'girl']
print(','.join(item) for item in s)

Analysis Why can generator expressions be directly used as parameters of functions

>>> nums = [1, 2, 3, 4, 5, 6]
>>> s = sum(x * x for x in nums)
>>> s
91
>>> s = sum((x * x for x in nums))
>>> s
91

# Whether there is () effect is the same.

master Replacement of key in function by generator expression

data = {
    'a': 20,
    'b': 25,
    'c': 15,
    'd': 30
}
min(data[item] for item in data)
min(data, key=lambda k: data[k])

18. Combine multiple dictionaries into a single dictionary

problem There are multiple dictionaries or maps that logically merge them into a separate structure to perform certain operations, such as finding keys, and whether values exist
Solution Using collections.ChainMap to link the two dictionaries from the bottom
```
a = {'a': 1, 'b': 1}
b = {'c': 1, 'b': 2}
from collections import ChainMap
c = ChainMap(a, b)
len(c)
list(c.keys())		# ['b', 'c', 'a']
list(c.values())	# [1, 1, 1]
```
- If two dictionaries have the same key, the value depends on the key of the first dictionary.
- If you want to delete the key-value pairs in the combined dictionary, you can only remove the first mapping or dictionary.
```
del c['a']		# √
del c['c']		# ×
```

Posted by rross46 on Thu, 26 Sep 2019 05:42:43 -0700

Programmer Group