Python gracefully merges two Dict s

Keywords: Python Lambda less Ubuntu

One line of code merges two dict s

Suppose two dicts x and y are merged into a new dict without changing the values of x and y, for example

 
 x = {'a': 1, 'b': 2}
 y = {'b': 3, 'c': 4}

A new result Z is expected, and y covers x if the key is the same. Expected results

>>> z
{'a': 1, 'b': 3, 'c': 4}

In PEP448, a new syntax is implemented and supported in Python 3.5. The merge code is as follows

z = {**x, **y}

A proper line of code.
Because many people are still using Python 2, there's an elegant way for people with Python 2 and python 3.0-python 3.4, but it requires two lines of code.

z = x.copy()
z.update(y)

The above method, y will cover the content of x, so the final result b=3.

How to do it in one line without using Python 3.5

If you haven't used Python 3.5, or need to write backward compatible code, and you want to run in a single expression, the most effective way is to put it in a function:

def merge_two_dicts(x, y):
    """Given two dicts, merge them into a new dict as a shallow copy."""
    z = x.copy()
    z.update(y)
    return z

Then a line of code completes the call:

 z = merge_two_dicts(x, y)

You can also define a function that combines multiple dict s, such as

def merge_dicts(*dict_args):
    """
    Given any number of dicts, shallow copy and merge into a new dict,
    precedence goes to key value pairs in latter dicts.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

Then you can use it this way.

z = merge_dicts(a, b, c, d, e, f, g) 

All of these, the same key, are covered by the back.

Some less elegant demonstrations

items

Some people will use this method:

 z = dict(x.items() + y.items())

This is actually to create two lists in memory, then create a third list, after the copy is completed, create a new dict, delete the first three lists. This method consumes performance, and for Python 3, this cannot be successfully executed because items() return is an object.

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 
'dict_items'

You have to explicitly force it into a list, z = Dict (list (x. items ()) + list (y. items ()), which is a waste of performance.
In addition, the method of unifying list s returned from items() will also fail for Python 3. Moreover, the method of unifying leads to uncertainty in the value of duplicate key s. Therefore, if you have a priority requirement for the merging of two dict s, this method is completely inappropriate.

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Here's an example where y should have priority, but because of any set order, the value of x is preserved:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

Constructor

Others will use it that way.

z = dict(x, **y)

This is a good way to use it. It's much more efficient than the previous two steps, but it's not readable enough for pythonic. If the key is not a string, Python 3 still fails.

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

Guido van Rossum said: declare dict ({}, {1:3}) illegal, because it is an abuse of mechanism after all. Although this method is more hacker, it is too opportunistic.

Some of the poorer but more elegant methods

The following methods, though poorly performing, are much better than the items method. And support priority.


{k: v for d in dicts for k, v in d.items()}

This can be done in Python 2.6

 dict((k, v) for d in dicts for k, v in d.items())
 

itertools.chain:

import itertools
z = dict(itertools.chain(x.iteritems(), y.iteritems()))

performance testing

The following is done on Ubuntu 14.04, in Python 2.7 (System Python):

>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.5726828575134277
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.163769006729126
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.iteritems(),y.iteritems()))))
1.1614501476287842
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
2.2345519065856934

In Python 3.5

>>> min(timeit.repeat(lambda: {**x, **y}))
0.4094954460160807
>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.7881555100320838
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.4525277839857154
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.items(), y.items()))))
2.3143140770262107
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
3.2069112799945287

Why not visit my blog?

Posted by Mr Mako on Thu, 06 Jun 2019 12:33:14 -0700