How do I sort dictionary lists by dictionary values?

Keywords: Attribute Lambda Programming

I have a list of dictionaries that I want each item to be sorted by a specific attribute value.

Consider the following array,

[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]

When sorting by name, it should be

[{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]

#1st floor

Using Perl's Schwartzian transformation,

py = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]

do

sort_on = "name"
decorated = [(dict_[sort_on], dict_) for dict_ in py]
decorated.sort()
result = [dict_ for (key, dict_) in decorated]

to

>>> result
[{'age': 10, 'name': 'Bart'}, {'age': 39, 'name': 'Homer'}]

Of Perl Schwartzian transformation More information

In computer science, Schwartzian transformation is a Perl programming idiom used to improve the efficiency of sorting item lists.When sorting is actually based on an attribute (key) of an element, this usage applies to comparison-based sorting, where calculating the attribute is a dense operation that should be performed at least a few times.The obvious thing about Schwartzian transformation is that it does not use named temporary arrays.

#2nd floor

Suppose I have a dictionary D with elements below it.To sort, simply pass the custom function using the key parameter in the sort, as follows:

D = {'eggs': 3, 'ham': 1, 'spam': 2}
def get_count(tuple):
    return tuple[1]

sorted(D.items(), key = get_count, reverse=True)
# or
sorted(D.items(), key = lambda x: x[1], reverse=True)  # avoiding get_count function call

inspect this Get out.

#3rd floor

This is an alternative generic solution - it sorts dict elements by keys and values.Its advantage - there is no need to specify keys, and it will still work if some keys are missing from some dictionaries.

def sort_key_func(item):
    """ helper function used to sort list of dicts

    :param item: dict
    :return: sorted list of tuples (k, v)
    """
    pairs = []
    for k, v in item.items():
        pairs.append((k, v))
    return sorted(pairs)
sorted(A, key=sort_key_func)

#4th floor

If you want to sort the list by multiple keys, you can do the following:

my_list = [{'name':'Homer', 'age':39}, {'name':'Milhouse', 'age':10}, {'name':'Bart', 'age':10} ]
sortedlist = sorted(my_list , key=lambda elem: "%02d %s" % (elem['age'], elem['name']))

It is quite shocking because it relies on converting values to a single string representation for comparison, but it also works well for numbers including negative numbers (although if you use numbers, you need to format the string appropriately with zero padding)

#5th floor

Using the pandas package is another option, although its large-scale operation is much slower than the more traditional methods suggested by others:

import pandas as pd

listOfDicts = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
df = pd.DataFrame(listOfDicts)
df = df.sort_values('name')
sorted_listOfDicts = df.T.to_dict().values()

Here are some benchmarks for small and large (100k +) dictionaries:

setup_large = "listOfDicts = [];\
[listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10})) for _ in range(50000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"

setup_small = "listOfDicts = [];\
listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"

method1 = "newlist = sorted(listOfDicts, key=lambda k: k['name'])"
method2 = "newlist = sorted(listOfDicts, key=itemgetter('name')) "
method3 = "df = df.sort_values('name');\
sorted_listOfDicts = df.T.to_dict().values()"

import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))

t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_large)
print('Large Method Pandas: ' + str(t.timeit(1)))

#Small Method LC: 0.000163078308105
#Small Method LC2: 0.000134944915771
#Small Method Pandas: 0.0712950229645
#Large Method LC: 0.0321750640869
#Large Method LC2: 0.0206089019775
#Large Method Pandas: 5.81405615807

Posted by fatnjazzy on Fri, 06 Dec 2019 13:00:39 -0800