Functools.lru of Python caching mechanism_ cache

Keywords: Python Cache

When it comes to the concept of cache, I think everyone should be familiar with it. Cache applications represented by Redis and Memcache have basically become the standard configuration of microservice architecture.

In fact, it does not mean that Redis and other services must be deployed to use cache. For example, in small single applications with Python as the development language, we can use functools.lru_cache to implement the caching mechanism. Of course, you can also use secondary encapsulation on this basis to meet your own needs, such as adding cache expiration time.

First, use a simple example to understand the concept of caching mechanism, as shown in the following code. Note that this is just a general wit. This method is not necessary. A more elegant method is functools.lru_cache.

# -*- coding: utf-8 -*-
import random
import datetime


class MyCache:
    """Cache class"""

    def __init__(self):
        # Cache data in kv form with dictionary structure
        self.cache = {}
        # Limit the size of the cache because the cache space is limited
        # Therefore, when the cache is too large, you need to discard the old cache
        self.max_cache_size = 10

    def __contains__(self, key):
        """Returns based on whether the key exists in the cache True perhaps False"""
        return key in self.cache

    def get(self, key):
        """Get data from cache"""
        data = self.cache[key]
        data["date_accessed"] = datetime.datetime.now()
        return data["value"]

    def add(self, key, value):
        """Update the cache dictionary. If the cache is too large, delete the oldest entry first"""
        if key not in self.cache and len(self.cache) >= self.max_cache_size:
            self.remove_oldest()
        self.cache[key] = {
            'date_accessed': datetime.datetime.now(),
            'value': value
        }

    def remove_oldest(self):
        """Delete input data with earliest access date"""
        oldest_entry = None

        for key in self.cache:
            if oldest_entry is None:
                oldest_entry = key
                continue
            curr_entry_date = self.cache[key]['date_accessed']
            oldest_entry_date = self.cache[oldest_entry]['date_accessed']
            if curr_entry_date < oldest_entry_date:
                oldest_entry = key

        self.cache.pop(oldest_entry)

    @property
    def size(self):
        """Returns the size of the cache capacity"""
        return len(self.cache)


if __name__ == '__main__':
    # Test cache function
    cache = MyCache()
    cache.add("test", sum(range(100000)))
    assert cache.get("test") == cache.get("test")

    keys = [
        'red', 'fox', 'fence', 'junk', 'other', 'alpha', 'bravo', 'cal',
        'devo', 'ele'
    ]
    s = 'abcdefghijklmnop'
    for i, key in enumerate(keys):
        if key in cache:
            continue
        else:
            value = ''.join([random.choice(s) for i in range(20)])
            cache.add(key, value)

    assert "test" not in cache
    print(cache.cache)

In the 3.2 + version of Python, a very elegant caching mechanism is introduced, namely   functool   In the module   lru_cache   Decorator can directly cache the results of function or class methods, and subsequent calls directly return the cached results. lru_cache   The prototype is as follows:

@functools.lru_cache(maxsize=None, typed=False)

LUR using functools module_ The cache decorator can cache up to maxsize call results of this function, so as to improve the efficiency of program execution, especially for time-consuming functions. parameter   maxsize   It is the maximum number of caches. If it is None, there is no limit. When it is set to the power of 2, the performance is the best; If   typed=True (note that there is no such parameter in functools32), then calls with different parameter types will be cached separately, for example, f(3) and f(3.0) will be cached separately.

LRU (Least Recently Used)   The algorithm is a cache elimination strategy. It is eliminated according to the historical access record of data. The core idea is that "if the data has been accessed recently, the probability of being accessed in the future is higher". The algorithm was originally a memory managed page replacement algorithm in the operating system. It is mainly used to find the memory blocks that have not been used in memory for a long time and move them out of memory, so as to provide space for new data. The principle is like the simple example above.

cover   lru_cache   Decorated functions will have   cache_clear   and   cache_info   Two methods are used to clear the cache and view the cache information respectively.

The following is a simple LRU_ The use effect of cache. If the function is called, print will be used to print a log. If the cache is used, it will not.

from functools import lru_cache

@lru_cache(None)
def add(x, y):
    print("calculating: %s + %s" % (x, y))
    return x + y

print(add(1, 2))
print(add(1, 2))
print(add(2, 3))

The operation results are as follows:

calculating: 1 + 2
3
3
calculating: 2 + 3
5

It can be seen from the results that when add(1, 2) is called for the second time, the function body is not really executed, but the cached results are returned directly.

Next, let's look at the setting and use of maxsize and typed parameters. The following code: maxsize=1 means that only one result is cached, type=True means that types are strictly distinguished, and a=3 and a=3.0 are different scenarios, which will be cached as two results.

from functools import lru_cache


@lru_cache(maxsize=1, typed=True)
def add(x, y):
    print("calculating: %s + %s" % (x, y))
    return x + y


print(add(1, 2))
print(add(1, 2))
print(add(2, 3))
print(add(2, 3.0))
print(add(1, 2))
print(add(2, 3))
print(add(2, 3.0))
print(add(2, 3.0))

Output results:

calculating: 1 + 2
3
3
calculating: 2 + 3
5
calculating: 2 + 3.0
5.0
calculating: 1 + 2
3
calculating: 2 + 3
5
calculating: 2 + 3.0
5.0
5.0

To view the current cache information of the function, you can use the following methods, such as viewing the add function  :

# View function cache information
cache_info = add.cache_info()
print(cache_info)

The output results are similar: hits represents the number of hits to the cache, misses represents the number of misses to the cache, maxsize represents the maximum number of stored results allowed, and curlsize represents the currently stored result data.

CacheInfo(hits=3, misses=2, maxsize=1, currsize=1)

If you need to consider expiration time and thread safety, you can use the following method.

import collections
import threading
import time


class LRUCacheNotThreadSafe(object):
    """
    # LRU cache
    """
    def __init__(self, capacity):
        """
        # cache
        """
        self.capacity = capacity
        self.cache = collections.OrderedDict()

    def get_and_clear_expired(self, key, current_timestamp):
        """
        # get value and clean expired valued.
        """
        try:
            (value, expire_time) = self.cache.pop(key)
            if expire_time > current_timestamp:
                # only when don't expire, we keep this key
                self.cache[key] = (value, expire_time)
            return (True, value)
        except KeyError:
            return (False, None)

    def set(self, key, value, expire_time):
        """
        # set value
        """
        try:
            self.cache.pop(key)
        except KeyError:
            if len(self.cache) >= self.capacity:
                self.cache.popitem(last=False)
        self.cache[key] = (value, expire_time)


class LRUCacheThreadSafe(object):
    """
    LRU cache. base copied from https://www.kunxi.org/2014/05/lru-cache-in-python/
    we clean expired value only when get.
    """

    def __init__(self, capacity):
        """cache
        """
        self.capacity = capacity
        self.cache = collections.OrderedDict()
        self._lock = threading.Lock()

    def get_and_clear_expired(self, key, current_timestamp):
        """
        # get value and clean expired valued.
        :param key:
        :param current_timestamp:
        :return:
        """
        with self._lock:
            try:
                (value, expire_time) = self.cache.pop(key)
                if expire_time > current_timestamp:
                    # only when don't expire, we keep this key
                    self.cache[key] = (value, expire_time)
                return (True, value)
            except KeyError:
                return (False, None)

    def set(self, key, value, expire_time):
        """
        # set value
        """
        with self._lock:
            try:
                self.cache.pop(key)
            except KeyError:
                if len(self.cache) >= self.capacity:
                    self.cache.popitem(last=False)
            self.cache[key] = (value, expire_time)


# Cache example
lru_ins = LRUCacheThreadSafe(capacity=50)
# Write 1
lru_ins.set("key1", "value1", int(time.time()))
# Query 1
s, v = lru_ins.get_and_clear_expired("key1", int(time.time()))
print(s, v)

# Write 2
lru_ins.set("key1", "value2", int(time.time()))
# Query 2
s, v = lru_ins.get_and_clear_expired("key1", int(time.time()))
print(s, v)

Output results:

True value1
True value2

reference resources:

Python caching mechanism and functools.lru_cache | Huoty's Blog (konghy.cn)

python built-in cache lru_cache usage and extension (detailed)

Posted by vasse on Sun, 21 Nov 2021 13:58:22 -0800