When it comes to the concept of cache, I think everyone should be familiar with it. Cache applications represented by Redis and Memcache have basically become the standard configuration of microservice architecture.
In fact, it does not mean that Redis and other services must be deployed to use cache. For example, in small single applications with Python as the development language, we can use functools.lru_cache to implement the caching mechanism. Of course, you can also use secondary encapsulation on this basis to meet your own needs, such as adding cache expiration time.
First, use a simple example to understand the concept of caching mechanism, as shown in the following code. Note that this is just a general wit. This method is not necessary. A more elegant method is functools.lru_cache.
# -*- coding: utf-8 -*- import random import datetime class MyCache: """Cache class""" def __init__(self): # Cache data in kv form with dictionary structure self.cache = {} # Limit the size of the cache because the cache space is limited # Therefore, when the cache is too large, you need to discard the old cache self.max_cache_size = 10 def __contains__(self, key): """Returns based on whether the key exists in the cache True perhaps False""" return key in self.cache def get(self, key): """Get data from cache""" data = self.cache[key] data["date_accessed"] = datetime.datetime.now() return data["value"] def add(self, key, value): """Update the cache dictionary. If the cache is too large, delete the oldest entry first""" if key not in self.cache and len(self.cache) >= self.max_cache_size: self.remove_oldest() self.cache[key] = { 'date_accessed': datetime.datetime.now(), 'value': value } def remove_oldest(self): """Delete input data with earliest access date""" oldest_entry = None for key in self.cache: if oldest_entry is None: oldest_entry = key continue curr_entry_date = self.cache[key]['date_accessed'] oldest_entry_date = self.cache[oldest_entry]['date_accessed'] if curr_entry_date < oldest_entry_date: oldest_entry = key self.cache.pop(oldest_entry) @property def size(self): """Returns the size of the cache capacity""" return len(self.cache) if __name__ == '__main__': # Test cache function cache = MyCache() cache.add("test", sum(range(100000))) assert cache.get("test") == cache.get("test") keys = [ 'red', 'fox', 'fence', 'junk', 'other', 'alpha', 'bravo', 'cal', 'devo', 'ele' ] s = 'abcdefghijklmnop' for i, key in enumerate(keys): if key in cache: continue else: value = ''.join([random.choice(s) for i in range(20)]) cache.add(key, value) assert "test" not in cache print(cache.cache)
In the 3.2 + version of Python, a very elegant caching mechanism is introduced, namely functool In the module lru_cache Decorator can directly cache the results of function or class methods, and subsequent calls directly return the cached results. lru_cache The prototype is as follows:
@functools.lru_cache(maxsize=None, typed=False)
LUR using functools module_ The cache decorator can cache up to maxsize call results of this function, so as to improve the efficiency of program execution, especially for time-consuming functions. parameter maxsize It is the maximum number of caches. If it is None, there is no limit. When it is set to the power of 2, the performance is the best; If typed=True (note that there is no such parameter in functools32), then calls with different parameter types will be cached separately, for example, f(3) and f(3.0) will be cached separately.
LRU (Least Recently Used) The algorithm is a cache elimination strategy. It is eliminated according to the historical access record of data. The core idea is that "if the data has been accessed recently, the probability of being accessed in the future is higher". The algorithm was originally a memory managed page replacement algorithm in the operating system. It is mainly used to find the memory blocks that have not been used in memory for a long time and move them out of memory, so as to provide space for new data. The principle is like the simple example above.
cover lru_cache Decorated functions will have cache_clear and cache_info Two methods are used to clear the cache and view the cache information respectively.
The following is a simple LRU_ The use effect of cache. If the function is called, print will be used to print a log. If the cache is used, it will not.
from functools import lru_cache @lru_cache(None) def add(x, y): print("calculating: %s + %s" % (x, y)) return x + y print(add(1, 2)) print(add(1, 2)) print(add(2, 3))
The operation results are as follows:
calculating: 1 + 2 3 3 calculating: 2 + 3 5
It can be seen from the results that when add(1, 2) is called for the second time, the function body is not really executed, but the cached results are returned directly.
Next, let's look at the setting and use of maxsize and typed parameters. The following code: maxsize=1 means that only one result is cached, type=True means that types are strictly distinguished, and a=3 and a=3.0 are different scenarios, which will be cached as two results.
from functools import lru_cache @lru_cache(maxsize=1, typed=True) def add(x, y): print("calculating: %s + %s" % (x, y)) return x + y print(add(1, 2)) print(add(1, 2)) print(add(2, 3)) print(add(2, 3.0)) print(add(1, 2)) print(add(2, 3)) print(add(2, 3.0)) print(add(2, 3.0))
Output results:
calculating: 1 + 2 3 3 calculating: 2 + 3 5 calculating: 2 + 3.0 5.0 calculating: 1 + 2 3 calculating: 2 + 3 5 calculating: 2 + 3.0 5.0 5.0
To view the current cache information of the function, you can use the following methods, such as viewing the add function :
# View function cache information cache_info = add.cache_info() print(cache_info)
The output results are similar: hits represents the number of hits to the cache, misses represents the number of misses to the cache, maxsize represents the maximum number of stored results allowed, and curlsize represents the currently stored result data.
CacheInfo(hits=3, misses=2, maxsize=1, currsize=1)
If you need to consider expiration time and thread safety, you can use the following method.
import collections import threading import time class LRUCacheNotThreadSafe(object): """ # LRU cache """ def __init__(self, capacity): """ # cache """ self.capacity = capacity self.cache = collections.OrderedDict() def get_and_clear_expired(self, key, current_timestamp): """ # get value and clean expired valued. """ try: (value, expire_time) = self.cache.pop(key) if expire_time > current_timestamp: # only when don't expire, we keep this key self.cache[key] = (value, expire_time) return (True, value) except KeyError: return (False, None) def set(self, key, value, expire_time): """ # set value """ try: self.cache.pop(key) except KeyError: if len(self.cache) >= self.capacity: self.cache.popitem(last=False) self.cache[key] = (value, expire_time) class LRUCacheThreadSafe(object): """ LRU cache. base copied from https://www.kunxi.org/2014/05/lru-cache-in-python/ we clean expired value only when get. """ def __init__(self, capacity): """cache """ self.capacity = capacity self.cache = collections.OrderedDict() self._lock = threading.Lock() def get_and_clear_expired(self, key, current_timestamp): """ # get value and clean expired valued. :param key: :param current_timestamp: :return: """ with self._lock: try: (value, expire_time) = self.cache.pop(key) if expire_time > current_timestamp: # only when don't expire, we keep this key self.cache[key] = (value, expire_time) return (True, value) except KeyError: return (False, None) def set(self, key, value, expire_time): """ # set value """ with self._lock: try: self.cache.pop(key) except KeyError: if len(self.cache) >= self.capacity: self.cache.popitem(last=False) self.cache[key] = (value, expire_time) # Cache example lru_ins = LRUCacheThreadSafe(capacity=50) # Write 1 lru_ins.set("key1", "value1", int(time.time())) # Query 1 s, v = lru_ins.get_and_clear_expired("key1", int(time.time())) print(s, v) # Write 2 lru_ins.set("key1", "value2", int(time.time())) # Query 2 s, v = lru_ins.get_and_clear_expired("key1", int(time.time())) print(s, v)
Output results:
True value1 True value2
reference resources:
Python caching mechanism and functools.lru_cache | Huoty's Blog (konghy.cn)
python built-in cache lru_cache usage and extension (detailed)