The name of the database is WawaDB, which is implemented in python. It can be seen that python is very powerful!
brief introduction
The requirements for logging are generally as follows:
Only append, not modify, write in chronological order;
Write a lot, read a little, query, generally query the data of a time period;
The fixed collection of MongoDB well meets this requirement, but MongoDB occupies a large memory, which is a bit like a fire through a mosquito and making a mountain out of a molehill.
The idea of WawaDB is to record the current time and the offset of the log file in an index file for every 1000 logs written.
Then, when querying the log by time, first load the index into memory, find out the offset of the time point by dichotomy, and then open the log file seek to the specified location, so that the data required by the user can be located and read quickly without traversing the whole log file.
performance
Core 2 P8400,2.26GHZ,2G memory, 32 bit win7
Write test:
Simulate writing 10000 pieces of data in 1 minute, write data for 5 hours in total, insert 3 million pieces of data, each data has 54 characters, and it takes 2 minutes and 51 seconds
Read test: read the log containing a substring within the specified time period
Data range traversal data volume results time (seconds)
5 hours 3 million 604 6.6
2 hours 1.2 million 2.25 2.7
600000 96 1.3 per hour
30 minutes 300000 44 0.6
Indexes
Only the time of log records is indexed. The introduction roughly describes the implementation of the index. Binary search is certainly not as efficient as B Tree, but generally it is not an order of magnitude worse, and the implementation is particularly simple.
Because it is a sparse index, not every log has an index to record its offset, so when reading data, you should read more data forward to prevent missing reading, and then really return data to the user when you read the really needed data.
As shown in the following figure, for example, if the user wants to read logs from 25 to 43, find 25 by dichotomy, and find the point where 30 is located,
Index: 0 ten twenty thirty forty 50 log: |... |... |... |... > > A = [0, 10, 20, 30, 40, 50] > > > bisect.bisect_ left(a, 35)>>>3>>>a[3]>>>30>>>bisect.bisect_ left(a, 43)>>>5>>>a[5]>>>50
Therefore, we need to move forward and read the log from 20 (the previous scale of 30). After reading 21, 22, 23 and 24, they are discarded because they are smaller than 25. After reading 25, 26, 27,..., they are returned to the user
After reading 40 (the previous scale of 50), it is necessary to judge whether the current data is greater than 43. If it is greater than 43 (return to the data in the fully open interval), it is necessary to stop reading.
On the whole, we only operate a small part of the large file and get the data users want.
buffer
In order to reduce a large number of disk writes when writing logs, set the buffer to 10k when append ing logs. The system default should be 4k.
Similarly, in order to improve the efficiency of reading logs, the read buffer is also set to 10k, which also needs to be adjusted according to the size of your logs.
The read / write of the index is set to the row buffer, and every full row should be flush ed to the disk to prevent incomplete index rows from being read (in fact, practice has proved that if the row buffer is set, half pull rows can still be read).
query
What? To support SQL, stop it. How can 100 lines of code support SQL.
Now the query is to directly pass in a lambada expression. When the system traverses the data rows within the specified time range, it will be returned to the user only if the user's lambada conditions are met.
Of course, this will read more data that users don't need, and lambda expressions are calculated for each line, but there's no way. Simplicity is beauty.
In the past, I recorded the conditions to be queried, the log time and the log file offset in the index, so as to find the qualified offset from the index, and then each data is like seek and read in the log file. There is only one advantage, that is, the amount of data read is less, but there are two disadvantages:
The index file is too large to be loaded into memory
Every time you read, you need to seek first. It seems that the buffer can't be used. It's especially slow. It's four or five times slower than reading a segment of data continuously and filtering with lambda
write in
As mentioned earlier, it only append s without modifying the data, and the time stamp is at the front of each log line.
Multithreading
Query data can be queried by multiple threads at the same time. Each query will open a new log file descriptor, so multiple reads in parallel will not fight.
When writing, although it is only an append operation, it does not confirm whether it is safe for multiple threads to append files. Therefore, it is recommended to write with a queue and a dedicated thread.
lock
There are no locks.
sort
By default, the queried data is arranged in positive chronological order. If other sorting is required, it can be sorted by python's sorted function after getting to memory. You can arrange as you want.
More than 100 lines of database code
# -- coding:utf-8 -- import os import time import bisect import itertools from datetime import datetime import logging default_data_dir = './data/' default_write_buffer_size = 102410 default_read_buffer_size = 102410 default_index_interval = 1000 def ensure_data_dir(): if not os.path.exists(default_data_dir): os.makedirs(default_data_dir) def init(): ensure_data_dir() class WawaIndex: def init(self, index_name): self.fp_index = open(os.path.join(default_data_dir, index_name + '.index'), 'a+', 1) self.indexes, self.offsets, self.index_count = [], [], 0 self.__load_index() def __update_index(self, key, offset): self.indexes.append(key) self.offsets.append(offset) def __load_index(self): self.fp_index.seek(0) for line in self.fp_index: try: key, offset = line.split() self.__update_index(key, offset) except ValueError: # If the index does not have flush, it may read half a row of data pass def append_index(self, key, offset): self.index_count += 1 if self.index_count % default_index_interval == 0: self.__update_index(key, offset) self.fp_index.write('%s %s %s' % (key, offset, os.linesep)) def get_offsets(self, begin_key, end_key): left = bisect.bisect_left(self.indexes, str(begin_key)) right = bisect.bisect_left(self.indexes, str(end_key)) left, right = left - 1, right - 1 if left < 0: left = 0 if right < 0: right = 0 if right > len(self.indexes) - 1: right = len(self.indexes) - 1 logging.debug('get_index_range:%s %s %s %s %s %s', self.indexes[0], self.indexes[-1], begin_key, end_key, left, right) return self.offsets[left], self.offsets[right] class WawaDB: def init(self, db_name): self.db_name = db_name self.fp_data_for_append = open(os.path.join(default_data_dir, db_name + '.db'), 'a', default_write_buffer_size) self.index = WawaIndex(db_name) def __get_data_by_offsets(self, begin_key, end_key, begin_offset, end_offset): fp_data = open(os.path.join(default_data_dir, self.db_name + '.db'), 'r', default_read_buffer_size) fp_data.seek(int(begin_offset)) line = fp_data.readline() find_real_begin_offset = False will_read_len, read_len = int(end_offset) - int(begin_offset), 0 while line: read_len += len(line) if (not find_real_begin_offset) and (line < str(begin_key)): line = fp_data.readline() continue find_real_begin_offset = True if (read_len >= will_read_len) and (line > str(end_key)): break yield line.rstrip('\r\n') line = fp_data.readline() def append_data(self, data, record_time=datetime.now()): def check_args(): if not data: raise ValueError('data is null') if not isinstance(data, basestring): raise ValueError('data is not string') if data.find('\r') != -1 or data.find('\n') != -1: raise ValueError('data contains linesep') check_args() record_time = time.mktime(record_time.timetuple()) data = '%s %s %s' % (record_time, data, os.linesep) offset = self.fp_data_for_append.tell() self.fp_data_for_append.write(data) self.index.append_index(record_time, offset) def get_data(self, begin_time, end_time, data_filter=None): def check_args(): if not (isinstance(begin_time, datetime) and isinstance(end_time, datetime)): raise ValueError('begin_time or end_time is not datetime') check_args() begin_time, end_time = time.mktime(begin_time.timetuple()), time.mktime(end_time.timetuple()) begin_offset, end_offset = self.index.get_offsets(begin_time, end_time) for data in self.__get_data_by_offsets(begin_time, end_time, begin_offset, end_offset): if data_filter: if data_filter(data): yield data else: yield data def test(): from datetime import datetime, timedelta import uuid, random logging.getLogger().setLevel(logging.NOTSET) def time_test(test_name): def inner(f): def inner2(*args, **kargs): start_time = datetime.now() result = f(*args, **kargs) print '%s take time:%s' % (test_name, (datetime.now() - start_time)) return result return inner2 return inner @time_test('gen_test_data') def gen_test_data(db): now = datetime.now() begin_time = now - timedelta(hours=5) while begin_time < now: print begin_time for i in range(10000): db.append_data(str(random.randint(1,10000))+ ' ' +str(uuid.uuid1()), begin_time) begin_time += timedelta(minutes=1) @time_test('test_get_data') def test_get_data(db): begin_time = datetime.now() - timedelta(hours=3) end_time = begin_time + timedelta(minutes=120) results = list(db.get_data(begin_time, end_time, lambda x: x.find('1024') != -1)) print 'test_get_data get %s results' % len(results) @time_test('get_db') def get_db(): return WawaDB('test') if not os.path.exists('./data/test.db'): db = get_db() gen_test_data(db) #db.index.fp_index.flush() db = get_db() test_get_data(db) init() if name == 'main': test()</pre>