Author: Zhang Lianzhuang, PostgreSQL R & D Engineer
He has been engaged in the development of PostgreSQL database kernel for many years and has a very in-depth research on citus.
In the last issue, we introduced the PostgreSQL data retrieval tool: pg_reconvery
This article will take you to understand PG_ The implementation principle and design idea of recovery tool, and bring source code interpretation.
|Implementation principle of data retrieval
The normal data reading method of a database system is to select * from PG_ When the recovery query starts (i.e. executing a transaction), the query operation will generate a snapshot of the transaction at the same time. You can see the currently visible data through the GetActiveSnapshot() function.
|Design ideas
1. How to read Dead tuple?
PostgreSQL determines the visibility of the current database data through snapshots. Therefore, when a piece of data is deleted, the data entity still exists in the database instance. Generally, this invisible data is called Dead tuple (a piece of data in PostgreSQL is called a tuple).
Special snapshots of SnapshotAny (and many other types) are available in PostgreSQL. This snapshot can read any data, pg_recovery is all data read in this way. By default, only recovery data is returned, and no visible data is returned.
2. How much data does the function return at a time?
The amount of data is returned by row and is limited to one row at a time.
3. How to control memory?
Functions are executed multiple times, and some states are global. Therefore, multi can be used_ call_ memory_ CTX (memory pool context) parameter to control memory.
About parameters of function
When creating a function through SQL, execute the following statement. Please refer to the previous issue for function usage.
CREATE FUNCTION pg_recovery(regclass, recoveryrow bool DEFAULT true) RETURNS SETOF record
regclass: the table type of PostgreSQL, which automatically converts the table name to OID (the unique identifier of the object inside the OID database), so you only need to enter the table name.
reconveryrow bool DEFAULT ture: the default value is true, which means that only recovery data is returned. The value is false, indicating that all data is returned. Execute the following statement to modify the default value of the parameter.
select * from pg_recovery('aa', recoveryrow => false)
RETURNS SETOF record: the function returns row type data.
|Source code interpretation
Necessary data
typedef struct { Relation rel; -- Table for the current operation TupleDesc reltupledesc; -- Meta information of table TupleConversionMap *map; -- The mapping diagram of a table, that is, the data of the table is mapped into custom returned columns TableScanDesc scan; -- Scan table HTAB *active_ctid; -- Of visible data ctid bool droppedcolumn; -- Delete column } pg_recovery_ctx;
Hide column
The hidden column of recoveryrow is added. When all information is returned, this column can be used to identify whether the data in this row is recovery data or user visible data.
static const struct system_columns_t { char *attname; Oid atttypid; int32 atttypmod; int attnum; } system_columns[] = { { "ctid", TIDOID, -1, SelfItemPointerAttributeNumber }, { "xmin", XIDOID, -1, MinTransactionIdAttributeNumber }, { "cmin", CIDOID, -1, MinCommandIdAttributeNumber }, { "xmax", XIDOID, -1, MaxTransactionIdAttributeNumber }, { "cmax", CIDOID, -1, MaxCommandIdAttributeNumber }, { "tableoid", OIDOID, -1, TableOidAttributeNumber }, { "recoveryrow", BOOLOID, -1, DeadFakeAttributeNumber }, { 0 }, };
pg_recovery simplifies code
Datum pg_recovery(PG_FUNCTION_ARGS) { FuncCallContext *funcctx; pg_recovery_ctx *usr_ctx; recoveryrow = PG_GETARG_BOOL(1); -- Get default parameters if (SRF_IS_FIRSTCALL()) -- Each data function will be called once, so you need to initialize the data first { funcctx = SRF_FIRSTCALL_INIT(); -- Application context oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); -- Using memory pools usr_ctx->rel = heap_open(relid, AccessShareLock); -- Add read lock usr_ctx->reltupledesc = RelationGetDescr(usr_ctx->rel); -- Get meta information funcctx->tuple_desc = BlessTupleDesc(tupdesc); -- Meta information used by function usr_ctx->map = recovery_convert_tuples_by_name(usr_ctx->reltupledesc, funcctx->tuple_desc, "Error converting tuple descriptors!", &usr_ctx->droppedcolumn); -- Column mapping usr_ctx->scan = heap_beginscan(usr_ctx->rel, SnapshotAny, 0, NULL , NULL, 0); -- Scan all table data active_scan = heap_beginscan(usr_ctx->rel, GetActiveSnapshot(), 0, NULL , NULL, 0); -- Scan visible data while ((tuplein = heap_getnext(active_scan, ForwardScanDirection)) != NULL) hash_search(usr_ctx->active_ctid, (void*)&tuplein->t_self, HASH_ENTER, NULL); -- Cache visible data ctid } funcctx = SRF_PERCALL_SETUP(); -- Gets the context before the function usr_ctx = (pg_recovery_ctx *) funcctx->user_fctx; get_tuple: if ((tuplein = heap_getnext(usr_ctx->scan, ForwardScanDirection)) != NULL) { -- Check whether the data in the table is dead hash_search(usr_ctx->active_ctid, (void*)&tuplein->t_self, HASH_FIND, &alive); tuplein = recovery_do_convert_tuple(tuplein, usr_ctx->map, alive); -- Convert original table data to output format SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuplein)); -- convert to Datum format,Return data } else { -- Finished reading data heap_endscan(usr_ctx->scan); -- End scan table heap_close(usr_ctx->rel, AccessShareLock); -- Release lock SRF_RETURN_DONE(funcctx); --Free function resources } }
Generate mapping table
TupleConversionMap * recovery_convert_tuples_by_name(TupleDesc indesc, TupleDesc outdesc, const char *msg, bool *droppedcolumn) { attrMap = recovery_convert_tuples_by_name_map(indesc, outdesc, msg, droppedcolumn); -- handle recoveryrow/Hide column/Mapping of visible columns map->indesc = indesc; map->outdesc = outdesc; map->attrMap = attrMap; map->outvalues = (Datum *) palloc(n * sizeof(Datum)); map->outisnull = (bool *) palloc(n * sizeof(bool)); map->invalues = (Datum *) palloc(n * sizeof(Datum)); map->inisnull = (bool *) palloc(n * sizeof(bool)); map->invalues[0] = (Datum) 0; map->inisnull[0] = true; return map; }
Tuple conversion function
HeapTuple recovery_do_convert_tuple(HeapTuple tuple, TupleConversionMap *map, bool alive) { heap_deform_tuple(tuple, map->indesc, invalues + 1, inisnull + 1); -- Split tuples,Extract column data for (i = 0; i < outnatts; i++) { outvalues[i] = invalues[j]; -- Convert data outisnull[i] = inisnull[j]; -- Convert data } return heap_form_tuple(map->outdesc, outvalues, outisnull); -- Convert column data to tuples }
This article is composed of blog one article multi posting platform OpenWrite release!