Tools | pg_recovery design principle and source code interpretation

Keywords: MySQL

Author: Zhang Lianzhuang, PostgreSQL R & D Engineer

He has been engaged in the development of PostgreSQL database kernel for many years and has a very in-depth research on citus.

In the last issue, we introduced the PostgreSQL data retrieval tool: pg_reconvery

This article will take you to understand PG_ The implementation principle and design idea of recovery tool, and bring source code interpretation.

|Implementation principle of data retrieval

The normal data reading method of a database system is to select * from PG_ When the recovery query starts (i.e. executing a transaction), the query operation will generate a snapshot of the transaction at the same time. You can see the currently visible data through the GetActiveSnapshot() function.

|Design ideas

1. How to read Dead tuple?

PostgreSQL determines the visibility of the current database data through snapshots. Therefore, when a piece of data is deleted, the data entity still exists in the database instance. Generally, this invisible data is called Dead tuple (a piece of data in PostgreSQL is called a tuple).

Special snapshots of SnapshotAny (and many other types) are available in PostgreSQL. This snapshot can read any data, pg_recovery is all data read in this way. By default, only recovery data is returned, and no visible data is returned.

2. How much data does the function return at a time?

The amount of data is returned by row and is limited to one row at a time.

3. How to control memory?

Functions are executed multiple times, and some states are global. Therefore, multi can be used_ call_ memory_ CTX (memory pool context) parameter to control memory.

About parameters of function

When creating a function through SQL, execute the following statement. Please refer to the previous issue for function usage.

CREATE FUNCTION pg_recovery(regclass, recoveryrow bool DEFAULT true) RETURNS SETOF record

regclass: the table type of PostgreSQL, which automatically converts the table name to OID (the unique identifier of the object inside the OID database), so you only need to enter the table name.

reconveryrow bool DEFAULT ture: the default value is true, which means that only recovery data is returned. The value is false, indicating that all data is returned.
Execute the following statement to modify the default value of the parameter.

select * from pg_recovery('aa', recoveryrow => false)

RETURNS SETOF record: the function returns row type data.

|Source code interpretation

Necessary data

typedef struct
    Relation            rel;    -- Table for the current operation
    TupleDesc           reltupledesc; -- Meta information of table
    TupleConversionMap  *map; -- The mapping diagram of a table, that is, the data of the table is mapped into custom returned columns
    TableScanDesc       scan; -- Scan table
    HTAB                *active_ctid; -- Of visible data ctid
    bool                droppedcolumn; -- Delete column
} pg_recovery_ctx;

Hide column

The hidden column of recoveryrow is added. When all information is returned, this column can be used to identify whether the data in this row is recovery data or user visible data.

static const struct system_columns_t {
    char       *attname;
    Oid         atttypid;
    int32       atttypmod;
    int         attnum;
} system_columns[] = { 
    { "ctid",     TIDOID,  -1, SelfItemPointerAttributeNumber },
    { "xmin",     XIDOID,  -1, MinTransactionIdAttributeNumber },
    { "cmin",     CIDOID,  -1, MinCommandIdAttributeNumber },
    { "xmax",     XIDOID,  -1, MaxTransactionIdAttributeNumber },
    { "cmax",     CIDOID,  -1, MaxCommandIdAttributeNumber },
    { "tableoid", OIDOID,  -1, TableOidAttributeNumber },
    { "recoveryrow",     BOOLOID, -1, DeadFakeAttributeNumber },
    { 0 },

pg_recovery simplifies code

    FuncCallContext     *funcctx;
    pg_recovery_ctx *usr_ctx;
    recoveryrow = PG_GETARG_BOOL(1); -- Get default parameters

    if (SRF_IS_FIRSTCALL()) -- Each data function will be called once, so you need to initialize the data first
        funcctx = SRF_FIRSTCALL_INIT(); -- Application context
        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); -- Using memory pools

        usr_ctx->rel = heap_open(relid, AccessShareLock); -- Add read lock
        usr_ctx->reltupledesc = RelationGetDescr(usr_ctx->rel); -- Get meta information
        funcctx->tuple_desc = BlessTupleDesc(tupdesc); -- Meta information used by function
        usr_ctx->map = recovery_convert_tuples_by_name(usr_ctx->reltupledesc,
                funcctx->tuple_desc, "Error converting tuple descriptors!", &usr_ctx->droppedcolumn); -- Column mapping
        usr_ctx->scan = heap_beginscan(usr_ctx->rel, SnapshotAny, 0, NULL , NULL, 0); -- Scan all table data
        active_scan = heap_beginscan(usr_ctx->rel, GetActiveSnapshot(), 0, NULL , NULL, 0); -- Scan visible data
        while ((tuplein = heap_getnext(active_scan, ForwardScanDirection)) != NULL)
            hash_search(usr_ctx->active_ctid, (void*)&tuplein->t_self, HASH_ENTER, NULL); -- Cache visible data ctid


    funcctx = SRF_PERCALL_SETUP(); -- Gets the context before the function
    usr_ctx = (pg_recovery_ctx *) funcctx->user_fctx;

    if ((tuplein = heap_getnext(usr_ctx->scan, ForwardScanDirection)) != NULL)
        -- Check whether the data in the table is dead
        hash_search(usr_ctx->active_ctid, (void*)&tuplein->t_self, HASH_FIND, &alive);

        tuplein = recovery_do_convert_tuple(tuplein, usr_ctx->map, alive); -- Convert original table data to output format
        SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuplein)); -- convert to Datum format,Return data
        -- Finished reading data
        heap_endscan(usr_ctx->scan); -- End scan table
        heap_close(usr_ctx->rel, AccessShareLock); -- Release lock
        SRF_RETURN_DONE(funcctx); --Free function resources

Generate mapping table

TupleConversionMap *
recovery_convert_tuples_by_name(TupleDesc indesc,
                       TupleDesc outdesc,
                       const char *msg, bool *droppedcolumn)

    attrMap = recovery_convert_tuples_by_name_map(indesc, outdesc, msg, droppedcolumn); -- handle recoveryrow/Hide column/Mapping of visible columns

    map->indesc = indesc;
    map->outdesc = outdesc;
    map->attrMap = attrMap;
    map->outvalues = (Datum *) palloc(n * sizeof(Datum));
    map->outisnull = (bool *) palloc(n * sizeof(bool));
    map->invalues = (Datum *) palloc(n * sizeof(Datum));
    map->inisnull = (bool *) palloc(n * sizeof(bool));
    map->invalues[0] = (Datum) 0;
    map->inisnull[0] = true;

    return map;

Tuple conversion function

recovery_do_convert_tuple(HeapTuple tuple, TupleConversionMap *map, bool alive)
    heap_deform_tuple(tuple, map->indesc, invalues + 1, inisnull + 1); -- Split tuples,Extract column data

    for (i = 0; i < outnatts; i++)
        outvalues[i] = invalues[j]; -- Convert data
        outisnull[i] = inisnull[j]; -- Convert data

    return heap_form_tuple(map->outdesc, outvalues, outisnull); -- Convert column data to tuples

This article is composed of blog one article multi posting platform OpenWrite release!

Posted by rweston002 on Fri, 26 Nov 2021 02:28:59 -0800