[original] Linux memory management - zoned page frame allocator - 4

Keywords: Linux Fragment Mobile

background

  • Read the fueling source code! - by Lu Xun
  • A picture is worth a thousand words. --By Golgi

Explain:

  1. Kernel version: 4.14
  2. ARM64 processor, Contex-A53, dual core
  3. Using tool: Source Insight 3.5, Visio

1. overview

This paper will describe memory compaction, memory defragmentation technology.
Memory fragments are divided into internal and external fragments:

  • Internal fragment: the fragment inside the memory page;
  • External fragment: the fragment between memory pages, which may cause continuous physical page allocation failure.

Memory compation is a method to obtain continuous free pages by migrating the mobile pages in use to another place. For memory fragmentation, the migration type is defined in the kernel to describe the migration type:

  • Migrate ﹐ immutable: immutable, corresponding to the page allocated by the kernel;
  • Migrate ﹣ movable: movable, corresponding to memory or files allocated from user space;
  • Migrate? Recyclable: cannot be moved and can be recycled;

First, let's take a picture of memory interaction.

The above figure corresponds to the operation of struct page, while the operation for physical memory is as follows:

In the previous article, we mentioned the pageblock. We can see that the area of the zone in the figure is scanned up and down based on the pageblock. The size of the pageblock is defined as follows (without using the huge table), which is consistent with the maximum block size in the Buddy System management:

/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
#define pageblock_order     (MAX_ORDER-1)

#define pageblock_nr_pages  (1UL << pageblock_order)

OK, I have a preliminary impression. Let's make a further analysis.

1. Data structure

1.1 compact_priority

/*
 * Determines how hard direct compaction should try to succeed.
 * Lower value means higher priority, analogically to reclaim priority.
 */
enum compact_priority {
    COMPACT_PRIO_SYNC_FULL,
    MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL,
    COMPACT_PRIO_SYNC_LIGHT,
    MIN_COMPACT_COSTLY_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
    DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
    COMPACT_PRIO_ASYNC,
    INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
};

This structure is used to describe several different ways of memory compact:

  • Compact ﹣ prio ﹣ sync ﹣ full / min ﹣ compact ﹣ priority: the highest priority, compression and migration are completed synchronously;
  • Compact? Prio? Sync? Light / min? Compact? Cost? Priority / def? Compact? Priority: medium priority, compression is processed synchronously, migration is processed asynchronously;
  • Compact? Prio? Async / init? Compact? Priority: lowest priority, compression and migration are handled asynchronously.

1.2 compact_result

This structure is used to describe the return value of the compression processing function:

/* Return values for compact_zone() and try_to_compact_pages() */
/* When adding new states, please adjust include/trace/events/compaction.h */
enum compact_result {
    /* For more detailed tracepoint output - internal to compaction */
    COMPACT_NOT_SUITABLE_ZONE,
    /*
     * compaction didn't start as it was not possible or direct reclaim
     * was more suitable
     */
    COMPACT_SKIPPED,
    /* compaction didn't start as it was deferred due to past failures */
    COMPACT_DEFERRED,

    /* compaction not active last round */
    COMPACT_INACTIVE = COMPACT_DEFERRED,

    /* For more detailed tracepoint output - internal to compaction */
    COMPACT_NO_SUITABLE_PAGE,
    /* compaction should continue to another pageblock */
    COMPACT_CONTINUE,

    /*
     * The full zone was compacted scanned but wasn't successfull to compact
     * suitable pages.
     */
    COMPACT_COMPLETE,
    /*
     * direct compaction has scanned part of the zone but wasn't successfull
     * to compact suitable pages.
     */
    COMPACT_PARTIAL_SKIPPED,

    /* compaction terminated prematurely due to lock contentions */
    COMPACT_CONTENDED,

    /*
     * direct compaction terminated after concluding that the allocation
     * should now succeed
     */
    COMPACT_SUCCESS,
};

1.3 migrate_mode

This structure is used to describe different modes in the process of migration, mainly for synchronous and asynchronous processing.

/*
 * MIGRATE_ASYNC means never block
 * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking
 *  on most operations but not ->writepage as the potential stall time
 *  is too significant
 * MIGRATE_SYNC will block when migrating pages
 * MIGRATE_SYNC_NO_COPY will block when migrating pages but will not copy pages
 *  with the CPU. Instead, page copy happens outside the migratepage()
 *  callback and is likely using a DMA engine. See migrate_vma() and HMM
 *  (mm/hmm.c) for users of this mode.
 */
enum migrate_mode {
    MIGRATE_ASYNC,
    MIGRATE_SYNC_LIGHT,
    MIGRATE_SYNC,
    MIGRATE_SYNC_NO_COPY,
};

1.4 compact_control

The compact ABCD control structure is used to maintain two scanners, corresponding to freepages and migratepages, when performing compact. Finally, the pages in migratepages are copied to freepages. The specific field comments are detailed enough, not detailed enough.

/*
 * compact_control is used to track pages being migrated and the free pages
 * they are being migrated to during memory compaction. The free_pfn starts
 * at the end of a zone and migrate_pfn begins at the start. Movable pages
 * are moved to the end of a zone during a compaction run and the run
 * completes when free_pfn <= migrate_pfn
 */
struct compact_control {
    struct list_head freepages; /* List of free pages to migrate to */
    struct list_head migratepages;  /* List of pages being migrated */
    struct zone *zone;
    unsigned long nr_freepages; /* Number of isolated free pages */
    unsigned long nr_migratepages;  /* Number of pages to migrate */
    unsigned long total_migrate_scanned;
    unsigned long total_free_scanned;
    unsigned long free_pfn;     /* isolate_freepages search base */
    unsigned long migrate_pfn;  /* isolate_migratepages search base */
    unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
    const gfp_t gfp_mask;       /* gfp mask of a direct compactor */
    int order;          /* order a direct compactor needs */
    int migratetype;        /* migratetype of direct compactor */
    const unsigned int alloc_flags; /* alloc flags of a direct compactor */
    const int classzone_idx;    /* zone index of a direct compactor */
    enum migrate_mode mode;     /* Async or sync migration mode */
    bool ignore_skip_hint;      /* Scan blocks even if marked skip */
    bool ignore_block_suitable; /* Scan blocks considered unsuitable */
    bool direct_compaction;     /* False from kcompactd or /proc/... */
    bool whole_zone;        /* Whole zone should/has been scanned */
    bool contended;         /* Signal lock or sched contention */
    bool finishing_block;       /* Finishing current pageblock */
};

2. Call process

Just look at the data structure above, it will be scattered. Look at the overall process.
In the kernel, there are three ways to operate memory compact:

  1. In the process of memory allocation, because the allocation request cannot be satisfied, the memory compact processing is triggered directly.
  2. When there is not enough memory, the kcompact daemons wake up in the background to perform the compact processing.
  3. Manual trigger, which is triggered by echo 1 > / proc / sys / VM / compact? Memory;

Here's the picture:

Practical operation:
cat /proc/pagetypeinfo is as follows:

3. compact processing

This process is still very complicated. The following figure shows the general process:

Next, we will analyze each sub module more deeply.

  • compaction_suitable

To determine whether to perform memory defragmentation, the following three conditions need to be met:

  1. Remove the applied page, the number of free pages will be lower than the watermark value, or although it is greater than or equal to the watermark value, there is not a large enough free page block;
  2. The free page minus twice the application page (twice indicates that there are enough free pages as the migration target), which is higher than the watermark value;
  3. When the requested order is greater than page ﹣ alloc ﹣ cost ﹣ order, the fragment index fragindex is calculated and judged according to the value;
  • isolate_migratepages
    In the isolate ABCD migratepages function, the migration scanner scans the movable pages in the unit of pageblock, and finally adds the movable pages to the migratepages linked list in the struct compact ABCD control structure. As shown in the figure below:

The logic of isolate ﹣ freepages is similar to that of isolate ﹣ migratepages. It is also used to isolate pages and finally add them to the CC - > freepages list.

When the idle and migration scanners have finished scanning, it's time to migrate the pages in the two linked lists.

  • migrate_pages
  1. Call the compact? Alloc function to take out a free page from the CC - > freepages list;
  2. Call unmap and move to move the mobile page to the free page.
    _The unmap and move function involves reverse mapping, page caching, and so on. Let's look at it later. This function has two key functions: 1) call try to unmap to delete the old mapping relationship in the process page table and remap it to the new physical address when it needs to be accessed; 2) call move to new page function to move the old page to the new physical page, where copy page function is completed in assembly file arch / arm64 / lib / copy page. S.
  • compact_finished
    The compact ABCD finished function is mainly used to check whether the compact is completed.

  • compaction_deferred/compaction_defer_reset/defer_compaction
    These three functions are related to the memory fragment's delay of compact. These three functions are invoked in try_to_compact_pages. When the number of free pages removed from the application page is higher than the water level, and there is at least one free page large enough for the application or standby migration type, compact can be considered successful. In case of no success, it may need to be postponed several times.
    The fields related to the struct zone structure are as follows:
struct zone {
...
    /*
     * On compaction failure, 1<<compact_defer_shift compactions
     * are skipped before trying again. The number attempted since
     * last failure is tracked with compact_considered.
     */
    unsigned int        compact_considered; //Record number of delays
    unsigned int        compact_defer_shift; //(1 < compact ﹤ defer ﹤ shift) = number of postponements, up to 6
    int                    compact_order_failed; //Record the request order value when defragmentation fails
...
};

Posted by xwin on Sat, 26 Oct 2019 10:46:59 -0700