Synchronization mechanism of Linux arm process kernel space page table

Keywords: Linux memory management

  this article describes the ARM32 processor.

Kernel page table: that is, the main kernel page table mentioned in the book. In the kernel, it is actually a section of memory stored in the global directory init of the main kernel page_ Mm. PGD (swap_pg_dir), the hardware is not used directly.
Process page table: each process's own page table, which is placed in the process's own page directory task_struct.pgd.

Process creation

  when a process is created, the kernel page table will be copied to the current process page table.

Calling relationship:
Last call pgd_alloc is related to arch, and the processing methods of different architectures are also inconsistent. arm32 only uses TTBR0 to set the page table base address. The kernel shares the address with the process, so you need to copy the init process page entry kernel space to the newly created process.

pgd_t *pgd_alloc(struct mm_struct *mm)
	pgd_t *new_pgd, *init_pgd;
	pud_t *new_pud, *init_pud;
	pmd_t *new_pmd, *init_pmd;
	pte_t *new_pte, *init_pte;

	new_pgd = __pgd_alloc();
	if (!new_pgd)
		goto no_pgd;

	memset(new_pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));								(1)

	 * Copy over the kernel and IO PGD entries
	init_pgd = pgd_offset_k(0);
	memcpy(new_pgd + USER_PTRS_PER_PGD, init_pgd + USER_PTRS_PER_PGD,
		       (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));					(2)

	clean_dcache_area(new_pgd, PTRS_PER_PGD * sizeof(pgd_t));

	 * Allocate PMD table for modules and pkmap mappings.
	new_pud = pud_alloc(mm, new_pgd + pgd_index(MODULES_VADDR),
	if (!new_pud)
		goto no_pud;

	new_pmd = pmd_alloc(mm, new_pud, 0);
	if (!new_pmd)
		goto no_pmd;

	if (!vectors_high()) {
		 * On ARM, first page must always be allocated since it
		 * contains the machine vectors. The vectors are always high
		 * with LPAE.
		new_pud = pud_alloc(mm, new_pgd, 0);
		if (!new_pud)
			goto no_pud;

		new_pmd = pmd_alloc(mm, new_pud, 0);
		if (!new_pmd)
			goto no_pmd;

		new_pte = pte_alloc_map(mm, NULL, new_pmd, 0);
		if (!new_pte)
			goto no_pte;

		init_pud = pud_offset(init_pgd, 0);
		init_pmd = pmd_offset(init_pud, 0);
		init_pte = pte_offset_map(init_pmd, 0);
		set_pte_ext(new_pte + 0, init_pte[0], 0);
		set_pte_ext(new_pte + 1, init_pte[1], 0);

	return new_pgd;

	pmd_free(mm, new_pmd);
	pud_free(mm, new_pud);
	return NULL;

1) Clear user space level 1 page table
2) Copy the pgd kernel space page table of init process to the current level-1 page table. Because the primary page table is the same, the secondary page table pointed to is the same.

Kernel page table modification

  during vmalloc, vmap or ioremap, the kernel space page table will adjust the process.
    take vmalloc and vmap as examples. Map will be called during address mapping_ vm_ Area, and finally call vmap_pte_range

static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
		unsigned long end, pgprot_t prot, struct page **pages, int *nr)
	pte_t *pte;

	 * nr is a running index into the array which helps higher level
	 * callers keep track of where we're up to.

	pte = pte_alloc_kernel(pmd, addr);											(1)
	if (!pte)
		return -ENOMEM;
	do {
		struct page *page = pages[*nr];

		if (WARN_ON(!pte_none(*pte)))
			return -EBUSY;
		if (WARN_ON(!page))
			return -ENOMEM;
		set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));		/* Set pte page table item content */
	} while (pte++, addr += PAGE_SIZE, addr != end);
	return 0;

   judge whether the pmd page table item is empty (for arm32, there are only pte and pgd in the secondary mapping, that is, pmd is pgd). In the first case, if pmd is empty, you need to re apply for the pte page table item. Because a single application requires page alignment, that is, a single application for the secondary page table of 2MB virtual address space, it may not be used up when mapping. In the second case, the secondary page table applied last time can be directly used if it has not been mapped.

In the first case

#define pte_alloc_kernel(pmd, address)			\
	((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
		NULL: pte_offset_kernel(pmd, address))

int __pte_alloc_kernel(pmd_t *pmd, unsigned long address)
	pte_t *new = pte_alloc_one_kernel(&init_mm, address);
	if (!new)
		return -ENOMEM;

	smp_wmb(); /* See comment in __pte_alloc */

	if (likely(pmd_none(*pmd))) {	/* Has another populated it ? */
		pmd_populate_kernel(&init_mm, pmd, new);								(1)
		new = NULL;
	} else
	if (new)
		pte_free_kernel(&init_mm, new);
	return 0;

1) Set the pmd content as the pte page table item of the new application. Note that init is set here_ Mm, not synchronized to other processes, including the current process.

Page entry synchronization

   as described above, only init is set when a new virtual address is mapped in the kernel space_ Mm pgd(pmd) first level page table, not all processes are synchronized. Then, when the process needs the virtual address of the range, the hardware needs to translate the virtual address according to the page table, so it won't access the illegal address?
  on this issue, I thought for a long time, and then I looked for it for a long time. The kernel code was puzzled. Later, I was surprised to find that I interrupted the operation with a missing page. (I'm sorry that I'm short-sighted, and I really don't mention this in some books I read)

  because the first level page table is missing, it will jump to the segment address translation error interface:

do_translation_fault (this function is also architecture related)

static int __kprobes
do_translation_fault(unsigned long addr, unsigned int fsr,
		     struct pt_regs *regs)
	unsigned int index;
	pgd_t *pgd, *pgd_k;
	pud_t *pud, *pud_k;
	pmd_t *pmd, *pmd_k;

	if (addr < TASK_SIZE)
		return do_page_fault(addr, fsr, regs);				(1)

	if (user_mode(regs))
		goto bad_area;

	index = pgd_index(addr);

	pgd = cpu_get_pgd() + index;								(2)
	pgd_k = init_mm.pgd + index;								(3)

	if (pgd_none(*pgd_k))											(4)
		goto bad_area;
	if (!pgd_present(*pgd))
		set_pgd(pgd, *pgd_k);

	pud = pud_offset(pgd, addr);
	pud_k = pud_offset(pgd_k, addr);

	if (pud_none(*pud_k))
		goto bad_area;
	if (!pud_present(*pud))
		set_pud(pud, *pud_k);

	pmd = pmd_offset(pud, addr);
	pmd_k = pmd_offset(pud_k, addr);

	 * Only one hardware entry per PMD with LPAE.
	index = 0;
	 * On ARM one Linux PGD entry contains two hardware entries (see page
	 * tables layout in pgtable.h). We normally guarantee that we always
	 * fill both L1 entries. But create_mapping() doesn't follow the rule.
	 * It can create inidividual L1 entries, so here we have to call
	 * pmd_none() check for the entry really corresponded to address, not
	 * for the first of pair.
	index = (addr >> SECTION_SHIFT) & 1;
	if (pmd_none(pmd_k[index]))
		goto bad_area;

	copy_pmd(pmd, pmd_k);								(5)
	return 0;

	do_bad_area(addr, fsr, regs);
	return 0;

1) If the address is in user space, jump to page_fault
2) Get the current process error address page table
3) Get the current address page table of init process
4) If the current address page table entry of the init process is also empty, then this address is indeed an illegal address.
5) Copy the corresponding page table of the kernel to the process.

#define copy_pmd(pmdpd,pmdps)		\
	do {				\
		pmdpd[0] = pmdps[0];	\
		pmdpd[1] = pmdps[1];	\
		flush_pmd_entry(pmdpd);	\
	} while (0)

   for ARM32-bit processor, the kernel page entry address and user space page entry address cannot be set simultaneously through TTBR0 and TTBR1, so the operation of sharing kernel space is realized by copying the kernel space page table when creating the process. However, the virtual address in the kernel space is constantly changing. If you map an address, it is obviously not cost-effective to actively update the kernel part of all process page tables. The address mapped by one process is unlikely to be used by other processes. Therefore, the page missing interrupt method is adopted to copy the page table when it needs to be used. init_mm pgd, the complete kernel space page table is saved.

Posted by ugriffin on Sat, 20 Nov 2021 06:27:39 -0800