Common macro collations in Linux kernel s

Keywords: C Linux less

Basic knowledge of 0x00 macros

// object-like
#define Macro Name Replace List Line Break
//function-like
#define Macro Name ([Identifier List]) Replaces List Line Break

The replacement list and the identifier list are both lists after tokenizing the string.The difference is that the identifier list is used as a separator between different parameters.Each parameter is a tokenized list.In macros, whitespace only serves to split tokens, and the amount of whitespace is meaningless to the preprocessor.

Some of Macro's magic tricks are sexy:

https://gaomf.cn/2017/10/06/C_Macro/

Following are some of the common macros in Linux kernels. Due to different architectures or different macro definitions of different modules, only the easy-to-understand macros are selected as records, and the functions are basically the same.

What do{...}while(0) means in the Linux kernel:

  • Assist in defining complex macros, avoid errors when referencing them, and if {} is not used, only the first statement follows if makes a judgement.Also avoid';'after macro expansion causing compilation to fail.

  • Avoid goto, unify control of program flow, break out

  • Avoid warning caused by empty macros

  • Define a single function block for complex operations

0x01 Common Macro Collation

_u CONCAT Macro

'##'is used to paste two parameters and'#' is used to replace parameters:

#define __CONCAT(a, b) a ## b

BUG_ON(condition)

The condition is true, crashes occur, principle: undefined exception.

Corresponding to WARN_ON:

#define BUG() assert(0)
#define BUG_ON(x) assert(!(x))
 
/* Does it make sense to treat warnings as errors? */
#define WARN() BUG()
#define WARN_ON(x) (BUG_ON(x), false)

BUILD_BUG_ON Macro

#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
  1. When condition is true, sizeof(char[-1]), produces an error and compilation fails

  2. If condition is false, sizeof(char[1]), compiled by

Checks whether expression e is 0, compiles for 0 and returns 0; if it is not 0, compilation fails.

struct { int : –!!(0); } -=> struct { int : 0; }

If e is 0, the structure has an int-type data field and specifies that the number of bits it occupies is 0.

struct { int : –!!(1); } -=> struct { int : –1; }

If e is not 0, the bit field of the int-type data field of the structure will become a negative number, resulting in a syntax error.

Typeof obtains the variable type of x and produces different behaviors, depending on the type of parameter passed in, to achieve "compile-time polymorphism".The actual typeof is handled at precompilation time and finally converted to a data type that is processed by the compiler.

So the expression in it will not be executed at runtime, such as typeof(fun()), the fun() function will not be executed, and typeof just returns the value of fun() from compile-time analysis.

There are also limitations to typeof, where variables cannot contain storage class specifiers, such as static, extern, and so on.

typecheck macro

The macro typecheck checks if x is of type, if it is not thrown (warning: comparison of distinct pointer types lacks a cast), and typecheck_fn checks if the function is of type and runs out of consistency (warning: initialization from incompatible pointer type).

/*
 * Check at compile time that something is of a particular type.
 * Always evaluates to 1 so you may use it easily in comparisons.
 */
#define typecheck(type,x) \
({ type __dummy; \
    typeof(x) __dummy2; \
    (void)(&__dummy == &__dummy2); \
    1; \
})
/*GCC An extended feature, such as ({...}), is treated as a statement.
* The result of the calculation is the result of the last statement in {...}.
* So this returns 1
*/
/*
 * Check at compile time that 'function' is a certain type, or is a pointer
 * to that type (needs to use typedef for the function type.)
 */
#define typecheck_fn(type,function) \
({ typeof(type) __tmp = function; \
    (void)__tmp; \
})

min macro

Implicit conversion via type is safe to compile or run out of warning:

#define min(x, y) __careful_cmp(x, y, <)
#define __cmp(x, y, op) ((x) op (y) ? (x) : (y))
#define __safe_cmp(x, y) \
        (__typecheck(x, y) && __no_side_effects(x, y))
#define __no_side_effects(x, y) \
        (__is_constexpr(x) && __is_constexpr(y))
 
#define __cmp_once(x, y, unique_x, unique_y, op) ({ \
        typeof(x) unique_x = (x); \
        typeof(y) unique_y = (y); \
        __cmp(unique_x, unique_y, op); })
/*Re-assignment prevents x++ this duplication+1 */
#define __careful_cmp(x, y, op) \
    __builtin_choose_expr(__safe_cmp(x, y), \ //Compare the types of x, y
        __cmp(x, y, op), \ //When x,y types are the same
        __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
          //Different x, y types

_u UNIQUE_ID guarantees that variables are unique.

_u is_constexpr macro
Determine if x is an integer constant expression:

/*
 * This returns a constant expression while determining if an argument is
 * a constant expression, most importantly without evaluating the argument.
 * Glory to Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
 */
#define __is_constexpr(x) \
    (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))

If x is a constant expression, then (void) ((long) (x) 0l) is a null pointer constant, and the third operand, or (int *) 8, is used.If it is not a constant expression, the second operand void type is used.

So there are two scenarios:

sizeof(int) == sizeof(*((int *) (NULL))) // if `x` was an integer constant expression
sizeof(int) == sizeof(*((void *)(....))) // otherwise

Because sizeof(void) = 1, if x is an integer constant expression, the result of the macro is 1, otherwise 0.

https://stackoverflow.com/questions/49481217/linux-kernels-is-constexpr-macro

Description: This function is a GNU extension to determine whether two types are the same, and returns 1 if type_a and type_b are the same, or 0 if they are not.

int __builtin_choose_expr(exp, e1, e2);

max macro

Same min macro.

roundup macro

Returns a value that divides y by more than x, closest to x, rounded up, and can be used for memory alignment of addresses:

#define roundup(x, y) ( \
{ \
    const typeof(y) __y = y; \
    (((x) + (__y - 1)) / __y) * __y; \
} \
)

clamp macro

Determines whether the value is within the range of lo and hi, returns Lo if it is less than lo, returns hi if it is greater than hi, and returns val if it is between lo and hi:

/**
 * clamp - return a value clamped to a given range with strict typechecking
 * @val: current value
 * @lo: lowest allowable value
 * @hi: highest allowable value
 *
 * This macro does strict typechecking of @lo/@hi to make sure they are of the
 * same type as @val. See the unnecessary pointer comparisons.
 */
#define clamp(val, lo, hi) min((typeof(val))max(val, lo), hi)

abs macro

Take absolute value:

/**
 * abs - return absolute value of an argument
 * @x: the value. If it is unsigned type, it is converted to signed type first.
 * char is treated as if it was signed (regardless of whether it really is)
 * but the macro's return type is preserved as char.
 *
 * Return: an absolute value of x.
 */
#define abs(x) __abs_choose_expr(x, long long, \
        __abs_choose_expr(x, long, \
        __abs_choose_expr(x, int, \
        __abs_choose_expr(x, short, \
        __abs_choose_expr(x, char, \
        __builtin_choose_expr( \
            __builtin_types_compatible_p(typeof(x), char), \
            (char)({ signed char __x = (x); __x<0?-__x:__x; }), \
            ((void)0)))))))
 
#define __abs_choose_expr(x, type, other) __builtin_choose_expr( \
    __builtin_types_compatible_p(typeof(x), signed type) || \
    __builtin_types_compatible_p(typeof(x), unsigned type), \
    ({ signed type __x = (x); __x < 0 ? -__x : __x; }), other)

swap macro

Use typeof to get the type of variable you want to exchange:

/*
 * swap - swap value of @a and @b
 */
#define swap(a, b) \
    do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)

container_of macro

Obtain a pointer to the entire structure variable based on the member variables in a structure variable.

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
/*Structural address is 0, converting member address to size_t type as offset
/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr: the pointer to the member.
 * @type: the type of the container struct this is embedded in.
 * @member: the name of the member within the struct.
 *
 */
#define container_of(ptr, type, member) ({ \
    const typeof( ((type *)0)->member ) *__mptr = (ptr); \ //*u mptr A pointer to save the member variable
    (type *)( (char *)__mptr - offsetof(type,member) );}) //Variable pointer minus its own offset to get a pointer to the structure

likely and unlikely macros

Provide branch prediction information to the compiler to reduce branch drop due to instruction jumps:

#define likely(x) __builtin_exp ect(!!(x), 1)
#define unlikely(x) __builtin_exp ect(!!(x), 0)

The built-in method of GCC will determine whether EXP == C is valid. If it is valid, the execution statement in the if branch is immediately followed by the assembly jump instruction, otherwise the execution statement in the else branch is immediately followed by the assembly jump instruction.

This allows the cache to place the branched execution statement in the cache when it prefetches data, improving the cache hit rate.

ALIGN Alignment Macro

Alignment is aligned upwards, such as 0x123 with 16, resulting in 0x130, which is allocated more than needed because alignment is often used when allocating memory.

#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
#define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
#define __ALIGN_MASK(x, mask) __ALIGN_KERNEL_MASK((x), (mask))

_u get_unaligned_le(ptr) macro
Getting the data that is not aligned mainly identifies the size of the data:

#define __get_unaligned_le(ptr) ((__force typeof(*(ptr)))({ \
    __builtin_choose_expr(sizeof(*(ptr)) == 1, *(ptr), \
    __builtin_choose_expr(sizeof(*(ptr)) == 2, get_unaligned_le16((ptr)), \
    __builtin_choose_expr(sizeof(*(ptr)) == 4, get_unaligned_le32((ptr)), \
    __builtin_choose_expr(sizeof(*(ptr)) == 8, get_unaligned_le64((ptr)), \
    __bad_unaligned_access_size())))); \
 }))
 
 static inline u32 get_unaligned_be32(const void *p)
{
    return __get_unaligned_cpu32((const u8 *)p);
}
 
static inline u32 __get_unaligned_cpu32(const void *p)
{
    const struct __una_u32 *ptr = (const struct __una_u32 *)p;
    return ptr->x;
}
 
struct __una_u16 { u16 x; } __packed;
struct __una_u32 { u32 x; } __packed;
struct __una_u64 { u64 x; } __packed;

By default, the compiler will use byte alignment for structs, and the u packed keyword will remove byte alignment and use 1 byte alignment.

_u put_unaligned_le macro

Write misaligned data.

#define __put_unaligned_le(val, ptr) ({ \
    void *__gu_p = (ptr); \
    switch (sizeof(*(ptr))) { \
    case 1: \
        *(u8 *)__gu_p = (__force u8)(val); \
        break; \
    case 2: \
        put_unaligned_le16((__force u16)(val), __gu_p); \
        break; \
    case 4: \
        put_unaligned_le32((__force u32)(val), __gu_p); \
        break; \
    case 8: \
        put_unaligned_le64((__force u64)(val), __gu_p); \
        break; \
    default: \
        __bad_unaligned_access_size(); \
        break; \
    } \
    (void)0; })
 
 static inline void put_unaligned_be32(u32 val, void *p)
{
    __put_unaligned_cpu32(val, p);
}
 
static inline void __put_unaligned_cpu32(u32 val, void *p)
{
    struct __una_u32 *ptr = (struct __una_u32 *)p;
    ptr->x = val;
}

ACCESS_ONCE Macro

Access the destination address once, get the address of x first, then convert the address into a pointer to this address type, and then get what the pointer points to, for the purpose of accessing once.volatile means no optimization and forced access once.

Optimizing variables in some concurrent scenarios can lead to errors, requiring the latest values of the variables at all times, so volatile is used to force an update.

The two conditions for using ACCESS_ONCE() are:

  • Accessing global variables without locks

  • Access to this variable may be optimized by the compiler to merge into one or split into multiple times

#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

https://blog.csdn.net/ganggexiongqi/article/details/24603363

ACCESS_OK Macro

CVE-2017-5123 (waitid system call), check whether the pointer belongs to user space, the implementation of ACCESS_OK macro under x86 architecture:

/**
 * access_ok: - Checks if a user space pointer is valid
 * @addr: User space pointer to start of block to check
 * @size: Size of block to check
 *
 * Context: User context only. This function may sleep if pagefaults are
 * enabled.
 *
 * Checks if a pointer to a block of memory in user space is valid.
 *
 * Returns true (nonzero) if the memory block may be valid, false (zero)
 * if it is definitely invalid.
 *
 * Note that, depending on architecture, this function probably just
 * checks that the pointer is in the user space range - after calling
 * this function, memory access functions may still return -EFAULT.
 */
#define access_ok(addr, size) \
({ \
    WARN_ON_IN_IRQ(); \
    likely(!__range_not_ok(addr, size, user_addr_max())); \
})
/*__range_not_ok Return 0 to verify pass
 
#define __range_not_ok(addr, size, limit) \
({ \
    __chk_user_ptr(addr); \
    __chk_range_not_ok((unsigned long __force)(addr), size, limit); \
})
 
/*
 * Test whether a block of memory is a valid user space address.
 * Returns 0 if the range is valid, nonzero otherwise.
 */
static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, unsigned long limit)
{
    /*
     * If we have used "sizeof()" for the size,
     * we know it won't overflow the limit (but
     * it might overflow the 'addr', so it's
     * important to subtract the size from the
     * limit, not add it to the address).
     */
    if (__builtin_constant_p(size))
        return unlikely(addr > limit - size);
    /*__builtin_constant_p Determines whether compilation time is a constant and returns 1 if it is */
    /* Arbitrary sizes? Be careful about overflow */
    addr += size;
    if (unlikely(addr < size))
        return true;
    return unlikely(addr > limit);
}

mdelay macro

Busy wait function, unable to run other tasks in the delay process, will occupy CPU time, delay time is accurate.

Msleep is a sleep function, it does not involve busy waiting. When msleep (200) is used, the delay time is actually longer than 200 ms, which is an indeterminate time value.

#define MAX_UDELAY_MS 5
#define mdelay(n) (\ /* delay milliseconds*/
    (__builtin_constant_p(n) && (n)<=MAX_UDELAY_MS) ? udelay((n)*1000) : \
    ({unsigned long __ms=(n); while (__ms--) udelay(1000);}))
 
static void udelay(int loops) /*Delay microsecond level */
{
    while (loops--)
        io_delay(); /* Approximately 1 us */
}
 
static inline void io_delay(void)
{
    const u16 DELAY_PORT = 0x80;
    asm volatile("outb %%al,%0" : : "dN" (DELAY_PORT));
}
/*Writing any byte to I/O port 0x80 will result in a 1 us delay*/

System Call Macro

One of the most common macro uses in the linux kernel is the system call:

#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
/*...: Ellipsis for variable parts and u VA_AEGS_ for elongated parts*/
#Define SYSCALL_DEFINE_MAXARGS 6 /* System calls can take up to 6 parameters*/

Take the open system call as an example:

SYSCALL_DEFINE

The first parameter is the name of the system call, followed by 2*n parameters, each pair indicating the parameter type and name of the system call.

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
    if (force_o_largefile())
        flags |= O_LARGEFILE;
 
    return do_sys_open(AT_FDCWD, filename, flags, mode);
}

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
//Following the expansion is:
SYSCALL_DEFINEx(3, _open, __VA_ARGS__)

Expand again as:

__SYSCALL_DEFINEx(3, _open, __VA_ARGS__)
#define __SYSCALL_DEFINEx(x, name, ...) \
    asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \

Finally, expand to:

asmlinkage long sys_open(__MAP(3,__SC_DECL,__VA_ARGS__))
 
#define __MAP0(m,...)
#define __MAP1(m,t,a) m(t,a)
#define __MAP2(m,t,a,...) m(t,a), __MAP1(m,__VA_ARGS__)
#define __MAP3(m,t,a,...) m(t,a), __MAP2(m,__VA_ARGS__)
#define __MAP4(m,t,a,...) m(t,a), __MAP3(m,__VA_ARGS__)
#define __MAP5(m,t,a,...) m(t,a), __MAP4(m,__VA_ARGS__)
#define __MAP6(m,t,a,...) m(t,a), __MAP5(m,__VA_ARGS__)
#define __MAP(n,...) __MAP##n(__VA_ARGS__)
 
#define __SC_DECL(t, a) t a
 
__MAP(3,__SC_DECL,__VA_ARGS__)
-->__MAP3(__SC_DECL,const char __user *, filename, int, flags, umode_t, mode)
-->__SC_DECL(const char __user *, filename), __MAP2(__SC_DECL,__VA_ARGS__)
-->const char __user * filename,__SC_DECL(int, flags),__MAP1(__SC_DECL,__VA_ARGS__)
-->const char __user * filename, int flags, __SC_DECL(umode_t, mode)
-->const char __user * filename, int flags, umode_t mode

Finally, asmlink long sys_open (const char u user *filename, int flags, umode_t mode) is called;

Why define system calls as macros?In the cores of CVE-2009-0029, CVE-2010-3301, Linux 2.6.28 and previous versions, 32-bit parameters in system calls passed into 64-bit registers cannot be symbolically extended, possibly resulting in a system crash or a weighting vulnerability.

Kernel developers avoid this vulnerability by first converting all input parameters of a system call to a long type (64-bit) and then forcing it to the appropriate type.

asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
{ \
        long ret = __do_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\
        __MAP(x,__SC_TEST,__VA_ARGS__); \
        __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
        return ret; \
} \
 
 
#Define u TYPE_AS(t, v) u same_type((u force t)0, v) /* Determine if T and V are of the same type*/
#Define u TYPE_IS_L(t) (u TYPE_AS(t,0L))/* Determines whether T is a long type and returns 1*/
#Define u TYPE_IS_UL(t) (u TYPE_AS(t, 0UL))/* Determines whether T is of unsigned long type and returns 1*/
#Define u TYPE_IS_LL(t) (u TYPE_AS(t, 0LL) | u TYPE_AS(t, 0ULL)/* returns 1*/if it is a long type
#define __SC_LONG(t, a) __typeof(__builtin_choose_expr(__TYPE_IS_LL(t), 0LL, 0L)) a
/*Convert parameter to long type*/
#Define u SC_CAST(t, a) (u force t) a /* to the original type*/

# define __force __attribute__((force))

Indicates that the defined variable type can be cast

barrier() macro

Memory barrier, this statement does not produce any code, but refreshes the allocation of variables to registers after execution.

/* Optimization barrier */
/* The "volatile" is due to gcc bugs */
#define barrier() __asm__ __volatile__("": : :"memory")

After executing this statement, the registers in the cpu and the cached data in the cache are scrapped and the data in memory is re-read.This prevents the cpu from using data from registers and caches to optimize instructions without accessing memory.For example:

int a = 5, b = 6;
barrier();
a = b;

In the third line, instead of assigning a value to a with the register that holds b, the GCC rereads the b value in memory to assign a value to invalidate b's cache line.

Additional memory barrier macro definitions:

  • Mfence: Read-write operations before mfence instructions must be completed before read-write operations after mfence instructions.

  • Lfence: Read operation before lfence instruction must be completed before read operation after lfence instruction, does not affect write operation

  • Sfence: Write operation before sfence instruction must be completed before write operation after sfence instruction, does not affect read operation

  • The lock prefix (or instructions such as cpuid, xchg, etc.) causes this CPU's Cache to write to memory, which also causes other CPUs to invalidate their Cache.Memory used to decorate the current instruction operation can only be used by the current CPU

In memory, Write-Through and Write-Back strategies are distinguished for Cache update policies.The former updates directly to the memory and does not update the Cache at the same time, but to invalidate the Cache, the latter updates the Cache first, and then asynchronously updates the memory.Usually the Write-Back policy is used for X86 CPU memory updates.

ifdef ASSEMBLY macro

Some constant macros are used in both assembly and C. However, we cannot add a "UL" or other suffix like the constant macros in comment C.So we need to use the following macros to solve this problem.

For example, the call: #define DEMO_MACRO_AT(1, UL): is interpreted in C as #define DEMO_MACRO 1UL; and does nothing in the assembly, namely: #define DEMO_MACRO 1

#ifdef __ASSEMBLY__
#define _AC(X,Y) X
#define _AT(T,X) X
#else
#define __AC(X,Y) (X##Y)
#define _AC(X,Y) __AC(X,Y)
#define _AT(T,X) ((T)(X))
#endif
 
#define _UL(x) (_AC(x, UL))
#define _ULL(x) (_AC(x, ULL))

force_o_largefile macro

Determine if large files are supported.

define force_o_largefile() (personality(current->personality) != PER_LINUX32)

PER_LINUX32 = 0x0008,
PER_MASK = 0x00ff,
/*,

  • Return the base personality without flags.
    */

    define personality(pers) (pers & PER_MASK)

Logical and physical addresses convert to each other

#define __pa(x) __virt_to_phys((unsigned long)(x))
#define __va(x) ((void *)__phys_to_virt((unsigned long)(x)))

Error Code Related Macros

Some error codes in the linux kernel are simply represented by virtual addresses greater than or equal to -4095 using their negative numbers as function return values.

On 32-bit systems, the value converted from -4095 to unsigned long type is 0xFFF001, that is, the address range [0xFFFFF001, 0xFFFFFFFF FFFF] is used to represent the error code from -4095 to -1, respectively.

Determines whether the pointer returned by a function is a valid address or an error code:

#define MAX_ERRNO 4095
 
#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)
 
static inline long __must_check IS_ERR(const void *ptr)
{
    return IS_ERR_VALUE((unsigned long)ptr);
}

The interchange of the error code with the corresponding address:

static inline void * __must_check ERR_PTR(long error)
{
    return (void *) error;
}

Convert Long Integer to Pointer

static inline long __must_check PTR_ERR(const void *ptr)
{
    return (long) ptr;
}

Convert pointer to long integer

Extra interesting macros

Recursive macro, inverted bytes:

#define BSWAP_8(x) ((x) & 0xff)
#define BSWAP_16(x) ((BSWAP_8(x) << 8) | BSWAP_8((x) >> 8))
#define BSWAP_32(x) ((BSWAP_16(x) << 16) | BSWAP_16((x) >> 16))
#define BSWAP_64(x) ((BSWAP_32(x) << 32) | BSWAP_32((x) >> 32))

Swap macros, no extra variables need to be defined

#define swap(a, b) \
(((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))

Posted by darkerstar on Mon, 16 Dec 2019 18:33:46 -0800