"21 days of good habits" phase I-14

Keywords: word2vec

Strange usage of C in word2vec

1. malloc() and calloc()

Both malloc() and calloc() functions can be used to dynamically allocate memory space, but they are slightly different.
malloc() function has one parameter, that is, the size of memory space to be allocated:

void *malloc(size_t size);

The calloc() function has two parameters: the number of elements and the size of each element. The product of these two parameters is the size of the memory space to be allocated.

void *calloc(size_t numElements,size_t sizeOfElement);

Same: if the call is successful, both malloc() and calloc() will return the first address of the allocated memory space.
Difference: the function malloc () cannot initialize the allocated memory space, but calloc() can. If the memory space allocated by the malloc() function has not been used, each bit may be 0; On the contrary, if this part of memory has been allocated, there may be a variety of data left in it. In other words, the program using malloc() function can run normally at the beginning (the memory space has not been reallocated), but problems may occur after a period of time (the memory space has been reallocated).
The function calloc() initializes every bit in the allocated memory space to zero, that is, if you allocate memory for elements of character type or integer type, these elements will be guaranteed to be initialized to 0; If you allocate memory for pointer type elements, these elements will usually be initialized as null pointers; If you allocate memory for real data, these elements will be initialized to floating-point zeros.

reference resources original text

2.posix_memalign()

In the source code:

posix_memalign((void **)&syn0, 128, (long long)vocab_size * layer1_size * sizeof(real))
//Allocate memory for syn0, aligned memory, size vocab_size * layer1_size * sizeof(real), that is, each word corresponds to a layer1_ Vector of size
posix_memalign():

Function: return dynamic memory of size bytes and pre align memory allocation. posix_ The usage of the memalign function is similar to that of malloc, which is defined by POSIX_ The memory space allocated by memalign needs to be released by free.

Header file: #include < stdlib. H >

Function prototype: int posix_memalign (void **memptr,
size_t alignment,
size_t size);

Parameters:
memptr ---- the first address of the allocated memory space
Alignment ---- alignment boundary. In Linux, the 32-bit system is 8 bytes and the 64 bit system is 16 bytes
Size ------------ specifies the size of memory allocated in bytes

Return value: when posix_memalign() is called successfully, the dynamic memory of size bytes will be returned, and the address of this memory is a multiple of alignment. The parameter alignment must be a power of 2 or a multiple of the size of void pointer. The address of the returned memory block is placed in memptr, and the return value of the function is 0.
When the call fails, no memory will be allocated, and the value of memptr is not defined. One of the following error codes is returned:
EINVAL: parameter is not a power of 2 or a multiple of void pointer.
ENOMEM: there is not enough memory to satisfy the function's request.
Note that for this function, errno is not set and can only be obtained by returning the value.

reference resources link

3. Construct Huffman tree


Picture source

pos1 = vocab_size - 1;
pos2 = vocab_size;
for (a = 0; a < vocab_size - 1; a++) {
    // First, find two smallest nodes' Min1, Min2 ', note that the words in vocab are arranged in descending order according to cn
	//pos1 represents the word frequency corresponding to the most original word, while pos2 represents the word frequency formed by taking the combined minimum value
	//Two times taken, as like as two peas, two times.
    if (pos1 >= 0) {
      if (count[pos1] < count[pos2]) {
        min1i = pos1;
        pos1--;
      } else {
        min1i = pos2;
        pos2++;
      }
    } else {
      min1i = pos2;
      pos2++;
    }
    if (pos1 >= 0) {
      if (count[pos1] < count[pos2]) {
        min2i = pos1;
        pos1--;
      } else {
        min2i = pos2;
        pos2++;
      }
    } else {
      min2i = pos2;
      pos2++;
    }
    count[vocab_size + a] = count[min1i] + count[min2i];
    parent_node[min1i] = vocab_size + a;                   //Record the position of the merged parent node
    parent_node[min2i] = vocab_size + a;
    binary[min2i] = 1;                                     //0 on the left and 1 on the right
  }

Posted by obscurr on Fri, 05 Nov 2021 11:29:28 -0700