Data Structure and Algorithms - Further Understanding Arrays

array

When it comes to arrays, I believe everyone is familiar with them. After all, every programming language has its shadow.

Arrays are the most basic data structure. Although arrays seem very simple, it is not so simple to master the essence of this basic data structure.

Come to the point

Array is a linear table data structure that uses a continuous set of memory space to store a set of data of the same type.

There are several key words in this definition, which is also the essence of the array. Here's a further understanding of arrays from these keywords.

The first is a linear table. As the name implies, linear tables are characterized by data rows forming a line-like structure. The data of each linear table has at most two directions: front and back. In addition to arrays, linked lists, queues, stacks and other data structures are also linear table structures.

Take chestnuts for example. Sugar gourd strings are very similar to linear tables. Sugar gourd (data) is strung into a straight line of bamboo sticks, and each sugar gourd (data) has only two directions at most.

The second is continuous memory space and the same type of data. Because of the limitations of these two conditions, arrays have very important characteristics: random access elements, random access elements of the time complexity of O(1). But there are advantages and disadvantages. The limitations of these two conditions lead to the need to move data in order to ensure the continuity of data when inserting and deleting a data.

Random access

How do arrays implement random access to array elements according to the following table?

Let's take an array int a[5] of type int with a length of 5 as an example. When we define this array, the computer allocates a continuous memory space to the array int a[5].

Suppose that the first address of the memory block of array int a[5] is base u address = 100, then

The address of a[0] is 100 (first address)
The address of a[2] is 104.
The address of a[3] is 108
The address of a[3] is 112.
The address of a[4] is 116

Computers access data stored in memory by visiting memory addresses. Then, when the computer wants to access an element in the array randomly, it will calculate the memory address of the corresponding element through the following addressing formula, so as to access the data through the memory address.

a[i]_address = base_address + i * data_type_size

a[i]_address denotes the memory address of the corresponding array subscript, data_type_size denotes the size of the data type stored in the array, and the array int a[5]. It stores five ints of data, and its data_type_size is four bytes.

The addressing formula of two-dimensional array, assuming that the dimension of two-digit array is mn, is as follows:

a[i][j]_address = base_address + ( i * n + j ) * data_type_size

Why do array subscripts start at 0?

To answer this question first, let's assume that the subscript of the array starts from 1 and that a[1] represents the first address of the array, then the addressing formula of the computer will be as follows:

a[i]_address = base_address + (i - 1) * data_type_size

Comparing the addressing formulas of array subscripts starting from 0 and setting array subscripts starting from 1, it is not difficult to see that from the beginning of numbering, each random access to array elements has one more subtraction operation, for the CPU, that is, one more subtraction instruction.

Moreover, arrays are very basic data structures and have very high frequency of use, so efficiency optimization must be done to the utmost. So in order to reduce the CPU's one subtraction instruction, the array is numbered from 0 instead of from 1.

The above analysis is from the point of view of computer addressing formula, of course, there are historical reasons.

Insertion and deletion of arrays

As mentioned earlier, for the definition of arrays, in order to maintain the continuity of memory data, arrays will result in inefficient insertion and deletion operations. Next, the code explains why it is inefficient. What are the ways to improve it?

Insertion procedure

The insertion operation has slightly different time complexity for different scenarios and insertion locations of data. The next step is to analyze the insertion operations in two scenarios: orderly and irregular.

In any scenario, if you insert elements at the end of an array, it is very simple. You do not need to move the data, but put the elements directly at the end of the array. This space-time complexity is O(1).

What if you insert data at the beginning or in the middle of an array? At this time, different ways can be adopted according to the different scenarios.

If the array data is ordered (from small to large or from large to small), when inserting a new element at k position, the data after k must be moved one bit backward, and the worst time complexity is O(n).

If there is no regularity in the data of the array, when inserting a new element at the K position, the data at the old K position is moved to the end of the data, and the new element data is placed directly at the K position. In this particular scenario, the time complexity of inserting an element at the k-th position is O(1).

A picture is worth a thousand words. We show that the data of an array is the process of inserting elements in orderly and irregular scenes in a graph way.

Delete operation procedure

Similar to inserting data, if we want to delete the data at position k, we need to move the data for the sake of memory continuity, otherwise there will be holes in the middle, and the memory will not be continuous.

If the data at the end of the array is deleted, the time complexity is O(1); if the data at the beginning is deleted, the data after k position needs to be moved forward by one bit, then the time complexity is O(n).

A graph is worth a thousand words. We show the array deletion operation in a graph way.

Code Actual Array Insertion, Deletion and Query

In this example, the data in arrays is ordered (data from small to large) scenario, which implements the insertion, deletion and query of arrays.

Firstly, the attributes of the array are defined by the structure, which includes the length of the array, the number of occupied arrays and the array pointer.

struct Array_t
{
    int length; // Array length
    int used;   // Number of occupants
    int *arr;   // Array address
};

Create arrays:

Create arrays of the same type corresponding to the continuous space according to the length of the array set by the structure

void alloc(struct Array_t *array)
{
    array->arr = (int *)malloc(array->length * sizeof(int));
}

Insertion process:

Determine whether the number of arrays occupied exceeds the length of the array
Traverse the array to find the subscript idx to insert the new element
If you find that the subscript of the insertion element is not at the end, you need to move the idx data one bit back in turn.
Insert a new element in the idx subscript and take up the number of arrays + 1

/*
 *  Insert new elements
 *  Parametric 1: Array_t Array Structure Pointer
 *  Parametric 2: Value of the new element
 *  Return: Successfully returns the inserted array subscript, and fails to return - 1
 */
int insertElem(struct Array_t *array, int elem)
{
    // When the number of subscripts occupied by an array is greater than or equal to the length of the array, it means that all subscripts of the array have stored data and cannot be inserted.
    if (array->used >= array->length)
    {
        std::cout << "ERROR: array size is full, can't insert " << elem << " elem." << std::endl;
        return -1;
    }

    int idx = 0;

    // Traverse the array to find the subscript idx greater than the new element elem
    for (idx = 0; idx < array->used; idx++)
    {
        // If the value of the array element is found to be greater than the value of the new element elem, exit
        if (array->arr[idx] > elem)
        {
            break;
        }
    }

    // If the insertion of subscripts is not at the end, you need to post idx
    // The data is moved back one bit in turn, leaving the element labeled idx for subsequent insertion.
    if (idx < array->used)
    {
        // Move the data after idx back one bit in turn
        memmove(&array->arr[idx + 1], &array->arr[idx], (array->used - idx) * sizeof(int));
    }

    // Insertion element
    array->arr[idx] = elem;
    // Occupancy increases by itself
    array->used++;

    // Successful return of inserted array subscripts
    return idx;
}

Delete process:

Judging whether the subscript to be deleted is legitimate
Move forward one bit of data after deleting the idx subscript

/*
 *  Delete new elements
 *  Parametric 1: Array_t Array Structure Pointer
 *  Parametric 2: Array subscript position for deleted elements
 *  Return: Successful return 0, failed return - 1
 */
int deleteElem(struct Array_t *array, int idx)
{
    // Determine whether subscript position is legal
    if (idx < 0 || idx >= array->used)
    {
        std::cout << "ERROR:idx[" << idx << "] not in the range of arrays." << std::endl;
        return -1;
    }

    // Move the data after the idx subscript one bit forward
    memmove(&array->arr[idx], &array->arr[idx + 1], (array->used - idx - 1) * sizeof(int));

    // Array occupancy minus 1
    array->used--;

    return 0;
}

Query subscript:

Traverse the array, query the subscript of the element value, return the array element if found, and report an error if not found.

/*
 *  Query element subscript
 *  Parametric 1: Array_t Array Structure Pointer
 *  Parametric 2: Element values
 *  Return: Successful return of element subscript, failed return - 1
 */
int search(struct Array_t *array, int elem)
{
    int idx = 0;

    // Traversing array
    for (idx = 0; idx < array->used; idx++)
    {
        // Find an array element with the same element value as the query, and return the element subscript
        if (array->arr[idx] == elem)
        {
            return idx;
        }

        // If the array element is larger than the new element, indicating that the array subscript has not been found, an early error exit is reported.
        // Because the array in this example is ordered from small to large
        if (array->arr[idx] > elem)
        {
            break;
        }
    }

    // After traversing, if the subscript of this array is not found, an error will be reported.
    std::cout << "ERROR: No search to this" << elem << " elem." << std::endl;

    return -1;
}

Print arrays:

Each element of the output array

void dump(struct Array_t *array)
{
    int idx = 0;

    for (idx = 0; idx < array->used; idx++)
    {
        std::cout << "INFO: array[" << idx << "] : " << array->arr[idx] << std::endl;
    }
}

main function:

Create an array of length 3, type int, and insert elements, delete elements, query elements and print elements into the array.

int main()
{
    struct Array_t array = {3, 0, NULL};

    int idx = 0;

    std::cout << "alloc array length: " << array.length << " size: " << array.length * sizeof(int) << std::endl;
    alloc(&array);
    if (!array.arr)
        return -1;

    std::cout << "insert 1 elem" << std::endl;
    insertElem(&array, 1);

    std::cout << "insert 0 elem" << std::endl;
    insertElem(&array, 0);

    std::cout << "insert 2 elem" << std::endl;
    insertElem(&array, 2);

    dump(&array);

    idx = search(&array, 1);
    std::cout << "1 elem  is at position " << idx << std::endl;

    idx = search(&array, 2);
    std::cout << "2 elem  is at position " << idx << std::endl;

    std::cout << "delect position [2] elem " << std::endl;
    deleteElem(&array, 2);

    dump(&array);

    return 0;
}

Operation results:

[root@lincoding array]# ./array
alloc array length: 3 size: 12
insert 1 elem
insert 0 elem
insert 2 elem
INFO: array[0] : 0
INFO: array[1] : 1
INFO: array[2] : 2
1 elem  is at position 1
2 elem  is at position 2
delect position [2] elem
INFO: array[0] : 0
INFO: array[1] : 1

Summary

Array is the most basic and simplest data structure. Arrays use a continuous memory space to store a set of data of the same type. The biggest feature is random access to elements, and the time complexity is O(1). However, insertion and deletion operations are inefficient, and the time complexity is O(n).

Statement: Reference for this article Geek Time-Data Structure and Algorithms Part of the content.

Posted by meritre on Sat, 21 Sep 2019 07:09:14 -0700

Programmer Group