Using linked list to learn Rust: A Bad Stack

Keywords: data structure linked list Rust

Books recommended by a good friend, take advantage of Thanksgiving holiday to learn! Original in here , you can find a lot of Chinese translations on the Internet. There are explanation videos on a certain station, but I haven't seen them very much, so it's hard to evaluate them. This note is some records and experience in my learning process. You are welcome to correct it! I don't know the standard translation of some nouns. I use English directly, including: list, node, layout

A Bad Stuck

Firstly, review the concept of list in functional programming. The recursively defined list belongs to sum type, which is similar to enum: List a = Empty | Elem a (List a) in C language
First attempt:

pub enum List {
    Empty,
    Elem(i32, List),
}

Compilation error! Solution: box is introduced to determine the size of space allocated on the heap. Box is the pointer type for managing heap allocation.

#[derive(Debug)]
pub enum List {
    Empty,
    Elem(i32, Box<List>),
}

Taking the list containing two elements a and B as an example, the actual space allocation is as follows.

[] = Stack
() = Heap

[Elem A, ptr] -> (Elem B, ptr) -> (Empty, *junk*)

Note: the first element A is allocated on the stack (in the form of elem (A, Box < list >), and the first element B is nested in A's Box < list >, so the last element Empty can only be allocated on the heap with * junk *.: (1) Empty is not A node at all; (2) Some elements are on the heap and some on the stack, which are not unified. The improvement scheme is as follows:

[ptr] -> (Elem A, ptr) -> (Elem B, *null*)

This layout solves the above two problems! Empty is no longer a separate node, and all elements are on the heap! In the first case, the node where empty is located should be ready at all times, and enough space should be reserved for new elements that may come at any time

Splitting and merging list s can cause more problems

layout 1:

[Elem A, ptr] -> (Elem B, ptr) -> (Elem C, ptr) -> (Empty *junk*)

split off C:

[Elem A, ptr] -> (Elem B, ptr) -> (Empty *junk*)
[Elem C, ptr] -> (Empty *junk*)

layout 2:

[ptr] -> (Elem A, ptr) -> (Elem B, ptr) -> (Elem C, *null*)

split off C:

[ptr] -> (Elem A, ptr) -> (Elem B, *null*)
[ptr] -> (Elem C, *null*)

Comparing the two layouts, it can be found that layout 1 destroys the advantage that the list should have of moving the pointer to change the position of elements. Compared with layout 2, it increases the process of copying element C from the heap to the stack. It is a non-uniform node layout

How can you improve your intuition?

pub enum List {
    Empty,
    ElemThenEmpty(i32),
    ElemThenNotEmpty(i32, Box<List>),
}

Obviously, it is not very good. ElemThenNotEmpty(0, Box(Empty)) is unreachable, so even if the heap space allocation space is reduced by 1, the space allocation is still non uniformly. In fact, this method occupies more space than before, because the previous layout benefited from null pointer optimization. null pointer optimization is an important optimization feature of Rust, Space optimization can be carried out according to the tag of enum type variable (integer indicating which enum variable is). An important example is that using Option for &, & mut, box, RC, arc, VEC, etc. has no additional overhead

struct Node {
    elem: i32,
    next: List,
}

pub enum List {
    Empty,
    More(Box<Node>),
}

If the List is public, the Node must also be public, but this is not a good implementation. The improvement scheme is as follows:

pub struct List {
    head: Link,
}

enum Link {
    Empty,
    More(Box<Node>),
}

struct Node {
    elem: i32,
    next: Link,
}

This is a zero cost abstraction!

ownership

Three primary type s:

  • self - Value
  • &mut self - mutable reference
  • &self - shared reference

Push & Pop

use std::mem;

impl List {
    pub fn push(&mut self, elem: i32) {
        let new_node = Box::new(Node {
            elem: elem,
            next: mem::replace(&mut self.head, Link::Empty),
        });
        self.head = Link::More(new_node);
    }
    
	pub fn pop(&mut self) -> Option<i32> {
	    match mem::replace(&mut self.head, Link::Empty) {
	        Link::Empty => None,
	        Link::More(node) => {
	            self.head = node.next;
	            Some(node.elem)
	        }
	    }
	}
}

The implementation of push and pop is very interesting. In push, new_ The next field of node needs to be temporarily occupied, and then self.head is assigned a new value, and the next will change accordingly. In pop, on the one hand, match the value of self.head, and on the other hand, assign a new value to self.head

Drop

impl Drop for List {
    fn drop(&mut self) {
        let mut cur_link = mem::replace(&mut self.head, Link::Empty);
        // `while let` == "do this thing until this pattern doesn't match"
        while let Link::More(mut boxed_node) = cur_link {
            cur_link = mem::replace(&mut boxed_node.next, Link::Empty);
            // boxed_node goes out of scope and gets dropped here;
            // but its Node's `next` field has been set to Link::Empty
            // so no unbounded recursion occurs.
        }
    }
}

The reason why this scheme is better than while let some () = self. Pop() {} is that it only involves the movement of the pointer without moving the node value, and so on. FN pop can be considered_ node(&mut self) -> Link.

Posted by mbrown on Wed, 24 Nov 2021 15:27:30 -0800