Implement text buffer

Keywords: Android

The role of data structures and algorithms

  • Functions provided: such as add, delete, set

Previous data structure

  • The editor's thinking model is line based
  • Developers read and write source code line by line
  • The compiler provides row / column based diagnostics
  • Stack traces contain line numbers, tag engines run line by line, and so on

When entering and modifying content

  • The modified row was found in the array
  • And replace it

When inserting a new row

  • Splice the new row object into the row array

shortcoming

  • Too many file lines will result in insufficient memory
  • The speed at which the file is opened, the content is split according to the newline character, and a string object is obtained from each line. The splitting process itself will damage the performance
  • Summary: row array takes a lot of time to create and consumes a lot of memory, but it provides fast row lookup

Select the new data structure (Piece table)

class PieceTable(
    // The starting content is read-only
    val original: String,
    // User added content
    var added: String,
    val nodes: MutableList<Node>
)

sealed class Node {
    data class Original(
        val offset: Int,
        val length: Int,
    ) : Node()
    data class Added(
        val offset: Int,
        val length: Int,
    ) : Node()
}

When opening a document for the first time

  • original equals the entire file content
  • added is null
  • Another NodeType.Original type is single node

When the user enters at the end of the file

  • Append the new content to the added field
  • And insert a new type node NodeType.Added at the end of the node list

When the user edits in the middle of a node

  • Split the node and insert a new node as needed

Example

fun main() {
    val pieceTable = PieceTable(
        original = "This first line\n This second line\n This third line,This third line",
        added = "",
        nodes = mutableListOf()
    )
    pieceTable.added = "Add content"
    pieceTable.nodes.add(
        Node.Original(0,15)
    )
    pieceTable.nodes.add(
        Node.Added(0,4)
    )
    pieceTable.nodes.add(
        Node.Original(15,4)
    )
    val strBuilder = StringBuilder()
    for (node in pieceTable.nodes) {
        when (node) {
            is Node.Original -> {
                strBuilder.append(
                    String(
                        chars = pieceTable.original.toCharArray(),
                        offset = node.offset,
                        length = node.length
                    )
                )
            }
            is Node.Added -> {
                strBuilder.append(
                    String(
                        chars = pieceTable.added.toCharArray(),
                        offset = node.offset,
                        length = node.length
                    )
                )
            }
        }
    }
    println(strBuilder)
}
  • Advantages: low memory usage
  • Disadvantages: there is no line mark. If you want to jump to line 1000, you need to traverse from the beginning of the document to find the 1000th line break

Use cache to speed up row lookup

The piece table node adds newline information to find the line content faster

sealed class Node {
    data class Original(
        val offset: Int,
        val length: Int,
        val lineStarts: MutableList<Int>,
    ) : Node()
    data class Added(
        val offset: Int,
        val length: Int,
        val lineStarts: MutableList<Int>
    ) : Node()
}

For example, if you want to access the second row in the specified Node instance, you can read node.lineStarts[0] and node.lineStarts[1],

Avoid string merge traps

  • New data structure
  • Avoid any string merging, get one block from disk at a time, put it directly into buffers, and create a node pointing to the buffer
data class PieceTable (
	// Change original and added to buffers
    val buffers: MutableList<String>,
    val nodes: MutableList<Node>
)

data class Node(
    val bufferIndex: Int,
    val start: Int, // start offset in buffers[bufferIndex]
    val length: Int,
    val lineStarts: MutableList<Int>
)

Accelerating row lookup using balanced binary tree

If you want to jump to a row, you need to scroll through the nodes from the beginning to find the node containing the row

data class PieceTable (
    val buffers: MutableList<String>,
    val rootNode: Node
)

data class Node(
    val bufferIndex: Int,
    val start: Int, // start offset in buffers[bufferIndex]
    val length: Int,
    val lineStarts: MutableList<Int>,

    val leftSubtreeLength: Int,
    val leftSubtreeLfcnt: Int,// Number of line breaks
    val left: Node,
    val right: Node,
    val parent: Node,
)

Reduce object allocation

class Buffer(
    val value: String,
    val lineStarts: MutableList<Int>
)

class BufferPosition(
    val index: Int, // index in Buffer.lineStarts
    val remainder: Int,
)

data class PieceTable (
    val buffers: MutableList<Buffer>,
    val rootNode: Node
)

data class Node(
    val bufferIndex: Int,
    val start: BufferPosition, // start offset in buffers[bufferIndex]
    val length: BufferPosition,
    val lineStarts: MutableList<Int>,

    val leftSubtreeLength: Int,
    val leftSubtreeLfcnt: Int,// Number of line breaks
    val left: Node,
    val right: Node,
    val parent: Node,
)

Posted by terry2 on Fri, 29 Oct 2021 01:27:50 -0700