The role of data structures and algorithms
- Functions provided: such as add, delete, set
Previous data structure
- The editor's thinking model is line based
- Developers read and write source code line by line
- The compiler provides row / column based diagnostics
- Stack traces contain line numbers, tag engines run line by line, and so on
When entering and modifying content
- The modified row was found in the array
- And replace it
When inserting a new row
- Splice the new row object into the row array
shortcoming
- Too many file lines will result in insufficient memory
- The speed at which the file is opened, the content is split according to the newline character, and a string object is obtained from each line. The splitting process itself will damage the performance
- Summary: row array takes a lot of time to create and consumes a lot of memory, but it provides fast row lookup
Select the new data structure (Piece table)
class PieceTable( // The starting content is read-only val original: String, // User added content var added: String, val nodes: MutableList<Node> ) sealed class Node { data class Original( val offset: Int, val length: Int, ) : Node() data class Added( val offset: Int, val length: Int, ) : Node() }
When opening a document for the first time
- original equals the entire file content
- added is null
- Another NodeType.Original type is single node
When the user enters at the end of the file
- Append the new content to the added field
- And insert a new type node NodeType.Added at the end of the node list
When the user edits in the middle of a node
- Split the node and insert a new node as needed
Example
fun main() { val pieceTable = PieceTable( original = "This first line\n This second line\n This third line,This third line", added = "", nodes = mutableListOf() ) pieceTable.added = "Add content" pieceTable.nodes.add( Node.Original(0,15) ) pieceTable.nodes.add( Node.Added(0,4) ) pieceTable.nodes.add( Node.Original(15,4) ) val strBuilder = StringBuilder() for (node in pieceTable.nodes) { when (node) { is Node.Original -> { strBuilder.append( String( chars = pieceTable.original.toCharArray(), offset = node.offset, length = node.length ) ) } is Node.Added -> { strBuilder.append( String( chars = pieceTable.added.toCharArray(), offset = node.offset, length = node.length ) ) } } } println(strBuilder) }
- Advantages: low memory usage
- Disadvantages: there is no line mark. If you want to jump to line 1000, you need to traverse from the beginning of the document to find the 1000th line break
Use cache to speed up row lookup
The piece table node adds newline information to find the line content faster
sealed class Node { data class Original( val offset: Int, val length: Int, val lineStarts: MutableList<Int>, ) : Node() data class Added( val offset: Int, val length: Int, val lineStarts: MutableList<Int> ) : Node() }
For example, if you want to access the second row in the specified Node instance, you can read node.lineStarts[0] and node.lineStarts[1],
Avoid string merge traps
- New data structure
- Avoid any string merging, get one block from disk at a time, put it directly into buffers, and create a node pointing to the buffer
data class PieceTable ( // Change original and added to buffers val buffers: MutableList<String>, val nodes: MutableList<Node> ) data class Node( val bufferIndex: Int, val start: Int, // start offset in buffers[bufferIndex] val length: Int, val lineStarts: MutableList<Int> )
Accelerating row lookup using balanced binary tree
If you want to jump to a row, you need to scroll through the nodes from the beginning to find the node containing the row
data class PieceTable ( val buffers: MutableList<String>, val rootNode: Node ) data class Node( val bufferIndex: Int, val start: Int, // start offset in buffers[bufferIndex] val length: Int, val lineStarts: MutableList<Int>, val leftSubtreeLength: Int, val leftSubtreeLfcnt: Int,// Number of line breaks val left: Node, val right: Node, val parent: Node, )
Reduce object allocation
class Buffer( val value: String, val lineStarts: MutableList<Int> ) class BufferPosition( val index: Int, // index in Buffer.lineStarts val remainder: Int, ) data class PieceTable ( val buffers: MutableList<Buffer>, val rootNode: Node ) data class Node( val bufferIndex: Int, val start: BufferPosition, // start offset in buffers[bufferIndex] val length: BufferPosition, val lineStarts: MutableList<Int>, val leftSubtreeLength: Int, val leftSubtreeLfcnt: Int,// Number of line breaks val left: Node, val right: Node, val parent: Node, )