B+Tree

B+Tree #

A B+Tree is a self balanced tree data structure that keeps data sorted and allows searches, sequential access, insertion, and deletions in O(log(n)). It is optimized for disk-oriented DBMSs that read/write large blocks of data.

Almost every modern DBMS that supports order-preserving indexes uses a B+Tree. There is a specific data structure called a B-Tree, but people also use the term to generally refer to a class of data structures. Modern B+Tree implementations combine features from other B-Tree variants, such as the sibling pointers used in the Blink-Tree.

Formally, a B+Tree is an M-way search tree with the following properties:

  1. Every inner node other than the root is at least half full, i.e.,
    • ceil of M/2 <= num of children <= M
    • (ceil of M/2) − 1 <= num of keys <= M − 1
  2. It is perfectly balanced (i.e., every leaf node is at the same depth).
  3. Every inner node with k keys has k+1 non-null children.
  4. Root node can have minimum 2 children(i.e, at least one key).

Every node in a B+Tree contains an array of key/value pairs:

  1. Arrays at every node are (almost) sorted by the keys.
  2. The value array for inner nodes will contain pointers to other nodes.
  3. Two approaches for leaf node values
    1. Record IDs: A pointer to the location of the tuple
    2. Tuple Data: The actual contents of the tuple is stored in the leaf node

Degree & Order #

Degree represents the Lower bound on the number of children a node in the B Tree can have (except for the root). i.e the minimum number of children possible. Whereas the Order represents the Upper bound on the number of children. ie. the maximum number possible.

Assume the Order of a B tree is U and Degree is L, then L = ceil of U/2.

More specifically, the ceil can be interpreted as, If U-1(max number of keys) is odd, then U is even and U=2L, otherwise U is odd and U=2L-1.

Insersion I #

All insertions start at a leaf node. To insert a new element, search the tree to find the leaf node where the new element should be added. Insert the new element into that node with the following steps:

  1. If the node contains fewer than the maximum allowed number of elements, then there is room for the new element. Insert the new element in the node, keeping the node’s elements ordered.
  2. Otherwise the node is full, evenly split it into two nodes so:
    1. A single median is chosen from among the leaf’s elements and the new element that is being inserted.
    2. Values less than the median are put in the new left node and values greater than the median are put in the new right node, with the median acting as a separation value.
    3. The separation value is inserted in the node’s parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree).

If the splitting goes all the way up to the root, it creates a new root with a single separator value and two children, which is why the lower bound on the size of internal nodes does not apply to the root. The maximum number of elements per node is U−1. When a node is split, one element moves to the parent, but one element is added. So, it must be possible to divide the maximum number U−1 of elements into two legal nodes. If this number(U-1) is odd, then U=2L and one of the new nodes contains (U−2)/2 = L−1 elements, and hence is a legal node, and the other contains one more element, and hence it is legal too. If U−1 is even, then U=2L−1, so there are 2L−2 elements in the node. Half of this number is L−1, which is the minimum number of elements allowed per node.

Insertion II #

An alternative algorithm supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way preemptively. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this algorithm, we must be able to send one element to the parent and split the remaining U−2 elements into two legal nodes, without adding a new element. This requires U = 2L rather than U = 2L−1, which accounts for why some textbooks impose this requirement in defining B-trees.

Deletion #

There are two popular strategies for deletion from a B-tree.

  1. Locate and delete the item, then restructure the tree to retain its invariants, OR
  2. Do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring

Deletion I #

Locate and delete the item, then restructure the tree to retain its invariants, There are two special cases to consider when deleting an element:

  1. The element in an internal node is a separator for its child nodes
  2. Deleting an element may put its node under the minimum number of elements and children

The procedures for these cases are in order below.

Deletion from a leaf node #

  1. Search for the value to delete.
  2. If the value is in a leaf node, simply delete it from the node.
  3. If underflow happens, rebalance the tree as described in section “Rebalancing after deletion” below

Deletion from an internal node #

Each element in an internal node acts as a separation value for two subtrees, therefore we need to find a replacement for separation. Note that the largest element in the left subtree is still less than the separator. Likewise, the smallest element in the right subtree is still greater than the separator. Both of those elements are in leaf nodes, and either one can be the new separator for the two subtrees. Algorithmically described below:

  1. Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator.
  2. The previous step deleted an element (the new separator) from a leaf node. If that leaf node is now deficient (has fewer than the required number of nodes), then rebalance the tree starting from the leaf node.

Rebalancing after deletion #

Rebalancing starts from a leaf and proceeds toward the root until the tree is balanced. If deleting an element from a node has brought it under the minimum size, then some elements must be redistributed to bring all nodes up to the minimum. Usually, the redistribution involves moving an element from a sibling node that has more than the minimum number of nodes. That redistribution operation is called a rotation. If no sibling can spare an element, then the deficient node must be merged with a sibling. The merge causes the parent to lose a separator element, so the parent may become deficient and need rebalancing. The merging and rebalancing may continue all the way to the root. Since the minimum element count doesn’t apply to the root, making the root be the only deficient node is not a problem. The algorithm to rebalance the tree is as follows:

  • If the deficient node’s right sibling exists and has more than the minimum number of elements, then rotate left

    1. Copy the separator from the parent to the end of the deficient node (the separator moves down; the deficient node now has the minimum number of elements)
    2. Replace the separator in the parent with the first element of the right sibling (right sibling loses one node but still has at least the minimum number of elements)
    3. The tree is now balanced
  • Otherwise, if the deficient node’s left sibling exists and has more than the minimum number of elements, then rotate right

    1. Copy the separator from the parent to the start of the deficient node (the separator moves down; deficient node now has the minimum number of elements)
    2. Replace the separator in the parent with the last element of the left sibling (left sibling loses one node but still has at least the minimum number of elements)
    3. The tree is now balanced
  • Otherwise, if both immediate siblings have only the minimum number of elements, then merge with a sibling sandwiching their separator taken off from their parent

    1. Copy the separator to the end of the left node (the left node may be the deficient node or it may be the sibling with the minimum number of elements)
    2. Move all elements from the right node to the left node (the left node now has the maximum number of elements, and the right node – empty)
    3. Remove the separator from the parent along with its empty right child (the parent loses an element)
      • If the parent is the root and now has no elements, then free it and make the merged node the new root (tree becomes shallower)
      • Otherwise, if the parent has fewer than the required number of elements, then rebalance the parent[21]

    Note: The rebalancing operations are different for B+ trees (e.g., rotation is different because parent has copy of the key) and B*-tree (e.g., three siblings are merged into two siblings).

B+TREE VISUALIZATION #

https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

B-Tree FAMILY #

There is a specific data structure called a B-Tree, people also use the term to generally refer to a class of balanced tree data structures:

  • B-Tree(1971)
  • B+Tree(1973)
  • B*Tree(1977?)
  • Blink-Tree (1981)

Reference #

https://en.wikipedia.org/wiki/B-tree

https://en.wikipedia.org/wiki/B%2B_tree