Home B-Trees
Post
Cancel

B-Trees

B-Trees

  • BSTs can either be bushy or spindly.
    • A spindly BST tree is slow to access elements. Similar to a linked list. $O(N)$ time for accessing elements.
    • A bushy BST is fast to access elements, $O(\log N)$ time to access elements.
  • This motivates us to construct a “Balancing-Tree”, a tree that always attempts to maintain a bushy structure.

    Note about Big-Theta versus Big-O. Big Theta is typically more informative than Big O as it restricts us to one family of functions. Something that is O(N) could also be O(N^2). The upper bound can keep going up without consequence.

B-Tree Operations

  • BST performance is reliant on Height and Depth
    • Height: The depth of the deepest leaf and is that tree-wide property.
    • Depth: The distance from the root of a node and is node-specific.
    • Average Depth: The mean of the depth of every node.
  • The core-concept behind a B-Tree is the ability to overstuff the leaf nodes.
    • If we can avoid it, we do not add a new node but rather stick the new value into an existing leaf node at some determined location. Overstuffed Node
  • The problem with overstuffing is that with sufficiently enough items in a node, searchign sequentially within a node requires linear search.

Moving items up.

  • Once a node becomes saturated with enough values, we may move up a value to the parent node.
    • For simplicity, we always move the middle element.
    • However, this may cause an issue where there exists values smaller than the moved value to the right of the moved element: Splitting Overstuffed Node
      • In this case, the value of 16 is greater than 17, but it is still to the right of 17. This breaks the BST property.
  • To address this, we create many children for each parent overstuffed parent node. Intervals of Node
  • In the above example, we have nodes representing intervals for $(-\infty, 15), [15,17], (17, +\infty)$
  • We can set some constant $L$ as the limit on our node size.
    • This would make our search time increase only by a constant factor, so we preserve our asymptotic behavior.
  • When we add a node, we may cause a chain reaction where parent nodes successively split. Chain Reaction Splitting
  • A B-Tree can have different names based on the number of children it may have in a node.
    • We work with 2-3 trees, which contains a limit of 2 items per node, where each node can have 2 or three children.

B-Tree Usage

  • B-Trees are used for two main contexts
    • We use a small L for self-balancing search trees.
    • We use a large L in the thousands for databses and filesystems.
  • The construction of a B-Tree grants it two invariants:
    • All leaves are the same distance from the root.
    • A non-leaf node with k items must have exactly k+1 children.
  • This guaratees us a B-Tree with $\log N$ height

    B-Tree Runtime Analysis

  • let $L$ be the maximum items per node and $N$ be the number of items. The maximum height should be $\log_{L+1} N$ in the best case when all nodes have $L$ items and $\log_{2}N$ worst case when each node has 1 item.
    • Overall height is on the order of $\Theta (\log N)$ Worst-case B-Tree height Best-case B-Tree height

      Runtime for contains

  • Worst case forces us to look at up to $L$ items per node, but since height is logarithmic, the runtime of contains is bounded by $O(L \log N)$. This tells use that the runtime of contains is $O(\log N)$

    Runtime for add

  • The logic for add is similar to contains except add might break up certain nodes. The act of breaking up nodes results in doing $\log N$ additional split operations. THus means that the runtime of add is still $O(\log N)$
This post is licensed under CC BY 4.0 by the author.

CS61B: Lecture 16

CS61B: Lecture 17