Interactive B+ Tree (C)

A B+ tree ("bee plus tree") is a data structure used as an index to facilitate fast access to the elements of a larger body of data, such as the entries in a database or the blocks of memory storage ("pages") in an operating system. Each target object (entry, page) is associated with an index key. The B+ tree is laid out like a family tree, where each node has some number of keys that is between some predetermined maximum limit and half that limit (inclusive). Each node also has one more pointer than the number of its keys. (A "pointer" is the address of a location in the computer's memory.) You can picture the node as having alternating pointers and keys, starting and ending with pointers.

At the bottom level of the B+ tree are the leaves. Each pointer on the leaf except the last (rightmost) one points to the data object whose key stands immediately to the right of that pointer. The rightmost pointer points to the next leaf over to the right. Then, each bunch of leaves has its own parent node. If there are enough of these parents, then they, in turn, are divided into bunches, each of which shares but one parent — and this one-parent-many-children family tree goes all the way up to a single ancestor at the top, the root. The internal nodes, which are the parents, grandparents, etc., of the leaf nodes, also have keys, which are (initially) duplicates of some of the keys on the leaves. A given internal node's keys are "representative" copies of a few of the keys to be found on the leaves that are the (ultimate) descendants of that node. The pointers on the internal nodes point to nodes at the next level down on the tree, which may be leaves or other internal nodes.

A search starts at the root node (top ancestor) of the tree. Each node has its keys in sorted order. Suppose you are searching for the data associated with some key. Starting at the root and working your way downward through the nodes, you find the smallest-value key in your current node whose value is greater than the value of the key you are seeking, and you follow the pointer to that key's left. If the sought key is greater than the rightmost key in the node, you follow the pointer at the node's extreme right. You repeat this procedure for each node you encounter as you move down the tree toward the leaves. When you reach a leaf, the key must be there if it exists in the tree at all. If it is, you follow the pointer to that key's left to reach the desired data. If the key is not in the leaf, then you know that no such data exist. Thus the whole search involves but a few hops through the ranks of the tree. To be more exact, the number of steps in the process is proportional to the logarithm of the whole number of keys. Thus, as data objects and keys are added, the length of the search path grows only very slowly.

The B+ tree preserves this efficient search capability even as you insert or remove data objects and their respective keys arbitrarily. It does so by allowing the nodes to grow and shrink as keys and data are inserted and deleted. When a node has too many keys, it splits in half. When it has too few, it coalesces with a neighboring node.

When you delete a datum with its key, it may happen that duplicates of that key in the internal nodes will remain. This does not harm the B+ tree's efficient operation: the key, though obsolete if it were in a leaf node, still works to direct the search path efficiently through the internal nodes.

The attached code is an implementation of a B+ tree, written in C, whose purpose is merely to show how it works. Once you've successfully compiled the program, you can insert or delete keys and see how the tree grows or shrinks. You can also trace a search path for a specific key through the tree — and various other interactive commands are available. (Enter ? at the prompt to see them all.)

I have also implemented the same algorithm in C++, making an effort to express the algorithm in the genuine idioms of C++. Accordingly, the source code appears rather different from that of the C version attached to this page. The C++ implementation uses features only available in C++'11 and later versions of the C++ programming language.

This interactive B+ tree demonstration is based on the description and pseudocode of the B+ tree presented in Abraham Silberschatz, Henry F. Korth, and S. Sudarshan, Database System Concepts, 5th ed. (New York: McGraw Hill, 2006), Section 12.3, 489-501. For more information on B+ trees in general, see B+ tree (Wikipedia).

You are allowed to download, use, redistribute, and change the source code according to the BSD 3-clause license, as set forth at the top of the source code file.

This code should be regarded as a "beta release." Please e-mail me (amittai dot aviram at gmail dot com) with comments, questions, bug reports, and corrections. Thank you!

Please download the source code file:

Then compile bpt.c using a C '99 compiler, such as GCC.

Recent Version History

Acknowledgements

Many thanks to the following kind users for sending me their bug reports and suggestions:

C++'11 implementation