Introduction to B+ Trees
B+ trees are a specialized tree data structure commonly used for indexing and managing large datasets, particularly in the context of databases and file systems. They are a variant of the more general B-tree data structure, with some key differences that make them better suited for certain applications.
The primary distinction between B-trees and B+ trees is that B+ trees store all data records (or "values") in the leaf nodes, while the internal nodes only contain "keys" that are used for navigating the tree. In contrast, B-trees can store both keys and values in their internal nodes. This design choice in B+ trees provides several advantages, such as faster searches, more efficient storage utilization, and easier maintenance of the tree structure.
Some of the key properties and characteristics of B+ trees include:
Order (M): The order of a B+ tree determines the maximum number of children a node can have. A B+ tree of order M can have up to M child nodes and store up to M-1 keys.
Leaf Nodes: All data records (or "values") are stored in the leaf nodes of the B+ tree. The leaf nodes are linked together in a doubly-linked list, which allows for efficient range queries and sequential access.
Internal Nodes: The internal nodes of a B+ tree only store keys, which are used to guide the search process. These nodes do not store any actual data records.
Balanced Tree: B+ trees are self-balancing data structures, meaning that the height of the tree is kept relatively low, ensuring efficient search, insertion, and deletion operations.
Minimum Occupancy: Each node in a B+ tree (except the root) must be at least half-full, meaning it must contain at least ⌈M/2⌉ – 1 keys.
Root Node: The root node of a B+ tree can have between 1 and M-1 keys, and between 2 and M child nodes.
These properties and characteristics of B+ trees make them well-suited for managing large datasets, particularly in scenarios where efficient indexing, range queries, and sequential access are important.
Insertion Process in B+ Trees
The insertion process in a B+ tree involves adding a new key-value pair to the appropriate leaf node of the tree. If the leaf node has sufficient space to accommodate the new key, the insertion is straightforward. However, if the leaf node is already full (i.e., it contains the maximum number of keys), the node must be split to maintain the B+ tree properties.
The general steps for inserting a new key-value pair into a B+ tree are as follows:
Locate the Appropriate Leaf Node: Start at the root of the B+ tree and traverse down the tree, following the appropriate child pointers based on the new key being inserted. This process will lead you to the leaf node where the new key-value pair should be inserted.
Insert the Key-Value Pair: If the leaf node has sufficient space to accommodate the new key-value pair, insert it in the correct sorted order.
Handle Node Overflow: If the leaf node is already full (i.e., it contains the maximum number of keys), the node must be split to maintain the B+ tree properties. The steps for handling node overflow are as follows:
- Split the leaf node into two nodes, with the first node containing ⌈M/2⌉ – 1 keys and the second node containing the remaining keys.
- Promote the smallest key from the second node to the parent node.
- If the parent node also becomes full, recursively split the parent node and promote the middle key to the next higher level.
Update Pointers: After splitting a node, update the pointers to maintain the linked list of leaf nodes and the correct parent-child relationships in the internal nodes.
Handle Root Node Splits: If the root node becomes full and needs to be split, create a new root node with the promoted key from the split, and the two child nodes resulting from the split.
By following these steps, you can successfully insert a new key-value pair into a B+ tree while maintaining the necessary properties and structure of the tree.
Example Walkthrough of Insertion
Let‘s consider an example to illustrate the insertion process in a B+ tree. Suppose we have a B+ tree of order 3 (i.e., each node can have up to 3 child nodes and store up to 2 keys), and we want to insert the following key values: 6, 16, 26, 36, 46.
Step 1: Insert 6 and 16
Initially, the B+ tree is empty, so we create a new root node and insert the keys 6 and 16 in the leaf node.
[6, 16]Step 2: Insert 26
Inserting 26 causes an overflow in the leaf node, so we split the node into two:
[6, 16]
[26]We also promote the smallest key from the second node (26) to the parent node.
Step 3: Insert 36
Inserting 36 causes another overflow in the leaf node, so we split the node again:
[6, 16]
[26, 36]We promote the smallest key from the second node (26) to the parent node.
Step 4: Insert 46
Inserting 46 causes an overflow in the leaf node, so we split the node once more:
[6, 16]
[26, 36]
[46]We promote the smallest key from the third node (36) to the parent node.
At this point, the parent node also becomes full, so we split it as well:
[16, 36]
[6]
[26]
[46]We promote the smallest key from the second node (16) to the new root node.
The final B+ tree after inserting all the keys looks like this:
[16]
[6] [26]
[6, 16] [26, 36] [46]This example demonstrates how the insertion process works in a B+ tree, including the steps of locating the appropriate leaf node, inserting the new key, handling node overflows, and updating the tree structure as necessary.
Algorithmic Implementation
Here are sample implementations of the B+ tree insertion process in Python, JavaScript, and Java:
C++
#include
#include
#include
#include
using namespace std;
// Node class
class Node {
public:
int t; // Order of the B+ tree
vector keys; // Keys in the node
vector<vector> values;
vector<Node> child_ptr; // Pointers to child nodes
bool leaf; // Boolean to check if the node is a leaf
int n; // Current number of keys
Node ptr2next; // Pointer to the next node
// Node constructor
Node(int _t, Node* _ptr2next = NULL) {
t = _t;
ptr2next = _ptr2next;
leaf = true;
keys.resize(2*t-1);
values.resize(2*t-1);
child_ptr.resize(2*t);
n = 0;
}
// Function to insert a key in a non-full node
void insertNonFull(string k, int v) {
int i = n-1;
if (leaf) {
keys.insert(keys.begin()+n, k);
values.insert(values.begin()+n, vector<int>(1, v));
n += 1;
while (i>=0 && keys[i]>k) {
swap(keys[i], keys[i+1]);
swap(values[i], values[i+1]);
i -= 1;
}
} else {
while (i>=0 && keys[i]>k) {
i -= 1;
}
i += 1;
if (child_ptr[i]->n == 2*t-1) {
splitChild(i);
if (keys[i] < k) {
i += 1;
}
}
child_ptr[i]->insertNonFull(k, v);
}
}
// Function to split the child
void splitChild(int i) {
Node* y = child_ptr[i];
Node* z = new Node(y->t, y->ptr2next);
child_ptr.insert(child_ptr.begin()+i+1, z);
keys.insert(keys.begin()+i, y->keys[t-1]);
values.insert(values.begin()+i, y->values[t-1]);
y->ptr2next = z;
z->leaf = y->leaf;
z->n = t-1;
y->n = t-1;
for (int j=0; j<t-1; j++) {
z->keys[j] = y->keys[j+t];
z->values[j] = y->values[j+t];
}
if (!y->leaf) {
for (int j=0; j<t; j++) {
z->child_ptr[j] = y->child_ptr[j+t];
}
}
n += 1;
}
// Function to print the tree
void print() {
for (int i=0; i<n; i++) {
if (!leaf) {
child_ptr[i]->print();
}
cout << "[‘" << keys[i] << "‘]" << endl;
}
if (!leaf) {
child_ptr[n]->print();
}
}
// Function to search a key in the tree
Node* search(string k, int v) {
int i = 0;
while (i<n && k>keys[i]) {
i += 1;
}
if (keys[i] == k) {
for (int j = 0; j < values[i].size(); j++) {
if (values[i][j] == v) {
return this;
}
}
}
if (leaf) {
return NULL;
} else {
return child_ptr[i]->search(k, v);
}
}};
class BTree {
public:
Node* root; // Root of the B+ tree
int t; // Order of the B+ tree
// BTree constructor
BTree(int _t) {
root = new Node(_t);
root->leaf = true;
}
// Function to insert a key in the tree
void insert(string k, int v) {
Node* r = root;
if (r->n == 2*t-1) {
Node* s = new Node(t);
root = s;
s->child_ptr[0] = r;
s->splitChild(0);
s->insertNonFull(k, v);
} else {
r->insertNonFull(k, v);
}
}
// Function to print the tree
void print() {
root->print();
}
// Function to search a key in the tree
Node* search(string k, int v) {
return (root == NULL)? NULL : root->search(k, v);
}};
// Function to print the tree
void printTree(BTree* tree) {
tree->print();
}
int main() {
int record_len = 3;
BTree* bplustree = new BTree(record_len);
bplustree->insert("5", 33);
bplustree->insert("15", 21);
bplustree->insert("25", 31);
bplustree->insert("35", 41);
bplustree->insert("45", 10);
printTree(bplustree);
if (bplustree->search("5", 34) != NULL) {
cout << "Found" << endl;
} else {
cout << "Not found" << endl;
}
return 0;
}
Java
import java.util.ArrayList;
import java.util.List;
// Node creation
class Node {
int order;
List values;
List<List> keys;
Node nextKey;
Node parent;
boolean isLeaf;
// Node constructor
public Node(int order) {
this.order = order;
this.values = new ArrayList<>();
this.keys = new ArrayList<>();
this.nextKey = null;
this.parent = null;
this.isLeaf = false;
}
// Insert at the leaf
public void insertAtLeaf(String value, Node key) {
if (!this.values.isEmpty()) {
for (int i = 0; i < this.values.size(); i++) {
if (value.equals(this.values.get(i))) {
this.keys.get(i).add(key);
break;
} else if (value.compareTo(this.values.get(i)) < 0) {
this.values.add(i, value);
this.keys.add(i, new ArrayList<>());
this.keys.get(i).add(key);
break;
} else if (i + 1 == this.values.size()) {
this.values.add(value);
this.keys.add(new ArrayList<>());
this.keys.get(i + 1).add(key);
break;
}
}
} else {
this.values.add(value);
this.keys.add(new ArrayList<>());
this.keys.get(0).add(key);
}
}}
// B plus tree
class BplusTree {
Node root;
// B plus tree constructor
public BplusTree(int order) {
this.root = new Node(order);
this.root.isLeaf = true;
}
// Insert operation
public void insert(String value, Node key) {
Node oldNode = this.search(value);
oldNode.insertAtLeaf(value, key);
if (oldNode.values.size() == oldNode.order) {
Node newNode = new Node(oldNode.order);
newNode.isLeaf = true;
newNode.parent = oldNode.parent;
int mid = (int) Math.ceil(oldNode.order / 2.0) - 1;
newNode.values = new ArrayList<>(oldNode.values.subList(mid + 1, oldNode.values.size()));