C/C++数据结构之B树详解

B树--磁盘存储链式

  • 树的概念
  • 二叉树的概念
  • B树的概念
  • B树的性质
  • B树的具体实现
    • B树的定义
    • B树创建结点
    • B树销毁结点
    • B树的分裂
    • B树的添加
    • B树的合并
    • B树的删除
    • B树的查找
    • 完整示例代码
  • 总结

树的概念

树是计算机中非常重要的一种数据结构,树的存储方式可以提高数据的存储、读取效率。
树的种类有很多种,有二叉树、完全二叉树、红黑树、B树、B+树、满二叉树、二叉排序树、平衡二叉树、AVL平衡二叉树、堆等。

1
2
3
4
5
6
7
8
9

二叉树的概念

二叉树就是每个结点最多有两个子结点;二叉树的结点分为左结点和右结点。

1
2
3
4
5
6
7
8
9

二叉树的每一层的结点数量都达到最大值,则这个二叉树称为满二叉树。一棵深度为n的满二叉树,结点数量有:
2 n − 1 2^n-1 2n1

1
2
3
4
5
6
8

叶子结点只能出现在最下层和次下层,最后一层的叶子结点集中在左边,倒数第二层的叶子结点在右边连续,这样的二叉树称为完全二叉树

1
2
3
4
5
6
7
8
9
10

B树的概念

二叉树的每个结点最多有两个子节点,那么当有1023个节点时,二叉树至少需要10层;查找数据时这会增加数据的对比次数。
如果使用二叉树对磁盘数据进行组织,把二叉树每个节点都存储在磁盘,那么每次数据对比后寻找下一个结点又是一次磁盘寻址,这是一个极其耗时的过程。

因此,衍生出一种降层高的数据结构:多叉树。
多叉树和B树区别:多叉树说明树的结构有多个叉,而B树的所有叶子节点在同一层。

B+树:B+树的内结点不存储数据,仅作索引;所有数据都是存储在叶子结点上。
而B树的所有结点都是存储数据的。

B+树更适合做磁盘索引,性能优于B树;因为B+的内结点不存储数据。

B树的性质

一颗M阶B数T,满足一下条件:
(1)每个结点至多拥有M颗子树。
(2)根结点至少拥有两颗子树。
(3)除了根结点以外,其余每个分支结点至少拥有M/2颗子树。
(4)所有叶子结点在同一层上。
(5)有K颗子树的分支结点则存在K-1个关键字,关键字按照递增顺序进行排序。
(6)关键字数量满足ceil(M/2)-1 <=n<=M-1。

1
F
C
10
Z
V
W
X
Y
9
T
S
8
Q
P
7
N
M
6
K
J
5
H
G
4
E
D
3
B
A
2
U
L
O
R
I

B树的具体实现

B树的定义

typedef int KEY_VALUE;

typedef struct _btree_node {
	//int keys[2 * SUB_M - 1];					// 存储关键字,M-1
	KEY_VALUE *keys;							// 存储关键字
	// void *value;								// 存储数据
	struct _btree_node **childrens;				// 子树,M
	int num;									// 已存储的key数量
	int leaf;									// 是否为叶子结点
}btree_node;

typedef struct _btree {
	btree_node *root;
	int t;				// M阶,t=M/2
}btree;

B树创建结点

// 创建结点
btree_node *btree_create_node(int t,int leaf)
{
	btree_node *node = (btree_node *)calloc(1, sizeof(btree_node));
	if (node == NULL)
		return NULL;

	
	node->keys = (KEY_VALUE *)calloc(1, (2 * t - 1)*sizeof(KEY_VALUE));
	if (node->keys == NULL)
	{
		free(node);
		return NULL;
	}
	
	node->childrens = (btree_node **)calloc(1, (2 * t)*sizeof(btree_node*));
	if (node->childrens == NULL)
	{
		free(node->keys);
		free(node);
		return NULL;
	}
	node->leaf = leaf;
	node->num = 0;

	return node;
}

// 创建根结点
void btree_create(btree *T, int t) {
	T->t = t;

	btree_node *x = btree_create_node(t, 1);
	T->root = x;

}

B树销毁结点

void btree_destroy_node(btree_node *node)
{
	if (node == NULL)
		return;
	if (node->childrens != NULL)
		free(node->childrens);
	if (node->keys != NULL)
		free(node->keys);
	free(node);
}

B树的分裂

假设一颗M阶B数T,当B树的结点关键字达到M时,需要分裂。分裂时将M/2结点放到父结点,0 ~ (M/2-1)组成一个子树,(M/2+1) ~ (M-1)组成一个子树;当父结点结点关键字达到M时也要分裂。
结点分裂:

分裂
添加O完成
结点分裂
L
C
F
I
B
A
E
D
H
G
K
J
O
M
N
结点分裂
L
C
F
I
B
A
E
D
H
G
K
J
N
M
结点分裂
I
C
F
B
A
E
D
H
G
N
J
K
L
M

根结点分裂,M=6:

添加F
添加F结束
根结点分裂
B
A
F
D
E
C
根结点分裂
B
A
E
D
C
根结点分裂
E
A
B
C
D
// 子节点分裂
void btree_split_child(btree *T, btree_node *x, int idx)
{

	int t = T->t;

	btree_node *y = x->childrens[idx];
	btree_node *z = btree_create_node(t,y->leaf);

	z->num = t - 1;
	int i = 0;
	for (i = 0; i < t - 1; i++)
		z->keys[i] = y->keys[t + i];

	if (y->leaf == 0)//inner,内节点
	{
		for (i = 0; i < t; i++)
			z->childrens[i] = y->childrens[t + i];
	}

	y->num = t-1;

	// 移动、插入结点
	for (i = x->num; i >= idx + 1; i--)
	{
		x->childrens[i + 1] = x->childrens[i];
	}
	x->childrens[idx + 1] = z;

	// key 交换
	for (i = x->num-1; i >= idx; i--)
	{
		x->keys[i + 1] = x->keys[i];
	}
	x->keys[idx] = y->keys[t-1];
	x->num += 1;
}

B树的添加

先分裂再添加。B树添加的时候,将其添加到叶子结点,再根据是否满足分裂的条件进行分裂。如果父节点满了之后需要再进行分裂来增加树的高度。

添加G
添加H
添加 I
B
A
E
D
I
G
H
F
C
B
A
H
D
E
F
G
C
B
A
G
D
E
F
C
B
A
F
D
E
C
void btree_insert_nonfull(btree *T, btree_node *x, KEY_VALUE k) {

	int i = x->num - 1;

	if (x->leaf == 1) {

		while (i >= 0 && x->keys[i] > k) {
			x->keys[i + 1] = x->keys[i];
			i--;
		}
		x->keys[i + 1] = k;
		x->num += 1;

	}
	else {
		while (i >= 0 && x->keys[i] > k) i--;

		if (x->childrens[i + 1]->num == (2 * (T->t)) - 1) {
			btree_split_child(T, x, i + 1);
			if (k > x->keys[i + 1]) i++;
		}

		btree_insert_nonfull(T, x->childrens[i + 1], k);
	}
}

void btree_insert(btree *T, KEY_VALUE key) {

	btree_node *r = T->root;

	if (r->num == 2 * T->t - 1) {

		btree_node *node = btree_create_node(T->t, 0);
		T->root = node;

		node->childrens[0] = r;

		btree_split_child(T, node, 0);

		int i = 0;
		if (node->keys[0] < key) i++;
		btree_insert_nonfull(T, node->childrens[i], key);

	}
	else {
		btree_insert_nonfull(T, r, key);
	}
}

B树的合并

假设一颗M阶B数T,当要删除的结点所在子树的key数量等于ceil(M/2)时,需要进行合并。

合并
L
I
U
R
H
D
E
F
G
K
J
N
M
Q
P
T
S
Z
V
W
X
Y
O
L
F
I
U
R
E
D
H
G
K
J
N
M
Q
P
T
S
Z
V
W
X
Y
O
/*************************合并 merge*****************************/
void btree_merge(btree *T, btree_node *x, int idx)
{
	btree_node *left = x->childrens[idx];
	btree_node *right = x->childrens[idx + 1];

	int i = 0;

	// 合并keys
	left->keys[T->t-1] = x->keys[idx];
	for (i = 0; i < T->t-1; i++)
	{
		left->keys[T->t + i] = right->keys[i];
	}

	// 如果不是子树,需要拷贝结点
	if (!left->leaf) {
		for (i = 0; i < T->t; i++) {
			left->childrens[T->t + i] = right->childrens[i];
		}
	}
	left->num += T->t;

	btree_destroy_node(right);

	// x 的key前移
	for (i = idx + 1; i < x->num; i++)
	{
		x->keys[i - 1] = x->keys[i];
		x->childrens[i] = x->childrens[i + 1];
	}

	x->childrens[i + 1] = NULL;
	x->num -= 1;

	if (x->num == 0) {
		T->root = left;
		btree_destroy_node(x);
	}
}

B树的删除

B树的查找时间复杂度: l o g n ( m / n ) logn(m/n) logn(m/n)。其中n是叉的数量。

借位:如果子树关键字数量=M/2-1,需要借位,**避免资源不足。**借不到就进行合并。

删除流程:B树可以删除的状态时直接删除;B树不可以删除状态时,先合并或借位,转换为B树可以删除的状态,再删除。

(1)关键字在叶子结点中,直接删除。

删除Q
删除R
T
S
Z
V
W
X
Y
U
T
R
S
Z
V
W
X
Y
U
T
Q
R
S
Z
V
W
X
Y
U

(2)当前结点为内结点,左孩子至少包含T个关键字。
先从左子树借位,再删除。

删除R
U
Q
P
M
N
O
T
S
Z
V
W
X
Y
U
R
Q
M
N
O
P
T
S
Z
V
W
X
Y

(3)当前结点为内结点,右孩子至少包含T个关键字。
先从右子树借位,再删除。

删除U
V
R
P
M
N
O
T
S
Z
W
X
Y
U
R
Q
M
N
O
P
T
S
Z
V
W
X
Y

(4)左右孩子结点都是T-1个关键字。
先合并,再删除。

合并
删除O
U
I
L
R
H
E
F
G
K
J
N
M
Q
P
T
S
Z
V
W
X
Y
U
I
L
O
R
H
E
F
G
K
J
N
M
Q
P
T
S
Z
V
W
X
Y
L
I
U
R
H
E
F
G
K
J
N
M
Q
P
T
S
Z
V
W
X
Y
O

(5)相邻左右结点包含至少有T个结点,且当前孩子的结点是T-1个结点。
直接删除S会改变B树的特性(关键字数量满足ceil(M/2)-1 <=n<=M-1), 先借位合并,再删除。

借位合并
删除S
U
T
Z
W
X
Y
V
U
S
T
Z
W
X
Y
V
T
S
Z
V
W
X
Y
U
void btree_delete_key(btree *T, btree_node *node, KEY_VALUE key) {

	if (node == NULL) return ;

	int idx = 0, i;

	while (idx < node->num && key > node->keys[idx]) {
		idx ++;
	}

	if (idx < node->num && key == node->keys[idx]) {

		if (node->leaf) {
			
			for (i = idx;i < node->num-1;i ++) {
				node->keys[i] = node->keys[i+1];
			}

			node->keys[node->num - 1] = 0;
			node->num--;
			
			if (node->num == 0) { //root
				free(node);
				T->root = NULL;
			}

			return ;
		} else if (node->childrens[idx]->num >= T->t) {

			btree_node *left = node->childrens[idx];
			node->keys[idx] = left->keys[left->num - 1];

			btree_delete_key(T, left, left->keys[left->num - 1]);
			
		} else if (node->childrens[idx+1]->num >= T->t) {

			btree_node *right = node->childrens[idx+1];
			node->keys[idx] = right->keys[0];

			btree_delete_key(T, right, right->keys[0]);
			
		} else {

			btree_merge(T, node, idx);
			btree_delete_key(T, node->childrens[idx], key);
			
		}
		
	} else {

		btree_node *child = node->childrens[idx];
		if (child == NULL) {
			printf("Cannot del key = %d\n", key);
			return ;
		}

		if (child->num == T->t - 1) {

			btree_node *left = NULL;
			btree_node *right = NULL;
			if (idx - 1 >= 0)
				left = node->childrens[idx-1];
			if (idx + 1 <= node->num) 
				right = node->childrens[idx+1];

			if ((left && left->num >= T->t) ||
				(right && right->num >= T->t)) {

				int richR = 0;
				if (right) richR = 1;
				if (left && right) richR = (right->num > left->num) ? 1 : 0;

				if (right && right->num >= T->t && richR) { //borrow from next
					child->keys[child->num] = node->keys[idx];
					child->childrens[child->num+1] = right->childrens[0];
					child->num ++;

					node->keys[idx] = right->keys[0];
					for (i = 0;i < right->num - 1;i ++) {
						right->keys[i] = right->keys[i+1];
						right->childrens[i] = right->childrens[i+1];
					}

					right->keys[right->num-1] = 0;
					right->childrens[right->num-1] = right->childrens[right->num];
					right->childrens[right->num] = NULL;
					right->num --;
					
				} else { //borrow from prev

					for (i = child->num;i > 0;i --) {
						child->keys[i] = child->keys[i-1];
						child->childrens[i+1] = child->childrens[i];
					}

					child->childrens[1] = child->childrens[0];
					child->childrens[0] = left->childrens[left->num];
					child->keys[0] = node->keys[idx-1];
					
					child->num ++;

					node->keys[idx-1] = left->keys[left->num-1];
					left->keys[left->num-1] = 0;
					left->childrens[left->num] = NULL;
					left->num --;
				}

			} else if ((!left || (left->num == T->t - 1))
				&& (!right || (right->num == T->t - 1))) {

				if (left && left->num == T->t - 1) {
					btree_merge(T, node, idx-1);					
					child = left;
				} else if (right && right->num == T->t - 1) {
					btree_merge(T, node, idx);
				}
			}
		}

		btree_delete_key(T, child, key);
	}
	
}


int btree_delete(btree *T, KEY_VALUE key) {
	if (!T->root) return -1;

	btree_delete_key(T, T->root, key);
	return 0;
}

B树的查找

int btree_bin_search(btree_node *node, int low, int high, KEY_VALUE key) {
	int mid;
	if (low > high || low < 0 || high < 0) {
		return -1;
	}

	while (low <= high) {
		mid = (low + high) / 2;
		if (key > node->keys[mid]) {
			low = mid + 1;
		} else {
			high = mid - 1;
		}
	}

	return low;
}

完整示例代码

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SUB_M		3		// M=6, and SUB_M=M/2

typedef int KEY_VALUE;

typedef struct _btree_node {
	//int keys[2 * SUB_M - 1];					// 存储关键字,M-1
	KEY_VALUE *keys;							// 存储关键字
	// void *value;								// 存储数据
	struct _btree_node **childrens;				// 子树,M
	int num;									// 已存储的key数量
	int leaf;									// 是否为叶子结点
}btree_node;

typedef struct _btree {
	btree_node *root;
	int t;				// M阶,t=M/2
}btree;


btree_node *btree_create_node(int t,int leaf)
{
	btree_node *node = (btree_node *)calloc(1, sizeof(btree_node));
	if (node == NULL)
		return NULL;

	
	node->keys = (KEY_VALUE *)calloc(1, (2 * t - 1)*sizeof(KEY_VALUE));
	if (node->keys == NULL)
	{
		free(node);
		return NULL;
	}
	
	node->childrens = (btree_node **)calloc(1, (2 * t)*sizeof(btree_node*));
	if (node->childrens == NULL)
	{
		free(node->keys);
		free(node);
		return NULL;
	}
	node->leaf = leaf;
	node->num = 0;

	return node;
}

void btree_destroy_node(btree_node *node)
{
	if (node == NULL)
		return;
	if (node->childrens != NULL)
		free(node->childrens);
	if (node->keys != NULL)
		free(node->keys);
	free(node);
}

/**********************分裂 split************************/
// 子节点分裂
void btree_split_child(btree *T, btree_node *x, int idx)
{

	int t = T->t;

	btree_node *y = x->childrens[idx];
	btree_node *z = btree_create_node(t,y->leaf);

	z->num = t - 1;
	int i = 0;
	for (i = 0; i < t - 1; i++)
		z->keys[i] = y->keys[t + i];

	if (y->leaf == 0)//inner,内节点
	{
		for (i = 0; i < t; i++)
			z->childrens[i] = y->childrens[t + i];
	}

	y->num = t-1;

	// 移动、插入结点
	for (i = x->num; i >= idx + 1; i--)
	{
		x->childrens[i + 1] = x->childrens[i];
	}
	x->childrens[idx + 1] = z;

	// key 交换
	for (i = x->num-1; i >= idx; i--)
	{
		x->keys[i + 1] = x->keys[i];
	}
	x->keys[idx] = y->keys[t-1];
	x->num += 1;
}

/*************************分裂 split end*****************************/

// 创建根结点
void btree_create(btree *T, int t) {
	T->t = t;

	btree_node *x = btree_create_node(t, 1);
	T->root = x;

}


void btree_insert_nonfull(btree *T, btree_node *x, KEY_VALUE k) {

	int i = x->num - 1;

	if (x->leaf == 1) {

		while (i >= 0 && x->keys[i] > k) {
			x->keys[i + 1] = x->keys[i];
			i--;
		}
		x->keys[i + 1] = k;
		x->num += 1;

	}
	else {
		while (i >= 0 && x->keys[i] > k) i--;

		if (x->childrens[i + 1]->num == (2 * (T->t)) - 1) {
			btree_split_child(T, x, i + 1);
			if (k > x->keys[i + 1]) i++;
		}

		btree_insert_nonfull(T, x->childrens[i + 1], k);
	}
}

void btree_insert(btree *T, KEY_VALUE key) {

	btree_node *r = T->root;

	if (r->num == 2 * T->t - 1) {

		btree_node *node = btree_create_node(T->t, 0);
		T->root = node;

		node->childrens[0] = r;

		btree_split_child(T, node, 0);

		int i = 0;
		if (node->keys[0] < key) i++;
		btree_insert_nonfull(T, node->childrens[i], key);

	}
	else {
		btree_insert_nonfull(T, r, key);
	}
}

/*************************合并 merge*****************************/
void btree_merge(btree *T, btree_node *x, int idx)
{
	btree_node *left = x->childrens[idx];
	btree_node *right = x->childrens[idx + 1];

	int i = 0;

	// 合并keys
	left->keys[T->t-1] = x->keys[idx];
	for (i = 0; i < T->t-1; i++)
	{
		left->keys[T->t + i] = right->keys[i];
	}

	// 如果不是子树,需要拷贝结点
	if (!left->leaf) {
		for (i = 0; i < T->t; i++) {
			left->childrens[T->t + i] = right->childrens[i];
		}
	}
	left->num += T->t;

	btree_destroy_node(right);

	// x 的key前移
	for (i = idx + 1; i < x->num; i++)
	{
		x->keys[i - 1] = x->keys[i];
		x->childrens[i] = x->childrens[i + 1];
	}

	x->childrens[i + 1] = NULL;
	x->num -= 1;

	if (x->num == 0) {
		T->root = left;
		btree_destroy_node(x);
	}
}

void btree_delete_key(btree *T, btree_node *node, KEY_VALUE key) {

	if (node == NULL) return;

	int idx = 0, i;

	while (idx < node->num && key > node->keys[idx]) {
		idx++;
	}

	if (idx < node->num && key == node->keys[idx]) {

		if (node->leaf) {

			for (i = idx; i < node->num - 1; i++) {
				node->keys[i] = node->keys[i + 1];
			}

			node->keys[node->num - 1] = 0;
			node->num--;

			if (node->num == 0) { //root
				free(node);
				T->root = NULL;
			}

			return;
		}
		else if (node->childrens[idx]->num >= T->t) {

			btree_node *left = node->childrens[idx];
			node->keys[idx] = left->keys[left->num - 1];

			btree_delete_key(T, left, left->keys[left->num - 1]);

		}
		else if (node->childrens[idx + 1]->num >= T->t) {

			btree_node *right = node->childrens[idx + 1];
			node->keys[idx] = right->keys[0];

			btree_delete_key(T, right, right->keys[0]);

		}
		else {

			btree_merge(T, node, idx);
			btree_delete_key(T, node->childrens[idx], key);

		}

	}
	else {

		btree_node *child = node->childrens[idx];
		if (child == NULL) {
			printf("Cannot del key = %d\n", key);
			return;
		}

		if (child->num == T->t - 1) {

			btree_node *left = NULL;
			btree_node *right = NULL;
			if (idx - 1 >= 0)
				left = node->childrens[idx - 1];
			if (idx + 1 <= node->num)
				right = node->childrens[idx + 1];

			if ((left && left->num >= T->t) ||
				(right && right->num >= T->t)) {

				int richR = 0;
				if (right) richR = 1;
				if (left && right) richR = (right->num > left->num) ? 1 : 0;

				if (right && right->num >= T->t && richR) { //borrow from next
					child->keys[child->num] = node->keys[idx];
					child->childrens[child->num + 1] = right->childrens[0];
					child->num++;

					node->keys[idx] = right->keys[0];
					for (i = 0; i < right->num - 1; i++) {
						right->keys[i] = right->keys[i + 1];
						right->childrens[i] = right->childrens[i + 1];
					}

					right->keys[right->num - 1] = 0;
					right->childrens[right->num - 1] = right->childrens[right->num];
					right->childrens[right->num] = NULL;
					right->num--;

				}
				else { //borrow from prev

					for (i = child->num; i > 0; i--) {
						child->keys[i] = child->keys[i - 1];
						child->childrens[i + 1] = child->childrens[i];
					}

					child->childrens[1] = child->childrens[0];
					child->childrens[0] = left->childrens[left->num];
					child->keys[0] = node->keys[idx - 1];

					child->num++;

					node->keys[idx - 1] = left->keys[left->num - 1];
					left->keys[left->num - 1] = 0;
					left->childrens[left->num] = NULL;
					left->num--;
				}

			}
			else if ((!left || (left->num == T->t - 1))
				&& (!right || (right->num == T->t - 1))) {

				if (left && left->num == T->t - 1) {
					btree_merge(T, node, idx - 1);
					child = left;
				}
				else if (right && right->num == T->t - 1) {
					btree_merge(T, node, idx);
				}
			}
		}

		btree_delete_key(T, child, key);
	}

}


int btree_delete(btree *T, KEY_VALUE key) {
	if (!T->root) return -1;

	btree_delete_key(T, T->root, key);
	return 0;
}


/******************测试************************/

void btree_print(btree *T, btree_node *node, int layer)
{
	btree_node* p = node;
	int i;
	if (p) {
		printf("\nlayer = %d keynum = %d is_leaf = %d\n", layer, p->num, p->leaf);
		for (i = 0; i < node->num; i++)
			printf("%c ", p->keys[i]);
		printf("\n");
		layer++;
		for (i = 0; i <= p->num; i++)
			if (p->childrens[i])
				btree_print(T, p->childrens[i], layer);
	}
	else printf("the tree is empty\n");
}


int main() {
	btree T = { 0 };

	btree_create(&T, SUB_M);
	srand(48);

	int i = 0;
	char key[30] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
	for (i = 0; i < 26; i++) {
		//key[i] = rand() % 1000;
		printf("%c ", key[i]);
		btree_insert(&T, key[i]);
	}

	btree_print(&T, T.root, 0);

	for (i = 0; i < 26; i++) {
		printf("\n---------------------------------\n");
		btree_delete(&T, key[25 - i]);
		//btree_traverse(T.root);
		btree_print(&T, T.root, 0);
	}
	return 0;
}

总结

B树是多叉树的一种,但B树不等于多叉树;B树的主要目的是降低层高。B树和B+树的区别在于B树的所有结点都是存储数据的;而B+树的内结点不存储数据,而是作为索引,数据存储在外结点;B+树更适合做磁盘索引,性能优于B树。
假设一颗M阶B树,它满足:
(1)每个结点至多拥有M颗子树;
(2)根结点至少拥有两颗子树;
(3)除了根结点以外,其余每个分支结点至少拥有M/2颗子树;
(4)所有叶子结点在同一层上;
(5)有K颗子树的分支结点则存在K-1个关键字,关键字按照递增顺序进行排序;
(6)关键字数量满足ceil(M/2)-1 <=n<=M-1。
C/C++数据结构之B树详解_第1张图片

你可能感兴趣的:(C/C++,b树,数据结构,c语言,linux,后端)