桶排序假设输入数组均匀分布,则其平均运行时间为θ(n).同计数排序一样,因为对输入做某种假设,桶排序比较快.不同的是,计数排序假设输入由小区间的整数构成;而桶排序则假设输入是随机产生且均匀分布在区间[0,1)内.
桶排序将区间[0,1)分成m个相同大小的子区间或称为桶,然后将n个元素分别放入各个子区间。因为输入在区间[0,1)均匀分布,所以不会出现所有数据出现在某个区间的情况。接着对每个区间(桶)中的数据进行排序,然后把各个区间(桶)中的数据依次取出,就得到有序的数据。
桶排序伪代码如下:
下图是对输入数组{0.78,0.17,0.39,0.26,0.72,0.94,0.21,0.12,0.23,0.68}处理后的示意图(有点类似于hash遇到相同hash值时的处理):
个人code参照的wiki上,不同于伪代码中引入存储链表的数组,而是全部采用的链表:
#include<iterator> #include<iostream> #include<vector> using namespace std; const int BUCKET_NUM = 10; struct ListNode { explicit ListNode(double i=0):mData(i), mNext(NULL){} ListNode* mNext; double mData; }; /* * three case: * 1. before insert val, this bucket is empty, so head is NULL. so just return the newNode's address * 2. before insert val, this bucket is not empty, and the all exist data is big than new insert val, like case 1, no need enter for loop, * so return dummy_Node.mNext is just pre.next = newNode's address. * 3. before insert val, this bucket is not empty, and not all exist data is big than new insert val, need enter for loop, so need return * pre-exist head addres. because enter into for loop, so return dummy_Node.mNext is just head no change, pre now is the front * position before insert. */ ListNode* insert(ListNode* head, double val) { ListNode dummy_Node; ListNode *newNode = new ListNode(val); ListNode *pre = NULL, *curr = NULL; dummy_Node.mNext = head; //need by case 3. pre = &dummy_Node; curr = head; //find the position to insert val's node while(NULL!=curr && curr->mData<=val) { pre = curr; curr = curr->mNext; } newNode->mNext = curr; pre->mNext = newNode; return dummy_Node.mNext; } /* * three case: * 1. before merge, head1 is NULL, no need other precess, just return head2's address * 2. before merge, head1 is not NULL, but head2 is NULL, like case 1, no need other precess, just return * head1's address. * 3. before merge, both head1 and head2 is valid. use a p_dummy_node to link each node by comparing * each node in two bucket. last return the head's address. */ ListNode* Merge(ListNode *head1, ListNode *head2) { ListNode dummyNode; ListNode *p_dummy = &dummyNode; while(NULL!=head1 && NULL!=head2) { if(head1->mData <= head2->mData) { p_dummy->mNext = head1; head1 = head1->mNext; } else { p_dummy->mNext = head2; head2 = head2->mNext; } p_dummy = p_dummy->mNext; } if(NULL != head1) p_dummy->mNext = head1; if(NULL != head2) p_dummy->mNext = head2; //if head1 is not NULL, will link head2 to the end of head1 return dummyNode.mNext; } /* bucket sort core process, divide into three steps. */ void BucketSort(int n, double arr[]) { int i = 0; vector<ListNode*> buckets(BUCKET_NUM, (ListNode*)(0)); //step 1: insert all data into each bucket for(i=0; i<n; ++i) { int index = arr[i]*10;//arr[i]/BUCKET_NUM; //here may change with input array ListNode *head = buckets.at(index); buckets.at(index) = insert(head, arr[i]); } //step 2: merge all sorted bucket ListNode *head = buckets.at(0); for(i=1; i<BUCKET_NUM; ++i) { head = Merge(head, buckets.at(i)); } //step 3: get sorted data from bucket in turn. for(i=0; i<n; ++i) { arr[i] = head->mData; head = head->mNext; } } void print_array(double array[], int length) { int i = 0; for(i=0; i<length; i++) { cout << array[i] << " "; } cout << endl << endl; } int main(void) { double array_src[] = {0.78,0.17,0.39,0.26,0.72,0.94,0.21,0.12,0.23,0.68};//{0.79, 0.13, 0.16, 0.64, 0.39, 0.20, 0.89, 0.53, 0.73, 0.42}; int array_length = sizeof(array_src)/sizeof(array_src[0]); BucketSort(array_length, array_src); print_array(array_src, array_length); return 0; }
对运行时间稍作分析,根据伪代码可知除了第8行n次循环插入排序,其他处理时间均为θ(n),易知插入排序运行时间为θ(n2)(这里n为每个桶中数据个数)。容易得出下面性质:
就算桶排序输入数据不是均匀分布,只要满足桶中元素个数的平方和同元素总个数n呈线性关系,则桶排序仍然能以线性时间运行。
至于如何满足n拆分出m个数相加,m个数的平方和同n是线性关系,这算数学证明题了,有兴趣的同学也可以研究下。
reference:
算法导论英文版第3版
http://zh.wikipedia.org/wiki/%E6%A1%B6%E6%8E%92%E5%BA%8F