[算法沉淀记录] 排序算法 —— 希尔排序

排序算法 —— 希尔排序

算法介绍

希尔排序(Shell Sort)是一种基于插入排序的算法,由Donald Shell于1959年提出。希尔排序的基本思想是将待排序的序列划分成若干个子序列,分别进行插入排序,待整个序列中的记录基本有序时,再对全体记录进行一次直接插入排序。

算法基本思想

基本概念

  1. 间隔序列:希尔排序中,间隔序列是一个递减的序列,用于控制子序列的划分。初始间隔较大,逐步减小,最终减至1,此时整个序列被视为一个子序列。
  2. 子序列:根据间隔序列,将原始序列划分成若干个子序列,每个子序列中的元素间隔为当前间隔序列中的数值。

算法步骤

  1. 选择一个间隔序列 ( G_1, G_2, …, G_t ),其中 ( G_t = 1 )。
  2. 根据当前间隔 ( G_i ),将序列分成若干子序列,对每个子序列进行插入排序。
  3. 减小间隔 ( G_{i+1} ),重复步骤2,直至间隔为1,此时整个序列被视为一个子序列,进行最后一次插入排序。

伪代码描述

function shellSort(arr):
    n = length(arr)
    gap = n / 2
    while gap > 0:
        for i = gap; i < n; i++:
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
        gap = gap / 2
    return arr

优点

  1. 效率提升:希尔排序比传统的插入排序在效率上有显著提升,特别是当数据量较大时。
  2. 减少移动次数:通过比较距离较远的元素,希尔排序减少了元素之间的比较和移动次数,从而提高了排序效率。
  3. 灵活性:希尔排序的间隔序列可以灵活选择,不同的间隔序列可能会带来不同的性能表现。
  4. 易于实现:希尔排序的算法实现相对简单,理解和实现起来比较容易。

缺点

  1. 时间复杂度:在最坏的情况下,希尔排序的时间复杂度仍然是 (O(n^2)),这使得它在处理大数据集时可能不如其他更高效的排序算法(如快速排序、归并排序等)。
  2. 不稳定排序算法:希尔排序不是稳定的排序算法,相同值的元素可能会因为间隔序列的选择而交换位置。
  3. 依赖间隔序列:希尔排序的性能很大程度上取决于间隔序列的选择,选择不当可能会导致性能不如插入排序。

应用场景

  1. 小规模数据:对于小规模的数据集,希尔排序可能比其他算法更快,因为其时间复杂度接近线性。
  2. 简单应用:在不需要高精度或稳定性,且数据规模不大的情况下,希尔排序是一个不错的选择。
  3. 教育与学习:由于其算法实现简单,希尔排序常被用于教学,帮助初学者理解排序算法的概念。

尽管希尔排序在理论上的时间复杂度不如一些现代排序算法,但在实际应用中,尤其是在数据量不是非常大时,希尔排序由于其低廉的实现成本和较好的性能,仍然是一个可行的选择。此外,对于一些特定数据结构和数据集,通过精心设计的间隔序列,希尔排序可以展现出比传统插入排序更好的性能。

时间复杂度

希尔排序的时间复杂度分析相对复杂,因为它依赖于间隔序列的选择。以下是几种不同情况下的时间复杂度分析

最坏情况

在最坏的情况下,希尔排序的时间复杂度为 (O(n^2))。这是因为在最坏情况下,每次插入排序操作都需要移动其他元素。由于希尔排序是通过比较间隔序列中的元素来进行的,因此存在一种情况,其中间隔序列被设置为非常小的值(例如1),这实际上将希尔排序转换为普通的插入排序。

平均情况

在平均情况下,希尔排序的时间复杂度通常被认为介于 (O(n^{1.3} \log n)) 到 (O(n^{2.25} \log n)) 之间。这是因为在平均情况下,插入排序的效率得到了提高,因为每次插入操作不需要移动所有的元素。

最佳情况

在最佳情况下,希尔排序的时间复杂度可以达到 (O(n \log^2 n))。这是当间隔序列被设计得非常好的情况下,例如使用Sedgewick间隔序列时。在这种情况下,每次插入操作需要移动的元素数量较少,因此整体效率较高。

空间复杂度

希尔排序的空间复杂度为 (O(1))。这是因为希尔排序是原地排序算法,除了输入数组本身之外,它只需要一个很小的常数空间来存储间隔序列和临时变量。因此,希尔排序不需要额外的内存空间来完成排序。

证明

由于希尔排序的时间复杂度分析依赖于间隔序列的选择,没有统一的数学证明来确定其时间复杂度。上述的时间复杂度是基于实验和观察得出的,而不是精确的数学证明。然而,对于特定的间隔序列,如Sedgewick间隔序列,已经有一些研究表明它在平均和最佳情况下的时间复杂度。
总的来说,希尔排序的时间复杂度分析是实验性的,而不是理论性的。在实际应用中,选择合适的间隔序列可以显著提高希尔排序的性能,使其在某些情况下比传统的插入排序更有效率。

代码实现

Python 实现

def shell_sort(arr):
    n = len(arr)
    gap = n // 2
    while gap > 0:
        for i in range(gap, n):
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
        gap //= 2

C++ 实现

void shellSort(int arr[], int n) {
    for (int gap = n/2; gap > 0; gap /= 2) {
        for (int i = gap; i < n; i += 1) {
            int temp = arr[i];
            int j;
            for (j = i; j >= gap && arr[j - gap] > temp; j -= gap) {
                arr[j] = arr[j - gap];
            }
            arr[j] = temp;
        }
    }
}

C++ 模板实现

template <typename T>
void shellSort(vector<T> &arr)
{
    // n is the size of the array
    int n = arr.size();
    // gap is the difference between the current position and the gap position
    for (int gap = n / 2; gap > 0; gap /= 2)
    {
        // i is the current position
        for (int i = gap; i < n; ++i)
        {
            // temp is the current element
            T temp = arr[i];
            // j is the gap position
            int j;
            // loop from i to gap and swap the elements if the gap position is greater than the current element
            for (j = i; j >= gap && arr[j - gap] > temp; j -= gap)
            {
                arr[j] = arr[j - gap];
            }
            // swap the current element with the gap position
            arr[j] = temp;
        }
    }
}

扩展阅读

优化时间复杂度的思路

  1. 选择合适的间隔序列:选择一个好的间隔序列是优化希尔排序的关键。Sedgewick间隔序列和Poonen间隔序列是经过精心设计的,可以在平均和最佳情况下提供较好的性能。
  2. 自定义间隔序列:根据具体的数据集特点,可以设计自定义的间隔序列,以适应特定的数据分布,从而提高排序效率。
  3. 减少比较和移动的次数:通过改进插入排序的实现,减少不必要的比较和元素的移动,可以提高希尔排序的效率。

历史上针对希尔排序时间复杂度的变种算法

  1. Sedgewick希尔排序:Robert Sedgewick提出了使用特定的间隔序列(Sedgewick间隔序列)来优化希尔排序。这种方法在平均和最佳情况下提供了较好的性能。
  2. Poonen希尔排序:Larry Poonen提出了使用一组固定的间隔序列来优化希尔排序,这些间隔序列不需要依赖于输入数据的规模。
  3. Knuth希尔排序:Donald Knuth提出了一种基于斐波那契数列的间隔序列,这种方法在某些情况下也表现良好。
  4. Hibbard希尔排序:虽然不是专门为时间复杂度优化设计的,但Hibbard间隔序列在某些情况下也可以提供较好的性能。

除了这些基于间隔序列优化的方法,还有一些其他的工作致力于改进希尔排序的性能,例如通过减少比较和交换操作来提高效率。然而,尽管这些方法可能对特定数据集或特定情况有所帮助,但它们并没有产生新的希尔排序变种,而是在原有算法基础上的一些改进。希尔排序的时间复杂度优化主要集中在间隔序列的选择和实现细节的优化上。通过选择合适的间隔序列和优化实现,可以在一定程度上提高希尔排序的性能。然而,需要注意的是,希尔排序的时间复杂度仍然在最坏情况下是 (O(n^2)),这使得它在处理大数据集时可能不如其他更高效的排序算法。

Hibbard希尔排序

伪代码
function hibbardShellSort(arr):
    n = length(arr)
    k = 1
    while (2^k - 1) < n:
        k += 1
    for gap = 2^(k-1) - 1; gap > 0; gap = (gap / 2) - 1:
        for i = gap; i < n; i++:
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
    return arr
Python代码
def hibbard_shell_sort(arr):
    n = len(arr)
    k = 0
    while (1 << k) - 1 < n:
        k += 1
    gaps = [1]
    for i in range(k):
        gaps.append((1 << (2 * i)) - 1)
    for gap in reversed(gaps):
        for i in range(gap, n):
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
C++模板代码
template <typename T>
void hibbardShellSort(vector<T> &arr)
{
    // Calculate the size of the array
    int n = arr.size();
    // Calculate the number of levels in the tree
    int k = 1;
    // Calculate the number of elements in each level of the tree
    while ((1 << k) - 1 < n)
    {
        k++;
    }
    // Sort each level of the tree
    for (int gap = (1 << (k - 1)) - 1; gap > 0; gap = (gap >> 1) - 1)
    {
        // Sort each element in the level
        for (int i = gap; i < n; ++i)
        {
            // Store the current element in a temporary variable
            T temp = arr[i];
            // Find the correct position for the element
            int j;
            for (j = i; j >= gap && arr[j - gap] > temp; j -= gap)
            {
                // Move the element to the correct position
                arr[j] = arr[j - gap];
            }
            // Put the element in its correct position
            arr[j] = temp;
        }
    }
}

完整的项目代码

Python

def shell_sort(arr):
    n = len(arr)
    gap = n // 2
    while gap > 0:
        for i in range(gap, n):
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
        gap //= 2

def hibbard_shell_sort(arr):
    n = len(arr)
    k = 0
    while (1 << k) - 1 < n:
        k += 1
    gaps = [1]
    for i in range(k):
        gaps.append((1 << (2 * i)) - 1)
    for gap in reversed(gaps):
        for i in range(gap, n):
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp

def knuth_shell_sort(arr):
    n = len(arr)
    k = 0
    fib = 1
    while fib < n:
        k += 1
        fib = (k % 2 == 0) and (3 * fib + 1) or (3 * fib - 1)
    gaps = [(fib - 1) for i in range(k, 0, -1)]
    for gap in reversed(gaps):
        for i in range(gap, n):
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp

def sedgewick_shell_sort(arr):
    n = len(arr)
    gap = 1
    while gap < n / 3:
        gap = 3 * gap + 1
    while gap > 0:
        for i in range(gap, n):
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
        gap //= 3

class Person:
    def __init__(self, name, age, score):
        self.name = name
        self.age = age
        self.score = score

    def __lt__(self, other):
        return self.score < other.score

    def __le__(self, other):
        return self.score <= other.score

    def __eq__(self, other):
        return self.score == other.score and self.age == other.age and self.name == other.name

    def __ne__(self, other):
        return not self.__eq__(other)

    def __gt__(self, other):
        return self.score > other.score

    def __ge__(self, other):
        return self.score >= other.score

    def get_name(self):
        return self.name

    def get_age(self):
        return self.age

    def get_score(self):
        return self.score

def test_shell_sort():
    data = [9, 8, 3, 7, 5, 6, 4, 1]
    shell_sort(data)
    print(data)

    d_data = [9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1]
    shell_sort(d_data)
    print(d_data)

    c_data = ['a', 'c', 'b', 'd', 'e']
    shell_sort(c_data)
    print(c_data)

    p_data = [Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)]
    shell_sort(p_data)
    for person in p_data:
        print(person.get_name(), person.get_age(), person.get_score())

def test_hibbard_shell_sort():
    data = [9, 8, 3, 7, 5, 6, 4, 1]
    hibbard_shell_sort(data)
    print(data)

    d_data = [9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1]
    hibbard_shell_sort(d_data)
    print(d_data)

    c_data = ['a', 'c', 'b', 'd', 'e']
    hibbard_shell_sort(c_data)
    print(c_data)

    p_data = [Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)]
    hibbard_shell_sort(p_data)
    for person in p_data:
        print(person.get_name(), person.get_age(), person.get_score())

def test_knuth_shell_sort():
    data = [9, 8, 3, 7, 5, 6, 4, 1]
    knuth_shell_sort(data)
    print(data)

    d_data = [9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1]
    knuth_shell_sort(d_data)
    print(d_data)

    c_data = ['a', 'c', 'b', 'd', 'e']
    knuth_shell_sort(c_data)
    print(c_data)

    p_data = [Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)]
    knuth_shell_sort(p_data)
    for person in p_data:
        print(person.get_name(), person.get_age(), person.get_score())

def test_sedgewick_shell_sort():
    data = [9, 8, 3, 7, 5, 6, 4, 1]
    sedgewick_shell_sort(data)
    print(data)

    d_data = [9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1]
    sedgewick_shell_sort(d_data)
    print(d_data)

    c_data = ['a', 'c', 'b', 'd', 'e']
    sedgewick_shell_sort(c_data)
    print(c_data)

    p_data = [Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)]
    sedgewick_shell_sort(p_data)
    for person in p_data:
        print(person.get_name(), person.get_age(), person.get_score())

if __name__ == "__main__":
    test_shell_sort()
    test_hibbard_shell_sort()
    test_knuth_shell_sort()
    test_sedgewick_shell_sort()

C++

#include 
#include 
#include 
#include 
#include 

using namespace std;

class Person
{
public:
    Person(string name, int age, int score)
    {
        this->name = name;
        this->age = age;
        this->socre = score;
    }

    // Override the operator> for other function to use.
    bool operator>(const Person &other) const
    {
        // Compare the socre of two Person objects.
        return this->socre > other.socre;
    }

    // Override the operator< for other function to use.
    bool operator<(const Person &other) const
    {
        // Compare the socre of two Person objects.
        return this->socre < other.socre;
    }

    // Override the operator== for other function to use.
    bool operator==(const Person &other) const
    {
        // Compare the socre, age and name of two Person objects.
        return this->socre == other.socre &&
               this->age == other.age &&
               this->name == other.name;
    }

    // Override the operator!= for other function to use.
    bool operator!=(const Person &other) const
    {
        // Compare the socre, age and name of two Person objects.
        return this->socre != other.socre ||
               this->age != other.age ||
               this->name != other.name;
    }

    // Override the operator<= for other fnction to use.
    bool operator<=(const Person &other) const
    {
        // Compare the socre, age and name of two Person objects.
        return this->socre <= other.socre &&
               this->age <= other.age &&
               this->name <= other.name;
    }

    // Override the operator>= for other function to use.
    bool operator>=(const Person &other) const
    {
        // Compare the socre, age and name of two Person objects.
        return this->socre >= other.socre &&
               this->age >= other.age &&
               this->name >= other.name;
    }

    // Now there are some get parameters function for this calss:
    const string &getName() const { return this->name; }
    int getAge() const { return this->age; }
    int getScore() const { return this->socre; }

private:
    string name;
    int age;
    int socre;
};

template <typename T>
void shellSort(vector<T> &arr)
{
    // n is the size of the array
    int n = arr.size();
    // gap is the difference between the current position and the gap position
    for (int gap = n / 2; gap > 0; gap /= 2)
    {
        // i is the current position
        for (int i = gap; i < n; ++i)
        {
            // temp is the current element
            T temp = arr[i];
            // j is the gap position
            int j;
            // loop from i to gap and swap the elements if the gap position is greater than the current element
            for (j = i; j >= gap && arr[j - gap] > temp; j -= gap)
            {
                arr[j] = arr[j - gap];
            }
            // swap the current element with the gap position
            arr[j] = temp;
        }
    }
}

void shellSortTestCase()
{
    vector<int> data = {9, 8, 3, 7, 5, 6, 4, 1};
    shellSort<int>(data);
    for (int i : data)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<double> dData = {9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1};
    shellSort<double>(dData);
    for (double i : dData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<char> cData = {'a', 'c', 'b', 'd', 'e'};
    shellSort<char>(cData);
    for (char i : cData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<Person> pData = {Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)};
    shellSort<Person>(pData);
    for (Person i : pData)
    {
        cout << i.getName() << " " << i.getAge() << " " << i.getScore() << endl;
    }
    cout << endl;
}

template <typename T>
void hibbardShellSort(vector<T> &arr)
{
    // Calculate the size of the array
    int n = arr.size();
    // Calculate the number of levels in the tree
    int k = 1;
    // Calculate the number of elements in each level of the tree
    while ((1 << k) - 1 < n)
    {
        k++;
    }
    // Sort each level of the tree
    for (int gap = (1 << (k - 1)) - 1; gap > 0; gap = (gap >> 1) - 1)
    {
        // Sort each element in the level
        for (int i = gap; i < n; ++i)
        {
            // Store the current element in a temporary variable
            T temp = arr[i];
            // Find the correct position for the element
            int j;
            for (j = i; j >= gap && arr[j - gap] > temp; j -= gap)
            {
                // Move the element to the correct position
                arr[j] = arr[j - gap];
            }
            // Put the element in its correct position
            arr[j] = temp;
        }
    }
}

void hibbardShellSortTestCase()
{
    vector<int> data = {9, 8, 3, 7, 5, 6, 4, 1};
    hibbardShellSort<int>(data);
    for (int i : data)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<double> dData = {9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1};
    hibbardShellSort<double>(dData);
    for (double i : dData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<char> cData = {'a', 'c', 'b', 'd', 'e'};
    hibbardShellSort<char>(cData);
    for (char i : cData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<Person> pData = {Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)};
    hibbardShellSort<Person>(pData);
    for (Person i : pData)
    {
        cout << i.getName() << " " << i.getAge() << " " << i.getScore() << endl;
    }
    cout << endl;
}

template <typename T>
void knuthShellSort(vector<T> &arr)
{
    // find the length of the array
    int n = arr.size();
    // initialize the gap
    int k = 0;
    // initialize the fibonacci number
    long long fib = 1;
    // calculate the fibonacci number
    while (fib < n)
    {
        k++;
        fib = (k % 2 == 0) ? (3 * fib + 1) : (3 * fib - 1);
    }

    // create a vector to store the gaps
    vector<int> gaps;
    // calculate the gaps
    for (int i = k; i >= 0; i--)
    {
        fib = (i % 2 == 0) ? (3 * fib + 1) : (3 * fib - 1);
        gaps.push_back(static_cast<int>(fib) - 1);
    }

    // sort the array using the gaps
    for (auto gap = gaps.rbegin(); gap != gaps.rend(); ++gap)
    {
        // sort the array within the gap
        for (int i = *gap; i < n; ++i)
        {
            T temp = arr[i];
            int j;
            // find the correct position
            for (j = i; j >= *gap && arr[j - *gap] > temp; j -= *gap)
            {
                arr[j] = arr[j - *gap];
            }
            arr[j] = temp;
        }
    }
}

void knuthShellSortTestCase()
{
    vector<int> data = {9, 8, 3, 7, 5, 6, 4, 1};
    knuthShellSort<int>(data);
    for (int i : data)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<double> dData = {9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1};
    knuthShellSort<double>(dData);
    for (double i : dData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<char> cData = {'a', 'c', 'b', 'd', 'e'};
    knuthShellSort<char>(cData);
    for (char i : cData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<Person> pData = {Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)};
    knuthShellSort<Person>(pData);
    for (Person i : pData)
    {
        cout << i.getName() << " " << i.getAge() << " " << i.getScore() << endl;
    }
    cout << endl;
}

template <typename T>
void sedgewickShellSort(vector<T> &arr)
{
    int n = arr.size();
    int i = 0;
    while ((9 * (1 << (2 * i)) - 9 * (1 << i) + 1) < n)
    {
        i++;
    }
    vector<int> gaps;
    for (int j = 0; j < i; j++)
    {
        gaps.push_back(9 * (1 << (2 * j)) - 9 * (1 << j) + 1);
    }
    for (auto gap = gaps.rbegin(); gap != gaps.rend(); ++gap)
    {
        for (int i = *gap; i < n; ++i)
        {
            T temp = arr[i];
            int j;
            for (j = i; j >= *gap && arr[j - *gap] > temp; j -= *gap)
            {
                arr[j] = arr[j - *gap];
            }
            arr[j] = temp;
        }
    }
}

void sedgewickTestCase()
{
    vector<int> data = {9, 8, 3, 7, 5, 6, 4, 1};
    sedgewick<int>(data);
    for (int i : data)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<double> dData = {9.9, 9.1, 3.3, 7.7, 5.5, 6.6, 4.4, 1.1};
    sedgewick<double>(dData);
    for (double i : dData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<char> cData = {'a', 'c', 'b', 'd', 'e'};
    sedgewick<char>(cData);
    for (char i : cData)
    {
        cout << i << " ";
    }
    cout << endl;

    vector<Person> pData = {Person("Alice", 20, 90), Person("Bob", 18, 85), Person("Charlie", 22, 95)};
    sedgewick<Person>(pData);
    for (Person i : pData)
    {
        cout << i.getName() << " " << i.getAge() << " " << i.getScore() << endl;
    }
    cout << endl;
}


int main()
{
    shellSortTestCase();
    hibbardShellSortTestCase();
    knuthShellSortTestCase();
    sedgewickTestCase();
    return 0;
}

你可能感兴趣的:(C++,数据结构,C语言,排序算法,算法,数据结构,c++,STL)