算法与设计分析作业3(贪心)

  • 算法与设计分析作业3贪心
      • Greedy Algorithm
        • Pseudo-code
        • Prove the correctness
        • The complexity of your algorithm
      • Greedy Algorithm
        • Pseudo-code
        • Prove the correctness
        • The complexity of your algorithm
      • Programming
        • C Code
        • compare the results

算法与设计分析作业3(贪心)


2 Greedy Algorithm

​ There are n distinct jobs, labeled J1,J2,···,Jn, which can be performed completely independently of one another. Each jop consists of two stages: first it needs to be preprocessed on the supercomputer, and then it needs to be finished on one of the PCs. Let’s say that job Ji needs pi seconds of time on the supercomputer, followed by fi seconds of time on a PC. Since there are at least n PCs available on the premises, the finishing of the jobs can be performed on PCs at the same time. However, the supercomputer can only work on a single job a time without any interruption. For every job, as soon as the preprocessing is done on the supercomputer, it can be handed off to a PC for finishing.
​ Let’s say that a schedule is an ordering of the jobs for the supercomputer, and the completion time of the schedule is the earlist time at which all jobs have finished processing on the PCs. Give a polynomial-time algorithm that finds a schedule with as small a completion time as possible.

Pseudo-code

# 定义一个结构体(类)
# class Job
#   p  #实例变量p,表示任务在超级计算机上所需要花的时间
#   f  #实例变量f,表示任务在PC上所需要花的时间
# end
# @param {Job[]} Jobs
# @return {Job[]}
function find_minimum_time_schedule(Jobs)
    以f为优先级对jobs按降序排序
    return Jobs
end

Prove the correctness

​ 按照PC上所需要的时间降序排序后获得的顺序,即是能最早完成所有任务的顺序。所最少花的时间是

[ img+img ].max 1<=j<=n。对于任务j来说,它完成的时间是img+img ,所有任务完成的时间取最大值即是最早完成任务的时间。

​ 证明该问题有贪心选择的性质。由于我们以f为优先级对jobs按降序排序,所以job1的f1是最大的,job1也是第一个做的,假设最优解B中job1不是最先做的。而是在第k项任务开始做(k>1),那么对于B来说完成第k项任务所需要的时间是img+img(j=k) (因为fk是最大的,且前面超算用的p的和也是最大的),而这个值大于以PC上所需要的时间降序排序后获得的顺序任意一个任务完成的时间,这与B是最优解矛盾(假设只有两个任务J1和J2,f1>f2,第一种情况:J1在前J2在后(按照贪心选择性质),则J1完成的时间是p1+f1,J2完成的时间是p1+p2+f2。第二种情况,J1在后J2在前,J1完成的p1+p2+f1,这个值大于第一种情况的任何任务的完成时间)。所以该问题有贪心选择性质,最优解中包含让f最大的任务先做。

​ 该问题有最优子结构性质。当第一个任务被选择之后(f最大的任务),我们可以用相同的算法来选择接下来的n-1个任务。因此该问题可以用贪心算法完成。

The complexity of your algorithm

​ 时间复杂度主要是排序算法的时间为O(nlogn)。

3 Greedy Algorithm

​ Assume the coasting is an infinite straight line. Land is in one side of coasting, sea in the other. Each small island is a point locating in the sea side. And any radar installation, locating on the coasting, can only cover d distance, so an island in the sea can be covered by a radius installation, if the distance between them is at most d.

​ We use Cartesian coordinate system, defining the coasting is the x-axis. The sea side is above x-axis, and the land side below. Given the position of each island in the sea, and given the distance of the coverage of the radar installation, your task is to write a program to find the minimal number of radar installations to cover all the islands. Note that the position of an island is represented by its x-y coordinates.

Pseudo-code

# 定义一个结构体(类)
# class Pos  #岛的位置
#   x  #岛的横坐标
#   y  #岛的纵坐标
# end
# class Interval #想要探测到岛雷达在横轴安装的区间
#   s  #区间的开始
#   e  #区间的结束
# end
# @param {Pos[],Integer} 
# @return {Integer}
function find_minimal_installations(poss,d)
    len=poss.length
    return 0 if len==0
    定义一个Interval数组intervals长度为len
    for i in 0...len
      return -1 if poss[i].y>d
      intervals[i].s=poss[i].x-sqrt(d*d-poss[i].y*poss[i].y)
      intervals[i].e=poss[i].x+sqrt(d*d-poss[i].y*poss[i].y)  
    end
    intervals.sort #以intervals[i].e为优先级升序排序
    tmp=intervals[0].e
    count=1
    for i in 1...len
      if tmp1
      end
    end
    return count
end

Prove the correctness

​ 在x轴上的区间表示只要在该区间内安装雷达,就能探测到相应的岛,因此主要证明算出区间以后为什么可以用贪心算法求出最少的安装的雷达数量。我们以区间的末端点按升序将区间排序,策略是选择第一个末端点为安装第一个雷达的位置,并将该雷达所能辐射到的其他岛的区间都去除,然后找第二个末端点(已经去除的区间不包括,或者说第二个末端点是没有被第一个雷达辐射到的区间中的第一个区间),依次下去直至没有区间。

​ 首先证明贪心选择性质。证明该问题有一个最优解以贪心选择开始,即最优解中包含第一个末端点。设A是该问题的一个最优解,A的第一个雷达位置为k。若k等于第一个末端点,则A就是一个以贪心选择开始的最优解。若k不等于第一个末端点,那么首先k的位置必须在第一个区间中(否则无法覆盖第一个区间对应的岛),其次它必须在第一个区间与其他所有能相交的区间的交集内(否则其他岛需要其他雷达覆盖)。设B=A-{k}∪{第一个末端点},由于满足条件的k的位置包含第一个末端点,所以B和A的雷达个数相同,由于A是最优解,所以B也是最优解。也就是说B是以贪心选择选第一个末端点开始的最优选择。由此可见,总存在以贪心选择开始的最优解。

​ 其次证明最优子结构性质。选择了第一个末端点后,原问题就简化为将第一个雷达所能辐射到的所有岛都去除后,其他所有岛都能辐射到的最少雷达安装数目。若A是原问题的最优解,则A‘是A-{第一个末端点}是将第一个雷达所能辐射到的所有岛都去除后E’的最优解,假设能找到一个E’的最优解B‘,它包含的雷达数目更少。那么将第一末端点加入到B’中将产生E的一个解B,它包含比A更少的雷达数目,这与A的最优性矛盾。因此,每一步所做的贪心选择都将问题简化为一个更小的与原问题具有相同形式的子问题。对贪心选择次数用数学归纳即知,贪心算法将产生原问题的最优解。

The complexity of your algorithm

​ 排序算法复杂度为O(nlogn),其他的操作复杂度为O(n),所以该算法的时间复杂度为O(nlogn)

5 Programming

Write a program in your favorate language to compress a file using Huffman code and then decompress it. Code information may be contained in the compressed file if you can. Use your program to compress the two files (graph.txt and Aesop Fables.txt) and compare the results (Huffman code and compression ratio).

C++ Code

// HuffmanCode.cpp : 定义控制台应用程序的入口点。
//

#include "stdafx.h"

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
using namespace std;

typedef unsigned __int64 Dcode;
//Aesop_Fables.txt  graph.txt
char *readFileName = "graph.txt";  //读的文件名称
char *writeFileName = "graph_compress.dat";  //写的文件名称

typedef struct  HCBinary
{
    Dcode binaryCode;  //哈弗曼编码的二进制表示方法
    int length;  //编码长度
}HCBinary;

typedef struct Unit
{
    Dcode content;
    int remain;
}Unit;

Unit tempUnit;

typedef struct
{
    long long int weight;
    int parent, lchild, rchild;
    char ch;
}HTNode, *HuffmanTree;

map<char, long long int> frequencyMap;
map<char, HCBinary> codeMap;

void select(HuffmanTree &HT, int dest, int &s1, int &s2)//选择两个最小的元素,获得它们的位置
{
    long long int min1 = 9999999999;   //先让min1,min2为一个足够大的数
    long long int min2 = 9999999999;
    int tempS1 = 0;    //用来记录位置
    int tempS2 = 0;
    for (int i = 1; i <= dest; i++)
    {
        if (HT[i].parent != 0)   //说明已经被用过,则直接跳到下一个
        {
            continue;
        }
        else
        {
            if (HT[i].weight <= min2)
            {
                if (HT[i].weight <= min1)
                {
                    min2 = min1;
                    tempS2 = tempS1;
                    min1 = HT[i].weight;
                    tempS1 = i;
                }
                else
                {
                    min2 = HT[i].weight;
                    tempS2 = i;
                }
            }
        }
    }
    s1 = tempS1;
    s2 = tempS2;
}

int select2(HuffmanTree &HT, int dest)
{
    long long int min = 9999999999;
    int pos=0;
    for (int i = 1; i <= dest; i++)
    {
        if (HT[i].parent != 0)   //说明已经被用过,则直接跳到下一个
        {
            continue;
        }
        else if(HT[i].weight < min)
        {
            min = HT[i].weight;
            pos = i;
        }
    }
    return pos;
}

void HuffmanCoding(HuffmanTree &HT, char** &HC)  //哈弗曼编码,其实就是填表的过程
{
    int c, f, m, i;
    int start;
    int s1, s2;
    HuffmanTree p;
    char * cd;
    int n = frequencyMap.size();
    if (n <= 1)
    {
        exit(0);
    }
    m = 2 * n - 1;    //根据哈弗曼树的性质,可得总结点数m=2n-1
    HT = new HTNode[m + 1];   //0号元素不用
    map<char, long long int>::iterator iter = frequencyMap.begin();
    for (p = HT + 1, i = 1; i <= n; i++, ++p) //先初始化所有叶子节点
    {
        p->ch = iter->first;
        p->weight = iter->second;   //由于0号元素是不用的所以要加1
        p->lchild = 0;
        p->rchild = 0;
        p->parent = 0;
        iter++;
    }
    for (; i <= m; ++i, ++p)   //初始化其他节点
    {
        p->ch = NULL;
        p->weight = 0;
        p->lchild = 0;
        p->rchild = 0;
        p->parent = 0;
    }
    for (i = n + 1; i <= m; ++i)
    {
        select(HT, i - 1, s2, s1);  //选择两个最小的元素,获得它们的位置
        /*s1 = select2(HT, i - 1);
        s2 = select2(HT, i - 1);*/
        HT[s1].parent = i;
        HT[s2].parent = i;
        HT[i].lchild = s1;
        HT[i].rchild = s2;
        HT[i].weight = HT[s1].weight + HT[s2].weight;
    }
    HC = new char*[n + 1];
    cd = new char[n];
    cd[n - 1] = '\0';
    for (i = 1; i <= n; ++i)
    {
        start = n - 1;
        for (c = i, f = HT[i].parent; f != 0; c = f, f = HT[f].parent)
        {
            if (HT[f].lchild == c)
                cd[--start] = '0';
            else
                cd[--start] = '1';
        }
        HC[i] = new char[n - start];
        strcpy_s(HC[i], n - start, &cd[start]);
    }
    delete cd;
}

void getBinaryCode(char** HC)   //获取56种哈弗曼编码的二进制,存储在code中
{
    int i;
    int j;
    Dcode ZERO = 0;
    Dcode ONE = 1;
    map<char, long long int>::iterator iter = frequencyMap.begin();
    HCBinary tmpCode;
    for (i = 1; i <= frequencyMap.size(); i++)
    {
        tmpCode.binaryCode = 0;
        tmpCode.length = 0;
        for (j = 0; HC[i][j] != '\0'; j++)
        {
            if (HC[i][j] == '1')
            {
                tmpCode.binaryCode <<= 1;                      //必须要先移动再或
                tmpCode.binaryCode = ONE | tmpCode.binaryCode;

            }
            if (HC[i][j] == '0')
            {
                tmpCode.binaryCode <<= 1;
            }
            tmpCode.length++;
        }
        codeMap[iter->first] = tmpCode;
        iter++;
    }
}

void init()  //初始化
{
    tempUnit.content = 0;
    tempUnit.remain = 64;
}

void computeFrequent()  //计算字符出现的频率
{
    int i;
    fstream fp(readFileName, ios::in);
    if (!fp)
    {
        cout << "can't open the file." << endl;
        exit(0);
    }
    char ch;
    while (ch = fp.get(), !fp.eof())
    {
        frequencyMap[ch] += 1;
    }
    fp.close();
}

void writeFile()
{
    int i;
    fstream readFp(readFileName, ios::in);
    fstream writeFp(writeFileName, ios::out | ios::binary);
    if (!readFp || !writeFp)
    {
        cout << "can not open the file!" << endl;
        exit(0);
    }
    char ch;
    HCBinary tempCode;
    while (ch = readFp.get(), !readFp.eof())
    {
        tempCode = codeMap[ch];
        if (tempUnit.remain == tempCode.length)  //若单元中剩余位置大小等于code长度
        {
            tempUnit.content <<= tempCode.length;
            tempUnit.content |= tempCode.binaryCode;
            writeFp.write((char*)(&tempUnit.content), sizeof(tempUnit.content));  //将单元整个写入

            tempUnit.remain = 64;         //初始化该单元
            tempUnit.content = 0;

        }
        else if (tempUnit.remain>tempCode.length)  //若单元中剩余位置大小大于code长度,将code放入单元中
        {
            tempUnit.content <<= tempCode.length;
            tempUnit.content |= tempCode.binaryCode;
            tempUnit.remain = tempUnit.remain - tempCode.length;
        }
        else    //若单元中剩余位置大小小于code长度,将部分code放入单元,则要先将部分code写入单元,在将单元写入文件后,将其余code写入单元
        {
            Dcode temp = 0;
            tempUnit.content <<= tempUnit.remain;
            temp = tempCode.binaryCode >> (tempCode.length - tempUnit.remain);
            tempUnit.content |= temp;
            writeFp.write((char*)(&tempUnit.content), sizeof(tempUnit.content));

            tempUnit.content = 0;
            temp = 0xFFFFFFFFFFFFFFFF;  //不用数了,16个F
            temp <<= (tempCode.length - tempUnit.remain);
            temp = ~temp;//求反
            temp &= tempCode.binaryCode;
            tempUnit.content |= temp;
            tempUnit.remain = 64 - (tempCode.length - tempUnit.remain);
        }
    }
    if (tempUnit.remain != 64)   //说明,还有一个单元没有被写入,将剩余的写入
    {
        //FIXME:这样会导致多写入0,那么如果有哈弗曼编码位00..,就会出错
        tempUnit.content <<= tempUnit.remain;
        writeFp.write((char*)(&tempUnit.content), sizeof(tempUnit.content));
    }
    readFp.close();
    writeFp.close();
}

void Decode(HuffmanTree &HT)  //解码过程
{
    int len = frequencyMap.size();
    int f = 2 * len - 1;
    int c = 0;
    int i;
    fstream readFp(writeFileName, ios::in | ios::binary);
    if (!readFp)
    {
        cout << "can not open the file." << endl;
        exit(0);
    }
    Dcode one = 0;
    Dcode temp = 0;
    Dcode temp2 = 0;
    one = 0x8000000000000000;
    while (!readFp.eof())
    {
        readFp.read((char*)(&temp2), sizeof(Dcode));
        if (readFp.fail())
        {
            break;
        }
        for (i = 0; i<64; i++)     //根据得到的0,1从根节点往下,直到叶子节点输出相应字符
        {
            temp = temp2&one;
            if (temp == 0)
            {
                c = HT[f].lchild;
            }
            else
            {
                c = HT[f].rchild;
            }
            f = c;
            temp2 <<= 1;
            if (HT[c].lchild == 0 && HT[c].rchild == 0)
            {
                cout << HT[c].ch;
                f = 2 * len - 1;
                c = 0;
            }
        }
    }
    readFp.close();
}

int main()
{
    init();
    computeFrequent();
    HuffmanTree HT;
    char** HC;
    HuffmanCoding(HT, HC);
    cout << "字符" << setw(20) << "出现频率" << setw(25) << "哈弗曼编码" << endl;
    map<char, long long int>::iterator iter = frequencyMap.begin();
    for (int i = 1; i <= frequencyMap.size(); i++)
    {
        cout << iter->first << setw(20) << iter->second << setw(25) << HC[i] << endl;
        iter++;
    }
    getBinaryCode(HC);
    writeFile();
    cout << "压缩完成" << endl;
    cout << "开始解码:" << endl;
    Decode(HT);
    return 0;
}

compare the results

对于文件Aesop_Fables.txt,压缩前后对比图如下:

Aesop_Fables压缩图

压缩前186KB,压缩后103KB,压缩率为55.38%。

对于文件graph.txt,压缩前后对比图如下:

graph压缩图

压缩前2046KB,压缩后894KB,压缩率为43.70%。

因为第二个文件大多数是数字和空格,且出现频率都较高,码都比较短。压缩后一个字符所占的空间也会更小。

你可能感兴趣的:(算法)