《算法》链接：1.5 Case Study: Union-Find
学习工具：mac，java8，eclipse,coursera

环境搭建
在小伙伴的推荐下，这个学期开始上普林斯顿的算法课。
这门课有自己的Java library，刚开始的时候研究载入这个library花了好长时间，最终的解决方案是下载algs4.jar包，然后在eclipse软件中将其作为外部library，使用的时候import statement要写成类似这样的：

import edu.princeton.cs.algs4.StdRandom;

1 Dynamic connectivity

教授一开始就讲到了dynamic connectivity，其实这是Union Find算法的一种应用，这学期选修的另外一门network model和这个也有关。

Input: 一系列的整数对（p,q）代表p与q相互连接("is connected to")，比如（4,3）(3,8)等，每一个整数代表了一个object。这里的"is connected to"表示的是没有方向的对称连接,且一个p可以与自身p相连。
Goal:当读取的(p,q)整数对本来不相连时，输出这个整数对(p,q)且使其相连；如果相连的话，就不输出，忽略这个整数对

algs4.jar中已经有了UF这个class，可以直接使用。

public static void main(String[] args) {
        int n = StdIn.readInt();
        UF uf = new UF(n);
        while (!StdIn.isEmpty()) {
            int p = StdIn.readInt();
            int q = StdIn.readInt();
            if (uf.connected(p, q)) continue;
            uf.union(p, q);
            StdOut.println(p + " " + q);
        }
        StdOut.println(uf.count() + " components");
    }

2 Quick Find (eager approach)

数组形式进行存储array id[i]
如果两个objects相互链接，那么他们数组的值相同
All sites in a component must have the same value in id[].
例如，当union(p,q)时，p所存的值改为q所存的值。id[p] = id[q]

3 Quick Union (lazy approach)

同样是id[i]数组形式，不同的地方是id[i]中存储父节点
如果两个sites的根相同，那么他们在同一个component
根节点i == id[i]
例如，当union(p,q)时

    pID = id[p];
    qID = id[q];
    循环数组中所有值等于pID的，如果id[i]==pID，则id[i] = qID;
    也就是说p所在的树将作为q的子树

4 Improvement

4.1 weighted
增加sz[]数组来存储一颗树里面objects的个数
当要链接(p,q)时，需要比较sz[i]和sz[j]的大小（假设i,j分别是他们的root）

4.2 path compression
只需要增添一个语句

id[i] = id[id[i]]

两种运用

quick union with path compression
weighted quick-union with path compression

5 Applications

教授列举了相当多的应用，物理学上的，应用到其他算法或者语言的。所给的面试题也很常见。
最近刚接触seo相关的内容，老师谈到IR system需要将用户提供的单词和收集的items中相关的words比对，如果对上了(match)，那么这些items就可能是用户想要的。我觉得这个可能也是并查集相关应用。这学期继续学习深入理解了就能明白了。

6 作业: Percolation

Write a program to estimate the value of the percolation threshold via Monte Carlo simulation.
Percolation. Given a composite systems comprised of randomly distributed insulating and metallic materials: what fraction of the materials need to be metallic so that the composite system is an electrical conductor? Given a porous landscape with water on the surface (or oil below), under what conditions will the water be able to drain through to the bottom (or the oil to gush through to the surface)? Scientists have defined an abstract process known as percolation to model such situations.
The model. We model a percolation system using an n-by-n grid of sites. Each site is either open or blocked.

open
full

Percolation data type：

public class Percolation {
   public Percolation(int n)                // create n-by-n grid, with all sites blocked
   public    void open(int row, int col)    // open site (row, col) if it is not open already
   public boolean isOpen(int row, int col)  // is site (row, col) open?
   public boolean isFull(int row, int col)  // is site (row, col) full?
   public     int numberOfOpenSites()       // number of open sites
   public boolean percolates()              // does the system percolate?

   public static void main(String[] args)   // test client (optional)
}

PercolationStats data type：

public class PercolationStats {
   public PercolationStats(int n, int trials)    // perform trials independent experiments on an n-by-n grid
   public double mean()                          // sample mean of percolation threshold
   public double stddev()                        // sample standard deviation of percolation threshold
   public double confidenceLo()                  // low  endpoint of 95% confidence interval
   public double confidenceHi()                  // high endpoint of 95% confidence interval

   public static void main(String[] args)        // test client (described below)
}

要点

导入WeightedQuickUnionUF,在程序中主要使用connect(p,q),union(p,q)函数
题目默认(1,1)就是左上角的位置，(n,n)就是右下角的位置，所以我先创建了一个siten+1的二维数组, 并全部初始化为0，代表全部block
创建WeightedQuickUnionUF对象，由于UF算法中都是以一维数组存储，所以数组uf[1]-uf[NN]对应NN矩阵中的位置，并且根据老师的提示，创建了top virtual site uf[0]和bottom site uf[NN+1], 所以整个uf一维数组的长度是NN+2
最关键也是最容易出错的地方就在open一个点以后，检查上下左右邻近的点，并对open的邻近做union的过程，要考虑到边缘位置的特殊情况，比如中心点在第一行，那么该点上下左右最多只有三个临近点，而上点可以与top virtual site连通。

Debug

Percolation.java的问题

几次发现结果不对，问题都出在open函数里面
对eclipse还不熟，测试中In in = new In(args[0]) 语句要求从命令行键入文件名，回车运行。eclipse的操作方法是，在 run configurations-->arguments-->program arguments-->键入文件名，可以不打引号,也可以打引号。
如果不想写路径，只写filename.txt，文件要放在项目目录下(我的项目名是algs)，因为eclipse默认的根目录就是项目目录
等待解决的bug：一旦连通，再在第N行open的单元格会变成full状态，因为bottom virtual root 已经联通，那么与之相连的方块都认为是与top virtual root 相连的。目前想到的解决方案有
1. 去掉bottom root,但是这样会遍历n次
2. isFull方法加上if判断
3. 思考优化open 方法

Percolationstat.java 的问题

Percolationstat.java 在编写时，for循环里面，每一次判断一个网格是不是连通之前都要新创建一个n-by-n的网格，但是我把创建的语句写在了for循环外面，导致同一个网格会被实验T次
计算时除号/左右两边numberOfOpenSites和(nn)都是整数，所以得到的结果也是整数；需要强制转换成double类型，即numberOfOpenSites/(double)(nn)
每次读取一组整数对后，open site前先要判断这个点是否被open了，以免多计算在numberOfOpenSites里面

作业提交遇到的问题

所有的fields 和自己定义的methods都要写为private
最后提交的作业91分，看了一下report, 扣分主要就扣在bottom root与所有最后一行的sites连通的问题上。以后有时间再来修改code吧

Union-Find并查集算法学习笔记