yueyedeai

FPGrowth 实现

在关联规则挖掘领域最经典的算法法是Apriori，其致命的缺点是需要多次扫描事务数据库。于是人们提出了各种裁剪（prune）数据集的方法以减少I/O开支，韩嘉炜老师的FP-Tree算法就是其中非常高效的一种。

支持度和置信度

严格地说Apriori和FP-Tree都是寻找频繁项集的算法，频繁项集就是所谓的“支持度”比较高的项集，下面解释一下支持度和置信度的概念。

设事务数据库为：

A　　E　　F　　G

A　　F　　G

A　　B　　E　　F　　G

E　　F　　G

则{A,F,G}的支持度数为3，支持度为3/4。

{F,G}的支持度数为4，支持度为4/4。

{A}的支持度数为3，支持度为3/4。

{F,G}=>{A}的置信度为：{A,F,G}的支持度数除以 {F,G}的支持度数，即3/4

{A}=>{F,G}的置信度为：{A,F,G}的支持度数除以 {A}的支持度数，即3/3

强关联规则挖掘是在满足一定支持度的情况下寻找置信度达到阈值的所有模式。

FP-Tree算法

我们举个例子来详细讲解FP-Tree算法的完整实现。

事务数据库如下，一行表示一条购物记录：

牛奶，鸡蛋，面包，薯片

鸡蛋，爆米花，薯片，啤酒

鸡蛋，面包，薯片

牛奶，鸡蛋，面包，爆米花，薯片，啤酒

牛奶，面包，啤酒

鸡蛋，面包，啤酒

牛奶，面包，薯片

牛奶，鸡蛋，面包，黄油，薯片

牛奶，鸡蛋，黄油，薯片

我们的目的是要找出哪些商品总是相伴出现的，比如人们买薯片的时候通常也会买鸡蛋，则[薯片，鸡蛋]就是一条频繁模式（frequent pattern）。

FP-Tree算法第一步：扫描事务数据库，每项商品按频数递减排序，并删除频数小于最小支持度MinSup的商品。（第一次扫描数据库）

薯片:7鸡蛋:7面包:7牛奶:6啤酒:4 （这里我们令MinSup=3）

以上结果就是频繁1项集，记为F1。

第二步：对于每一条购买记录，按照F1中的顺序重新排序。（第二次也是最后一次扫描数据库）

薯片,鸡蛋,面包,牛奶

薯片,鸡蛋,啤酒

薯片,鸡蛋,面包

薯片,鸡蛋,面包,牛奶,啤酒

面包,牛奶,啤酒

鸡蛋,面包,啤酒

薯片,面包,牛奶

薯片,鸡蛋,面包,牛奶

薯片,鸡蛋,牛奶

第三步：把第二步得到的各条记录插入到FP-Tree中。刚开始时后缀模式为空。

插入每一条（薯片,鸡蛋,面包,牛奶）之后

插入第二条记录（薯片,鸡蛋,啤酒）

插入第三条记录（面包,牛奶,啤酒）

估计你也知道怎么插了，最终生成的FP-Tree是：

上图中左边的那一叫做表头项，树中相同名称的节点要链接起来，链表的第一个元素就是表头项里的元素。

如果FP-Tree为空（只含一个虚的root节点），则FP-Growth函数返回。

此时输出表头项的每一项+postModel，支持度为表头项中对应项的计数。

第四步：从FP-Tree中找出频繁项。

遍历表头项中的每一项（我们拿“牛奶：6”为例），对于各项都执行以下（1）到（5）的操作：

（1）从FP-Tree中找到所有的“牛奶”节点，向上遍历它的祖先节点，得到4条路径：

薯片：7，鸡蛋：6，牛奶：1

薯片：7，鸡蛋：6，面包：4，牛奶：3

薯片：7，面包：1，牛奶：1

面包：1，牛奶：1

对于每一条路径上的节点，其count都设置为牛奶的count

薯片：1，鸡蛋：1，牛奶：1

薯片：3，鸡蛋：3，面包：3，牛奶：3

薯片：1，面包：1，牛奶：1

面包：1，牛奶：1

因为每一项末尾都是牛奶，可以把牛奶去掉，得到条件模式基（Conditional Pattern Base,CPB），此时的后缀模式是：（牛奶）。

薯片：1，鸡蛋：1

薯片：3，鸡蛋：3，面包：3

薯片：1，面包：1

面包：1

（2）我们把上面的结果当作原始的事务数据库，返回到第3步，递归迭代运行。

没讲清楚，你可以参考这篇博客，直接看核心代码吧：

[java] view plain copy print ?

public void FPGrowth(List<List<String>> transRecords,
List<String> postPattern,Context context) throws IOException, InterruptedException {
// 构建项头表，同时也是频繁1项集
ArrayList<TreeNode> HeaderTable = buildHeaderTable(transRecords);
// 构建FP-Tree
TreeNode treeRoot = buildFPTree(transRecords, HeaderTable);
// 如果FP-Tree为空则返回
if (treeRoot.getChildren()==null || treeRoot.getChildren().size() == 0)
return;
//输出项头表的每一项+postPattern
if(postPattern!=null){
for (TreeNode header : HeaderTable) {
String outStr=header.getName();
int count=header.getCount();
for (String ele : postPattern)
outStr+="\t" + ele;
context.write(new IntWritable(count), new Text(outStr));
}
}
// 找到项头表的每一项的条件模式基，进入递归迭代
for (TreeNode header : HeaderTable) {
// 后缀模式增加一项
List<String> newPostPattern = new LinkedList<String>();
newPostPattern.add(header.getName());
if (postPattern != null)
newPostPattern.addAll(postPattern);
// 寻找header的条件模式基CPB，放入newTransRecords中
List<List<String>> newTransRecords = new LinkedList<List<String>>();
TreeNode backnode = header.getNextHomonym();
while (backnode != null) {
int counter = backnode.getCount();
List<String> prenodes = new ArrayList<String>();
TreeNode parent = backnode;
// 遍历backnode的祖先节点，放到prenodes中
while ((parent = parent.getParent()).getName() != null) {
prenodes.add(parent.getName());
}
while (counter-- > 0) {
newTransRecords.add(prenodes);
}
backnode = backnode.getNextHomonym();
}
// 递归迭代
FPGrowth(newTransRecords, newPostPattern,context);
}
}

public void FPGrowth(List<List<String>> transRecords,
        List<String> postPattern,Context context) throws IOException, InterruptedException {
    // 构建项头表，同时也是频繁1项集
    ArrayList<TreeNode> HeaderTable = buildHeaderTable(transRecords);
    // 构建FP-Tree
    TreeNode treeRoot = buildFPTree(transRecords, HeaderTable);
    // 如果FP-Tree为空则返回
    if (treeRoot.getChildren()==null || treeRoot.getChildren().size() == 0)
        return;
    //输出项头表的每一项+postPattern
    if(postPattern!=null){
        for (TreeNode header : HeaderTable) {
            String outStr=header.getName();
            int count=header.getCount();
            for (String ele : postPattern)
                outStr+="\t" + ele;
            context.write(new IntWritable(count), new Text(outStr));
        }
    }
    // 找到项头表的每一项的条件模式基，进入递归迭代
    for (TreeNode header : HeaderTable) {
        // 后缀模式增加一项
        List<String> newPostPattern = new LinkedList<String>();
        newPostPattern.add(header.getName());
        if (postPattern != null)
            newPostPattern.addAll(postPattern);
        // 寻找header的条件模式基CPB，放入newTransRecords中
        List<List<String>> newTransRecords = new LinkedList<List<String>>();
        TreeNode backnode = header.getNextHomonym();
        while (backnode != null) {
            int counter = backnode.getCount();
            List<String> prenodes = new ArrayList<String>();
            TreeNode parent = backnode;
            // 遍历backnode的祖先节点，放到prenodes中
            while ((parent = parent.getParent()).getName() != null) {
                prenodes.add(parent.getName());
            }
            while (counter-- > 0) {
                newTransRecords.add(prenodes);
            }
            backnode = backnode.getNextHomonym();
        }
        // 递归迭代
        FPGrowth(newTransRecords, newPostPattern,context);
    }
}

对于FP-Tree已经是单枝的情况，就没有必要再递归调用FPGrowth了，直接输出整条路径上所有节点的各种组合+postModel就可了。例如当FP-Tree为：

我们直接输出：

3　　A+postModel

3　　B+postModel

3　　A+B+postModel

就可以了。

如何按照上面代码里的做法，是先输出：

3　　A+postModel

3　　B+postModel

然后把B插入到postModel的头部，重新建立一个FP-Tree，这时Tree中只含A，于是输出

3　　A+(B+postModel)

两种方法结果是一样的，但毕竟重新建立FP-Tree计算量大些。

Java实现

FP树节点定义

[java] view plain copy print ?

package fptree;
import java.util.ArrayList;
import java.util.List;
public class TreeNode implements Comparable<TreeNode> {
private String name; // 节点名称
private int count; // 计数
private TreeNode parent; // 父节点
private List<TreeNode> children; // 子节点
private TreeNode nextHomonym; // 下一个同名节点
public TreeNode() {
}
public TreeNode(String name) {
this.name = name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
}
public TreeNode getParent() {
return parent;
}
public void setParent(TreeNode parent) {
this.parent = parent;
}
public List<TreeNode> getChildren() {
return children;
}
public void addChild(TreeNode child) {
if (this.getChildren() == null) {
List<TreeNode> list = new ArrayList<TreeNode>();
list.add(child);
this.setChildren(list);
} else {
this.getChildren().add(child);
}
}
public TreeNode findChild(String name) {
List<TreeNode> children = this.getChildren();
if (children != null) {
for (TreeNode child : children) {
if (child.getName().equals(name)) {
return child;
}
}
}
return null;
}
public void setChildren(List<TreeNode> children) {
this.children = children;
}
public void printChildrenName() {
List<TreeNode> children = this.getChildren();
if (children != null) {
for (TreeNode child : children) {
System.out.print(child.getName() + " ");
}
} else {
System.out.print("null");
}
}
public TreeNode getNextHomonym() {
return nextHomonym;
}
public void setNextHomonym(TreeNode nextHomonym) {
this.nextHomonym = nextHomonym;
}
public void countIncrement(int n) {
this.count += n;
}
@Override
public int compareTo(TreeNode arg0) {
// TODO Auto-generated method stub
int count0 = arg0.getCount();
// 跟默认的比较大小相反，导致调用Arrays.sort()时是按降序排列
return count0 - this.count;
}
}

package fptree;
  
import java.util.ArrayList;
import java.util.List;
  
public class TreeNode implements Comparable<TreeNode> {
  
    private String name; // 节点名称
    private int count; // 计数
    private TreeNode parent; // 父节点
    private List<TreeNode> children; // 子节点
    private TreeNode nextHomonym; // 下一个同名节点
  
    public TreeNode() {
  
    }
  
    public TreeNode(String name) {
        this.name = name;
    }
  
    public String getName() {
        return name;
    }
  
    public void setName(String name) {
        this.name = name;
    }
  
    public int getCount() {
        return count;
    }
  
    public void setCount(int count) {
        this.count = count;
    }
  
    public TreeNode getParent() {
        return parent;
    }
  
    public void setParent(TreeNode parent) {
        this.parent = parent;
    }
  
    public List<TreeNode> getChildren() {
        return children;
    }
  
    public void addChild(TreeNode child) {
        if (this.getChildren() == null) {
            List<TreeNode> list = new ArrayList<TreeNode>();
            list.add(child);
            this.setChildren(list);
        } else {
            this.getChildren().add(child);
        }
    }
  
    public TreeNode findChild(String name) {
        List<TreeNode> children = this.getChildren();
        if (children != null) {
            for (TreeNode child : children) {
                if (child.getName().equals(name)) {
                    return child;
                }
            }
        }
        return null;
    }
  
    public void setChildren(List<TreeNode> children) {
        this.children = children;
    }
  
    public void printChildrenName() {
        List<TreeNode> children = this.getChildren();
        if (children != null) {
            for (TreeNode child : children) {
                System.out.print(child.getName() + " ");
            }
        } else {
            System.out.print("null");
        }
    }
  
    public TreeNode getNextHomonym() {
        return nextHomonym;
    }
  
    public void setNextHomonym(TreeNode nextHomonym) {
        this.nextHomonym = nextHomonym;
    }
  
    public void countIncrement(int n) {
        this.count += n;
    }
  
    @Override
    public int compareTo(TreeNode arg0) {
        // TODO Auto-generated method stub
        int count0 = arg0.getCount();
        // 跟默认的比较大小相反，导致调用Arrays.sort()时是按降序排列
        return count0 - this.count;
    }
}

挖掘频繁模式

[java] view plain copy print ?

package fptree;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
public class FPTree {
private int minSuport;
public int getMinSuport() {
return minSuport;
}
public void setMinSuport(int minSuport) {
this.minSuport = minSuport;
}
// 从若干个文件中读入Transaction Record
public List<List<String>> readTransRocords(String... filenames) {
List<List<String>> transaction = null;
if (filenames.length > 0) {
transaction = new LinkedList<List<String>>();
for (String filename : filenames) {
try {
FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
try {
String line;
List<String> record;
while ((line = br.readLine()) != null) {
if(line.trim().length()>0){
String str[] = line.split("，");
record = new LinkedList<String>();
for (String w : str)
record.add(w);
transaction.add(record);
}
}
} finally {
br.close();
}
} catch (IOException ex) {
System.out.println("Read transaction records failed."
+ ex.getMessage());
System.exit(1);
}
}
}
return transaction;
}
// FP-Growth算法
public void FPGrowth(List<List<String>> transRecords,
List<String> postPattern) {
// 构建项头表，同时也是频繁1项集
ArrayList<TreeNode> HeaderTable = buildHeaderTable(transRecords);
// 构建FP-Tree
TreeNode treeRoot = buildFPTree(transRecords, HeaderTable);
// 如果FP-Tree为空则返回
if (treeRoot.getChildren()==null || treeRoot.getChildren().size() == 0)
return;
//输出项头表的每一项+postPattern
if(postPattern!=null){
for (TreeNode header : HeaderTable) {
System.out.print(header.getCount() + "\t" + header.getName());
for (String ele : postPattern)
System.out.print("\t" + ele);
System.out.println();
}
}
// 找到项头表的每一项的条件模式基，进入递归迭代
for (TreeNode header : HeaderTable) {
// 后缀模式增加一项
List<String> newPostPattern = new LinkedList<String>();
newPostPattern.add(header.getName());
if (postPattern != null)
newPostPattern.addAll(postPattern);
// 寻找header的条件模式基CPB，放入newTransRecords中
List<List<String>> newTransRecords = new LinkedList<List<String>>();
TreeNode backnode = header.getNextHomonym();
while (backnode != null) {
int counter = backnode.getCount();
List<String> prenodes = new ArrayList<String>();
TreeNode parent = backnode;
// 遍历backnode的祖先节点，放到prenodes中
while ((parent = parent.getParent()).getName() != null) {
prenodes.add(parent.getName());
}
while (counter-- > 0) {
newTransRecords.add(prenodes);
}
backnode = backnode.getNextHomonym();
}
// 递归迭代
FPGrowth(newTransRecords, newPostPattern);
}
}
// 构建项头表，同时也是频繁1项集
public ArrayList<TreeNode> buildHeaderTable(List<List<String>> transRecords) {
ArrayList<TreeNode> F1 = null;
if (transRecords.size() > 0) {
F1 = new ArrayList<TreeNode>();
Map<String, TreeNode> map = new HashMap<String, TreeNode>();
// 计算事务数据库中各项的支持度
for (List<String> record : transRecords) {
for (String item : record) {
if (!map.keySet().contains(item)) {
TreeNode node = new TreeNode(item);
node.setCount(1);
map.put(item, node);
} else {
map.get(item).countIncrement(1);
}
}
}
// 把支持度大于（或等于）minSup的项加入到F1中
Set<String> names = map.keySet();
for (String name : names) {
TreeNode tnode = map.get(name);
if (tnode.getCount() >= minSuport) {
F1.add(tnode);
}
}
Collections.sort(F1);
return F1;
} else {
return null;
}
}
// 构建FP-Tree
public TreeNode buildFPTree(List<List<String>> transRecords,
ArrayList<TreeNode> F1) {
TreeNode root = new TreeNode(); // 创建树的根节点
for (List<String> transRecord : transRecords) {
LinkedList<String> record = sortByF1(transRecord, F1);
TreeNode subTreeRoot = root;
TreeNode tmpRoot = null;
if (root.getChildren() != null) {
while (!record.isEmpty()
&& (tmpRoot = subTreeRoot.findChild(record.peek())) != null) {
tmpRoot.countIncrement(1);
subTreeRoot = tmpRoot;
record.poll();
}
}
addNodes(subTreeRoot, record, F1);
}
return root;
}
// 把交易记录按项的频繁程序降序排列
public LinkedList<String> sortByF1(List<String> transRecord,
ArrayList<TreeNode> F1) {
Map<String, Integer> map = new HashMap<String, Integer>();
for (String item : transRecord) {
// 由于F1已经是按降序排列的，
for (int i = 0; i < F1.size(); i++) {
TreeNode tnode = F1.get(i);
if (tnode.getName().equals(item)) {
map.put(item, i);
}
}
}
ArrayList<Entry<String, Integer>> al = new ArrayList<Entry<String, Integer>>(
map.entrySet());
Collections.sort(al, new Comparator<Map.Entry<String, Integer>>() {
@Override
public int compare(Entry<String, Integer> arg0,
Entry<String, Integer> arg1) {
// 降序排列
return arg0.getValue() - arg1.getValue();
}
});
LinkedList<String> rest = new LinkedList<String>();
for (Entry<String, Integer> entry : al) {
rest.add(entry.getKey());
}
return rest;
}
// 把record作为ancestor的后代插入树中
public void addNodes(TreeNode ancestor, LinkedList<String> record,
ArrayList<TreeNode> F1) {
if (record.size() > 0) {
while (record.size() > 0) {
String item = record.poll();
TreeNode leafnode = new TreeNode(item);
leafnode.setCount(1);
leafnode.setParent(ancestor);
ancestor.addChild(leafnode);
for (TreeNode f1 : F1) {
if (f1.getName().equals(item)) {
while (f1.getNextHomonym() != null) {
f1 = f1.getNextHomonym();
}
f1.setNextHomonym(leafnode);
break;
}
}
addNodes(leafnode, record, F1);
}
}
}
public static void main(String[] args) {
FPTree fptree = new FPTree();
fptree.setMinSuport(3);
List<List<String>> transRecords = fptree
.readTransRocords("/home/orisun/test/market");
fptree.FPGrowth(transRecords, null);
}
}

package fptree;
 
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
 
public class FPTree {
 
    private int minSuport;
 
    public int getMinSuport() {
        return minSuport;
    }
 
    public void setMinSuport(int minSuport) {
        this.minSuport = minSuport;
    }
 
    // 从若干个文件中读入Transaction Record
    public List<List<String>> readTransRocords(String... filenames) {
        List<List<String>> transaction = null;
        if (filenames.length > 0) {
            transaction = new LinkedList<List<String>>();
            for (String filename : filenames) {
                try {
                    FileReader fr = new FileReader(filename);
                    BufferedReader br = new BufferedReader(fr);
                    try {
                        String line;
                        List<String> record;
                        while ((line = br.readLine()) != null) {
                            if(line.trim().length()>0){
                                String str[] = line.split("，");
                                record = new LinkedList<String>();
                                for (String w : str)
                                    record.add(w);
                                transaction.add(record);
                            }
                        }
                    } finally {
                        br.close();
                    }
                } catch (IOException ex) {
                    System.out.println("Read transaction records failed."
                            + ex.getMessage());
                    System.exit(1);
                }
            }
        }
        return transaction;
    }
 
    // FP-Growth算法
    public void FPGrowth(List<List<String>> transRecords,
            List<String> postPattern) {
        // 构建项头表，同时也是频繁1项集
        ArrayList<TreeNode> HeaderTable = buildHeaderTable(transRecords);
        // 构建FP-Tree
        TreeNode treeRoot = buildFPTree(transRecords, HeaderTable);
        // 如果FP-Tree为空则返回
        if (treeRoot.getChildren()==null || treeRoot.getChildren().size() == 0)
            return;
        //输出项头表的每一项+postPattern
        if(postPattern!=null){
            for (TreeNode header : HeaderTable) {
                System.out.print(header.getCount() + "\t" + header.getName());
                for (String ele : postPattern)
                    System.out.print("\t" + ele);
                System.out.println();
            }
        }
        // 找到项头表的每一项的条件模式基，进入递归迭代
        for (TreeNode header : HeaderTable) {
            // 后缀模式增加一项
            List<String> newPostPattern = new LinkedList<String>();
            newPostPattern.add(header.getName());
            if (postPattern != null)
                newPostPattern.addAll(postPattern);
            // 寻找header的条件模式基CPB，放入newTransRecords中
            List<List<String>> newTransRecords = new LinkedList<List<String>>();
            TreeNode backnode = header.getNextHomonym();
            while (backnode != null) {
                int counter = backnode.getCount();
                List<String> prenodes = new ArrayList<String>();
                TreeNode parent = backnode;
                // 遍历backnode的祖先节点，放到prenodes中
                while ((parent = parent.getParent()).getName() != null) {
                    prenodes.add(parent.getName());
                }
                while (counter-- > 0) {
                    newTransRecords.add(prenodes);
                }
                backnode = backnode.getNextHomonym();
            }
            // 递归迭代
            FPGrowth(newTransRecords, newPostPattern);
        }
    }
 
    // 构建项头表，同时也是频繁1项集
    public ArrayList<TreeNode> buildHeaderTable(List<List<String>> transRecords) {
        ArrayList<TreeNode> F1 = null;
        if (transRecords.size() > 0) {
            F1 = new ArrayList<TreeNode>();
            Map<String, TreeNode> map = new HashMap<String, TreeNode>();
            // 计算事务数据库中各项的支持度
            for (List<String> record : transRecords) {
                for (String item : record) {
                    if (!map.keySet().contains(item)) {
                        TreeNode node = new TreeNode(item);
                        node.setCount(1);
                        map.put(item, node);
                    } else {
                        map.get(item).countIncrement(1);
                    }
                }
            }
            // 把支持度大于（或等于）minSup的项加入到F1中
            Set<String> names = map.keySet();
            for (String name : names) {
                TreeNode tnode = map.get(name);
                if (tnode.getCount() >= minSuport) {
                    F1.add(tnode);
                }
            }
            Collections.sort(F1);
            return F1;
        } else {
            return null;
        }
    }
 
    // 构建FP-Tree
    public TreeNode buildFPTree(List<List<String>> transRecords,
            ArrayList<TreeNode> F1) {
        TreeNode root = new TreeNode(); // 创建树的根节点
        for (List<String> transRecord : transRecords) {
            LinkedList<String> record = sortByF1(transRecord, F1);
            TreeNode subTreeRoot = root;
            TreeNode tmpRoot = null;
            if (root.getChildren() != null) {
                while (!record.isEmpty()
                        && (tmpRoot = subTreeRoot.findChild(record.peek())) != null) {
                    tmpRoot.countIncrement(1);
                    subTreeRoot = tmpRoot;
                    record.poll();
                }
            }
            addNodes(subTreeRoot, record, F1);
        }
        return root;
    }
 
    // 把交易记录按项的频繁程序降序排列
    public LinkedList<String> sortByF1(List<String> transRecord,
            ArrayList<TreeNode> F1) {
        Map<String, Integer> map = new HashMap<String, Integer>();
        for (String item : transRecord) {
            // 由于F1已经是按降序排列的，
            for (int i = 0; i < F1.size(); i++) {
                TreeNode tnode = F1.get(i);
                if (tnode.getName().equals(item)) {
                    map.put(item, i);
                }
            }
        }
        ArrayList<Entry<String, Integer>> al = new ArrayList<Entry<String, Integer>>(
                map.entrySet());
        Collections.sort(al, new Comparator<Map.Entry<String, Integer>>() {
            @Override
            public int compare(Entry<String, Integer> arg0,
                    Entry<String, Integer> arg1) {
                // 降序排列
                return arg0.getValue() - arg1.getValue();
            }
        });
        LinkedList<String> rest = new LinkedList<String>();
        for (Entry<String, Integer> entry : al) {
            rest.add(entry.getKey());
        }
        return rest;
    }
 
    // 把record作为ancestor的后代插入树中
    public void addNodes(TreeNode ancestor, LinkedList<String> record,
            ArrayList<TreeNode> F1) {
        if (record.size() > 0) {
            while (record.size() > 0) {
                String item = record.poll();
                TreeNode leafnode = new TreeNode(item);
                leafnode.setCount(1);
                leafnode.setParent(ancestor);
                ancestor.addChild(leafnode);
 
                for (TreeNode f1 : F1) {
                    if (f1.getName().equals(item)) {
                        while (f1.getNextHomonym() != null) {
                            f1 = f1.getNextHomonym();
                        }
                        f1.setNextHomonym(leafnode);
                        break;
                    }
                }
 
                addNodes(leafnode, record, F1);
            }
        }
    }
 
    public static void main(String[] args) {
        FPTree fptree = new FPTree();
        fptree.setMinSuport(3);
        List<List<String>> transRecords = fptree
                .readTransRocords("/home/orisun/test/market");
        fptree.FPGrowth(transRecords, null);
    }
}

输入文件

牛奶，鸡蛋，面包，薯片
鸡蛋，爆米花，薯片，啤酒
鸡蛋，面包，薯片
牛奶，鸡蛋，面包，爆米花，薯片，啤酒
牛奶，面包，啤酒
鸡蛋，面包，啤酒
牛奶，面包，薯片
牛奶，鸡蛋，面包，黄油，薯片
牛奶，鸡蛋，黄油，薯片

输出

6    薯片    鸡蛋
5    薯片    面包
5    鸡蛋    面包
4    薯片    鸡蛋    面包
5    薯片    牛奶
5    面包    牛奶
4    鸡蛋    牛奶
4    薯片    面包    牛奶
4    薯片    鸡蛋    牛奶
3    面包    鸡蛋    牛奶
3    薯片    面包    鸡蛋    牛奶
3    鸡蛋    啤酒
3    面包    啤酒

用Hadoop来实现

在上面的代码我们把整个事务数据库放在一个List<List<String>>里面传给FPGrowth，在实际中这是不可取的，因为内存不可能容下整个事务数据库，我们可能需要从关系关系数据库中一条一条地读入来建立FP-Tree。但无论如何 FP-Tree是肯定需要放在内存中的，但内存如果容不下怎么办？另外FPGrowth仍然是非常耗时的，你想提高速度怎么办？解决办法：分而治之，并行计算。

我们把原始事务数据库分成N部分，在N个节点上并行地进行FPGrowth挖掘，最后把关联规则汇总到一起就可以了。关键问题是怎么“划分”才会不遗露任何一条关联规则呢？参见这篇博客。这里为了达到并行计算的目的，采用了一种“冗余”的划分方法，即各部分的并集大于原来的集合。这种方法最终求出来的关联规则也是有冗余的，比如在节点1上得到一条规则（6:啤酒，尿布），在节点2上得到一条规则（3:尿布，啤酒），显然节点2上的这条规则是冗余的，需要采用后续步骤把冗余的规则去掉。

代码：

Record.java

[java] view plain copy print ?

package fptree;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.Collections;
import java.util.LinkedList;
import org.apache.hadoop.io.WritableComparable;
public class Record implements WritableComparable<Record>{
LinkedList<String> list;
public Record(){
list=new LinkedList<String>();
}
public Record(String[] arr){
list=new LinkedList<String>();
for(int i=0;i<arr.length;i++)
list.add(arr[i]);
}
@Override
public String toString(){
String str=list.get(0);
for(int i=1;i<list.size();i++)
str+="\t"+list.get(i);
return str;
}
@Override
public void readFields(DataInput in) throws IOException {
list.clear();
String line=in.readUTF();
String []arr=line.split("\\s+");
for(int i=0;i<arr.length;i++)
list.add(arr[i]);
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(this.toString());
}
@Override
public int compareTo(Record obj) {
Collections.sort(list);
Collections.sort(obj.list);
return this.toString().compareTo(obj.toString());
}
}

package fptree;
 
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.Collections;
import java.util.LinkedList;
 
import org.apache.hadoop.io.WritableComparable;
 
public class Record implements WritableComparable<Record>{
     
    LinkedList<String> list;
     
    public Record(){
        list=new LinkedList<String>();
    }
     
    public Record(String[] arr){
        list=new LinkedList<String>();
        for(int i=0;i<arr.length;i++)
            list.add(arr[i]);
    }
     
    @Override
    public String toString(){
        String str=list.get(0);
        for(int i=1;i<list.size();i++)
            str+="\t"+list.get(i);
        return str;
    }
 
    @Override
    public void readFields(DataInput in) throws IOException {
        list.clear();
        String line=in.readUTF();
        String []arr=line.split("\\s+");
        for(int i=0;i<arr.length;i++)
            list.add(arr[i]);
    }
 
    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(this.toString());
    }
 
    @Override
    public int compareTo(Record obj) {
        Collections.sort(list);
        Collections.sort(obj.list);
        return this.toString().compareTo(obj.toString());
    }
 
}

DC_FPTree.java

[java] view plain copy print ?

package fptree;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class DC_FPTree extends Configured implements Tool {
private static final int GroupNum = 10;
private static final int minSuport=6;
public static class GroupMapper extends
Mapper<LongWritable, Text, IntWritable, Record> {
List<String> freq = new LinkedList<String>(); // 频繁1项集
List<List<String>> freq_group = new LinkedList<List<String>>(); // 分组后的频繁1项集
@Override
public void setup(Context context) throws IOException {
// 从文件读入频繁1项集
FileSystem fs = FileSystem.get(context.getConfiguration());
Path freqFile = new Path("/user/orisun/input/F1");
FSDataInputStream in = fs.open(freqFile);
InputStreamReader isr = new InputStreamReader(in);
BufferedReader br = new BufferedReader(isr);
try {
String line;
while ((line = br.readLine()) != null) {
String[] str = line.split("\\s+");
String word = str[0];
freq.add(word);
}
} finally {
br.close();
}
// 对频繁1项集进行分组
Collections.shuffle(freq); // 打乱顺序
int cap = freq.size() / GroupNum; // 每段分为一组
for (int i = 0; i < GroupNum; i++) {
List<String> list = new LinkedList<String>();
for (int j = 0; j < cap; j++) {
list.add(freq.get(i * cap + j));
}
freq_group.add(list);
}
int remainder = freq.size() % GroupNum;
int base = GroupNum * cap;
for (int i = 0; i < remainder; i++) {
freq_group.get(i).add(freq.get(base + i));
}
}
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] arr = value.toString().split("\\s+");
Record record = new Record(arr);
LinkedList<String> list = record.list;
BitSet bs=new BitSet(freq_group.size());
bs.clear();
while (record.list.size() > 0) {
String item = list.peekLast(); // 取出record的最后一项
int i=0;
for (; i < freq_group.size(); i++) {
if(bs.get(i))
continue;
if (freq_group.get(i).contains(item)) {
bs.set(i);
break;
}
}
if(i<freq_group.size()){ //找到了
context.write(new IntWritable(i), record);
}
record.list.pollLast();
}
}
}
public static class FPReducer extends Reducer<IntWritable,Record,IntWritable,Text>{
public void reduce(IntWritable key,Iterable<Record> values,Context context)throws IOException,InterruptedException{
List<List<String>> trans=new LinkedList<List<String>>();
while(values.iterator().hasNext()){
Record record=values.iterator().next();
LinkedList<String> list=new LinkedList<String>();
for(String ele:record.list)
list.add(ele);
trans.add(list);
}
FPGrowth(trans, null,context);
}
// FP-Growth算法
public void FPGrowth(List<List<String>> transRecords,
List<String> postPattern,Context context) throws IOException, InterruptedException {
// 构建项头表，同时也是频繁1项集
ArrayList<TreeNode> HeaderTable = buildHeaderTable(transRecords);
// 构建FP-Tree
TreeNode treeRoot = buildFPTree(transRecords, HeaderTable);
// 如果FP-Tree为空则返回
if (treeRoot.getChildren()==null || treeRoot.getChildren().size() == 0)
return;
//输出项头表的每一项+postPattern
if(postPattern!=null){
for (TreeNode header : HeaderTable) {
String outStr=header.getName();
int count=header.getCount();
for (String ele : postPattern)
outStr+="\t" + ele;
context.write(new IntWritable(count), new Text(outStr));
}
}
// 找到项头表的每一项的条件模式基，进入递归迭代
for (TreeNode header : HeaderTable) {
// 后缀模式增加一项
List<String> newPostPattern = new LinkedList<String>();
newPostPattern.add(header.getName());
if (postPattern != null)
newPostPattern.addAll(postPattern);
// 寻找header的条件模式基CPB，放入newTransRecords中
List<List<String>> newTransRecords = new LinkedList<List<String>>();
TreeNode backnode = header.getNextHomonym();
while (backnode != null) {
int counter = backnode.getCount();
List<String> prenodes = new ArrayList<String>();
TreeNode parent = backnode;
// 遍历backnode的祖先节点，放到prenodes中
while ((parent = parent.getParent()).getName() != null) {
prenodes.add(parent.getName());
}
while (counter-- > 0) {
newTransRecords.add(prenodes);
}
backnode = backnode.getNextHomonym();
}
// 递归迭代
FPGrowth(newTransRecords, newPostPattern,context);
}
}
// 构建项头表，同时也是频繁1项集
public ArrayList<TreeNode> buildHeaderTable(List<List<String>> transRecords) {
ArrayList<TreeNode> F1 = null;
if (transRecords.size() > 0) {
F1 = new ArrayList<TreeNode>();
Map<String, TreeNode> map = new HashMap<String, TreeNode>();
// 计算事务数据库中各项的支持度
for (List<String> record : transRecords) {
for (String item : record) {
if (!map.keySet().contains(item)) {
TreeNode node = new TreeNode(item);
node.setCount(1);
map.put(item, node);
} else {
map.get(item).countIncrement(1);
}
}
}
// 把支持度大于（或等于）minSup的项加入到F1中
Set<String> names = map.keySet();
for (String name : names) {
TreeNode tnode = map.get(name);
if (tnode.getCount() >= minSuport) {
F1.add(tnode);
}
}
Collections.sort(F1);
return F1;
} else {
return null;
}
}
// 构建FP-Tree
public TreeNode buildFPTree(List<List<String>> transRecords,
ArrayList<TreeNode> F1) {
TreeNode root = new TreeNode(); // 创建树的根节点
for (List<String> transRecord : transRecords) {
LinkedList<String> record = sortByF1(transRecord, F1);
TreeNode subTreeRoot = root;
TreeNode tmpRoot = null;
if (root.getChildren() != null) {
while (!record.isEmpty()
&& (tmpRoot = subTreeRoot.findChild(record.peek())) != null) {
tmpRoot.countIncrement(1);
subTreeRoot = tmpRoot;
record.poll();
}
}
addNodes(subTreeRoot, record, F1);
}
return root;
}
// 把交易记录按项的频繁程序降序排列
public LinkedList<String> sortByF1(List<String> transRecord,
ArrayList<TreeNode> F1) {
Map<String, Integer> map = new HashMap<String, Integer>();
for (String item : transRecord) {
// 由于F1已经是按降序排列的，
for (int i = 0; i < F1.size(); i++) {
TreeNode tnode = F1.get(i);
if (tnode.getName().equals(item)) {
map.put(item, i);
}
}
}
ArrayList<Entry<String, Integer>> al = new ArrayList<Entry<String, Integer>>(
map.entrySet());
Collections.sort(al, new Comparator<Map.Entry<String, Integer>>() {
@Override
public int compare(Entry<String, Integer> arg0,
Entry<String, Integer> arg1) {
// 降序排列
return arg0.getValue() - arg1.getValue();
}
});
LinkedList<String> rest = new LinkedList<String>();
for (Entry<String, Integer> entry : al) {
rest.add(entry.getKey());
}
return rest;
}
// 把record作为ancestor的后代插入树中
public void addNodes(TreeNode ancestor, LinkedList<String> record,
ArrayList<TreeNode> F1) {
if (record.size() > 0) {
while (record.size() > 0) {
String item = record.poll();
TreeNode leafnode = new TreeNode(item);
leafnode.setCount(1);
leafnode.setParent(ancestor);
ancestor.addChild(leafnode);
for (TreeNode f1 : F1) {
if (f1.getName().equals(item)) {
while (f1.getNextHomonym() != null) {
f1 = f1.getNextHomonym();
}
f1.setNextHomonym(leafnode);
break;
}
}
addNodes(leafnode, record, F1);
}
}
}
}
public static class InverseMapper extends
Mapper<LongWritable, Text, Record, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String []arr=value.toString().split("\\s+");
int count=Integer.parseInt(arr[0]);
Record record=new Record();
for(int i=1;i<arr.length;i++){
record.list.add(arr[i]);
}
context.write(record, new IntWritable(count));
}
}
public static class MaxReducer extends Reducer<Record,IntWritable,IntWritable,Record>{
public void reduce(Record key,Iterable<IntWritable> values,Context context)throws IOException,InterruptedException{
int max=-1;
for(IntWritable value:values){
int i=value.get();
if(i>max)
max=i;
}
context.write(new IntWritable(max), key);
}
}
@Override
public int run(String[] arg0) throws Exception {
Configuration conf=getConf();
conf.set("mapred.task.timeout", "6000000");
Job job=new Job(conf);
job.setJarByClass(DC_FPTree.class);
FileSystem fs=FileSystem.get(getConf());
FileInputFormat.setInputPaths(job, "/user/orisun/input/data");
Path outDir=new Path("/user/orisun/output");
fs.delete(outDir,true);
FileOutputFormat.setOutputPath(job, outDir);
job.setMapperClass(GroupMapper.class);
job.setReducerClass(FPReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Record.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
boolean success=job.waitForCompletion(true);
job=new Job(conf);
job.setJarByClass(DC_FPTree.class);
FileInputFormat.setInputPaths(job, "/user/orisun/output/part-r-*");
Path outDir2=new Path("/user/orisun/output2");
fs.delete(outDir2,true);
FileOutputFormat.setOutputPath(job, outDir2);
job.setMapperClass(InverseMapper.class);
job.setReducerClass(MaxReducer.class);
//job.setNumReduceTasks(0);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Record.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputKeyClass(Record.class);
success |= job.waitForCompletion(true);
return success?0:1;
}
public static void main(String[] args) throws Exception{
int res=ToolRunner.run(new Configuration(), new DC_FPTree(), args);
System.exit(res);
}
}

package fptree;
 
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
 
public class DC_FPTree extends Configured implements Tool {
 
    private static final int GroupNum = 10;
    private static final int minSuport=6;
 
    public static class GroupMapper extends
            Mapper<LongWritable, Text, IntWritable, Record> {
        List<String> freq = new LinkedList<String>(); // 频繁1项集
        List<List<String>> freq_group = new LinkedList<List<String>>(); // 分组后的频繁1项集
 
        @Override
        public void setup(Context context) throws IOException {
            // 从文件读入频繁1项集
            FileSystem fs = FileSystem.get(context.getConfiguration());
            Path freqFile = new Path("/user/orisun/input/F1");
            FSDataInputStream in = fs.open(freqFile);
            InputStreamReader isr = new InputStreamReader(in);
            BufferedReader br = new BufferedReader(isr);
            try {
                String line;
                while ((line = br.readLine()) != null) {
                    String[] str = line.split("\\s+");
                    String word = str[0];
                    freq.add(word);
                }
            } finally {
                br.close();
            }
            // 对频繁1项集进行分组
            Collections.shuffle(freq); // 打乱顺序
            int cap = freq.size() / GroupNum; // 每段分为一组
            for (int i = 0; i < GroupNum; i++) {
                List<String> list = new LinkedList<String>();
                for (int j = 0; j < cap; j++) {
                    list.add(freq.get(i * cap + j));
                }
                freq_group.add(list);
            }
            int remainder = freq.size() % GroupNum;
            int base = GroupNum * cap;
            for (int i = 0; i < remainder; i++) {
                freq_group.get(i).add(freq.get(base + i));
            }
        }
 
        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] arr = value.toString().split("\\s+");
            Record record = new Record(arr);
            LinkedList<String> list = record.list;
            BitSet bs=new BitSet(freq_group.size());
            bs.clear();
            while (record.list.size() > 0) {
                String item = list.peekLast(); // 取出record的最后一项
                int i=0;
                for (; i < freq_group.size(); i++) {
                    if(bs.get(i))
                        continue;
                    if (freq_group.get(i).contains(item)) {
                        bs.set(i);
                        break;
                    }
                }
                if(i<freq_group.size()){     //找到了
                    context.write(new IntWritable(i), record);  
                }
                record.list.pollLast();
            }
        }
    }
     
    public static class FPReducer extends Reducer<IntWritable,Record,IntWritable,Text>{
        public void reduce(IntWritable key,Iterable<Record> values,Context context)throws IOException,InterruptedException{
            List<List<String>> trans=new LinkedList<List<String>>();
            while(values.iterator().hasNext()){
                Record record=values.iterator().next();
                LinkedList<String> list=new LinkedList<String>();
                for(String ele:record.list)
                    list.add(ele);
                trans.add(list);
            }
            FPGrowth(trans, null,context);
        }
        // FP-Growth算法
    public void FPGrowth(List<List<String>> transRecords,
            List<String> postPattern,Context context) throws IOException, InterruptedException {
        // 构建项头表，同时也是频繁1项集
        ArrayList<TreeNode> HeaderTable = buildHeaderTable(transRecords);
        // 构建FP-Tree
        TreeNode treeRoot = buildFPTree(transRecords, HeaderTable);
        // 如果FP-Tree为空则返回
        if (treeRoot.getChildren()==null || treeRoot.getChildren().size() == 0)
            return;
        //输出项头表的每一项+postPattern
        if(postPattern!=null){
            for (TreeNode header : HeaderTable) {
                String outStr=header.getName();
                int count=header.getCount();
                for (String ele : postPattern)
                    outStr+="\t" + ele;
                context.write(new IntWritable(count), new Text(outStr));
            }
        }
        // 找到项头表的每一项的条件模式基，进入递归迭代
        for (TreeNode header : HeaderTable) {
            // 后缀模式增加一项
            List<String> newPostPattern = new LinkedList<String>();
            newPostPattern.add(header.getName());
            if (postPattern != null)
                newPostPattern.addAll(postPattern);
            // 寻找header的条件模式基CPB，放入newTransRecords中
            List<List<String>> newTransRecords = new LinkedList<List<String>>();
            TreeNode backnode = header.getNextHomonym();
            while (backnode != null) {
                int counter = backnode.getCount();
                List<String> prenodes = new ArrayList<String>();
                TreeNode parent = backnode;
                // 遍历backnode的祖先节点，放到prenodes中
                while ((parent = parent.getParent()).getName() != null) {
                    prenodes.add(parent.getName());
                }
                while (counter-- > 0) {
                    newTransRecords.add(prenodes);
                }
                backnode = backnode.getNextHomonym();
            }
            // 递归迭代
            FPGrowth(newTransRecords, newPostPattern,context);
        }
    }
 
        // 构建项头表，同时也是频繁1项集
        public ArrayList<TreeNode> buildHeaderTable(List<List<String>> transRecords) {
            ArrayList<TreeNode> F1 = null;
            if (transRecords.size() > 0) {
                F1 = new ArrayList<TreeNode>();
                Map<String, TreeNode> map = new HashMap<String, TreeNode>();
                // 计算事务数据库中各项的支持度
                for (List<String> record : transRecords) {
                    for (String item : record) {
                        if (!map.keySet().contains(item)) {
                            TreeNode node = new TreeNode(item);
                            node.setCount(1);
                            map.put(item, node);
                        } else {
                            map.get(item).countIncrement(1);
                        }
                    }
                }
                // 把支持度大于（或等于）minSup的项加入到F1中
                Set<String> names = map.keySet();
                for (String name : names) {
                    TreeNode tnode = map.get(name);
                    if (tnode.getCount() >= minSuport) {
                        F1.add(tnode);
                    }
                }
                Collections.sort(F1);
                return F1;
            } else {
                return null;
            }
        }
 
        // 构建FP-Tree
        public TreeNode buildFPTree(List<List<String>> transRecords,
                ArrayList<TreeNode> F1) {
            TreeNode root = new TreeNode(); // 创建树的根节点
            for (List<String> transRecord : transRecords) {
                LinkedList<String> record = sortByF1(transRecord, F1);
                TreeNode subTreeRoot = root;
                TreeNode tmpRoot = null;
                if (root.getChildren() != null) {
                    while (!record.isEmpty()
                            && (tmpRoot = subTreeRoot.findChild(record.peek())) != null) {
                        tmpRoot.countIncrement(1);
                        subTreeRoot = tmpRoot;
                        record.poll();
                    }
                }
                addNodes(subTreeRoot, record, F1);
            }
            return root;
        }
 
        // 把交易记录按项的频繁程序降序排列
        public LinkedList<String> sortByF1(List<String> transRecord,
                ArrayList<TreeNode> F1) {
            Map<String, Integer> map = new HashMap<String, Integer>();
            for (String item : transRecord) {
                // 由于F1已经是按降序排列的，
                for (int i = 0; i < F1.size(); i++) {
                    TreeNode tnode = F1.get(i);
                    if (tnode.getName().equals(item)) {
                        map.put(item, i);
                    }
                }
            }
            ArrayList<Entry<String, Integer>> al = new ArrayList<Entry<String, Integer>>(
                    map.entrySet());
            Collections.sort(al, new Comparator<Map.Entry<String, Integer>>() {
                @Override
                public int compare(Entry<String, Integer> arg0,
                        Entry<String, Integer> arg1) {
                    // 降序排列
                    return arg0.getValue() - arg1.getValue();
                }
            });
            LinkedList<String> rest = new LinkedList<String>();
            for (Entry<String, Integer> entry : al) {
                rest.add(entry.getKey());
            }
            return rest;
        }
 
        // 把record作为ancestor的后代插入树中
        public void addNodes(TreeNode ancestor, LinkedList<String> record,
                ArrayList<TreeNode> F1) {
            if (record.size() > 0) {
                while (record.size() > 0) {
                    String item = record.poll();
                    TreeNode leafnode = new TreeNode(item);
                    leafnode.setCount(1);
                    leafnode.setParent(ancestor);
                    ancestor.addChild(leafnode);
 
                    for (TreeNode f1 : F1) {
                        if (f1.getName().equals(item)) {
                            while (f1.getNextHomonym() != null) {
                                f1 = f1.getNextHomonym();
                            }
                            f1.setNextHomonym(leafnode);
                            break;
                        }
                    }
 
                    addNodes(leafnode, record, F1);
                }
            }
        }
    }
     
    public static class InverseMapper extends
            Mapper<LongWritable, Text, Record, IntWritable> {
        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String []arr=value.toString().split("\\s+");
            int count=Integer.parseInt(arr[0]);
            Record record=new Record();
            for(int i=1;i<arr.length;i++){
                record.list.add(arr[i]);
            }
            context.write(record, new IntWritable(count));
        }
    }
     
    public static class MaxReducer extends Reducer<Record,IntWritable,IntWritable,Record>{
        public void reduce(Record key,Iterable<IntWritable> values,Context context)throws IOException,InterruptedException{
            int max=-1;
            for(IntWritable value:values){
                int i=value.get();
                if(i>max)
                    max=i;
            }
            context.write(new IntWritable(max), key);
        }
    }
 
 
    @Override
    public int run(String[] arg0) throws Exception {
        Configuration conf=getConf();
        conf.set("mapred.task.timeout", "6000000");
        Job job=new Job(conf);
        job.setJarByClass(DC_FPTree.class);
        FileSystem fs=FileSystem.get(getConf());
         
        FileInputFormat.setInputPaths(job, "/user/orisun/input/data");
        Path outDir=new Path("/user/orisun/output");
        fs.delete(outDir,true);
        FileOutputFormat.setOutputPath(job, outDir);
         
        job.setMapperClass(GroupMapper.class);
        job.setReducerClass(FPReducer.class);
         
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(Record.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
         
        boolean success=job.waitForCompletion(true);
         
        job=new Job(conf);
        job.setJarByClass(DC_FPTree.class);
         
        FileInputFormat.setInputPaths(job, "/user/orisun/output/part-r-*");
        Path outDir2=new Path("/user/orisun/output2");
        fs.delete(outDir2,true);
        FileOutputFormat.setOutputPath(job, outDir2);
         
        job.setMapperClass(InverseMapper.class);
        job.setReducerClass(MaxReducer.class);
        //job.setNumReduceTasks(0);
         
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setMapOutputKeyClass(Record.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputKeyClass(Record.class);
         
        success |= job.waitForCompletion(true);
         
        return success?0:1;
    }
 
    public static void main(String[] args) throws Exception{
        int res=ToolRunner.run(new Configuration(), new DC_FPTree(), args);
        System.exit(res);
    }
}

结束语

在实践中，关联规则挖掘可能并不像人们期望的那么有用。一方面是因为支持度置信度框架会产生过多的规则，并不是每一个规则都是有用的。另一方面大部分的关联规则并不像“啤酒与尿布”这种经典故事这么普遍。关联规则分析是需要技巧的，有时需要用更严格的统计学知识来控制规则的增殖。　

原文来自:博客园（华夏35度）http://www.cnblogs.com/zhangchaoyang 作者:Orisun

你可能感兴趣的:(机器学习)

院士领衔、IEEE Fellow 坐镇，清华、上交大、复旦、同济等专家齐聚 2025 全球机器学习技术大会 CSDN资讯机器学习人工智能
随着Manus出圈，OpenManus、OWL迅速开源，OpenAI推出智能体开发工具，全球AI生态正经历新一轮智能体革命。大模型如何协同学习？大模型如何自我进化？新型强化学习技术如何赋能智能体？围绕这些关键问题，由CSDN&Boolan联合举办的「2025全球机器学习技术大会」将于4月18-19日在上海隆重举行。大会云集院士、10所高校科研工作者、近30家一线科技企业技术实战专家组成的超50位重
手写机器学习算法系列——K-Means聚类算法(一) 木有鱼丸223 手写机器学习算法系列机器学习算法聚类
代码仓库(数字空间项目，GN可上)不想看的话，我也将代码上传到本博客中。1.聚类算法简介在数据科学和机器学习领域，聚类(Clustering)算法是一种无监督学习方法，它将相似的对象分到同一个组，而不同的对象则被分到不同的组。这种算法的主要目标是根据数据的特征进行分组，以此找出数据的内在结构。聚类算法的一个核心特点就是它并不需要预先知道数据的类别，而是通过算法自动进行分组。在实际应用中，我们常见的
深入解析：大型机器学习模型的基本概念与特点 AI大模型-大飞机器学习人工智能 AI大模型 AI 神经网络大模型
大模型是指具有大规模参数和复杂计算结构的机器学习模型。本文从大模型的基本概念出发，对大模型领域容易混淆的相关概念进行区分，并就大模型的发展历程、特点和分类、泛化与微调进行了详细解读，供大家在了解大模型基本知识的过程中起到一定参考作用。本文目录如下：·大模型的定义·大模型相关概念区分·大模型的发展历程·大模型的特点·大模型的分类·大模型的泛化与微调1.大模型的定义大模型是指具有大规模参数和复杂计算结
深入浅出 K 近邻算法：原理、实践与应用烂蜻蜓机器学习近邻算法算法
引言在机器学习的众多算法中，K近邻算法（K-NearestNeighbors，简称KNN）以其简洁而强大的特性占据着重要地位。它既可以用于分类任务，也能在回归任务中发挥作用。无论是处理简单数据集，还是面对复杂的数据分布，KNN都展现出独特的魅力。本文将深入探讨KNN算法的原理、特点、优缺点、实现步骤以及在分类和回归任务中的具体应用。KNN算法的基本原理KNN算法属于监督学习范畴，其核心思想质朴而直
【漫话机器学习系列】137.随机搜索（Randomized Search） IT古董漫话机器学习系列专辑机器学习人工智能
随机搜索（RandomizedSearch）详解在机器学习和深度学习的模型训练过程中，超参数调优（HyperparameterTuning）是至关重要的一环。随机搜索（RandomizedSearch）是一种高效的超参数优化方法，它通过在候选超参数的数值分布（如正态分布、均匀分布等）中随机选择超参数组合，从而找到最优的超参数配置。1.超参数调优的必要性超参数是模型在训练之前需要人为设定的参数，例如
【大模型学习】第十九章什么是迁移学习好多渔鱼好多 AI大模型人工智能大模型 AI 机器学习迁移学习
目录1.迁移学习的起源背景1.1传统机器学习的问题1.2迁移学习的提出背景2.什么是迁移学习2.1迁移学习的定义2.2生活实例解释3.技术要点与原理3.1迁移学习方法分类3.1.1基于特征的迁移学习（Feature-basedTransfer）案例说明代码示例3.1.2基于模型的迁移（Model-basedTransfer）案例说明BERT用于情感分析的例子3.1.3基于实例的迁移（Instanc
Python实现机器学习项目教程：房价预测向着开发进攻 python python 机器学习开发语言
Python实现机器学习小项目教程：房价预测案例机器学习（MachineLearning）是数据科学中的一项重要技术，它通过从数据中学习规律，进行预测和决策。对于初学者来说，通过实际的项目来学习机器学习的原理和实现方法，是非常有效的。本篇教程将通过Python实现一个简单的机器学习小项目——房价预测。我们将使用scikit-learn库来构建并训练一个线性回归模型，预测房价。项目背景假设我们拥有一
AI Agent在企业预算管理与成本控制中的应用 SuperAGI2025 DeepSeek 人工智能大数据 ai
AIAgent在企业预算管理与成本控制中的应用关键词：AIAgent、企业预算管理、成本控制、机器学习、预测模型、优化算法摘要：本文深入探讨了AIAgent在企业预算管理与成本控制中的应用。通过详细的背景介绍、核心概念解析、算法原理讲解和实际案例剖析，本文展示了AIAgent如何通过智能预测和优化算法，为企业带来更高的效率和精确度，从而实现成本控制和预算优化的目标。背景介绍核心概念AIAgent:
常见的深度学习优化器青灯剑客算法 python 人工智能机器学习自然语言处理深度学习
一直用优化器解决问题，但是没有对它进行一个系统的总结。。不对，系统的总结进行过，只是时过境迁，早已忘却。一、照进我脑海的几个家伙一开始学习的当然是SGD，只是学着学着就忘记了。后来呢，接触到网上介绍的几种常用的优化器，看着原理挺给力，可是记了好几次都记不住。直到遇到《百面机器学习》，它从最基本的原理出发，给了我一点灵感。（1）几种常用的优化器，详情见这里链接34（2）二、以为自己遇见了大海老师说，
PyTorch 和 Python关系一只积极向上的小咸鱼 python pytorch 人工智能
1PyTorch和Python关系PyTorch和Python是两个不同但相互关联的工具，主要用于机器学习和深度学习领域。以下是它们之间的关系和各自的作用：Python编程语言:Python是一种高级编程语言，以其简洁易读的语法而闻名。广泛使用:Python在数据科学、人工智能、Web开发、自动化等多个领域有着广泛的应用。库和生态系统丰富:Python拥有丰富的第三方库和工具，如NumPy、pan
Python与人工智能：为何它们是天作之合？纪至训至 python 人工智能开发语言
引言在人工智能（AI）飞速发展的今天，Python已成为这一领域的“明星语言”。从机器学习到深度学习，从自然语言处理到计算机视觉，Python的身影无处不在。那么，Python究竟为何能成为AI开发的首选工具？本文将探讨Python与AI之间的深度关联，并解析其背后的原因。1.Python的简洁性与可读性AI开发的核心在于快速迭代和实验，而Python以其简洁的语法和直观的代码结构著称。开发者无需
Python深度学习033：Python、PyTorch、CUDA和显卡驱动之间的关系若北辰 Python深度学习 python 深度学习 pytorch
Python、PyTorch、CUDA和显卡驱动之间的关系相当紧密，它们共同构成了一个能够执行深度学习模型的高效计算环境。下面是它们之间关系的简要概述：PythonPython是一种编程语言，广泛用于科学计算、数据分析和机器学习。它是开发和运行PyTorch代码的基础环境。PyTorchPyTorch是一个开源的机器学习库，用于应用如自然语言处理和计算机视觉的深度学习模型。它提供了丰富的API，使
机器学习算法在司法预测中的应用【附保姆级代码】一键难忘机器学习算法人工智能
本文收录于专栏：精通AI实战千例专栏合集https://blog.csdn.net/weixin_52908342/category_11863492.html从基础到实践，深入学习。无论你是初学者还是经验丰富的老手，对于本专栏案例和项目实践都有参考学习意义。每一个案例都附带关键代码，详细讲解供大家学习，希望可以帮到大家。正在不断更新中~机器学习算法在司法预测中的应用司法预测作为法律领域的前沿研究
PyTorch深度学习框架60天进阶学习计划 - 第19天：时间序列预测凡人的AI工具箱深度学习 pytorch 学习人工智能 AI编程迁移学习 python
PyTorch深度学习框架60天进阶学习计划-第19天：时间序列预测目录时间序列预测概述滑动窗口数据构造方法归一化策略对比：MinMaxvsZ-ScoreLSTM基础原理Attention机制与LSTM结合LSTM-Attention模型实现TeacherForcing技术与应用Prophet基准模型对比多步预测的滚动验证方法综合实战：股票价格预测1.时间序列预测概述时间序列预测是机器学习中的一个
Python爬虫学习笔记_DAY_26_Python爬虫之requests库的安装与基本使用【Python爬虫】_requests库ip 苹果Android开发组程序员 python 爬虫学习
最后Python崛起并且风靡，因为优点多、应用领域广、被大牛们认可。学习Python门槛很低，但它的晋级路线很多，通过它你能进入机器学习、数据挖掘、大数据，CS等更加高级的领域。Python可以做网络应用，可以做科学计算，数据分析，可以做网络爬虫，可以做机器学习、自然语言处理、可以写游戏、可以做桌面应用…Python可以做的很多，你需要学好基础，再选择明确的方向。这里给大家分享一份全套的Pytho
大模型相关知识学习随记 m0_65156252 语言模型人工智能自然语言处理
2024/3/151，概念解释：通义千问，是阿里云推出的一个超大规模的语言模型，功能包括多轮对话、文案创作、逻辑推理、多模态理解、多语言支持。能够跟人类进行多轮的交互，也融入了多模态的知识理解，且有文案创作能力，能够续写小说，编写邮件等。2，多模态大模型：多模态大模型是一种基于深度学习的机器学习技术，其核心思想是将不同媒体数据（如文本、图像、音频和视频等）进行融合，通过学习不同模态之间的关联，实现
DeepSeek在供热行业中的应用杨航 AI 人工智能深度学习 python 机器学习算法
目录引言1.1DeepSeek技术概述1.2供暖行业业务挑战1.3DeepSeek在供暖行业的应用前景DeepSeek技术基础2.1深度学习与机器学习2.2自然语言处理（NLP）2.3图像识别与处理2.4数据挖掘与分析供暖行业应用场景3.1设备监控与维护3.1.1设备状态监控3.1.2故障预测与诊断3.1.3维护计划优化3.2能源管理与优化3.2.1能耗数据分析3.2.2热负荷预测3.2.3节能优
Anaconda与VS Code wei099
最近在学习机器学习和计算机视觉，使用GoogleColab来运行网上的示例代码。考虑到网页上写代码效率太低，没有代码补全功能，没有函数提示，不利于对代码的了解，于是还是决定折腾一下在自己的Windows本上安装工作环境。想要学习机器学习的技能，不可避免要具备熟练使用Python编程的能力。Anaconda是Python软件包管理器，可以大大减少使用者安装各种包的麻烦，提高工作效率。我先后安装了An
适合机器学习的Linux系统推荐及基本配置指南金外飞176 信息分享机器学习 linux 人工智能
适合机器学习的Linux系统推荐及基本配置指南在机器学习领域，选择一个合适的Linux发行版至关重要。它不仅影响开发效率，还可能影响模型训练的性能。经过广泛调研和用户反馈，Ubuntu脱颖而出，成为众多机器学习爱好者的首选。下面将详细介绍为何推荐Ubuntu以及其基本配置需求。一、推荐Ubuntu的理由1.用户友好的界面和强大的社区支持Ubuntu提供了直观的图形用户界面，对于初次接触Linux的
使用Python进行火焰检测与识别：从基础理论到高级实现的全面指南快撑死的鱼 python算法解析 python 开发语言
使用Python进行火焰检测与识别：从基础理论到高级实现的全面指南火灾是一种常见而危险的自然灾害，在工业、家庭和公共场所中，实时检测火焰并做出响应是保障安全的重要手段。随着计算机视觉技术的发展，使用图像处理和机器学习的方法进行火焰检测已经成为可能。Python作为一种功能强大且广泛使用的编程语言，提供了丰富的库和工具，能够有效地实现火焰检测和识别。在本文中，我们将深入探讨如何使用Python进行火
[每日一学]数据分析与可视化：anaconda与pythoncharm使用上的区别之处及优越点，使用哪款比较好用拼命绽放 python 开发语言
anaconda、.jupyter·jupyter的基本使用，开发环境与pythoncharm有什么区别？在数据分析和可视化使用中有什么区别？哪个在数据分析和可视化上更占优势？如果用pythoncharm如何去实现数据分析与可视化有影响吗？一、Anaconda是一个开源的Python发行版本，集成了多个常用的数据科学、机器学习、深度学习等相关工具，例如JupyterNotebook、Spyder、
差分革命：清华微软携手，用物理智慧重塑Transformer“慧眼” YINWA AI 人工智能科技 AI 人工智能科技 ai
当物理学遇上AI，一场精准捕捉的变革悄然上演想象一下，在信息的汪洋大海中，寻找一根至关重要的“针”，难度无异于“大海捞针”。然而，随着诺贝尔物理学奖的光芒照耀到“机器学习之父”GeoffreyHinton的肩头，另一场跨界融合也在悄然进行——微软与清华大学的科研团队携手，将物理学的智慧融入AI，推出DifferentialTransformer（DIFFTransformer），让Transfor
深度学习核心技术深度解析月落星还在深度学习深度学习人工智能
一、深度学习的本质与核心思想定义：通过多层非线性变换，自动学习数据层次化表征的机器学习方法核心突破：表征学习：自动发现数据的内在规律，无需人工设计特征端到端学习：直接从原始输入到最终输出，消除中间环节的信息损失分布式表示：通过神经元激活模式的组合，指数级提升表达能力数学本质：f(x)=WLσ(WL−1σ(...σ(W1x+b1)...)+bL−1)+bLf(x)=W_{L}σ(W_{L-1}σ(.
AI 界的包青天：GaussianNB 智断分类难题星际编程喵人工智能分类数据挖掘
前言在机器学习的江湖中，分类算法纷繁复杂，各具特色。有的深不可测，犹如隐世高人的内功心法，让人望而却步；有的则像街头小贩，简单直接却也能精准解决问题。江湖中高手云集，其中有一位侠客，宛如包青天，正气凛然，以公正无私和高效迅捷著称，擅长快速解决分类难题。此侠客正是GaussianNaïveBayes（高斯朴素贝叶斯，简称GaussianNB）。凭借朴素的假设与强大的数学支撑，GaussianNB在分
深度学习/机器学习入门基础数学知识整理（一）：线性代数基础，矩阵，范数等 chljerry_mouse 线性代数深度学习机器学习
前面大概有2年时间，利用业余时间断断续续写了一个机器学习方法系列，和深度学习方法系列，还有一个三十分钟理解系列（一些趣味知识）；新的一年开始了，今年给自己定的学习目标——以补齐基础理论为重点，研究一些基础课题；同时逐步继续写上述三个系列的文章。最近越来越多的研究工作聚焦研究多层神经网络的原理，本质，我相信深度学习并不是无法掌控的“炼金术”，而是真真实实有理论保证的理论体系；本篇打算摘录整理一些最最
图像识别技术与应用超帅的好吧笔记
第一节课这节课了解了这门专业的就业职位：工资是怎么样的岗位职责和任职要求看到了人类工业文明的演变了解了人工智能的研究、开发、模拟、延伸、理论、方法和技术看到了生活方式的转变比如智能语音闹钟控制系统、自动驾驶和人脸识别考勤智能购物、医疗日常生活的智能比如指纹、淘宝、抖音还能用软件看到天气的好坏了解了典型训练和机器学习中的关键组件机器学习中的关键组件包含：数据模型目标函数优化算法这节课学习了第一节剩下
AI概率学预测足球大小球让球数据分析 sanx18 人工智能数据分析数据挖掘
在足球数据分析中，AI概率学预测主要涉及大小球和让球盘口的分析。以下是关键点：1.大小球分析大小球指机构设定的进球数预期，投注者预测实际进球数是否超过或低于该值。AI应用：历史数据：AI通过分析球队的历史进球、失球等数据，预测未来比赛进球数。机器学习：使用回归模型、神经网络等预测进球数，考虑球队实力、比赛风格、天气等因素。实时数据：结合实时比赛数据动态调整预测。2.让球分析让球是机构为平衡双方实力
【梯度下降算法】蝉叫醒了夏天机器学习算法
梯度下降算法：第一章梯度下降的历史沿革1.1优化方法的演进脉络从17世纪牛顿时代的数值解法，到20世纪最优控制理论的发展，直至现代机器学习对优化算法的特殊需求，梯度下降算法在数学优化史上占据重要地位。1947年FrankRosenblatt在感知机研究中首次系统应用梯度下降思想1.2机器学习时代的复兴21世纪深度学习革命使梯度下降算法获得新生：2006年Hinton团队在深度信念网络中的突破应用2
sparkML入门，通俗解释机器学习的框架和算法 Tometor spark-ml 机器学习算法回归数据挖掘人工智能 scala
一、机器学习的整体框架（类比烹饪）假设你要做一道菜，机器学习的过程可以类比为：步骤-->烹饪类比-->机器学习对应1.确定目标|想做什么菜（红烧肉/沙拉）|明确任务(分类/回归/聚类)2.准备食材|买菜、洗菜、切菜|数据收集与预处理3.设计食谱|决定烹饪步骤和调料|选择算法和模型设计4.试做并尝味道|调整火候和调味|模型训练与调参5.最终成品|端上桌的菜|模型部署与应用二、机器学习的核心流程1.数
神经网络机器学习中说的过拟合是什么意思 yuanpan 机器学习神经网络人工智能
在神经网络和机器学习中，过拟合（Overfitting）是指模型在训练数据上表现非常好，但在未见过的测试数据上表现较差的现象。换句话说，模型过度学习了训练数据中的细节和噪声，导致其泛化能力（Generalization）下降，无法很好地适应新数据。过拟合的表现训练误差很低，但测试误差很高：模型在训练集上的准确率非常高，但在测试集上的准确率却显著下降。模型过于复杂：模型学习了训练数据中的噪声或不相关
java的(PO,VO,TO,BO,DAO,POJO) Cb123456 VO TO BO POJO DAO
转: http://www.cnblogs.com/yxnchinahlj/archive/2012/02/24/2366110.html ------------------------------------------------------------------- O/R Mapping 是 Object Relational Mapping（对象关系映
spring ioc原理（看完后大家可以自己写一个spring） aijuans spring
最近，买了本Spring入门书：spring In Action 。大致浏览了下感觉还不错。就是入门了点。Manning的书还是不错的，我虽然不像哪些只看Manning书的人那样专注于Manning,但怀着崇敬的心情和激情通览了一遍。又一次接受了IOC 、DI、AOP等Spring核心概念。先就IOC和DI谈一点我的看法。IO
MyEclipse 2014中Customize Persperctive设置无效的解决方法 Kai_Ge MyEclipse2014
高高兴兴下载个MyEclipse2014，发现工具条上多了个手机开发的按钮，心生不爽就想弄掉他！结果发现Customize Persperctive失效！！有说更新下就好了，可是国内Myeclipse访问不了，何谈更新... so~这里提供了更新后的一下jar包，给大家使用！ 1、将9个jar复制到myeclipse安装目录\plugins中 2、删除和这9个jar同包名但是版本号较
SpringMvc上传 120153216 springMVC
@RequestMapping(value = WebUrlConstant.UPLOADFILE) @ResponseBody public Map<String, Object> uploadFile(HttpServletRequest request,HttpServletResponse httpresponse) { try { //
Javascript----HTML DOM 事件何必如此 JavaScript html Web
HTML DOM 事件允许Javascript在HTML文档元素中注册不同事件处理程序。事件通常与函数结合使用，函数不会在事件发生前被执行！注：DOM：指明使用的 DOM 属性级别。 1.鼠标事件属性
动态绑定和删除onclick事件 357029540 JavaScript jquery
因为对JQUERY和JS的动态绑定事件的不熟悉，今天花了好久的时间才把动态绑定和删除onclick事件搞定!现在分享下我的过程。在我的查询页面，我将我的onclick事件绑定到了tr标签上同时传入当前行(this值)参数，这样可以在点击行上的任意地方时可以选中checkbox，但是在我的某一列上也有一个onclick事件是用于下载附件的，当
HttpClient|HttpClient请求详解 7454103 apache 应用服务器网络协议网络应用 Security
HttpClient 是 Apache Jakarta Common 下的子项目，可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包，并且它支持 HTTP 协议最新的版本和建议。本文首先介绍 HTTPClient，然后根据作者实际工作经验给出了一些常见问题的解决方法。HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了，越来越多的 Java 应用程序需
递归逐层统计树形结构数据 darkranger 数据结构
将集合递归获取树形结构: /** * * 递归获取数据 * @param alist:所有分类 * @param subjname:对应统计的项目名称 * @param pk:对应项目主键 * @param reportList: 最后统计的结果集 * @param count:项目级别 */ public void getReportVO(Arr
访问WEB-INF下使用frameset标签页面出错的原因 aijuans struts2
<frameset rows="61,*,24" cols="*" framespacing="0" frameborder="no" border="0">
MAVEN常用命令 avords
Maven库： http://repo2.maven.org/maven2/ Maven依赖查询： http://mvnrepository.com/ Maven常用命令： 1. 创建Maven的普通java项目： mvn archetype:create -DgroupId=packageName
PHP如果自带一个小型的web服务器就好了 houxinyou apache 应用服务器 Web PHP 脚本
最近单位用PHP做网站，感觉PHP挺好的，不过有一些地方不太习惯，比如，环境搭建。PHP本身就是一个网站后台脚本，但用PHP做程序时还要下载apache，配置起来也不太很方便，虽然有好多配置好的apache+php+mysq的环境，但用起来总是心里不太舒服，因为我要的只是一个开发环境，如果是真实的运行环境，下个apahe也无所谓，但只是一个开发环境，总有一种杀鸡用牛刀的感觉。如果php自己的程序中
NoSQL数据库之Redis数据库管理(list类型) bijian1013 redis 数据库 NoSQL
3.list类型及操作 List是一个链表结构，主要功能是push、pop、获取一个范围的所有值等等，操作key理解为链表的名字。Redis的list类型其实就是一个每个子元素都是string类型的双向链表。我们可以通过push、pop操作从链表的头部或者尾部添加删除元素，这样list既可以作为栈，又可以作为队列。 &nbs
谁在用Hadoop？ bingyingao hadoop 数据挖掘公司应用场景
Hadoop技术的应用已经十分广泛了，而我是最近才开始对它有所了解，它在大数据领域的出色表现也让我产生了兴趣。浏览了他的官网，其中有一个页面专门介绍目前世界上有哪些公司在用Hadoop，这些公司涵盖各行各业，不乏一些大公司如alibaba,ebay,amazon,google,facebook,adobe等，主要用于日志分析、数据挖掘、机器学习、构建索引、业务报表等场景,这更加激发了学习它的热情。
【Spark七十六】Spark计算结果存到MySQL bit1129 mysql
package spark.examples.db import java.sql.{PreparedStatement, Connection, DriverManager} import com.mysql.jdbc.Driver import org.apache.spark.{SparkContext, SparkConf} object SparkMySQLInteg
Scala: JVM上的函数编程 bookjovi scala erlang haskell
说Scala是JVM上的函数编程一点也不为过，Scala把面向对象和函数型编程这两种主流编程范式结合了起来，对于熟悉各种编程范式的人而言Scala并没有带来太多革新的编程思想，scala主要的有点在于Java庞大的package优势，这样也就弥补了JVM平台上函数型编程的缺失，MS家.net上已经有了F#，JVM怎么能不跟上呢？对本人而言
jar打成exe bro_feng java jar exe
今天要把jar包打成exe，jsmooth和exe4j都用了。遇见几个问题。记录一下。两个软件都很好使，网上都有图片教程，都挺不错。首先肯定是要用自己的jre的，不然不能通用，其次别忘了把需要的lib放到classPath中。困扰我很久的一个问题是，我自己打包成功后，在一个同事的没有装jdk的电脑上运行，就是不行，报错jvm.dll为无效的windows映像，如截图最后发现
读《研磨设计模式》-代码笔记-策略模式-Strategy bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /* 策略模式定义了一系列的算法，并将每一个算法封装起来，而且使它们还可以相互替换。策略模式让算法独立于使用它的客户而独立变化简单理解： 1、将不同的策略提炼出一个共同接口。这是容易的，因为不同的策略，只是算法不同，需要传递的参数
cmd命令值cvfM命令 chenyu19891124 cmd
cmd命令还真是强大啊。今天发现jar -cvfM aa.rar @aaalist 就这行命令可以根据aaalist取出相应的文件例如：在d：\workspace\prpall\test.java 有这样一个文件，现在想要将这个文件打成一个包。运行如下命令即可比如在d：\wor
OpenJWeb(1.8) Java Web应用快速开发平台 comsci java 框架 Web 项目管理企业应用
OpenJWeb(1.8) Java Web应用快速开发平台的作者是我们技术联盟的成员，他最近推出了新版本的快速应用开发平台 OpenJWeb(1.8)，我帮他做做宣传 OpenJWeb快速开发平台以快速开发为核心，整合先进的java 开源框架，本着自主开发+应用集成相结合的原则，旨在为政府、企事业单位、软件公司等平台用户提供一个架构透
Python 报错：IndentationError: unexpected indent daizj python tab 空格缩进
IndentationError: unexpected indent 是缩进的问题，也有可能是tab和空格混用啦 Python开发者有意让违反了缩进规则的程序不能通过编译，以此来强制程序员养成良好的编程习惯。并且在Python语言里，缩进而非花括号或者某种关键字，被用于表示语句块的开始和退出。增加缩进表示语句块的开
HttpClient 超时设置 dongwei_6688 httpclient
HttpClient中的超时设置包含两个部分： 1. 建立连接超时，是指在httpclient客户端和服务器端建立连接过程中允许的最大等待时间 2. 读取数据超时，是指在建立连接后，等待读取服务器端的响应数据时允许的最大等待时间在HttpClient 4.x中如下设置： HttpClient httpclient = new DefaultHttpC
小鱼与波浪 dcj3sjt126com
一条小鱼游出水面看蓝天，偶然间遇到了波浪。　　小鱼便与波浪在海面上游戏，随着波浪上下起伏、汹涌前进。　　小鱼在波浪里兴奋得大叫：“你每天都过着这么刺激的生活吗？简直太棒了。”　　波浪说：“岂只每天过这样的生活，几乎每一刻都这么刺激！还有更刺激的，要有潮汐变化，或者狂风暴雨，那才是兴奋得心脏都会跳出来。”　　小鱼说：“真希望我也能变成一个波浪，每天随着风雨、潮汐流动，不知道有多么好！”　　很快，小鱼
Error Code: 1175 You are using safe update mode and you tried to update a table dcj3sjt126com mysql
快速高效用：SET SQL_SAFE_UPDATES = 0；下面的就不要看了！今日用MySQL Workbench进行数据库的管理更新时，执行一个更新的语句碰到以下错误提示： Error Code: 1175 You are using safe update mode and you tried to update a table without a WHERE that
枚举类型详细介绍及方法定义 gaomysion enum javaee
转发 http://developer.51cto.com/art/201107/275031.htm 枚举其实就是一种类型，跟int, char 这种差不多，就是定义变量时限制输入的，你只能够赋enum里面规定的值。建议大家可以看看，这两篇文章，《java枚举类型入门》和《C++的中的结构体和枚举》，供大家参考。枚举类型是JDK5.0的新特征。Sun引进了一个全新的关键字enum
Merge Sorted Array hcx2013 array
Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array. Note:You may assume that nums1 has enough space (size that is
Expression Language 3.0新特性 jinnianshilongnian el 3.0
Expression Language 3.0表达式语言规范最终版从2013-4-29发布到现在已经非常久的时间了；目前如Tomcat 8、Jetty 9、GlasshFish 4已经支持EL 3.0。新特性包括：如字符串拼接操作符、赋值、分号操作符、对象方法调用、Lambda表达式、静态字段/方法调用、构造器调用、Java8集合操作。目前Glassfish 4/Jetty实现最好，对大多数新特性
超越算法来看待个性化推荐 liyonghui160com 超越算法来看待个性化推荐
一提到个性化推荐，大家一般会想到协同过滤、文本相似等推荐算法，或是更高阶的模型推荐算法，百度的张栋说过，推荐40%取决于UI、30%取决于数据、20%取决于背景知识，虽然本人不是很认同这种比例，但推荐系统中，推荐算法起的作用起的作用是非常有限的。就像任何
写给Javascript初学者的小小建议 pda158 JavaScript
　　一般初学JavaScript的时候最头痛的就是浏览器兼容问题。在Firefox下面好好的代码放到IE就不能显示了，又或者是在IE能正常显示的代码在firefox又报错了。　　如果你正初学JavaScript并有着一样的处境的话建议你：初学JavaScript的时候无视DOM和BOM的兼容性，将更多的时间花在了解语言本身（ECMAScript）。只在特定浏览器编写代码（Chrome/Fi
Java 枚举 ShihLei java enum 枚举
注：文章内容大量借鉴使用网上的资料，可惜没有记录参考地址，只能再传对作者说声抱歉并表示感谢！一基础 1）语法枚举类型只能有私有构造器（这样做可以保证客户代码没有办法新建一个enum的实例）枚举实例必须最先定义 2）特性 &nb
Java SE 6 HotSpot虚拟机的垃圾回收机制 uuhorse java HotSpot GC 垃圾回收 VM
官方资料，关于Java SE 6 HotSpot虚拟机的garbage Collection，非常全，英文。 http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning &