奋羊羊

基于docker的Hadoop集群下实现最小生成树的mapreduce程序

01.技术背景

在本文中，将为读者详细介绍如下内容：

如何部署基于docker的hadoop开发环境
mapreduce的基本开发流程与基本知识
java开发的一些基本知识
最小生成树算法相关的知识。

文章中假定您已经具有如下知识背景：

了解并掌握docker的相关操作开发过程中
Linux相关的知识
java的基本知识
算法相关的基本知识，如图、树等基本概念

在环境搭建与开发过程中，需要用到的资源如下：

Hadoop 安装包
Java Linux安装包
Java 开发环境（本次使用的Idea）

02.重要说明：

后面有部分代码并没有放出来，主要是因为此学习过程是基于我当前学习的一们课程的大作业完成的。当前大作业还未上交，为避免造成协同现象，故暂时不放出代码，2022年9月1日后，一定会将剩余的代码补充完整。

1.基于Docker的Hadoop环境配置

1.1 拉取Ubuntu docker 映像

docker pull ubuntu

1.2 安装java

1.2.1 下载java的安装包jdk-8u333-linux-x64.tar.gz

1.2.2 安装并配置java

启动docker并挂载目录到root根目录下：

进入docker并解压jdk安装包

配置java：

环境变量：

文件位置：/etc/profile

环境变量内容：

export   JAVA_HOME=/usr/local/jdk1.8.0_333
export   CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export   PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export   JRE_HOME=$JAVA_HOME/jre

运行测试：

注意：

此处可能需要使用vim，具体安装参见本文 1.4 安装必要软件

1.3 安装hadoop

1.3.1 下载hadoop

hadoop下载地址（官方）

hadoop下载地址(清华源)

将hadoop解压并移到指定目录

tar -zxvf hadoop-3.1.3.tar.gz
mv hadoop-3.1.3 /usr/local/hadoop

1.4 安装必要软件

更换源并更新：

cp /etc/apt/sources.list  /etc/apt/sources.list.bak
sed -i "s/archive.ubuntu.com/mirrors.aliyun.com/g" /etc/apt/sources.list
apt udpate

1.4.1 vim

apt-get install vim

1.4.2 ssh

apt-get install ssh

配置ssh的免密登陆

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
echo "service ssh start" >> ~/.bashrc

1.5 配置hadoop

参考链接：

1.ubuntu基于docker搭建hadoop集群【史上最详细】_Looho_的博客-CSDN博客

2. 超详细的基于docker搭建hadoop集群_孤独之风。的博客-CSDN博客_docker搭建hadoop集群

1.5.1 配置hadoop运行环境变量

修改/etc/profile

配置 hadoop的环境变量


export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME 
export HADOOP_INSTALL=$HADOOP_HOME 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_CONF_DIR=$HADOOP_HOME 
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec 
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

1.5.2 配置hadoop配置文件

在修改完/etc/profile之后，运行如下命令，使刚才配置的环境变量生效，同时，切换到hadoop的配置文件夹下：

source /etc/profile
cd $HADOOP_CONF_DIR/

主要修改以下5个文件：

1.hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0_333
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

2. core-site.xml，在后面添加如下配置：


        
                fs.default.name
                hdfs://h01:9000
        
        
                hadoop.tmp.dir
                /data/hadoop/tmp

3.配置 hdfs-site.xml，添加如下内容：


    
        dfs.replication
        2
    
    
        dfs.namenode.name.dir
        /data/hadoop/hdfs/name
    
    
        dfs.namenode.data.dir
        /data/hadoop/hdfs/data

此处需要建立几个文件夹：

mkdir /data/hadoop/tmp
mkdir /data/hadoop/hdfs
mkdir /data/hadoop/hdfs/name
mkdir /data/hadoop/hdfs/data

4.配置 mapred-site.xml，添加如下内容



    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.application.classpath
        
            /opt/hadoop/etc/hadoop,
            /opt/hadoop/share/hadoop/common/*,
            /opt/hadoop/share/hadoop/common/lib/*,
            /opt/hadoop/share/hadoop/hdfs/*,
            /opt/hadoop/share/hadoop/hdfs/lib/*,
            /opt/hadoop/share/hadoop/mapreduce/*,
            /opt/hadoop/share/hadoop/mapreduce/lib/*,
            /opt/hadoop/share/hadoop/yarn/*,
            /opt/hadoop/share/hadoop/yarn/lib/*

5.配置yarn-site.xml，添加如下内容：





    
        yarn.resourcemanager.hostname
        h01
    
    
        yarn.nodemanager.aux-services
        mapreduce_shuffle

1.5.3 保存docker为新映像

命令：

 docker commit -m "hadoop" eddd4e14c0b5 hadoop:docker

需要根据自己实际进行修改:

1.6 配置hadoop 集群

1.6.1 配置docker

启动三个docker：

docker run -it -h h01 --name h01 -p 9870:9870 -p 8088:8088 hadoop /bin/bash
docker run -it -h h02 --name h02 hadoop /bin/bash
docker run -it -h h03 --name h03 hadoop /bin/bash

分别根据docker实际IP地址，修改/etc/hosts文件：

在三个docker中分别对另外两个docker进行ssh登陆，将ip写入known_hosts

1.6.2 配置集群

修改master节点上的配置目录下的workers文件，删除原先的localhsot，添加另外两个主机名。

1.6.3 启动集群

# 第一次第一次启动，务必要format一下namenode，后面再启动就不需要format了
hdfs namenode -format
# 启动集群
start-all.sh

1.6.4 查看各节点的状态

2.MapReduce 开发相关知识

2.1 开发流程

以mapreduce的经典案例wordcount为例，基本流程如下：

简易代码说明如下：

map代码：

reduce代码：

2.2 数据类型

2.2.1 常用的数据

//以下是常用的数据类型：
BooleanWritable //布尔型
ByteWritable
DoubleWritable
FloatWritable
IntWritable
LongWritable
Text                //:使用UTF8格式存储我们的文本
NullWritable  //当中key或者value为空时使用

2.2.2 常用数据类的基本操作

以IntWritable为例：

// 创建实例
IntWritable iKey = new IntWritable();

// 设置值
iKey.set(1);

//获取值
int key = iKey.get();

2.3 java开发相关的基础

此处只介绍命令行下编译与运行相关的基础：

2.3.1. 编译 javac

linux下，javac在编译过程中，如果要使用classpath,是以" : " 为分割符的。

windows下，则是以 “;”为分割符的。

MENIFEST.MF文件的内容如下

如果在进行jar包生成的时候，如果不加MENIFEST.MF文件，在hadoop中运行的时候，需要指明主类的名称。

 hadoop jar ./mst1.jar MyKruskalMST /input /outputt1

2.3.2 hadoop的常用命令

# 程序运行,如示例程序，路径一般在xx/hadoop/share/hadoop/mapreduce/
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /wd_op

# 程序的命令行生成
javac *.java -cp /path1/xxx.jar:/path2/xxy.jar:$(hadoop classpath)
# 程序打包
jar -cvf Package_name.jar *.class 

# hdfs 操作:
# 文件列表：
hadoop fs -ls path 
# 文件删除
hadoop fs -rm path #在删除文件夹的过程中，需要添加 -r 选项
# 文件创建
hadoop fs -touch /path/filename
# 文件上传
hadoop fs -put local_file_path hdfs_file_path
# 文件下载 
hadoop fs -get hdfs_file_path local_file_path

#集群停止
cd $HADOOP_CONF_DIR/
stop-all.sh

# 集群启动
cd $HADOOP_CONF_DIR
start-all.sh

2.4 学习建议

水平有限，建议就不加了 ........................................

3. 最小生成树算法

3.1 最小生成树

一个连通图的生成树是一个极小的连通子图，它包含图中全部的n个顶点，但只有构成一棵树的n-1条边。

最小生成树，就是一个 带权图中边的权值和最小的生成树 ，最小是指边的权值之和小于或者等于其它生成树的边的权值之和。

3.2 常见的最小生成树算法

此处对两种算法进行简要的介绍，为后面的编程奠定一个基础，详细还需要进行专门的学习。

3.2.1Kruskal算法

克鲁斯卡尔算法（Kruskal）是一种使用贪婪方法的最小生成树算法。该算法初始将图视为森林，图中的每一个顶点视为一棵单独的树。一棵树只与它的邻接顶点中权值最小且不违反最小生成树属性（不构成环）的树之间建立连边。

java实现：

程序摘自算法

/******************************************************************************
 *  Compilation:  javac KruskalMST.java
 *  Execution:    java  KruskalMST filename.txt
 *  Dependencies: EdgeWeightedGraph.java Edge.java Queue.java MinPQ.java
 *                UF.java In.java StdOut.java
 *  Data files:   https://algs4.cs.princeton.edu/43mst/tinyEWG.txt
 *                https://algs4.cs.princeton.edu/43mst/mediumEWG.txt
 *                https://algs4.cs.princeton.edu/43mst/largeEWG.txt
 *
 *  Compute a minimum spanning forest using Kruskal's algorithm.
 *
 *  %  java KruskalMST tinyEWG.txt 
 *  0-7 0.16000
 *  2-3 0.17000
 *  1-7 0.19000
 *  0-2 0.26000
 *  5-7 0.28000
 *  4-5 0.35000
 *  6-2 0.40000
 *  1.81000
 *
 *  % java KruskalMST mediumEWG.txt
 *  168-231 0.00268
 *  151-208 0.00391
 *  7-157   0.00516
 *  122-205 0.00647
 *  8-152   0.00702
 *  156-219 0.00745
 *  28-198  0.00775
 *  38-126  0.00845
 *  10-123  0.00886
 *  ...
 *  10.46351
 *
 ******************************************************************************/

package edu.princeton.cs.algs4;

import java.util.Arrays;

/**
 *  The {@code KruskalMST} class represents a data type for computing a
 *  minimum spanning tree in an edge-weighted graph.
 *  The edge weights can be positive, zero, or negative and need not
 *  be distinct. If the graph is not connected, it computes a minimum
 *  spanning forest, which is the union of minimum spanning trees
 *  in each connected component. The {@code weight()} method returns the 
 *  weight of a minimum spanning tree and the {@code edges()} method
 *  returns its edges.
 *  
 *  This implementation uses Krusal's algorithm and the
 *  union-find data type.
 *  The constructor takes Θ(E log E) time in
 *  the worst case.
 *  Each instance method takes Θ(1) time.
 *  It uses Θ(E) extra space (not including the graph).
 *  

 *  This {@code weight()} method correctly computes the weight of the MST
 *  if all arithmetic performed is without floating-point rounding error
 *  or arithmetic overflow.
 *  This is the case if all edge weights are non-negative integers
 *  and the weight of the MST does not exceed 2⁵².
 *  
 *  For additional documentation,
 *  see Section 4.3 of
 *  Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne.
 *  For alternate implementations, see {@link LazyPrimMST}, {@link PrimMST},
 *  and {@link BoruvkaMST}.
 *
 *  @author Robert Sedgewick
 *  @author Kevin Wayne
 */
public class KruskalMST {
    private static final double FLOATING_POINT_EPSILON = 1E-12;

    private double weight;                        // weight of MST
    private Queue mst = new Queue();  // edges in MST

    /**
     * Compute a minimum spanning tree (or forest) of an edge-weighted graph.
     * @param G the edge-weighted graph
     */
    public KruskalMST(EdgeWeightedGraph G) {

        // create array of edges, sorted by weight
        Edge[] edges = new Edge[G.E()];
        int t = 0;
        for (Edge e: G.edges()) {
            edges[t++] = e;
        }
        Arrays.sort(edges);

        // run greedy algorithm
        UF uf = new UF(G.V());
        for (int i = 0; i < G.E() && mst.size() < G.V() - 1; i++) {
            Edge e = edges[i];
            int v = e.either();
            int w = e.other(v);

            // v-w does not create a cycle
            if (uf.find(v) != uf.find(w)) {
                uf.union(v, w);     // merge v and w components
                mst.enqueue(e);     // add edge e to mst
                weight += e.weight();
            }
        }

        // check optimality conditions
        assert check(G);
    }

    /**
     * Returns the edges in a minimum spanning tree (or forest).
     * @return the edges in a minimum spanning tree (or forest) as
     *    an iterable of edges
     */
    public Iterable edges() {
        return mst;
    }

    /**
     * Returns the sum of the edge weights in a minimum spanning tree (or forest).
     * @return the sum of the edge weights in a minimum spanning tree (or forest)
     */
    public double weight() {
        return weight;
    }
    
    // check optimality conditions (takes time proportional to E V lg* V)
    private boolean check(EdgeWeightedGraph G) {

        // check total weight
        double total = 0.0;
        for (Edge e : edges()) {
            total += e.weight();
        }
        if (Math.abs(total - weight()) > FLOATING_POINT_EPSILON) {
            System.err.printf("Weight of edges does not equal weight(): %f vs. %f\n", total, weight());
            return false;
        }

        // check that it is acyclic
        UF uf = new UF(G.V());
        for (Edge e : edges()) {
            int v = e.either(), w = e.other(v);
            if (uf.find(v) == uf.find(w)) {
                System.err.println("Not a forest");
                return false;
            }
            uf.union(v, w);
        }

        // check that it is a spanning forest
        for (Edge e : G.edges()) {
            int v = e.either(), w = e.other(v);
            if (uf.find(v) != uf.find(w)) {
                System.err.println("Not a spanning forest");
                return false;
            }
        }

        // check that it is a minimal spanning forest (cut optimality conditions)
        for (Edge e : edges()) {

            // all edges in MST except e
            uf = new UF(G.V());
            for (Edge f : mst) {
                int x = f.either(), y = f.other(x);
                if (f != e) uf.union(x, y);
            }
            
            // check that e is min weight edge in crossing cut
            for (Edge f : G.edges()) {
                int x = f.either(), y = f.other(x);
                if (uf.find(x) != uf.find(y)) {
                    if (f.weight() < e.weight()) {
                        System.err.println("Edge " + f + " violates cut optimality conditions");
                        return false;
                    }
                }
            }

        }

        return true;
    }


    /**
     * Unit tests the {@code KruskalMST} data type.
     *
     * @param args the command-line arguments
     */
    public static void main(String[] args) {
        In in = new In(args[0]);
        EdgeWeightedGraph G = new EdgeWeightedGraph(in);
        KruskalMST mst = new KruskalMST(G);
        for (Edge e : mst.edges()) {
            StdOut.println(e);
        }
        StdOut.printf("%.5f\n", mst.weight());
    }

}


/******************************************************************************
 *  Copyright 2002-2020, Robert Sedgewick and Kevin Wayne.
 *
 *  This file is part of algs4.jar, which accompanies the textbook
 *
 *      Algorithms, 4th edition by Robert Sedgewick and Kevin Wayne,
 *      Addison-Wesley Professional, 2011, ISBN 0-321-57351-X.
 *      http://algs4.cs.princeton.edu
 *
 *
 *  algs4.jar is free software: you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation, either version 3 of the License, or
 *  (at your option) any later version.
 *
 *  algs4.jar is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with algs4.jar.  If not, see http://www.gnu.org/licenses.
 ******************************************************************************/

测试程序：

import edu.princeton.cs.algs4.EdgeWeightedGraph;
import edu.princeton.cs.algs4.Edge;
import edu.princeton.cs.algs4.KruskalMST;
import edu.princeton.cs.algs4.In;
import edu.princeton.cs.algs4.StdOut;
import java.io.File;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;

class TestKruskalMST {
	public static void main(String[] args) {

        try{
            long begintime = System.currentTimeMillis();
            File file =new File("mst_result.txt");

            if(!file.exists()){ 

                file.createNewFile(); 

            }
             FileWriter fileWriter = new FileWriter(file.getAbsoluteFile());
            In in = new In(args[0]);
            EdgeWeightedGraph G = new EdgeWeightedGraph(in);
            KruskalMST mst = new KruskalMST(G);
            BufferedWriter bw = new BufferedWriter(fileWriter);
            for (Edge e : mst.edges()) {
                bw.write(e.toString()+"\n");
            }
            bw.write(String.format("%.5f\n", mst.weight()));
            bw.close();
            long endtime=System.currentTimeMillis();
            long costTime = (endtime - begintime);
            StdOut.printf("constTime:"+costTime);

        }catch(IOException e){ 

            e.printStackTrace(); 

        }
        
	}

}

生成与运行：

3.2.2 Prim算法

普里姆算法在找最小生成树时，将顶点分为两类，一类是在查找的过程中已经包含在生成树中的顶点（假设为 A 类），剩下的为另一类（假设为 B 类）。

对于给定的连通网，起始状态全部顶点都归为 B 类。在找最小生成树时，选定任意一个顶点作为起始点，并将之从 B 类移至 A 类；然后找出 B 类中到 A 类中的顶点之间权值最小的顶点，将之从 B 类移至 A 类，如此重复，直到 B 类中没有顶点为止。所走过的顶点和边就是该连通图的最小生成树。

编程实现：

/******************************************************************************
 *  Compilation:  javac PrimMST.java
 *  Execution:    java PrimMST filename.txt
 *  Dependencies: EdgeWeightedGraph.java Edge.java Queue.java
 *                IndexMinPQ.java UF.java In.java StdOut.java
 *  Data files:   https://algs4.cs.princeton.edu/43mst/tinyEWG.txt
 *                https://algs4.cs.princeton.edu/43mst/mediumEWG.txt
 *                https://algs4.cs.princeton.edu/43mst/largeEWG.txt
 *
 *  Compute a minimum spanning forest using Prim's algorithm.
 *
 *  %  java PrimMST tinyEWG.txt 
 *  1-7 0.19000
 *  0-2 0.26000
 *  2-3 0.17000
 *  4-5 0.35000
 *  5-7 0.28000
 *  6-2 0.40000
 *  0-7 0.16000
 *  1.81000
 *
 *  % java PrimMST mediumEWG.txt
 *  1-72   0.06506
 *  2-86   0.05980
 *  3-67   0.09725
 *  4-55   0.06425
 *  5-102  0.03834
 *  6-129  0.05363
 *  7-157  0.00516
 *  ...
 *  10.46351
 *
 *  % java PrimMST largeEWG.txt
 *  ...
 *  647.66307
 *
 ******************************************************************************/

package edu.princeton.cs.algs4;

/**
 *  The {@code PrimMST} class represents a data type for computing a
 *  minimum spanning tree in an edge-weighted graph.
 *  The edge weights can be positive, zero, or negative and need not
 *  be distinct. If the graph is not connected, it computes a minimum
 *  spanning forest, which is the union of minimum spanning trees
 *  in each connected component. The {@code weight()} method returns the 
 *  weight of a minimum spanning tree and the {@code edges()} method
 *  returns its edges.
 *  
 *  This implementation uses Prim's algorithm with an indexed
 *  binary heap.
 *  The constructor takes Θ(E log V) time in
 *  the worst case, where V is the number of
 *  vertices and E is the number of edges.
 *  Each instance method takes Θ(1) time.
 *  It uses Θ(V) extra space (not including the 
 *  edge-weighted graph).
 *  

 *  This {@code weight()} method correctly computes the weight of the MST
 *  if all arithmetic performed is without floating-point rounding error
 *  or arithmetic overflow.
 *  This is the case if all edge weights are non-negative integers
 *  and the weight of the MST does not exceed 2⁵².
 *  
 *  For additional documentation,
 *  see Section 4.3 of
 *  Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne.
 *  For alternate implementations, see {@link LazyPrimMST}, {@link KruskalMST},
 *  and {@link BoruvkaMST}.
 *
 *  @author Robert Sedgewick
 *  @author Kevin Wayne
 */
public class PrimMST {
    private static final double FLOATING_POINT_EPSILON = 1E-12;

    private Edge[] edgeTo;        // edgeTo[v] = shortest edge from tree vertex to non-tree vertex
    private double[] distTo;      // distTo[v] = weight of shortest such edge
    private boolean[] marked;     // marked[v] = true if v on tree, false otherwise
    private IndexMinPQ pq;

    /**
     * Compute a minimum spanning tree (or forest) of an edge-weighted graph.
     * @param G the edge-weighted graph
     */
    public PrimMST(EdgeWeightedGraph G) {
        edgeTo = new Edge[G.V()];
        distTo = new double[G.V()];
        marked = new boolean[G.V()];
        pq = new IndexMinPQ(G.V());
        for (int v = 0; v < G.V(); v++)
            distTo[v] = Double.POSITIVE_INFINITY;

        for (int v = 0; v < G.V(); v++)      // run from each vertex to find
            if (!marked[v]) prim(G, v);      // minimum spanning forest

        // check optimality conditions
        assert check(G);
    }

    // run Prim's algorithm in graph G, starting from vertex s
    private void prim(EdgeWeightedGraph G, int s) {
        distTo[s] = 0.0;
        pq.insert(s, distTo[s]);
        while (!pq.isEmpty()) {
            int v = pq.delMin();
            scan(G, v);
        }
    }

    // scan vertex v
    private void scan(EdgeWeightedGraph G, int v) {
        marked[v] = true;
        for (Edge e : G.adj(v)) {
            int w = e.other(v);
            if (marked[w]) continue;         // v-w is obsolete edge
            if (e.weight() < distTo[w]) {
                distTo[w] = e.weight();
                edgeTo[w] = e;
                if (pq.contains(w)) pq.decreaseKey(w, distTo[w]);
                else                pq.insert(w, distTo[w]);
            }
        }
    }

    /**
     * Returns the edges in a minimum spanning tree (or forest).
     * @return the edges in a minimum spanning tree (or forest) as
     *    an iterable of edges
     */
    public Iterable edges() {
        Queue mst = new Queue();
        for (int v = 0; v < edgeTo.length; v++) {
            Edge e = edgeTo[v];
            if (e != null) {
                mst.enqueue(e);
            }
        }
        return mst;
    }

    /**
     * Returns the sum of the edge weights in a minimum spanning tree (or forest).
     * @return the sum of the edge weights in a minimum spanning tree (or forest)
     */
    public double weight() {
        double weight = 0.0;
        for (Edge e : edges())
            weight += e.weight();
        return weight;
    }


    // check optimality conditions (takes time proportional to E V lg* V)
    private boolean check(EdgeWeightedGraph G) {

        // check weight
        double totalWeight = 0.0;
        for (Edge e : edges()) {
            totalWeight += e.weight();
        }
        if (Math.abs(totalWeight - weight()) > FLOATING_POINT_EPSILON) {
            System.err.printf("Weight of edges does not equal weight(): %f vs. %f\n", totalWeight, weight());
            return false;
        }

        // check that it is acyclic
        UF uf = new UF(G.V());
        for (Edge e : edges()) {
            int v = e.either(), w = e.other(v);
            if (uf.find(v) == uf.find(w)) {
                System.err.println("Not a forest");
                return false;
            }
            uf.union(v, w);
        }

        // check that it is a spanning forest
        for (Edge e : G.edges()) {
            int v = e.either(), w = e.other(v);
            if (uf.find(v) != uf.find(w)) {
                System.err.println("Not a spanning forest");
                return false;
            }
        }

        // check that it is a minimal spanning forest (cut optimality conditions)
        for (Edge e : edges()) {

            // all edges in MST except e
            uf = new UF(G.V());
            for (Edge f : edges()) {
                int x = f.either(), y = f.other(x);
                if (f != e) uf.union(x, y);
            }

            // check that e is min weight edge in crossing cut
            for (Edge f : G.edges()) {
                int x = f.either(), y = f.other(x);
                if (uf.find(x) != uf.find(y)) {
                    if (f.weight() < e.weight()) {
                        System.err.println("Edge " + f + " violates cut optimality conditions");
                        return false;
                    }
                }
            }

        }

        return true;
    }

    /**
     * Unit tests the {@code PrimMST} data type.
     *
     * @param args the command-line arguments
     */
    public static void main(String[] args) {
        In in = new In(args[0]);
        EdgeWeightedGraph G = new EdgeWeightedGraph(in);
        PrimMST mst = new PrimMST(G);
        for (Edge e : mst.edges()) {
            StdOut.println(e);
        }
        StdOut.printf("%.5f\n", mst.weight());
    }


}

/******************************************************************************
 *  Copyright 2002-2020, Robert Sedgewick and Kevin Wayne.
 *
 *  This file is part of algs4.jar, which accompanies the textbook
 *
 *      Algorithms, 4th edition by Robert Sedgewick and Kevin Wayne,
 *      Addison-Wesley Professional, 2011, ISBN 0-321-57351-X.
 *      http://algs4.cs.princeton.edu
 *
 *
 *  algs4.jar is free software: you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation, either version 3 of the License, or
 *  (at your option) any later version.
 *
 *  algs4.jar is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with algs4.jar.  If not, see http://www.gnu.org/licenses.
 ******************************************************************************/

测试程序：

import edu.princeton.cs.algs4.EdgeWeightedGraph;
import edu.princeton.cs.algs4.Edge;
import edu.princeton.cs.algs4.PrimMST;
import edu.princeton.cs.algs4.In;
import edu.princeton.cs.algs4.StdOut;
import java.io.File;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;

class TestPrimMST {
	public static void main(String[] args) {

        try{
            long begintime = System.currentTimeMillis();
            File file =new File("mst_result.txt");

            if(!file.exists()){ 

                file.createNewFile(); 

            }
             FileWriter fileWriter = new FileWriter(file.getAbsoluteFile());
            In in = new In(args[0]);
            EdgeWeightedGraph G = new EdgeWeightedGraph(in);
            PrimMST mst = new PrimMST(G);
            BufferedWriter bw = new BufferedWriter(fileWriter);
            for (Edge e : mst.edges()) {
                bw.write(e.toString()+"\n");
            }
            bw.write(String.format("%.5f\n", mst.weight()));
            bw.close();
            long endtime=System.currentTimeMillis();
            long costTime = (endtime - begintime);
            StdOut.printf("constTime:"+costTime);

        }catch(IOException e){ 

            e.printStackTrace(); 

        }
        
	}

}

编译与运行：

3.3 并查集

3.3.1 基本的概念

摘自：【算法与数据结构】—— 并查集_酱懵静的博客-CSDN博客_并查集

定义：
并查集是一种树型的数据结构，用于处理一些不相交集合的合并及查询问题（即所谓的并、查）。比如说，我们可以用并查集来判断一个森林中有几棵树、某个节点是否属于某棵树等。

主要构成：
并查集主要由一个整型数组pre[ ]和两个函数find( )、join( )构成。
数组 pre[ ] 记录了每个点的前驱节点是谁，函数 find(x) 用于查找指定节点 x 属于哪个集合，函数 join(x,y) 用于合并两个节点 x 和 y 。

作用：
并查集的主要作用是求连通分支数（如果一个图中所有点都存在可达关系（直接或间接相连），则此图的连通分支数为1；如果此图有两大子图各自全部可达，则此图的连通分支数为2……）

3.3.2 java实现

/******************************************************************************
 *  Compilation:  javac UF.java
 *  Execution:    java UF < input.txt
 *  Dependencies: StdIn.java StdOut.java
 *  Data files:   https://algs4.cs.princeton.edu/15uf/tinyUF.txt
 *                https://algs4.cs.princeton.edu/15uf/mediumUF.txt
 *                https://algs4.cs.princeton.edu/15uf/largeUF.txt
 *
 *  Weighted quick-union by rank with path compression by halving.
 *
 *  % java UF < tinyUF.txt
 *  4 3
 *  3 8
 *  6 5
 *  9 4
 *  2 1
 *  5 0
 *  7 2
 *  6 1
 *  2 components
 *
 ******************************************************************************/

package edu.princeton.cs.algs4;


/**
 *  The {@code UF} class represents a union–find data type
 *  (also known as the disjoint-sets data type).
 *  It supports the classic union and find operations,
 *  along with a count operation that returns the total number
 *  of sets.
 *  
 *  The union–find data type models a collection of sets containing
 *  n elements, with each element in exactly one set.
 *  The elements are named 0 through n–1.
 *  Initially, there are n sets, with each element in its
 *  own set. The canonical element of a set
 *  (also known as the root, identifier,
 *  leader, or set representative)
 *  is one distinguished element in the set. Here is a summary of
 *  the operations:
 *  

 *  find(p) returns the canonical element
 *      of the set containing p. The find operation
 *      returns the same value for two elements if and only if
 *      they are in the same set.
 *  
union(p, q) merges the set
 *      containing element p with the set containing
 *      element q. That is, if p and q
 *      are in different sets, replace these two sets
 *      with a new set that is the union of the two.
 *  
count() returns the number of sets.
 *  
 *  
 *  The canonical element of a set can change only when the set
 *  itself changes during a call to union—it cannot
 *  change during a call to either find or count.
 *  

 *  This implementation uses weighted quick union by rank
 *  with path compression by halving.
 *  The constructor takes Θ(n) time, where
 *  n is the number of elements.
 *  The union and find operations take
 *  Θ(log n) time in the worst case.
 *  The count operation takes Θ(1) time.
 *  Moreover, starting from an empty data structure with n sites,
 *  any intermixed sequence of m union and find
 *  operations takes O(m α(n)) time,
 *  where α(n) is the inverse of
 *  Ackermann's function.
 *  
 *  For alternative implementations of the same API, see
 *  {@link QuickUnionUF}, {@link QuickFindUF}, and {@link WeightedQuickUnionUF}.
 *  For additional documentation, see
 *  Section 1.5 of
 *  Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne.
 *
 *  @author Robert Sedgewick
 *  @author Kevin Wayne
 */

public class UF {

    private int[] parent;  // parent[i] = parent of i
    private byte[] rank;   // rank[i] = rank of subtree rooted at i (never more than 31)
    private int count;     // number of components

    /**
     * Initializes an empty union-find data structure with
     * {@code n} elements {@code 0} through {@code n-1}.
     * Initially, each element is in its own set.
     *
     * @param  n the number of elements
     * @throws IllegalArgumentException if {@code n < 0}
     */
    public UF(int n) {
        if (n < 0) throw new IllegalArgumentException();
        count = n;
        parent = new int[n];
        rank = new byte[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i;
            rank[i] = 0;
        }
    }

    /**
     * Returns the canonical element of the set containing element {@code p}.
     *
     * @param  p an element
     * @return the canonical element of the set containing {@code p}
     * @throws IllegalArgumentException unless {@code 0 <= p < n}
     */
    public int find(int p) {
        validate(p);
        while (p != parent[p]) {
            parent[p] = parent[parent[p]];    // path compression by halving
            p = parent[p];
        }
        return p;
    }

    /**
     * Returns the number of sets.
     *
     * @return the number of sets (between {@code 1} and {@code n})
     */
    public int count() {
        return count;
    }
  
    /**
     * Returns true if the two elements are in the same set.
     *
     * @param  p one element
     * @param  q the other element
     * @return {@code true} if {@code p} and {@code q} are in the same set;
     *         {@code false} otherwise
     * @throws IllegalArgumentException unless
     *         both {@code 0 <= p < n} and {@code 0 <= q < n}
     * @deprecated Replace with two calls to {@link #find(int)}.
     */
    @Deprecated
    public boolean connected(int p, int q) {
        return find(p) == find(q);
    }
  
    /**
     * Merges the set containing element {@code p} with the 
     * the set containing element {@code q}.
     *
     * @param  p one element
     * @param  q the other element
     * @throws IllegalArgumentException unless
     *         both {@code 0 <= p < n} and {@code 0 <= q < n}
     */
    public void union(int p, int q) {
        int rootP = find(p);
        int rootQ = find(q);
        if (rootP == rootQ) return;

        // make root of smaller rank point to root of larger rank
        if      (rank[rootP] < rank[rootQ]) parent[rootP] = rootQ;
        else if (rank[rootP] > rank[rootQ]) parent[rootQ] = rootP;
        else {
            parent[rootQ] = rootP;
            rank[rootP]++;
        }
        count--;
    }

    // validate that p is a valid index
    private void validate(int p) {
        int n = parent.length;
        if (p < 0 || p >= n) {
            throw new IllegalArgumentException("index " + p + " is not between 0 and " + (n-1));  
        }
    }

    /**
     * Reads an integer {@code n} and a sequence of pairs of integers
     * (between {@code 0} and {@code n-1}) from standard input, where each integer
     * in the pair represents some element;
     * if the elements are in different sets, merge the two sets
     * and print the pair to standard output.
     *
     * @param args the command-line arguments
     */
    public static void main(String[] args) {
        int n = StdIn.readInt();
        UF uf = new UF(n);
        while (!StdIn.isEmpty()) {
            int p = StdIn.readInt();
            int q = StdIn.readInt();
            if (uf.find(p) == uf.find(q)) continue;
            uf.union(p, q);
            StdOut.println(p + " " + q);
        }
        StdOut.println(uf.count() + " components");
    }
}


/******************************************************************************
 *  Copyright 2002-2020, Robert Sedgewick and Kevin Wayne.
 *
 *  This file is part of algs4.jar, which accompanies the textbook
 *
 *      Algorithms, 4th edition by Robert Sedgewick and Kevin Wayne,
 *      Addison-Wesley Professional, 2011, ISBN 0-321-57351-X.
 *      http://algs4.cs.princeton.edu
 *
 *
 *  algs4.jar is free software: you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation, either version 3 of the License, or
 *  (at your option) any later version.
 *
 *  algs4.jar is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with algs4.jar.  If not, see http://www.gnu.org/licenses.
 ******************************************************************************/

测试程序：

import edu.princeton.cs.algs4.*;
public class TestUF {
	public static void main(String[] args) {
       int n = StdIn.readInt();
        UF uf = new UF(n);
        while (!StdIn.isEmpty()) {
            int p = StdIn.readInt();
            int q = StdIn.readInt();
            if (uf.find(p) == uf.find(q)) continue;
            uf.union(p, q);
            StdOut.println(p + " " + q);
        }
        StdOut.println(uf.count() + " components");
	}
}

编译与运行：

4.hadoop 下实现最小生成树算法

以下是个人认识，可能对mapreduce认识不到位，同时，对java编程也不是非常熟悉，所以会存在一定的错误，仅做参考。

在hadoop 下，如果要使用基于并查集的生成树算法，需要对并查集进行一定的修改。首先，并查集使用固定的数组，其在实例化之前，需要知道数组的大小。在mapreduce中，如果map阶段的输入过程的文件开始阶段，设置图的边数与顶点数，如果对文件进行分割，则会导致后面的部分文件缺少这个数据而无法进行处理。我的解决方法：

将基于数组的并查集利用map与hashmap进行改造成一个动态并查集。

4.1 基于动态数组的并查集实现

代码如下：

import java.util.HashMap;
import java.util.Map;

// A class to represent a disjoint set
class UnionFind
{
	private Map parent = new HashMap<>();

	// stores the depth of trees
	private Map rank = new HashMap<>();

	int count = 0;

}

测试代码如下：

import edu.princeton.cs.algs4.*;
public class TestUF {
	public static void main(String[] args) {
       int n = StdIn.readInt();
        UF uf = new UF(n);
        UnionFind huf = new UnionFind();
        while (!StdIn.isEmpty()) {
            int p = StdIn.readInt();
            int q = StdIn.readInt();
            if (uf.find(p) != uf.find(q)){
                uf.union(p, q);
                StdOut.println("uf:"+p + " " + q);
            }
            if (huf.Find(p) != huf.Find(q)) {
                huf.Union(p, q);
                StdOut.println("huf" +p + " " + q);
            }
            
        }
        StdOut.println(uf.count() + " components");
	}
}

代码编译及运行结果如下：

4.2 mapreduce实现最小生成树求解

4.2.1 map

4.2.2 reduce

4.2.3 驱动程序

4.3 程序执行结果

4.3.1 程序编译及运行过程

4.3.2 程序执行结果

你可能感兴趣的:(学习笔记,技术分享,hadoop,mapreduce,大数据,java,算法)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
Long类型前后端数据不一致 igotyback 前端
响应给前端的数据浏览器控制台中response中看到的Long类型的数据是正常的到前端数据不一致前后端数据类型不匹配是一个常见问题，尤其是当后端使用Java的Long类型（64位）与前端JavaScript的Number类型（最大安全整数为2^53-1，即16位）进行数据交互时，很容易出现精度丢失的问题。这是因为JavaScript中的Number类型无法安全地表示超过16位的整数。为了解决这个问
LocalDateTime 转 String igotyback java 开发语言
importjava.time.LocalDateTime;importjava.time.format.DateTimeFormatter;publicclassMain{publicstaticvoidmain(String[]args){//获取当前时间LocalDateTimenow=LocalDateTime.now();//定义日期格式化器DateTimeFormatterformat
Linux下QT开发的动态库界面弹出操作（SDL2） 13jjyao QT类 qt 开发语言 sdl2 linux
需求：操作系统为linux，开发框架为qt，做成需带界面的qt动态库，调用方为java等非qt程序难点：调用方为java等非qt程序，也就是说调用方肯定不带QApplication::exec()，缺少了这个，QTimer等事件和QT创建的窗口将不能弹出(包括opencv也是不能弹出)；这与qt调用本身qt库是有本质的区别的思路：1.调用方缺QApplication::exec()，那么我们在接口
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
Goolge earth studio 进阶4——路径修改与平滑陟彼高冈yu Google earth studio 进阶教程旅游
如果我们希望在大约中途时获得更多的城市鸟瞰视角。可以将相机拖动到这里并创建一个新的关键帧。camera_target_clip_7EarthStudio会自动平滑我们的路径，所以当我们通过这个关键帧时，不是一个生硬的角度，而是一个平滑的曲线。camera_target_clip_8路径上有贝塞尔控制手柄，允许我们调整路径的形状。右键单击，我们可以选择“平滑路径”，这是默认的自动平滑算法，或者我们可
基于社交网络算法优化的二维最大熵图像分割智能算法研学社（Jack旭）智能优化算法应用图像分割算法 php 开发语言
智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码文章目录智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码1.前言2.二维最大熵阈值分割原理3.基于社交网络优化的多阈值分割4.算法结果：5.参考文献：6.Matlab代码摘要：本文介绍基于最大熵的图像分割，并且应用社交网络算法进行阈值寻优。1.前言阅读此文章前，请阅读《图像分割：直方图区域划分及信息统计介绍》htt
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
DIV+CSS+JavaScript技术制作网页（旅游主题网页设计与制作）云南大理 STU学生网页设计网页设计期末网页作业 html静态网页 html5期末大作业网页设计 web大作业
️精彩专栏推荐作者主页:【进入主页—获取更多源码】web前端期末大作业：【HTML5网页期末作业(1000套)】程序员有趣的告白方式：【HTML七夕情人节表白网页制作(110套)】文章目录二、网站介绍三、网站效果▶️1.视频演示2.图片演示四、网站代码HTML结构代码CSS样式代码五、更多源码二、网站介绍网站布局方面：计划采用目前主流的、能兼容各大主流浏览器、显示效果稳定的浮动网页布局结构。网站程
【华为OD机试真题2023B卷 JAVA&JS】We Are A Team 若博豆 java 算法华为 javascript
华为OD2023（B卷）机试题库全覆盖，刷题指南点这里WeAreATeam时间限制：1秒|内存限制：32768K|语言限制：不限题目描述：总共有n个人在机房，每个人有一个标号（1<=标号<=n），他们分成了多个团队，需要你根据收到的m条消息判定指定的两个人是否在一个团队中，具体的：1、消息构成为：abc，整数a、b分别代
关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript 二挡起步 web前端期末大作业 javascript html css 旅游风景
⛵源码获取文末联系✈Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业，Web大学生网页HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip
HTML网页设计制作大作业（div+css）云南我的家乡旅游景点带文字滚动二挡起步 web前端期末大作业 web设计网页规划与设计 html css javascript dreamweaver 前端
Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作HTML期末大学生网页设计作业HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScript：做与用户的交互行为文章目录前端学习路线
121. 买卖股票的最佳时机薄荷糖的味道_fb40
给定一个数组，它的第i个元素是一支给定股票第i天的价格。如果你最多只允许完成一笔交易（即买入和卖出一支股票），设计一个算法来计算你所能获取的最大利润。注意你不能在买入股票前卖出股票。示例1:输入:[7,1,5,3,6,4]输出:5解释:在第2天（股票价格=1）的时候买入，在第5天（股票价格=6）的时候卖出，最大利润=6-1=5。注意利润不能是7-1=6,因为卖出价格需要大于买入价格。示例2:输入:
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
node.js学习小猿L node.js node.js 学习 vim
node.js学习实操及笔记温故node.js，node.js学习实操过程及笔记~node.js学习视频node.js官网node.js中文网实操笔记githubcsdn笔记为什么学node.js可以让别人访问我们编写的网页为后续的框架学习打下基础，三大框架vuereactangular离不开node.jsnode.js是什么官网：node.js是一个开源的、跨平台的运行JavaScript的运行
回溯算法-重新安排行程 chirou_ 算法数据结构图论 c++图搜索
leetcode332.重新安排行程这题我还没自己ac过，只能现在凭着刚学完的热乎劲把我对题解的理解记下来。本题我认为对数据结构的考察比较多，用什么数据结构去存数据，去读取数据，都是很重要的。classSolution{private:unordered_map>targets;boolbacktracking(intticketNum,vector&result){//1.确定参数和返回值//2
Faiss：高效相似性搜索与聚类的利器网络·魚大数据 faiss
Faiss是一个针对大规模向量集合的相似性搜索库，由FacebookAIResearch开发。它提供了一系列高效的算法和数据结构，用于加速向量之间的相似性搜索，特别是在大规模数据集上。本文将介绍Faiss的原理、核心功能以及如何在实际项目中使用它。Faiss原理：近似最近邻搜索：Faiss的核心功能之一是近似最近邻搜索，它能够高效地在大规模数据集中找到与给定查询向量最相似的向量。这种搜索是近似的，
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
insert into select 主键自增_mybatis拦截器实现主键自动生成 weixin_39521651 insert into select 主键自增 mybatis delete返回值 mybatis insert返回主键 mybatis insert返回对象 mybatis plus insert返回主键 mybatis plus 插入生成id
前言前阵子和朋友聊天，他说他们项目有个需求，要实现主键自动生成，不想每次新增的时候，都手动设置主键。于是我就问他，那你们数据库表设置主键自动递增不就得了。他的回答是他们项目目前的id都是采用雪花算法来生成，因此为了项目稳定性，不会切换id的生成方式。朋友问我有没有什么实现思路，他们公司的orm框架是mybatis，我就建议他说，不然让你老大把mybatis切换成mybatis-plus。mybat
k均值聚类算法考试例题_k均值算法(k均值聚类算法计算题) 寻找你83497 k均值聚类算法考试例题
?算法：第一步：选K个初始聚类中心，z1(1),z2(1)，…，zK(1)，其中括号内的序号为寻找聚类中心的迭代运算的次序号。聚类中心的向量值可任意设定，例如可选开始的K个.k均值聚类：---------一种硬聚类算法，隶属度只有两个取值0或1，提出的基本根据是“类内误差平方和最小化”准则；模糊的c均值聚类算法：--------一种模糊聚类算法，是.K均值聚类算法是先随机选取K个对象作为初始的聚类
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
Java 重写(Override)与重载(Overload) 叨唧唧的
Java重写(Override)与重载(Overload)重写(Override)重写是子类对父类的允许访问的方法的实现过程进行重新编写,返回值和形参都不能改变。即外壳不变，核心重写！重写的好处在于子类可以根据需要，定义特定于自己的行为。也就是说子类能够根据需要实现父类的方法。重写方法不能抛出新的检查异常或者比被重写方法申明更加宽泛的异常。例如：父类的一个方法申明了一个检查异常IOExceptio
简单了解 JVM 记得开心一点啊 jvm
目录♫什么是JVM♫JVM的运行流程♫JVM运行时数据区♪虚拟机栈♪本地方法栈♪堆♪程序计数器♪方法区/元数据区♫类加载的过程♫双亲委派模型♫垃圾回收机制♫什么是JVMJVM是JavaVirtualMachine的简称，意为Java虚拟机。虚拟机是指通过软件模拟的具有完整硬件功能的、运行在一个完全隔离的环境中的完整计算机系统（如：JVM、VMwave、VirtualBox）。JVM和其他两个虚拟机
1分钟解决 -bash: mvn: command not found，在Centos 7中安装Maven Energet!c 开发语言
1分钟解决-bash:mvn:commandnotfound，在Centos7中安装Maven检查Java环境1下载Maven2解压Maven3配置环境变量4验证安装5常见问题与注意事项6总结检查Java环境Maven依赖Java环境，请确保系统已经安装了Java并配置了环境变量。可以通过以下命令检查：java-version如果未安装，请先安装Java。1下载Maven从官网下载：前往Apach
Python实现简单的机器学习算法 master_chenchengg python python 办公效率 python开发 IT
Python实现简单的机器学习算法开篇：初探机器学习的奇妙之旅搭建环境：一切从安装开始必备工具箱第一步：安装Anaconda和JupyterNotebook小贴士：如何配置Python环境变量算法初体验：从零开始的Python机器学习线性回归：让数据说话数据准备：从哪里找数据编码实战：Python实现线性回归模型评估：如何判断模型好坏逻辑回归：从分类开始理论入门：什么是逻辑回归代码实现：使用skl
Java企业面试题3 马龙强_ java
1.break和continue的作用(智*图)break：用于完全退出一个循环（如for,while）或一个switch语句。当在循环体内遇到break语句时，程序会立即跳出当前循环体，继续执行循环之后的代码。continue：用于跳过当前循环体中剩余的部分，并开始下一次循环。如果是在for循环中使用continue，则会直接进行条件判断以决定是否执行下一轮循环。2.if分支语句和switch分
JVM、JRE和 JDK：理解Java开发的三大核心组件 Y雨何时停T Java java
Java是一门跨平台的编程语言，它的成功离不开背后强大的运行环境与开发工具的支持。在Java的生态中，JVM（Java虚拟机）、JRE（Java运行时环境）和JDK（Java开发工具包）是三个至关重要的核心组件。本文将探讨JVM、JDK和JRE的区别，帮助你更好地理解Java的运行机制。1.JVM：Java虚拟机（JavaVirtualMachine）什么是JVM？JVM，即Java虚拟机，是Ja
推荐算法_隐语义-梯度下降 _feivirus_ 算法机器学习和数学推荐算法机器学习隐语义
importnumpyasnp1.模型实现"""inputrate_matrix:M行N列的评分矩阵，值为P*Q.P:初始化用户特征矩阵M*K.Q:初始化物品特征矩阵K*N.latent_feature_cnt:隐特征的向量个数max_iteration:最大迭代次数alpha:步长lamda:正则化系数output分解之后的P和Q"""defLFM_grad_desc(rate_matrix,l
K近邻算法_分类鸢尾花数据集 _feivirus_ 算法机器学习和数学分类机器学习 K近邻
importnumpyasnpimportpandasaspdfromsklearn.datasetsimportload_irisfromsklearn.model_selectionimporttrain_test_splitfromsklearn.metricsimportaccuracy_score1.数据预处理iris=load_iris()df=pd.DataFrame(data=ir
Java面试题精选：消息队列(二) 芒果不是芒 Java面试题精选 java kafka
一、Kafka的特性1.消息持久化：消息存储在磁盘，所以消息不会丢失2.高吞吐量：可以轻松实现单机百万级别的并发3.扩展性：扩展性强，还是动态扩展4.多客户端支持：支持多种语言（Java、C、C++、GO、）5.KafkaStreams（一个天生的流处理）:在双十一或者销售大屏就会用到这种流处理。使用KafkaStreams可以快速的把销售额统计出来6.安全机制：Kafka进行生产或者消费的时候会
LeetCode[位运算] - #137 Single Number II Cwind java Algorithm LeetCode 题解位运算
原题链接：#137 Single Number II 要求：给定一个整型数组，其中除了一个元素之外，每个元素都出现三次。找出这个元素注意：算法的时间复杂度应为O(n)，最好不使用额外的内存空间难度：中等分析：与#136类似，都是考察位运算。不过出现两次的可以使用异或运算的特性 n XOR n = 0, n XOR 0 = n，即某一
《JavaScript语言精粹》笔记 aijuans JavaScript
0、JavaScript的简单数据类型包括数字、字符创、布尔值（true/false）、null和undefined值，其它值都是对象。 1、JavaScript只有一个数字类型，它在内部被表示为64位的浮点数。没有分离出整数，所以1和1.0的值相同。 2、NaN是一个数值，表示一个不能产生正常结果的运算结果。NaN不等于任何值，包括它本身。可以用函数isNaN(number)检测NaN,但是
你应该更新的Java知识之常用程序库 Kai_Ge java
在很多人眼中，Java 已经是一门垂垂老矣的语言，但并不妨碍 Java 世界依然在前进。如果你曾离开 Java，云游于其它世界，或是每日只在遗留代码中挣扎，或许是时候抬起头，看看老 Java 中的新东西。 Guava Guava[gwɑ:və]，一句话，只要你做Java项目，就应该用Guava（Github）。 guava 是 Google 出品的一套 Java 核心库，在我看来，它甚至应该
HttpClient 120153216 httpclient
/** * 可以传对象的请求转发，对象已流形式放入HTTP中 */ public static Object doPost(Map<String,Object> parmMap,String url) { Object object = null; HttpClient hc = new HttpClient(); String fullURL
Django model字段类型清单 2002wmj django
Django 通过 models 实现数据库的创建、修改、删除等操作，本文为模型中一般常用的类型的清单，便于查询和使用： AutoField：一个自动递增的整型字段，添加记录时它会自动增长。你通常不需要直接使用这个字段；如果你不指定主键的话，系统会自动添加一个主键字段到你的model。(参阅自动主键字段) BooleanField：布尔字段,管理工具里会自动将其描述为checkbox。 Cha
在SQLSERVER中查找消耗CPU最多的SQL 357029540 SQL Server
返回消耗CPU数目最多的10条语句 SELECT TOP 10 total_worker_time/execution_count AS avg_cpu_cost, plan_handle, execution_count, (SELECT SUBSTRING(text, statement_start_of
Myeclipse项目无法部署，Undefined exploded archive location 7454103 eclipse MyEclipse
做个备忘！错误信息为： Undefined exploded archive location 原因：在工程转移过程中，导致工程的配置文件出错；解决方法：
GMT时间格式转换 adminjun GMT 时间转换
普通的时间转换问题我这里就不再罗嗦了，我想大家应该都会那种低级的转换问题吧，现在我向大家总结一下如何转换GMT时间格式，这种格式的转换方法网上还不是很多，所以有必要总结一下，也算给有需要的朋友一个小小的帮助啦。 1、可以使用 SimpleDateFormat SimpleDateFormat EEE-三位星期 d-天 MMM-月 yyyy-四位年
Oracle数据库新装连接串问题 aijuans oracle数据库
割接新装了数据库，客户端登陆无问题，apache/cgi-bin程序有问题，sqlnet.log日志如下： Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 10.2.0.4.0 - Product
回顾java数组复制 ayaoxinchao java 数组
在写这篇文章之前，也看了一些别人写的，基本上都是大同小异。文章是对java数组复制基础知识的回顾，算是作为学习笔记，供以后自己翻阅。首先，简单想一下这个问题：为什么要复制数组？我的个人理解：在我们在利用一个数组时，在每一次使用，我们都希望它的值是初始值。这时我们就要对数组进行复制，以达到原始数组值的安全性。java数组复制大致分为3种方式：①for循环方式 ②clone方式 ③arrayCopy方
java web会话监听并使用spring注入 bewithme Java Web
在java web应用中，当你想在建立会话或移除会话时，让系统做某些事情，比如说，统计在线用户，每当有用户登录时，或退出时，那么可以用下面这个监听器来监听。 import java.util.ArrayList; import java.ut
NoSQL数据库之Redis数据库管理(Redis的常用命令及高级应用) bijian1013 redis 数据库 NoSQL
一 .Redis常用命令 Redis提供了丰富的命令对数据库和各种数据库类型进行操作，这些命令可以在Linux终端使用。 a.键值相关命令 b.服务器相关命令 1.键值相关命令 &
java枚举序列化问题 bingyingao java 枚举序列化
对象在网络中传输离不开序列化和反序列化。而如果序列化的对象中有枚举值就要特别注意一些发布兼容问题: 1.加一个枚举值新机器代码读分布式缓存中老对象，没有问题，不会抛异常。老机器代码读分布式缓存中新对像，反序列化会中断，所以在所有机器发布完成之前要避免出现新对象，或者提前让老机器拥有新增枚举的jar。 2.删一个枚举值新机器代码读分布式缓存中老对象，反序列
【Spark七十八】Spark Kyro序列化 bit1129 spark
当使用SparkContext的saveAsObjectFile方法将对象序列化到文件，以及通过objectFile方法将对象从文件反序列出来的时候，Spark默认使用Java的序列化以及反序列化机制，通常情况下，这种序列化机制是很低效的，Spark支持使用Kyro作为对象的序列化和反序列化机制，序列化的速度比java更快，但是使用Kyro时要注意，Kyro目前还是有些bug。 Spark
Hybridizing OO and Functional Design bookjovi erlang haskell
推荐博文： Tell Above, and Ask Below - Hybridizing OO and Functional Design 文章中把OO和FP讲的深入透彻，里面把smalltalk和haskell作为典型的两种编程范式代表语言，此点本人极为同意，smalltalk可以说是最能体现OO设计的面向对象语言，smalltalk的作者Alan kay也是OO的最早先驱，
Java-Collections Framework学习与总结-HashMap BrokenDreams Collections
开发中常常会用到这样一种数据结构，根据一个关键字，找到所需的信息。这个过程有点像查字典，拿到一个key，去字典表中查找对应的value。Java1.0版本提供了这样的类java.util.Dictionary(抽象类)，基本上支持字典表的操作。后来引入了Map接口，更好的描述的这种数据结构。 &nb
读《研磨设计模式》-代码笔记-职责链模式-Chain Of Responsibility bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 业务逻辑：项目经理只能处理500以下的费用申请，部门经理是1000，总经理不设限。简单起见，只同意“Tom”的申请 * bylijinnan */ abstract class Handler { /*
Android中启动外部程序 cherishLC android
1、启动外部程序引用自： http://blog.csdn.net/linxcool/article/details/7692374 //方法一 Intent intent=new Intent(); //包名包名+类名（全路径） intent.setClassName("com.linxcool", "com.linxcool.PlaneActi
summary_keep_rate coollyj SUM
BEGIN /*DECLARE minDate varchar(20) ; DECLARE maxDate varchar(20) ;*/ DECLARE stkDate varchar(20) ; DECLARE done int default -1; /* 游标中注册服务器地址 */ DE
hadoop hdfs 添加数据目录出错 daizj hadoop hdfs 扩容
由于原来配置的hadoop data目录快要用满了，故准备修改配置文件增加数据目录，以便扩容，但由于疏忽，把core-site.xml, hdfs-site.xml配置文件dfs.datanode.data.dir 配置项增加了配置目录，但未创建实际目录，重启datanode服务时，报如下错误： 2014-11-18 08:51:39,128 WARN org.apache.hadoop.h
grep 目录级联查找 dongwei_6688 grep
在Mac或者Linux下使用grep进行文件内容查找时，如果给定的目标搜索路径是当前目录，那么它默认只搜索当前目录下的文件，而不会搜索其下面子目录中的文件内容，如果想级联搜索下级目录，需要使用一个“-r”参数： grep -n -r "GET" . 上面的命令将会找出当前目录“.”及当前目录中所有下级目录
yii 修改模块使用的布局文件 dcj3sjt126com yii layouts
方法一：yii模块默认使用系统当前的主题布局文件，如果在主配置文件中配置了主题比如: 'theme'=>'mythm', 那么yii的模块就使用 protected/themes/mythm/views/layouts 下的布局文件；如果未配置主题，那么 yii的模块就使用 protected/views/layouts 下的布局文件，总之默认不是使用自身目录 pr
设计模式之单例模式 come_for_dream 设计模式单例模式懒汉式饿汉式双重检验锁失败无序写入
今天该来的面试还没来，这个店估计不会来电话了，安静下来写写博客也不错，没事翻了翻小易哥的博客甚至与大牛们之间的差距，基础知识不扎实建起来的楼再高也只能是危楼罢了，陈下心回归基础把以前学过的东西总结一下。 *********************************
8、数组豆豆咖啡二维数组数组一维数组
一、概念数组是同一种类型数据的集合。其实数组就是一个容器。二、好处可以自动给数组中的元素从0开始编号，方便操作这些元素三、格式 //一维数组 1,元素类型[] 变量名 = new 元素类型[元素的个数] int[] arr =
Decode Ways hcx2013 decode
A message containing letters from A-Z is being encoded to numbers using the following mapping: 'A' -> 1 'B' -> 2 ... 'Z' -> 26 Given an encoded message containing digits, det
Spring4.1新特性——异步调度和事件机制的异常处理 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
squid3(高命中率)缓存服务器配置 liyonghui160com
系统:centos 5.x 需要的软件:squid-3.0.STABLE25.tar.gz 1.下载squid wget http://www.squid-cache.org/Versions/v3/3.0/squid-3.0.STABLE25.tar.gz tar zxf squid-3.0.STABLE25.tar.gz &&
避免Java应用中NullPointerException的技巧和最佳实践 pda158 java
1) 从已知的String对象中调用equals()和equalsIgnoreCase()方法，而非未知对象。　　总是从已知的非空String对象中调用equals()方法。因为equals()方法是对称的，调用a.equals(b)和调用b.equals(a)是完全相同的，这也是为什么程序员对于对象a和b这么不上心。如果调用者是空指针，这种调用可能导致一个空指针异常 Object unk
如何在Swift语言中创建http请求 shoothao http swift
概述：本文通过实例从同步和异步两种方式上回答了”如何在Swift语言中创建http请求“的问题。如果你对Objective-C比较了解的话，对于如何创建http请求你一定驾轻就熟了，而新语言Swift与其相比只有语法上的区别。但是，对才接触到这个崭新平台的初学者来说，他们仍然想知道“如何在Swift语言中创建http请求？”。在这里,我将作出一些建议来回答上述问题。常见的
Spring事务的传播方式 uule spring事务
传播方式：新建事务 required required_new - 挂起当前非事务方式运行 supports &nbs