weixin_30828379

hadoop系列四:mapreduce的使用(二)

转载请在页首明显处注明作者与出处

一：说明

此为大数据系列的一些博文，有空的话会陆续更新，包含大数据的一些内容，如hadoop,spark,storm,机器学习等。

当前使用的hadoop版本为2.6.4

此为mapreducer的第二章节

这一章节中有着计算共同好友，推荐可能认识的人

上一篇:hadoop系列三:mapreduce的使用(一)

一：说明
二：在开发工具在运行mapreducer
2.1:本地模式运行mapreducer
2.2:在开发工具中运行在yarn中
三:mapreduce实现join
3.1:sql数据库中的示例
3.2:mapreduce的实现思路
3.3:创建相应的javabean
3.4:创建mapper
3.5:创建reduce
3.6:完整代码
3.7:数据倾斜的问题
四:查找共同好友，计算可能认识的人
4.1:准备数据
4.2:计算指定用户是哪些人的好友
4.3:计算共同好友
五:使用GroupingComparator分组计算最大值
5.1:定义一个javabean
5.2:定义一个GroupingComparator
5.3:map代码
5.4:reduce的代码
5.5:启动类
六:自定义输出位置
6.1:自定义FileOutputFormat
七:自定义输入数据
八:全局计数器
九:多个job串联，定义执行顺序
十:mapreduce的参数优化
10.1:资源相关参数
10.2:容错相关参数
10.3:本地运行mapreduce作业
10.4:效率和稳定性相关参数

二：在开发工具在运行mapreducer

之前我们一直是在开发工具中写好了代码，然后打包成jar包在服务器中以hadoop jar的形式运行，当然这个极其麻烦，毕竟上传这个部署还是很麻烦的，其次就是每改一次代码，都需要重新打包到服务器中。还有一个最大的缺点就是没有办法打断点调试一些业务代码，这对于定位代码问题极其困难。这里也有两个办法。

2.1:本地模式运行mapreducer

何为本地模式，就是不是运行在yarn上面，仅仅是以运行在本地的一个模式。

首先既然是运行在本地，就需要有所有mapreducer的class文件，先在hadoop官网中下载hadoop的代码，然后编译成相应的操作系统版本，以笔者在windows中开发的环境，肯定是编译windows版本的，然后设置相应的环境变量

HADOOP_HOME=E:\software\hadoop-2.6.2

然后增加path

%HADOOP_HOME%\bin

然后看一下main方法，其实代码什么都不用改，conf的配置全部可以不写，直接运行就是本地模式，至于为什么在服务器根据hadoop jar运行时，会运行到jar中，因为hadoop jar命令加载了配置文件。

        Configuration conf = new Configuration();
        //这个默认值就是local，其实可以不写
        conf.set("mapreduce.framework.name", "local");
        //本地模式运行mr程序时，输入输出可以在本地，也可以在hdfs中，具体需要看如下的两行参数
        //这个默认值 就是本地，其实可以不配
        //conf.set("fs.defaultFS","file:///");
        //conf.set("fs.defaultFS","hdfs://server1:9000/");



        Job job = Job.getInstance(conf);

那实际上，需要使用本地模式的时候，这里面的配置可以什么都不写，因为默认的参数就是本地模式，所以这个时候直接运行就行了，当然，在后面我们接收了两个参数，分别是数据的的来源和存储位置，所以我们运行的时候的时候，直接提交参数就行了，以idea为例

像在这里就传了两个参数，地址就在D盘中。

当然，其实也是支持挂在hdfs中的，如下配置

        Configuration conf = new Configuration();
        //这个默认值就是local，其实可以不写
        conf.set("mapreduce.framework.name", "local");
        //本地模式运行mr程序时，输入输出可以在本地，也可以在hdfs中，具体需要看如下的两行参数
        //其实是可以本地模式也可以使用hdfs中的数据的
        //conf.set("fs.defaultFS","file:///");
        conf.set("fs.defaultFS","hdfs://server1:9000/");

也就是说，即使是本地模式，不仅仅可以使用在硬盘中，也可以使用在hdfs中

其实我们还需要加上一个日志文件，不然等下出错了，也看不到错误信息，仅仅是一片空白，那就尴尬了

在src/main/resource中添加一个log4j.properties文件，内容如下

log4j.rootLogger=info, stdout, R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=example.log log4j.appender.R.MaxFileSize=100KB log4j.appender.R.MaxBackupIndex=1 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n

打印所有的info信息

2.2:在开发工具中运行在yarn中

上一部分中，我们是运行在本地模式，但是使用开发工具，可以更好的debug，这次我们在开发工具在，运行在服务器中的yarn上面。

想要运行在yarn上面，我们可以进行如下的配置

        Configuration conf = new Configuration();
        //运行在yarn的集群模式
        conf.set("mapreduce.framework.name","yarn");
        conf.set("yarn.resourcemanager.hostname","server1");//这行配置，使得该main方法会寻找该机器的mr环境
        conf.set("fs.defaultFS","hdfs://server1:9000/");

通过之前的代码，我们知道我们要设置一个参数，使得mr环境能找到该代码的jar包，然后复制到所有的mr机器中去运行，但是我们这里要换一种方式，因为开发工具运行的时候，是直接运行class文件，而不是jar包

        Job job = Job.getInstance(conf);
        //使得hadoop可以根据类包，找到jar包在哪里，如果是在开发工具中运行，那么则是找不到的
        //job.setJarByClass(WordCountDriver.class);
        job.setJar("c:/xx.jar");

所以，如果我们要执行如下的代码，我们还需要先对程序进行打包才行。

仅仅修改完如上的一点代码，我们开始运行。

同样的，先配置启动参数，因为我们没改别的代码，mr的输入与输出都是从启动参数中读取的

然后执行main方法，如果server1有配置在hosts文中的话，那么见证奇迹.....哦，见证错误吧

在这里会看到一个错误，啥，没权限，对的，而且我们看到一个Administrator的用户，这个其实是我windows系统的用户，说明mapreduce运行的时候，拿的用户是当前登陆的用户，而在服务器中，如果看过之前的文章，我们给的目录权限是hadoop用户，所以我们要设置hadoop的用户。

我们要怎么做呢？还有要怎么设置用户为hadoop呢？我们来看一段hadoop的核心代码

if (!isSecurityEnabled() && (user == null)) {
  String envUser = System.getenv(HADOOP_USER_NAME);
  if (envUser == null) {
    envUser = System.getProperty(HADOOP_USER_NAME);
  }
  user = envUser == null ? null : new User(envUser);
}

这段代码是获取用户的代码，这个时候我们就知道该怎么设置用户名了，常量名称为:HADOOP_USER_NAME

        System.setProperty("HADOOP_USER_NAME","hadoop");
        Configuration conf = new Configuration();
        //运行在yarn的集群模式
        conf.set("mapreduce.framework.name","yarn");
        conf.set("yarn.resourcemanager.hostname","server1");//这行配置，使得该main方法会寻找该机器的mr环境
        conf.set("fs.defaultFS","hdfs://server1:9000/");

可以看到红色区域，设置了hadoop的用户，此时，我们再运行一下代码，见证下一个错误，ps：一定要配置日志文件，不然看不到错误信息

从完整的日志中，其实是可以看到，它是运行在yarn中了，不过出错了，图中是错误信息

有点让我吃惊的这竟然是中文的日志哈，如果是英文的日志，则是这样的

意思差不多哈，看到这个错误，我们要怎么解决呢？

这是hadoop的一个bug，新版本中已经解决，并且这个bug只会在windwos系统中出现，也就是意味着，如果你用的是linux的图形化界面，在这里面使用开发工具运行，也是不会有问题的。

先看一下问题是怎么产生的吧。先关联源码。

我们先找到org.apache.hadoop.mapred.YARNRunner这个类，并且在492行打上注释，可能位置会不一样，不过只需要找到environment变量即可，然后查看这个变量的名称

经过debug后，进入断点，查看environment变量，把内容最长的一段复制出来到记事本中查看。

很明显，最后的代码是执行在linux中的，但是这段环境却有问题。

首先就是%HADOOP_CONF_DIR%这种环境变量，对linux熟悉的可能知道，linux的环境变量是$JAVA_HOME$的这种形式，这是一个问题。

其次就是斜杠windows与linux也是不同的。

最后，环境变量的相隔，在linux中是冒号，而在windows中是分号。

这下应该知道问题了，不过我们要怎么改呢？只能改源代码了，千万不要对改源代码抱有害怕的心里，如果认真想想，这种类型的代码，就算是一个刚学会java基础的人也会修改，并没有什么可怕的。当然，等会也会贴出改完后的完整代码，不想改的同学直接复制就行了。

我们复制这样的一个类，包括代码，包名都要一样，直接建立在我们的工程中，java会优先读取本工程中的类

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.mapred;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Vector;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.classification.InterfaceAudience.Private;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileContext;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.UnsupportedFileSystemException;
import org.apache.hadoop.io.DataOutputBuffer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.ipc.ProtocolSignature;
import org.apache.hadoop.mapreduce.Cluster.JobTrackerStatus;
import org.apache.hadoop.mapreduce.ClusterMetrics;
import org.apache.hadoop.mapreduce.Counters;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.JobID;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.mapreduce.MRJobConfig;
import org.apache.hadoop.mapreduce.QueueAclsInfo;
import org.apache.hadoop.mapreduce.QueueInfo;
import org.apache.hadoop.mapreduce.TaskAttemptID;
import org.apache.hadoop.mapreduce.TaskCompletionEvent;
import org.apache.hadoop.mapreduce.TaskReport;
import org.apache.hadoop.mapreduce.TaskTrackerInfo;
import org.apache.hadoop.mapreduce.TaskType;
import org.apache.hadoop.mapreduce.TypeConverter;
import org.apache.hadoop.mapreduce.protocol.ClientProtocol;
import org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.mapreduce.v2.LogParams;
import org.apache.hadoop.mapreduce.v2.api.MRClientProtocol;
import org.apache.hadoop.mapreduce.v2.api.protocolrecords.GetDelegationTokenRequest;
import org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils;
import org.apache.hadoop.mapreduce.v2.util.MRApps;
import org.apache.hadoop.security.Credentials;
import org.apache.hadoop.security.SecurityUtil;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.security.authorize.AccessControlList;
import org.apache.hadoop.security.token.Token;
import org.apache.hadoop.yarn.api.ApplicationConstants;
import org.apache.hadoop.yarn.api.ApplicationConstants.Environment;
import org.apache.hadoop.yarn.api.records.ApplicationAccessType;
import org.apache.hadoop.yarn.api.records.ApplicationId;
import org.apache.hadoop.yarn.api.records.ApplicationReport;
import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
import org.apache.hadoop.yarn.api.records.LocalResource;
import org.apache.hadoop.yarn.api.records.LocalResourceType;
import org.apache.hadoop.yarn.api.records.LocalResourceVisibility;
import org.apache.hadoop.yarn.api.records.ReservationId;
import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.api.records.URL;
import org.apache.hadoop.yarn.api.records.YarnApplicationState;
import org.apache.hadoop.yarn.conf.YarnConfiguration;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.apache.hadoop.yarn.factories.RecordFactory;
import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
import org.apache.hadoop.yarn.security.client.RMDelegationTokenSelector;
import org.apache.hadoop.yarn.util.ConverterUtils;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.CaseFormat;

/**
 * This class enables the current JobClient (0.22 hadoop) to run on YARN.
 */
@SuppressWarnings("unchecked")
public class YARNRunner implements ClientProtocol {

    private static final Log LOG = LogFactory.getLog(YARNRunner.class);

    private final RecordFactory recordFactory = RecordFactoryProvider.getRecordFactory(null);
    private ResourceMgrDelegate resMgrDelegate;
    private ClientCache clientCache;
    private Configuration conf;
    private final FileContext defaultFileContext;

    /**
     * Yarn runner incapsulates the client interface of yarn
     * 
     * @param conf
     *            the configuration object for the client
     */
    public YARNRunner(Configuration conf) {
        this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
    }

    /**
     * Similar to {@link #YARNRunner(Configuration)} but allowing injecting
     * {@link ResourceMgrDelegate}. Enables mocking and testing.
     * 
     * @param conf
     *            the configuration object for the client
     * @param resMgrDelegate
     *            the resourcemanager client handle.
     */
    public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate) {
        this(conf, resMgrDelegate, new ClientCache(conf, resMgrDelegate));
    }

    /**
     * Similar to
     * {@link YARNRunner#YARNRunner(Configuration, ResourceMgrDelegate)} but
     * allowing injecting {@link ClientCache}. Enable mocking and testing.
     * 
     * @param conf
     *            the configuration object
     * @param resMgrDelegate
     *            the resource manager delegate
     * @param clientCache
     *            the client cache object.
     */
    public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate, ClientCache clientCache) {
        this.conf = conf;
        try {
            this.resMgrDelegate = resMgrDelegate;
            this.clientCache = clientCache;
            this.defaultFileContext = FileContext.getFileContext(this.conf);
        } catch (UnsupportedFileSystemException ufe) {
            throw new RuntimeException("Error in instantiating YarnClient", ufe);
        }
    }

    @Private
    /**
     * Used for testing mostly.
     * @param resMgrDelegate the resource manager delegate to set to.
     */
    public void setResourceMgrDelegate(ResourceMgrDelegate resMgrDelegate) {
        this.resMgrDelegate = resMgrDelegate;
    }

    @Override
    public void cancelDelegationToken(Token arg0) throws IOException, InterruptedException {
        throw new UnsupportedOperationException("Use Token.renew instead");
    }

    @Override
    public TaskTrackerInfo[] getActiveTrackers() throws IOException, InterruptedException {
        return resMgrDelegate.getActiveTrackers();
    }

    @Override
    public JobStatus[] getAllJobs() throws IOException, InterruptedException {
        return resMgrDelegate.getAllJobs();
    }

    @Override
    public TaskTrackerInfo[] getBlacklistedTrackers() throws IOException, InterruptedException {
        return resMgrDelegate.getBlacklistedTrackers();
    }

    @Override
    public ClusterMetrics getClusterMetrics() throws IOException, InterruptedException {
        return resMgrDelegate.getClusterMetrics();
    }

    @VisibleForTesting
    void addHistoryToken(Credentials ts) throws IOException, InterruptedException {
        /* check if we have a hsproxy, if not, no need */
        MRClientProtocol hsProxy = clientCache.getInitializedHSProxy();
        if (UserGroupInformation.isSecurityEnabled() && (hsProxy != null)) {
            /*
             * note that get delegation token was called. Again this is hack for
             * oozie to make sure we add history server delegation tokens to the
             * credentials
             */
            RMDelegationTokenSelector tokenSelector = new RMDelegationTokenSelector();
            Text service = resMgrDelegate.getRMDelegationTokenService();
            if (tokenSelector.selectToken(service, ts.getAllTokens()) != null) {
                Text hsService = SecurityUtil.buildTokenService(hsProxy.getConnectAddress());
                if (ts.getToken(hsService) == null) {
                    ts.addToken(hsService, getDelegationTokenFromHS(hsProxy));
                }
            }
        }
    }

    @VisibleForTesting
    Token getDelegationTokenFromHS(MRClientProtocol hsProxy) throws IOException, InterruptedException {
        GetDelegationTokenRequest request = recordFactory.newRecordInstance(GetDelegationTokenRequest.class);
        request.setRenewer(Master.getMasterPrincipal(conf));
        org.apache.hadoop.yarn.api.records.Token mrDelegationToken;
        mrDelegationToken = hsProxy.getDelegationToken(request).getDelegationToken();
        return ConverterUtils.convertFromYarn(mrDelegationToken, hsProxy.getConnectAddress());
    }

    @Override
    public Token getDelegationToken(Text renewer) throws IOException, InterruptedException {
        // The token is only used for serialization. So the type information
        // mismatch should be fine.
        return resMgrDelegate.getDelegationToken(renewer);
    }

    @Override
    public String getFilesystemName() throws IOException, InterruptedException {
        return resMgrDelegate.getFilesystemName();
    }

    @Override
    public JobID getNewJobID() throws IOException, InterruptedException {
        return resMgrDelegate.getNewJobID();
    }

    @Override
    public QueueInfo getQueue(String queueName) throws IOException, InterruptedException {
        return resMgrDelegate.getQueue(queueName);
    }

    @Override
    public QueueAclsInfo[] getQueueAclsForCurrentUser() throws IOException, InterruptedException {
        return resMgrDelegate.getQueueAclsForCurrentUser();
    }

    @Override
    public QueueInfo[] getQueues() throws IOException, InterruptedException {
        return resMgrDelegate.getQueues();
    }

    @Override
    public QueueInfo[] getRootQueues() throws IOException, InterruptedException {
        return resMgrDelegate.getRootQueues();
    }

    @Override
    public QueueInfo[] getChildQueues(String parent) throws IOException, InterruptedException {
        return resMgrDelegate.getChildQueues(parent);
    }

    @Override
    public String getStagingAreaDir() throws IOException, InterruptedException {
        return resMgrDelegate.getStagingAreaDir();
    }

    @Override
    public String getSystemDir() throws IOException, InterruptedException {
        return resMgrDelegate.getSystemDir();
    }

    @Override
    public long getTaskTrackerExpiryInterval() throws IOException, InterruptedException {
        return resMgrDelegate.getTaskTrackerExpiryInterval();
    }

    @Override
    public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts) throws IOException, InterruptedException {

        addHistoryToken(ts);

        // Construct necessary information to start the MR AM
        ApplicationSubmissionContext appContext = createApplicationSubmissionContext(conf, jobSubmitDir, ts);

        // Submit to ResourceManager
        try {
            ApplicationId applicationId = resMgrDelegate.submitApplication(appContext);

            ApplicationReport appMaster = resMgrDelegate.getApplicationReport(applicationId);
            String diagnostics = (appMaster == null ? "application report is null" : appMaster.getDiagnostics());
            if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
                throw new IOException("Failed to run job : " + diagnostics);
            }
            return clientCache.getClient(jobId).getJobStatus(jobId);
        } catch (YarnException e) {
            throw new IOException(e);
        }
    }

    private LocalResource createApplicationResource(FileContext fs, Path p, LocalResourceType type) throws IOException {
        LocalResource rsrc = recordFactory.newRecordInstance(LocalResource.class);
        FileStatus rsrcStat = fs.getFileStatus(p);
        rsrc.setResource(ConverterUtils.getYarnUrlFromPath(fs.getDefaultFileSystem().resolvePath(rsrcStat.getPath())));
        rsrc.setSize(rsrcStat.getLen());
        rsrc.setTimestamp(rsrcStat.getModificationTime());
        rsrc.setType(type);
        rsrc.setVisibility(LocalResourceVisibility.APPLICATION);
        return rsrc;
    }

    public ApplicationSubmissionContext createApplicationSubmissionContext(Configuration jobConf, String jobSubmitDir, Credentials ts) throws IOException {
        ApplicationId applicationId = resMgrDelegate.getApplicationId();

        // Setup resource requirements
        Resource capability = recordFactory.newRecordInstance(Resource.class);
        capability.setMemory(conf.getInt(MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB));
        capability.setVirtualCores(conf.getInt(MRJobConfig.MR_AM_CPU_VCORES, MRJobConfig.DEFAULT_MR_AM_CPU_VCORES));
        LOG.debug("AppMaster capability = " + capability);

        // Setup LocalResources
        Map localResources = new HashMap();

        Path jobConfPath = new Path(jobSubmitDir, MRJobConfig.JOB_CONF_FILE);

        URL yarnUrlForJobSubmitDir = ConverterUtils.getYarnUrlFromPath(defaultFileContext.getDefaultFileSystem().resolvePath(defaultFileContext.makeQualified(new Path(jobSubmitDir))));
        LOG.debug("Creating setup context, jobSubmitDir url is " + yarnUrlForJobSubmitDir);

        localResources.put(MRJobConfig.JOB_CONF_FILE, createApplicationResource(defaultFileContext, jobConfPath, LocalResourceType.FILE));
        if (jobConf.get(MRJobConfig.JAR) != null) {
            Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR));
            LocalResource rc = createApplicationResource(FileContext.getFileContext(jobJarPath.toUri(), jobConf), jobJarPath, LocalResourceType.PATTERN);
            String pattern = conf.getPattern(JobContext.JAR_UNPACK_PATTERN, JobConf.UNPACK_JAR_PATTERN_DEFAULT).pattern();
            rc.setPattern(pattern);
            localResources.put(MRJobConfig.JOB_JAR, rc);
        } else {
            // Job jar may be null. For e.g, for pipes, the job jar is the
            // hadoop
            // mapreduce jar itself which is already on the classpath.
            LOG.info("Job jar is not present. " + "Not adding any jar to the list of resources.");
        }

        // TODO gross hack
        for (String s : new String[] { MRJobConfig.JOB_SPLIT, MRJobConfig.JOB_SPLIT_METAINFO }) {
            localResources.put(MRJobConfig.JOB_SUBMIT_DIR + "/" + s, createApplicationResource(defaultFileContext, new Path(jobSubmitDir, s), LocalResourceType.FILE));
        }

        // Setup security tokens
        DataOutputBuffer dob = new DataOutputBuffer();
        ts.writeTokenStorageToStream(dob);
        ByteBuffer securityTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());

        // Setup the command to run the AM
        List vargs = new ArrayList(8);
        // vargs.add(MRApps.crossPlatformifyMREnv(jobConf,
        // Environment.JAVA_HOME)
        // + "/bin/java");
        // TODO   此处为修改处
        System.out.println(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java");
        vargs.add("$JAVA_HOME/bin/java");

        // TODO: why do we use 'conf' some places and 'jobConf' others?
        long logSize = jobConf.getLong(MRJobConfig.MR_AM_LOG_KB, MRJobConfig.DEFAULT_MR_AM_LOG_KB) << 10;
        String logLevel = jobConf.get(MRJobConfig.MR_AM_LOG_LEVEL, MRJobConfig.DEFAULT_MR_AM_LOG_LEVEL);
        int numBackups = jobConf.getInt(MRJobConfig.MR_AM_LOG_BACKUPS, MRJobConfig.DEFAULT_MR_AM_LOG_BACKUPS);
        MRApps.addLog4jSystemProperties(logLevel, logSize, numBackups, vargs, conf);

        // Check for Java Lib Path usage in MAP and REDUCE configs
        warnForJavaLibPath(conf.get(MRJobConfig.MAP_JAVA_OPTS, ""), "map", MRJobConfig.MAP_JAVA_OPTS, MRJobConfig.MAP_ENV);
        warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, ""), "map", MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV);
        warnForJavaLibPath(conf.get(MRJobConfig.REDUCE_JAVA_OPTS, ""), "reduce", MRJobConfig.REDUCE_JAVA_OPTS, MRJobConfig.REDUCE_ENV);
        warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, ""), "reduce", MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV);

        // Add AM admin command opts before user command opts
        // so that it can be overridden by user
        String mrAppMasterAdminOptions = conf.get(MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS);
        warnForJavaLibPath(mrAppMasterAdminOptions, "app master", MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.MR_AM_ADMIN_USER_ENV);
        vargs.add(mrAppMasterAdminOptions);

        // Add AM user command opts
        String mrAppMasterUserOptions = conf.get(MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS);
        warnForJavaLibPath(mrAppMasterUserOptions, "app master", MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.MR_AM_ENV);
        vargs.add(mrAppMasterUserOptions);

        if (jobConf.getBoolean(MRJobConfig.MR_AM_PROFILE, MRJobConfig.DEFAULT_MR_AM_PROFILE)) {
            final String profileParams = jobConf.get(MRJobConfig.MR_AM_PROFILE_PARAMS, MRJobConfig.DEFAULT_TASK_PROFILE_PARAMS);
            if (profileParams != null) {
                vargs.add(String.format(profileParams, ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + TaskLog.LogName.PROFILE));
            }
        }

        vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);
        vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDOUT);
        vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDERR);

        Vector vargsFinal = new Vector(8);
        // Final command
        StringBuilder mergedCommand = new StringBuilder();
        for (CharSequence str : vargs) {
            mergedCommand.append(str).append(" ");
        }
        vargsFinal.add(mergedCommand.toString());

        LOG.debug("Command to launch container for ApplicationMaster is : " + mergedCommand);

        // Setup the CLASSPATH in environment
        // i.e. add { Hadoop jars, job jar, CWD } to classpath.
        Map environment = new HashMap();
        MRApps.setClasspath(environment, conf);

        // Shell
        environment.put(Environment.SHELL.name(), conf.get(MRJobConfig.MAPRED_ADMIN_USER_SHELL, MRJobConfig.DEFAULT_SHELL));

        // Add the container working directory at the front of LD_LIBRARY_PATH
        MRApps.addToEnvironment(environment, Environment.LD_LIBRARY_PATH.name(), MRApps.crossPlatformifyMREnv(conf, Environment.PWD), conf);

        // Setup the environment variables for Admin first
        MRApps.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV), conf);
        // Setup the environment variables (LD_LIBRARY_PATH, etc)
        MRApps.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV), conf);

        // Parse distributed cache
        MRApps.setupDistributedCache(jobConf, localResources);

        Map acls = new HashMap(2);
        acls.put(ApplicationAccessType.VIEW_APP, jobConf.get(MRJobConfig.JOB_ACL_VIEW_JOB, MRJobConfig.DEFAULT_JOB_ACL_VIEW_JOB));
        acls.put(ApplicationAccessType.MODIFY_APP, jobConf.get(MRJobConfig.JOB_ACL_MODIFY_JOB, MRJobConfig.DEFAULT_JOB_ACL_MODIFY_JOB));

        // TODO BY DHT
        for (String key : environment.keySet()) {
            String org = environment.get(key);
            String linux = getLinux(org);
            environment.put(key, linux);
        }
        // Setup ContainerLaunchContext for AM container
        ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(localResources, environment, vargsFinal, null, securityTokens, acls);

        Collection tagsFromConf = jobConf.getTrimmedStringCollection(MRJobConfig.JOB_TAGS);

        // Set up the ApplicationSubmissionContext
        ApplicationSubmissionContext appContext = recordFactory.newRecordInstance(ApplicationSubmissionContext.class);
        appContext.setApplicationId(applicationId); // ApplicationId
        appContext.setQueue( // Queue name
                jobConf.get(JobContext.QUEUE_NAME, YarnConfiguration.DEFAULT_QUEUE_NAME));
        // add reservationID if present
        ReservationId reservationID = null;
        try {
            reservationID = ReservationId.parseReservationId(jobConf.get(JobContext.RESERVATION_ID));
        } catch (NumberFormatException e) {
            // throw exception as reservationid as is invalid
            String errMsg = "Invalid reservationId: " + jobConf.get(JobContext.RESERVATION_ID) + " specified for the app: " + applicationId;
            LOG.warn(errMsg);
            throw new IOException(errMsg);
        }
        if (reservationID != null) {
            appContext.setReservationID(reservationID);
            LOG.info("SUBMITTING ApplicationSubmissionContext app:" + applicationId + " to queue:" + appContext.getQueue() + " with reservationId:" + appContext.getReservationID());
        }
        appContext.setApplicationName( // Job name
                jobConf.get(JobContext.JOB_NAME, YarnConfiguration.DEFAULT_APPLICATION_NAME));
        appContext.setCancelTokensWhenComplete(conf.getBoolean(MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN, true));
        appContext.setAMContainerSpec(amContainer); // AM Container
        appContext.setMaxAppAttempts(conf.getInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, MRJobConfig.DEFAULT_MR_AM_MAX_ATTEMPTS));
        appContext.setResource(capability);
        appContext.setApplicationType(MRJobConfig.MR_APPLICATION_TYPE);
        if (tagsFromConf != null && !tagsFromConf.isEmpty()) {
            appContext.setApplicationTags(new HashSet(tagsFromConf));
        }

        return appContext;
    }

    /**
     * 此处为修改处
     * @param org
     * @return
     */
    private String getLinux(String org) {
        StringBuilder sb = new StringBuilder();
        int c = 0;
        for (int i = 0; i < org.length(); i++) {
            if (org.charAt(i) == '%') {
                c++;
                if (c % 2 == 1) {
                    sb.append("$");
                }
            } else {
                switch (org.charAt(i)) {
                case ';':
                    sb.append(":");
                    break;

                case '\\':
                    sb.append("/");
                    break;
                default:
                    sb.append(org.charAt(i));
                    break;
                }
            }
        }
        return (sb.toString());
    }

    @Override
    public void setJobPriority(JobID arg0, String arg1) throws IOException, InterruptedException {
        resMgrDelegate.setJobPriority(arg0, arg1);
    }

    @Override
    public long getProtocolVersion(String arg0, long arg1) throws IOException {
        return resMgrDelegate.getProtocolVersion(arg0, arg1);
    }

    @Override
    public long renewDelegationToken(Token arg0) throws IOException, InterruptedException {
        throw new UnsupportedOperationException("Use Token.renew instead");
    }

    @Override
    public Counters getJobCounters(JobID arg0) throws IOException, InterruptedException {
        return clientCache.getClient(arg0).getJobCounters(arg0);
    }

    @Override
    public String getJobHistoryDir() throws IOException, InterruptedException {
        return JobHistoryUtils.getConfiguredHistoryServerDoneDirPrefix(conf);
    }

    @Override
    public JobStatus getJobStatus(JobID jobID) throws IOException, InterruptedException {
        JobStatus status = clientCache.getClient(jobID).getJobStatus(jobID);
        return status;
    }

    @Override
    public TaskCompletionEvent[] getTaskCompletionEvents(JobID arg0, int arg1, int arg2) throws IOException, InterruptedException {
        return clientCache.getClient(arg0).getTaskCompletionEvents(arg0, arg1, arg2);
    }

    @Override
    public String[] getTaskDiagnostics(TaskAttemptID arg0) throws IOException, InterruptedException {
        return clientCache.getClient(arg0.getJobID()).getTaskDiagnostics(arg0);
    }

    @Override
    public TaskReport[] getTaskReports(JobID jobID, TaskType taskType) throws IOException, InterruptedException {
        return clientCache.getClient(jobID).getTaskReports(jobID, taskType);
    }

    private void killUnFinishedApplication(ApplicationId appId) throws IOException {
        ApplicationReport application = null;
        try {
            application = resMgrDelegate.getApplicationReport(appId);
        } catch (YarnException e) {
            throw new IOException(e);
        }
        if (application.getYarnApplicationState() == YarnApplicationState.FINISHED || application.getYarnApplicationState() == YarnApplicationState.FAILED || application.getYarnApplicationState() == YarnApplicationState.KILLED) {
            return;
        }
        killApplication(appId);
    }

    private void killApplication(ApplicationId appId) throws IOException {
        try {
            resMgrDelegate.killApplication(appId);
        } catch (YarnException e) {
            throw new IOException(e);
        }
    }

    private boolean isJobInTerminalState(JobStatus status) {
        return status.getState() == JobStatus.State.KILLED || status.getState() == JobStatus.State.FAILED || status.getState() == JobStatus.State.SUCCEEDED;
    }

    @Override
    public void killJob(JobID arg0) throws IOException, InterruptedException {
        /* check if the status is not running, if not send kill to RM */
        JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);
        ApplicationId appId = TypeConverter.toYarn(arg0).getAppId();

        // get status from RM and return
        if (status == null) {
            killUnFinishedApplication(appId);
            return;
        }

        if (status.getState() != JobStatus.State.RUNNING) {
            killApplication(appId);
            return;
        }

        try {
            /* send a kill to the AM */
            clientCache.getClient(arg0).killJob(arg0);
            long currentTimeMillis = System.currentTimeMillis();
            long timeKillIssued = currentTimeMillis;
            while ((currentTimeMillis < timeKillIssued + 10000L) && !isJobInTerminalState(status)) {
                try {
                    Thread.sleep(1000L);
                } catch (InterruptedException ie) {
                    /** interrupted, just break */
                    break;
                }
                currentTimeMillis = System.currentTimeMillis();
                status = clientCache.getClient(arg0).getJobStatus(arg0);
                if (status == null) {
                    killUnFinishedApplication(appId);
                    return;
                }
            }
        } catch (IOException io) {
            LOG.debug("Error when checking for application status", io);
        }
        if (status != null && !isJobInTerminalState(status)) {
            killApplication(appId);
        }
    }

    @Override
    public boolean killTask(TaskAttemptID arg0, boolean arg1) throws IOException, InterruptedException {
        return clientCache.getClient(arg0.getJobID()).killTask(arg0, arg1);
    }

    @Override
    public AccessControlList getQueueAdmins(String arg0) throws IOException {
        return new AccessControlList("*");
    }

    @Override
    public JobTrackerStatus getJobTrackerStatus() throws IOException, InterruptedException {
        return JobTrackerStatus.RUNNING;
    }

    @Override
    public ProtocolSignature getProtocolSignature(String protocol, long clientVersion, int clientMethodsHash) throws IOException {
        return ProtocolSignature.getProtocolSignature(this, protocol, clientVersion, clientMethodsHash);
    }

    @Override
    public LogParams getLogFileParams(JobID jobID, TaskAttemptID taskAttemptID) throws IOException {
        return clientCache.getClient(jobID).getLogFilePath(jobID, taskAttemptID);
    }

    private static void warnForJavaLibPath(String opts, String component, String javaConf, String envConf) {
        if (opts != null && opts.contains("-Djava.library.path")) {
            LOG.warn("Usage of -Djava.library.path in " + javaConf + " can cause " + "programs to no longer function if hadoop native libraries " + "are used. These values should be set as part of the " + "LD_LIBRARY_PATH in the " + component + " JVM env using " + envConf
                    + " config settings.");
        }
    }
}

代码就是这样子，重新运行main方法，就会发现，已经是运行成功了，第一次这样运行会有点慢，也不会太慢，第二次就正常了。　　

最后补充一些东西，其实conf的几行参数，也可以不写

        conf.set("mapreduce.framework.name","yarn");
        conf.set("yarn.resourcemanager.hostname","server1");//这行配置，使得该main方法会寻找该机器的mr环境
        conf.set("fs.defaultFS","hdfs://server1:9000/");

也就是这几行参数，其实是可以注释掉的。注释掉后会去读取配置文件，我们从服务器中把下面的几个配置文件下载下来

这里面的配置，是服务器中已经配置好的配置，再把它放到src/main/resource中，打包的时候，就会加载到classpath中，

如图，配置文件中也有着这些配置，所以如果不写conf参数，把配置文件放进去，也是可以的

三:mapreduce实现join

点我查看源码

3.1:sql数据库中的示例

先列举说明一下，以关系弄数据库来说明，假定我们现在有这样两个表，订单表和产品表。

订单表

订单Id,时间,产品编号,出售数量
1001,20170822,p1,3
1002,20170823,p2,9
1003,20170824,p3,11

产品表

#产品编号,产品名称,种类,单价
p1,防空火箭,1,20.2
p2,迫击炮,1,50
p3,法师塔,2,100

如果是用关系形数据库的SQL来表达，将会是如下的SQL

select * from 订单表 a left join 产品表 b on a.产品编号=b.产品编号

3.2:mapreduce的实现思路

首先找到链接的字符串，就是产品编号，可以看到，无论是订单表，还是产品表，都有个订单编号，sql中是根据这个关联，我们在mapreduce中也需要根据它来关联。

实现思路就是把产品编号，作为key当成reduce的输入。

这个时候，reduce中，全部是同一个产品的数据，其中有多个订单表的数据，这些订单是对应着同一个产品，也会有一条产品的表数据，然后把这些数据综合起来就行。

3.3:创建相应的javabean

以上是在sql数据库中的写法，假定我们有多个文件存在于hdfs中，我们要关联其中的数据，而数据格式就是这样的一个格式，我们要怎么处理呢？它就是mapreduce的一个join写法，我们这次使用本地模式运行。

首先在创建D:\mr\join\input目录，创建两个文件，分别为order_01.txt和product_01.txt里面分别把上面的订单数据和产品数据存放进去。

然后我们定义一个javabean，来存放这些信息，并且让其实现hadoop的序列化

    /**
     * 这个类的信息，包含了两个表的信息记录
     */
    static class Info implements Writable,Cloneable{
        /**
         * 订单号
         */
        private int orderId;
        /**
         * 时间
         */
        private String dateString;
        /**
         * 产品编号
         */
        private String pid;
        /**
         * 数量
         */
        private int amount;
        /**
         * 产品名称
         */
        private String pname;
        /**
         * 种类
         */
        private int categoryId;
        /**
         * 价格
         */
        private float price;
        /**
         * 这个字段需要理解

         * 因为这个对象，包含了订单与产品的两个文件的内容，当我们加载一个文件的时候，肯定只能加载一部分的信息，另一部分是加载不到的，需要在join的时候，加进去，这个字段就代表着这个对象存的是哪些信息
         * 如果为0  则是存了订单信息
         * 如果为1 则是存了产品信息
         */
        private String flag;

        @Override
        protected Object clone() throws CloneNotSupportedException {
            return super.clone();
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeInt(orderId);
            out.writeUTF(dateString);
            out.writeUTF(pid);
            out.writeInt(amount);
            out.writeUTF(pname);
            out.writeInt(categoryId);
            out.writeFloat(price);
            out.writeUTF(flag);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            orderId = in.readInt();
            dateString = in.readUTF();
            pid = in.readUTF();
            amount = in.readInt();
            pname = in.readUTF();
            categoryId = in.readInt();
            price = in.readFloat();
            flag = in.readUTF();
        }

        public Info() {
        }

        public void set(int orderId, String dateString, String pid, int amount, String pname, int categoryId, float price,String flag) {
            this.orderId = orderId;
            this.dateString = dateString;
            this.pid = pid;
            this.amount = amount;
            this.pname = pname;
            this.categoryId = categoryId;
            this.price = price;
            this.flag = flag;
        }

        public int getOrderId() {
            return orderId;
        }

        public void setOrderId(int orderId) {
            this.orderId = orderId;
        }

        public String getDateString() {
            return dateString;
        }

        public String getFlag() {
            return flag;
        }

        public void setFlag(String flag) {
            this.flag = flag;
        }

        public void setDateString(String dateString) {
            this.dateString = dateString;
        }

        public String getPid() {
            return pid;
        }

        public void setPid(String pid) {
            this.pid = pid;
        }

        public int getAmount() {
            return amount;
        }

        public void setAmount(int amount) {
            this.amount = amount;
        }

        public String getPname() {
            return pname;
        }

        public void setPname(String pname) {
            this.pname = pname;
        }

        public int getCategoryId() {
            return categoryId;
        }

        public void setCategoryId(int categoryId) {
            this.categoryId = categoryId;
        }

        public float getPrice() {
            return price;
        }

        public void setPrice(float price) {
            this.price = price;
        }

        @Override
        public String toString() {
            final StringBuilder sb = new StringBuilder("{");
            sb.append("\"orderId\":")
                    .append(orderId);
            sb.append(",\"dateString\":\"")
                    .append(dateString).append('\"');
            sb.append(",\"pid\":")
                    .append(pid);
            sb.append(",\"amount\":")
                    .append(amount);
            sb.append(",\"pname\":\"")
                    .append(pname).append('\"');
            sb.append(",\"categoryId\":")
                    .append(categoryId);
            sb.append(",\"price\":")
                    .append(price);
            sb.append(",\"flag\":\"")
                    .append(flag).append('\"');
            sb.append('}');
            return sb.toString();
        }
    }

3.4:创建mapper

mapper的代码可以直接看注释

    static class JoinMapper extends Mapper{
        private Info info = new Info();
        private Text text = new Text();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            if(line.startsWith("#")){//跳转带#的注释
                return;
            }
            //获取当前任务的输入切片，这个InputSplit是一个最上层抽象类，可以转换成FileSplit
            InputSplit inputSplit = context.getInputSplit();
            FileSplit fileSplit = (FileSplit) inputSplit;
            String name = fileSplit.getPath().getName();//得到的是文件名，这里根据文件名来判断是哪一种类型的数据
            //我们这里通过文件名判断是哪种数据
            String pid = "";
            String[] split = line.split(",");
            if(name.startsWith("order")){//加载订单内容，订单数据里面有 订单号，时间，产品ID，数量
                //orderId,date,pid,amount
                pid = split[2];
                info.set(Integer.parseInt(split[0]),split[1],pid,Integer.parseInt(split[3]),"",0,0,"0");

            }else{//加载产品内容，产品数据有 产品编号，产品名称，种类，价格
                //pid,pname,categoryId,price
                pid = split[0];
                info.set(0,"",pid,0,split[1],Integer.parseInt(split[2]),Float.parseFloat(split[3]),"1");
            }
            text.set(pid);
            context.write(text,info);
        }
    }

3.5:创建reduce

直接看注释即可

    static class JoinReduce extends Reducer{

        @Override
        protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
            Info product = new Info();//这个对象用来存放产品的数据，一个产品所以只有一个对象
            List infos = new ArrayList<>();//这个list用来存放所有的订单数据，订单肯定是有多个的
            for(Info info : values){
                if("1".equals(info.getFlag())){
                    //产品表的数据
                    try {
                        product = (Info) info.clone();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }else{//代表着是订单表的数据
                    Info order = new Info();
                    try {
                        order = (Info) info.clone();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                    infos.add(order);
                }
            }
            //经过上面的操作，就把订单与产品完全分离出来了，订单在list集合中，产品在单独的一个对象中
            //然后可以分别综合设置进去
            for(Info tmp : infos){
                tmp.setPname(product.getPname());
                tmp.setCategoryId(product.getCategoryId());
                tmp.setPrice(product.getPrice());
                //最后进行输出，就会得到结果文件                
                context.write(tmp,NullWritable.get());
            }
        }
    }

3.6:完整代码

上面贴了map与reduce，就差启动的main方法了，不过main方法是普通的main方法，和上一篇文中的启动方法一样，这里直接把join的所有代码全部贴了出来，包含main方法，全部写在一个文件里面

package com.zxj.hadoop.demo.mapreduce.join;

import org.apache.commons.beanutils.BeanUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * @Author 朱小杰
 * 时间 2017-08-22 .22:10
 * 说明 ...
 */
public class MRJoin {
    /**
     * 这个类的信息，包含了两个表的信息记录
     */
    static class Info implements Writable,Cloneable{
        /**
         * 订单号
         */
        private int orderId;
        /**
         * 时间
         */
        private String dateString;
        /**
         * 产品编号
         */
        private String pid;
        /**
         * 数量
         */
        private int amount;
        /**
         * 产品名称
         */
        private String pname;
        /**
         * 种类
         */
        private int categoryId;
        /**
         * 价格
         */
        private float price;
        /**
         * 这个字段需要理解

         * 因为这个对象，包含了订单与产品的两个文件的内容，当我们加载一个文件的时候，肯定只能加载一部分的信息，另一部分是加载不到的，需要在join的时候，加进去，这个字段就代表着这个对象存的是哪些信息
         * 如果为0  则是存了订单信息
         * 如果为1 则是存了产品信息
         */
        private String flag;

        @Override
        protected Object clone() throws CloneNotSupportedException {
            return super.clone();
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeInt(orderId);
            out.writeUTF(dateString);
            out.writeUTF(pid);
            out.writeInt(amount);
            out.writeUTF(pname);
            out.writeInt(categoryId);
            out.writeFloat(price);
            out.writeUTF(flag);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            orderId = in.readInt();
            dateString = in.readUTF();
            pid = in.readUTF();
            amount = in.readInt();
            pname = in.readUTF();
            categoryId = in.readInt();
            price = in.readFloat();
            flag = in.readUTF();
        }

        public Info() {
        }

        public void set(int orderId, String dateString, String pid, int amount, String pname, int categoryId, float price,String flag) {
            this.orderId = orderId;
            this.dateString = dateString;
            this.pid = pid;
            this.amount = amount;
            this.pname = pname;
            this.categoryId = categoryId;
            this.price = price;
            this.flag = flag;
        }

        public int getOrderId() {
            return orderId;
        }

        public void setOrderId(int orderId) {
            this.orderId = orderId;
        }

        public String getDateString() {
            return dateString;
        }

        public String getFlag() {
            return flag;
        }

        public void setFlag(String flag) {
            this.flag = flag;
        }

        public void setDateString(String dateString) {
            this.dateString = dateString;
        }

        public String getPid() {
            return pid;
        }

        public void setPid(String pid) {
            this.pid = pid;
        }

        public int getAmount() {
            return amount;
        }

        public void setAmount(int amount) {
            this.amount = amount;
        }

        public String getPname() {
            return pname;
        }

        public void setPname(String pname) {
            this.pname = pname;
        }

        public int getCategoryId() {
            return categoryId;
        }

        public void setCategoryId(int categoryId) {
            this.categoryId = categoryId;
        }

        public float getPrice() {
            return price;
        }

        public void setPrice(float price) {
            this.price = price;
        }

        @Override
        public String toString() {
            final StringBuilder sb = new StringBuilder("{");
            sb.append("\"orderId\":")
                    .append(orderId);
            sb.append(",\"dateString\":\"")
                    .append(dateString).append('\"');
            sb.append(",\"pid\":")
                    .append(pid);
            sb.append(",\"amount\":")
                    .append(amount);
            sb.append(",\"pname\":\"")
                    .append(pname).append('\"');
            sb.append(",\"categoryId\":")
                    .append(categoryId);
            sb.append(",\"price\":")
                    .append(price);
            sb.append(",\"flag\":\"")
                    .append(flag).append('\"');
            sb.append('}');
            return sb.toString();
        }
    }

    static class JoinMapper extends Mapper{
        private Info info = new Info();
        private Text text = new Text();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            if(line.startsWith("#")){//跳转带#的注释
                return;
            }
            //获取当前任务的输入切片，这个InputSplit是一个最上层抽象类，可以转换成FileSplit
            InputSplit inputSplit = context.getInputSplit();
            FileSplit fileSplit = (FileSplit) inputSplit;
            String name = fileSplit.getPath().getName();//得到的是文件名，这里根据文件名来判断是哪一种类型的数据
            //我们这里通过文件名判断是哪种数据
            String pid = "";
            String[] split = line.split(",");
            if(name.startsWith("order")){//加载订单内容，订单数据里面有 订单号，时间，产品ID，数量
                //orderId,date,pid,amount
                pid = split[2];
                info.set(Integer.parseInt(split[0]),split[1],pid,Integer.parseInt(split[3]),"",0,0,"0");

            }else{//加载产品内容，产品数据有 产品编号，产品名称，种类，价格
                //pid,pname,categoryId,price
                pid = split[0];
                info.set(0,"",pid,0,split[1],Integer.parseInt(split[2]),Float.parseFloat(split[3]),"1");
            }
            text.set(pid);
            context.write(text,info);
        }
    }

    static class JoinReduce extends Reducer{

        @Override
        protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
            Info product = new Info();//这个对象用来存放产品的数据，一个产品所以只有一个对象
            List infos = new ArrayList<>();//这个list用来存放所有的订单数据，订单肯定是有多个的
            for(Info info : values){
                if("1".equals(info.getFlag())){
                    //产品表的数据
                    try {
                        product = (Info) info.clone();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }else{//代表着是订单表的数据
                    Info order = new Info();
                    try {
                        order = (Info) info.clone();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                    infos.add(order);
                }
            }
            //经过上面的操作，就把订单与产品完全分离出来了，订单在list集合中，产品在单独的一个对象中
            //然后可以分别综合设置进去
            for(Info tmp : infos){
                tmp.setPname(product.getPname());
                tmp.setCategoryId(product.getCategoryId());
                tmp.setPrice(product.getPrice());
                //最后进行输出，就会得到结果文件
                context.write(tmp,NullWritable.get());
            }
        }
    }


    static class JoinMain{
        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf);

            job.setJarByClass(JoinMain.class);

            job.setMapperClass(JoinMapper.class);
            job.setReducerClass(JoinReduce.class);

            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Info.class);

            job.setOutputKeyClass(Info.class);
            job.setOutputValueClass(NullWritable.class);

            FileInputFormat.setInputPaths(job,new Path(args[0]));
            FileOutputFormat.setOutputPath(job,new Path(args[1]));

            boolean b = job.waitForCompletion(true);
            if(b){
                System.out.println("OK");
            }

        }
    }



}

最后配置启动参数，以本地开发模式运行

运行成功后，得到如下结果

这就完成了

3.7:数据倾斜的问题

上面我们虽然解决了join的问题，但是也会陷入另一个问题，那就是数据倾斜。

假如果说a产品有10万张订单，b产品只有10个订单，那么就会导致每个reduce分配的数据不一致，个别速度很快，个别速度很慢，达不到快速的效果，性能低下。

解决这个问题，就是在map端实现数据的合并，在每个map中，单独加载产品表的信息，因为产品表的数据，肯定相对小一些，然后在map中实现数据的合并。

四:查找共同好友，计算可能认识的人

点我下载源码

假定我们现在有一个社交软件，它的好友是单向好友，我们现在要计算用户之间的共同好友，然后向它推荐可能认识的人。

它需要经过两次mapreducer

4.1:准备数据

A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
E:B,C,D,M,L
F:A,B,C,D,E,O,M
G:A,C,D,E,F
H:A,C,D,E,O
I:A,O
J:B,O
K:A,C,D
L:D,E,F
M:E,F,G
O:A,H,I,J

如上，冒号前面的是用户，冒号后面的是好友列表。

然后保存为文件，作为第一次mapreduce的输入

4.2:计算指定用户是哪些人的好友

package com.zxj.hadoop.demo.mapreduce.findfriend;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-08-24 .22:59
 * 说明 先算出某个用户是哪些人的好友
 */
public class Friend1 {


    static class FriendMapper1 extends Mapper {
        private Text k = new Text();
        private Text v = new Text();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] personFriends = line.split(":");
            String person = personFriends[0];//用户
            String friends = personFriends[1];//好友
            for (String friend : friends.split(",")) {
                //输出<好友，人>
                k.set(friend);
                v.set(person);
                context.write(k,v);
            }
        }
    }

    /**
     * 输入 好友，用户
     */
    static class FriendReduce1 extends Reducer{
        private Text k = new Text();
        private Text v = new Text();
        @Override
        protected void reduce(Text friend, Iterable persons, Context context) throws IOException, InterruptedException {
            StringBuffer sb = new StringBuffer();
            for(Text person : persons){
                sb.append(person).append(",");
            }
            k.set(friend);
            v.set(sb.toString());
            context.write(k,v);
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        String input = "D:\\mr\\qq\\input";
        String output = "D:\\mr\\qq\\out1";
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(Friend1.class);

        job.setMapperClass(FriendMapper1.class);
        job.setReducerClass(FriendReduce1.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job,new Path(input));
        FileOutputFormat.setOutputPath(job,new Path(output));

        boolean b = job.waitForCompletion(true);
        if(b){}

    }
}

这里计算后的结果就是，某个用户分别是哪些人的好友，得到结果如下

4.3:计算共同好友

package com.zxj.hadoop.demo.mapreduce.findfriend;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.Arrays;

/**
 * @Author 朱小杰
 * 时间 2017-08-24 .22:59
 * 说明 继续第第二步操作
 */
public class Friend2 {


    static class FriendMapper2 extends Mapper {
        /**
         * 这里拿到的是上一次计算的数据  A    I,K,C,B,G,F,H,O,D,
         * A是哪些用户的好友
         * @param key
         * @param value
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] split = line.split("\t");
            String friend = split[0];
            String[] persions = split[1].split(",");
            Arrays.sort(persions);

            for(int i = 0 ; i < persions.length -2 ; i ++){
                for(int j = i+1 ; j < persions.length -1 ; j ++){
                    //发送出 人-人  好友的数据，就是这两个人有哪个共同好友，会进入到同一个reducer中
                    context.write(new Text(persions[i] + "-" + persions[j]),new Text(friend));
                }
            }
        }
    }

    /**
     * 输入 好友，用户
     */
    static class FriendReduce2 extends Reducer{
        private Text k = new Text();
        private Text v = new Text();
        @Override
        protected void reduce(Text person_person, Iterable friends, Context context) throws IOException, InterruptedException {
            StringBuffer sb = new StringBuffer();
            for(Text f : friends){
                sb.append(f.toString()).append(" ");
            }
            context.write(person_person,new Text(sb.toString()));
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        String input = "D:\\mr\\qq\\out1";
        String output = "D:\\mr\\qq\\out2";
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(Friend2.class);

        job.setMapperClass(FriendMapper2.class);
        job.setReducerClass(FriendReduce2.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job,new Path(input));
        FileOutputFormat.setOutputPath(job,new Path(output));

        boolean b = job.waitForCompletion(true);
        if(b){}

    }
}

经过这次计算，就能得到共同的好友了，因为是共同好友，所以他们也是有可能认识的人。

五:使用GroupingComparator分组计算最大值

点我下载源码

我们准备一些订单数据

1号订单,200
1号订单,300
2号订单,1000
2号订单,300
2号订单,900
3号订单,9000
3号订单,200
3号订单,1000

这是每一号订单，分别售出多少钱，这里要求计算出每一号订单中的最大金额。

5.1:定义一个javabean

定义一个bean,并且实现序列化与排序比较接口

package com.zxj.hadoop.demo.mapreduce.groupingcomporator;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;

/**
 *
 *
 */
public class OrderBean implements WritableComparable{

    private Text itemid;
    private DoubleWritable amount;

    public OrderBean() {
    }

    public OrderBean(Text itemid, DoubleWritable amount) {
        set(itemid, amount);

    }

    public void set(Text itemid, DoubleWritable amount) {

        this.itemid = itemid;
        this.amount = amount;

    }



    public Text getItemid() {
        return itemid;
    }

    public DoubleWritable getAmount() {
        return amount;
    }



    @Override
    public int compareTo(OrderBean o) {
        int cmp = this.itemid.compareTo(o.getItemid());
        if (cmp == 0) {
            cmp = -this.amount.compareTo(o.getAmount());
        }
        return cmp;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(itemid.toString());
        out.writeDouble(amount.get());
        
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        String readUTF = in.readUTF();
        double readDouble = in.readDouble();
        
        this.itemid = new Text(readUTF);
        this.amount= new DoubleWritable(readDouble);
    }


    @Override
    public String toString() {

        return itemid.toString() + "\t" + amount.get();
        
    }

}

5.2:定义一个GroupingComparator

我们都知道，reducer中，是把同一个key，以其所有的value放到了同一个reudce中计算，如果我们要把一个有着多属性的javabean当作key，那么同一个订单的bean就无法进入到同一个reduce中，我们需要通过这个分组，让所有同一个订单的bean全部进到同一个reduce中。

package com.zxj.hadoop.demo.mapreduce.groupingcomporator;

import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

/**
 * @Author 朱小杰
 * 时间 2017-08-26 .17:31
 * 说明 利用reduce端的GroupingComparator来实现将一组bean看成相同的key
 * 用来分组
 * @author
 */
public class ItemidGroupingComparator extends WritableComparator {

    /**
     * 这个类必须写，因为mapreduce需要知道反射成为哪个类
     */
    protected ItemidGroupingComparator() {
        super(OrderBean.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        OrderBean b1 = (OrderBean) a;
        OrderBean b2 = (OrderBean) b;
        //比较两个bean时，只比较这里面的一个字段，如果这里是相等的，那么mapreduce就会认为这两个对象是同一个key
        return b1.getItemid().compareTo(b2.getItemid());
    }
}

我们也知道,mapredce是根据key来进行排序的，所以我们可以想象，在把同一个订单的所有的bean当作一个key时，一个订单，只会有一个数据进入到reduce中，而因为我们实现的排序接口，数据最大的会最先进入到reduce中。

5.3:map代码

map的代码很简单

    static class SecondarySortMapper extends Mapper{
        
        OrderBean bean = new OrderBean();
        
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String line = value.toString();
            String[] fields = StringUtils.split(line, ",");
            
            bean.set(new Text(fields[0]), new DoubleWritable(Double.parseDouble(fields[1])));
            
            context.write(bean, NullWritable.get());
            
        }
        
    }

这里很直接的把一个bean和一个null输出

5.4:reduce的代码

    static class SecondarySortReducer extends Reducer{
        
        
        //到达reduce时，相同id的所有bean已经被看成一组，且金额最大的那个一排在第一位，所以后面的key也就不存在了
        @Override
        protected void reduce(OrderBean key, Iterable values, Context context) throws IOException, InterruptedException {
            context.write(key, NullWritable.get());
        }
    }

因为前面有解释到，一个订单，只会有一个bean进来，并且进来的这个bean，肯定是最大值的一个金额，所以我们直接输出就行了

5.5:启动类

启动类和以往有点不同

    public static void main(String[] args) throws Exception {
        
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        
        job.setJarByClass(SecondarySort.class);
        
        job.setMapperClass(SecondarySortMapper.class);
        job.setReducerClass(SecondarySortReducer.class);
        
        
        job.setOutputKeyClass(OrderBean.class);
        job.setOutputValueClass(NullWritable.class);
        
        FileInputFormat.setInputPaths(job, new Path("D:\\mr\\groupcompatrator\\input"));
        FileOutputFormat.setOutputPath(job, new Path("D:\\mr\\groupcompatrator\\out1"));
        
        //在此设置自定义的Groupingcomparator类 
        job.setGroupingComparatorClass(ItemidGroupingComparator.class);
        
        job.waitForCompletion(true);
        
    }

运行之后查看效果如下

六:自定义输出位置

点我下载源码

之前我们保存数据一直都是保存在文件系统中的，而且都是mapreduce代劳的，我们有没有可能把它输出到其它地方呢，比如关系型数据库，或者输出到缓存？hive等等这些地方？答案是可以的。

6.1:自定义FileOutputFormat

我们之前的启动类main方法中，一直有一行代码是这样子的

FileOutputFormat.setOutputPath(job, new Path("D:\\mr\\wordcount\\out1"));

这行代码是指定输出的位置，可以猜一下，我们使用的应该是FileOutputFormat或者是它的子类，答案是对的。所以我们来继承它，它是一个抽象类

package com.zxj.hadoop.demo.mapreduce.outputformat;

import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-08-26 .19:08
 * 说明 mapreduce写数据时，会先调用这个类的getRecordWriter()方法，拿到一个RecordWriter对象，再调这个对象的写数据方法
 */
public class MyOutputFormat extends FileOutputFormat {
    @Override
    public RecordWriter getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
        return new MyRecordWriter<>();
    }

    /**
     * 自定义的RecordWriter
     *
     * @param 
     */
    static class MyRecordWriter extends RecordWriter {
        private BufferedWriter writer;
        public MyRecordWriter() {
            try {
                writer = new BufferedWriter(new FileWriter("d:/myFileFormat"));
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        @Override
        public void write(Text key, LongWritable value) throws IOException, InterruptedException {
            writer.write(key.toString() + " " + value.toString());
            writer.newLine();
            writer.flush();
        }

        @Override
        public void close(TaskAttemptContext context) throws IOException, InterruptedException {
            writer.close();
        }
    }
}

如上的代码中，我们自定义了一个OutputFormat，并且把文件输出到了D盘，可以想象，假如说我们要输出到一些关系型数据库，或者一些缓存，或者其它的存储位置，我们都可以灵活的去通过这个类去扩展它，而并不仅仅是受限于文件系统。

这个类配置使用的代码也只有一行

        Job job = Job.getInstance(conf);


        //设置自定义的OutputFormat
        job.setOutputFormatClass(MyOutputFormat.class);

我们可以看到，这里我们设置了输出的Format。虽然我们在这个自定义的format中指定了输出的位置为D盘的根目录，但是输入和输出的两个参数还是要传的，也就是这两行代码

        //指定输入文件的位置，这里为了灵活，接收外部参数
        FileInputFormat.setInputPaths(job, new Path("D:\\mr\\wordcount\\input"));
        //指定输入文件的位置，这里接收启动参数
        FileOutputFormat.setOutputPath(job, new Path("D:\\mr\\wordcount\\out1"));

或许有人会觉得，输入需要指定可以理解，输出为什么要指定呢？这是因为我们继承的是FileOutputFormat，所以我们就必须要有一个输出目录，这个目录也会输出文件，但是输出的不是数据文件，而是一个结果文件，代表着成功或者失败，而自定义中指定的format的位置，才是真正数据输出的位置

这里贴上完整的启动类的代码，自定义输出format不会影响到map与reduce，所以这里就不贴

 public static void main(String[] args) throws IOException {
        Configuration conf = new Configuration();
        //这个默认值就是local，其实可以不写
        conf.set("mapreduce.framework.name", "local");
        //本地模式运行mr程序时，输入输出可以在本地，也可以在hdfs中，具体需要看如下的两行参数
        //这个默认值 就是本地，其实可以不配
        //conf.set("fs.defaultFS","file:///");
        //conf.set("fs.defaultFS","hdfs://server1:9000/");



        Job job = Job.getInstance(conf);

        //使得hadoop可以根据类包，找到jar包在哪里
        job.setJarByClass(Driver.class);

        //设置自定义的OutputFormat
        job.setOutputFormatClass(MyOutputFormat.class);

        //指定Mapper的类
        job.setMapperClass(WordCountMapper.class);
        //指定reduce的类
        job.setReducerClass(WordCountReduce.class);

        //设置Mapper输出的类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        //设置最终输出的类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        //指定输入文件的位置，这里为了灵活，接收外部参数
        FileInputFormat.setInputPaths(job, new Path("D:\\mr\\wordcount\\input"));
        //指定输入文件的位置，这里接收启动参数
        FileOutputFormat.setOutputPath(job, new Path("D:\\mr\\wordcount\\out1"));

        //将job中的参数，提交到yarn中运行
        //job.submit();
        try {
            job.waitForCompletion(true);
            //这里的为true,会打印执行结果
        } catch (ClassNotFoundException | InterruptedException e) {
            e.printStackTrace();
        }
    }

影响到的位置也仅仅是红色代码区域。然后随便写一个wordcount的代码，执行结果如下，我们先看FileOutputFormat.setOutputPath()中参数目录的内容

很明显，这是mapreduce运行完成后，代表运行结果的文件

我们再看D盘的目录

打开可以看到输出的最终结果

自定义输出就完了，利用这个类的实现，我们可以自由实现存储的位置

七:自定义输入数据

待补充...

八:全局计数器

在运行mapreduce中，我们可能会遇到计数器的需求，比如说我们要知道计算了多少条数据，剔除了多少条不合法的数据。

public class MultiOutputs {
    //通过枚举形式定义自定义计数器
    enum MyCounter{MALFORORMED,NORMAL}

    static class CommaMapper extends Mapper {

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String[] words = value.toString().split(",");

            for (String word : words) {
                context.write(new Text(word), new LongWritable(1));
            }
            //对枚举定义的自定义计数器加1
            context.getCounter(MyCounter.MALFORORMED).increment(1);
            //通过动态设置自定义计数器加1
            context.getCounter("counterGroupa", "countera").increment(1);
//直接设定数值
            context.getCounter("","").setValue(10);
        }

    }

九:多个job串联，定义执行顺序

还记得之前我们写的mr程序中有计算qq好友，以及计算一本小说中，出现的哪个词最多的程序吗？我们分别是使用了两个mapreduce来计算这些数据，第二个mapreduce是基于第一个mapreduce的。

但是那个时候，我们是等待第一个程序执行完成后，手动执行第二个程序，其实这一步操作是可以自动的。我们可以把多个job关联起来

    Job job1 = 创建第一个job;
    Job job2 = 创建第二个job;
    Job job3 = 创建第三个job;
      ControlledJob cJob1 = new ControlledJob(job1.getConfiguration());
        ControlledJob cJob2 = new ControlledJob(job2.getConfiguration());
        ControlledJob cJob3 = new ControlledJob(job3.getConfiguration());
       
        cJob1.setJob(job1);
        cJob2.setJob(job2);
        cJob3.setJob(job3);

        // 设置作业依赖关系
        cJob2.addDependingJob(cJob1);//第二个依赖于第一个
        cJob3.addDependingJob(cJob2);//第三个依赖于第二个
 
        JobControl jobControl = new JobControl("RecommendationJob");
        jobControl.addJob(cJob1);
        jobControl.addJob(cJob2);
        jobControl.addJob(cJob3);
 
 
        // 新建一个线程来运行已加入JobControl中的作业，开始进程并等待结束
        Thread jobControlThread = new Thread(jobControl);
        jobControlThread.start();
        while (!jobControl.allFinished()) {
            Thread.sleep(500);
        }
        jobControl.stop();

十:mapreduce的参数优化

10.1:资源相关参数

//以下参数是在用户自己的mr应用程序中配置就可以生效
(1) mapreduce.map.memory.mb: 一个Map Task可使用的资源上限（单位:MB），默认为1024。如果Map Task实际使用的资源量超过该值，则会被强制杀死。
(2) mapreduce.reduce.memory.mb: 一个Reduce Task可使用的资源上限（单位:MB），默认为1024。如果Reduce Task实际使用的资源量超过该值，则会被强制杀死。
(3) mapreduce.map.java.opts: Map Task的JVM参数，你可以在此配置默认的java heap size等参数, e.g.
“-Xmx1024m -verbose:gc -Xloggc:/tmp/@[email protected]” （@taskid@会被Hadoop框架自动换为相应的taskid）, 默认值: “”
(4) mapreduce.reduce.java.opts: Reduce Task的JVM参数，你可以在此配置默认的java heap size等参数, e.g.
“-Xmx1024m -verbose:gc -Xloggc:/tmp/@[email protected]”, 默认值: “”
(5) mapreduce.map.cpu.vcores: 每个Map task可使用的最多cpu core数目, 默认值: 1
(6) mapreduce.reduce.cpu.vcores: 每个Reduce task可使用的最多cpu core数目, 默认值: 1

//应该在yarn启动之前就配置在服务器的配置文件中才能生效
(7) yarn.scheduler.minimum-allocation-mb      1024   给应用程序container分配的最小内存
(8) yarn.scheduler.maximum-allocation-mb      8192    给应用程序container分配的最大内存
(9) yarn.scheduler.minimum-allocation-vcores    1    
(10)yarn.scheduler.maximum-allocation-vcores    32
(11)yarn.nodemanager.resource.memory-mb   8192  

//shuffle性能优化的关键参数，应在yarn启动之前就配置好
(12) mapreduce.task.io.sort.mb   100         //shuffle的环形缓冲区大小，默认100m
(13) mapreduce.map.sort.spill.percent   0.8    //环形缓冲区溢出的阈值，默认80%

10.2:容错相关参数

(1) mapreduce.map.maxattempts: 每个Map Task最大重试次数，一旦重试参数超过该值，则认为Map Task运行失败，默认值：4。
(2) mapreduce.reduce.maxattempts: 每个Reduce Task最大重试次数，一旦重试参数超过该值，则认为Map Task运行失败，默认值：4。
(3) mapreduce.map.failures.maxpercent: 当失败的Map Task失败比例超过该值为，整个作业则失败，默认值为0. 如果你的应用程序允许丢弃部分输入数据，则该该值设为一个大于0的值，比如5，表示如果有低于5%的Map Task失败（如果一个Map Task重试次数超过mapreduce.map.maxattempts，则认为这个Map Task失败，其对应的输入数据将不会产生任何结果），整个作业扔认为成功。
(4) mapreduce.reduce.failures.maxpercent: 当失败的Reduce Task失败比例超过该值为，整个作业则失败，默认值为0.
(5) mapreduce.task.timeout: Task超时时间，经常需要设置的一个参数，该参数表达的意思为：如果一个task在一定时间内没有任何进入，即不会读取新的数据，也没有输出数据，则认为该task处于block状态，可能是卡住了，也许永远会卡主，为了防止因为用户程序永远block住不退出，则强制设置了一个该超时时间（单位毫秒），默认是300000。如果你的程序对每条输入数据的处理时间过长（比如会访问数据库，通过网络拉取数据等），建议将该参数调大，该参数过小常出现的错误提示是“AttemptID:attempt_14267829456721_123456_m_000224_0 Timed out after 300 secsContainer killed by the ApplicationMaster.”。

10.3:本地运行mapreduce作业

mapreduce.framework.name=local
mapreduce.jobtracker.address=local
fs.defaultFS=local

10.4:效率和稳定性相关参数

(1) mapreduce.map.speculative: 是否为Map Task打开推测执行机制，默认为false
(2) mapreduce.reduce.speculative: 是否为Reduce Task打开推测执行机制，默认为false
(3) mapreduce.job.user.classpath.first & mapreduce.task.classpath.user.precedence：当同一个class同时出现在用户jar包和hadoop jar中时，优先使用哪个jar包中的class，默认为false，表示优先使用hadoop jar中的class。
(4) mapreduce.input.fileinputformat.split.minsize: FileInputFormat做切片时的最小切片大小，
(5)mapreduce.input.fileinputformat.split.maxsize:  FileInputFormat做切片时的最大切片大小(切片的默认大小就等于blocksize，即 134217728)

转载于:https://www.cnblogs.com/zhuxiaojie/p/7384677.html

你可能感兴趣的:(hadoop系列四:mapreduce的使用(二))

Gemini vs DeepSeek：Transformer 架构下的技术路线差异与企业级选择 charles666666 transformer 架构深度学习语言模型产品经理人工智能
一、引言：从商业价值切入Gemini和DeepSeek都基于Transformer架构，但在技术路线和应用场景上各有侧重。本文将解密同源Transformer下的技术分野，帮助企业做出更明智的大模型选型决策。二、Transformer核心机制精要Transformer架构是现代大语言模型的基础，其核心机制包括自注意力机制和前馈神经网络。自注意力机制使模型能够捕捉序列中元素的全局依赖关系，但也是GP
【无标题】Python ---Day2 复合类型之序列类型、映射类型和集合类型的学习！！！
系列文章目录文章目录系列文章目录前言一、复合类型初识1.1列表类型1.1.1列表创建1.1.2列表运算1.1.3列表访问1.1.3.1索引1.1.3.2反向索引1.1.3.3切片1.1.4列表操作1.1.4.1添加数据1.1.4.2修改数据1.1.4.3删除数据1.2元组类型1.2.1元组创建1.2.2元组操作1.2.2.2查看元组1.2.2.3解包技能1.2.3元组运算1.2.4元组不可变二、映
脑电分析入门指南：信号处理、特征提取与机器学习 Ao000000 信号处理机器学习人工智能
脑电分析入门指南一、为什么要研究脑电1.课题目标（解决什么问题）2.输入与输出二、脑电分析的整体流程三、每一步详解1.数据采集2.预处理3.特征提取4.特征选择/降维5.分类与识别四、研究过程中遇到的挑战与解决方法五、学习感受一、为什么要研究脑电1.课题目标（解决什么问题）本课题旨在通过对脑电（EEG）的采集与分析，提取有用的神经信息，实现对某类脑状或行为的识别/预测/评估。例如：情绪识别、疾病诊
来，C语言刷题(中)（保姆式详解）白子寰 C语言题集 c语言算法
目录关于VS2022调试技巧步骤一步骤二步骤三关于Debug和Release版本区别编程题1.计算求和2.水仙花数3.打印菱形4.喝汽水问题递归题组（1）关于递归的描述（2）打印一个整数的每一位（3）求阶层①递归方式②非递归方式(4)计算一个数的每位之和(5)n的k次方操作符讲解（1）进制位的转化（2）原码，反码，补码（3）按位异或^（4）按位或与&编程题（1）交换两个变量(2)统计二进制中1的个
KTO（Kahneman-Tversky Optimization）技术详解与工程实现 DK_Allen 大模型深度学习 pytorch 人工智能 KTO
KTO（Kahneman-TverskyOptimization）技术详解与工程实现一、KTO核心思想KTO是基于行为经济学前景理论（ProspectTheory）的偏好优化方法，突破传统偏好学习需要成对数据的限制，仅需单样本绝对标注（好/坏）即可优化模型。其创新性在于：损失函数设计：将人类对"收益"和"损失"的非对称心理反应量化数据效率：无需构建偏好对（y_w>y_l），直接利用松散标注二、KT
打造自己的组件库（二）CSS工程化方案行云＆流水 Vue3组件库前端 Vue3 vue3组件库 vue.js 前端
1.css工程化方案1.1.目录结构设计src/assets/styles/├──index.scss#主入口文件├──variables.scss#全局CSS变量定义├──mixins.scss#SCSS混入├──reset.scss#样式重置└──theme/├──light.scss#亮色主题└──dark.scss#暗色主题1.2.CSS工程化特点1.2.1模块化导入@use'./them
NLP-D7-李宏毅机器学习---X-Attention&&GAN&BERT&GPT 甄小胖机器学习自然语言处理机器学习 bert
—0521今天4:30就起床了！真的是迫不及待想看新的课程！！！昨天做人脸识别系统的demo查资料的时候，发现一个北理的大四做cv的同学，差距好大！！！我也要努力呀！！不是比较，只是别人可以做到这个程度，我也一定可以！！！要向他学习！！！开始看课程啦！-----0753看完了各种attention，由于attention自己计算的限制，当N很大的时候会产生计算速度问题，从各种不同角度（人工知识输入
常见DDOS攻击方式与防护详解 “萌面大虾” 网络安全 ddos 网络网络安全
常见DDOS攻击方式与防护详解1四层DDOS1.1基于UDP协议的DDOS攻击与防护1.1.1UDPFlood攻击原理：攻击者发送大量UDP协议报文，UDP协议报文是面向无连接的，受害者只能被动接受所有报文，导致业务资源被占用。防护方法1、常见端口限速：如常见DNS、NTP、SNMP等协议均有固定端口，可以对其端口进行阈值限速处理，防止流量过大。2、特征提取过滤：UDP协议报文多为工具输出，具有一
oracle pg 文件级迁移,从Oracle迁移到AntDB(二)-- ora2pg-对象和数据的导出导入
使用Ora2pg和psqlcopy方式进行数据迁移author:yafeishitags:AntDB,ora2pg,oracleAntDB:github_url,基于postgresql的高性能分布式数据库使用Ora2pg和psqlcopy方式进行数据迁移准备工作使用本文档的前提本文档指导如何使用ora2pg进行oracle到ADB的数据迁移，但是在参照本文档操作之前，有以下条件必须满足：-ADB
C语言初阶-ASCII表以及各种C语言的操作符
目录一、ASCII表二、C语言中的操作符观看之前记得先点赞谢谢大家啦一、ASCII表它的全称是“美国信息交换标准代码”。为保证人类和设备，设备和计算机之间能进行正确的信息交换，人们编制的统一的信息交换代码。二、C语言中的操作符
FreeRTOS 可重入
✅一、FreeRTOS是“可重入”的吗？FreeRTOS本身是设计为可重入的RTOS内核，但它的可重入性依赖于你使用的API和上下文环境（任务、ISR、中断嵌套等）。我们分情况来看：二、不同上下文下的可重入性分析1.FreeRTOS内核API（任务管理、调度器等）内核是多任务安全的（即线程安全/可重入）。大多数API内部使用了临界区保护（关中断/禁止调度），确保操作的原子性。✅可重入2.FreeR
[C语言初阶]指针初阶
目录一、指针是什么？二、指针与指针类型三、野指针及其避免方法3.1什么是野指针？3.2野指针产生的原因：3.3如何避免野指针？四、指针运算4.1应用：实现strlen函数五、指针与数组六、二级指针七、指针数组指针是C语言的灵魂所在，也是许多初学者感到困惑的概念。本文将带你系统学习指针的基础知识，从指针的本质到指针运算，再到指针与数组的关系，最后介绍二级指针和指针数组的概念。通过本文的学习，你将建立
Excalidraw：开源手绘风格白板工具的技术与生态解析 wylee 开源
一、项目定位与核心价值Excalidraw是一款基于浏览器的开源虚拟手绘风格白板工具，由Excalidraw团队开发并维护。项目以MIT协议开源，旨在提供轻量级、高定制性的在线绘图解决方案，适用于流程图设计、原型绘制、教学演示等场景。截至2025年3月，项目已发布v0.18.0版本，月下载量超24.5万次，被GoogleCloud、Meta等企业集成，成为开源协作工具领域的标杆项目。二、核心功能与
Conda安装与使用
目录一、软件安装及conda管理1.conda下载2.miniconda安装二、环境配置1.配置镜像：2.创建环境、移除环境：3.查看小环境4.进入、退出小环境5.查找并安装软件三、一步到位其他：参考资料：一、软件安装及conda管理conda可以来管理大量的生物信息学软件，或者想要复现一些文章中的实验结果需要不同环境的切换。1.conda下载（1）anacondaanaconda|镜像站使用帮助
0.基本环境配置、数据库介绍 ersanshi055 生信小菜鸟生信基本知识 r语言生信分析
目录一、R与Rstudio的下载与安装二、R包安装与加载1.R包安装2.R包加载三、常用R包1.tidyverse2.ggplot2四、常用数据库1.GEO2.TGGA一、R与Rstudio的下载与安装介绍：Home-RDocumentation下载：R:CRAN:MirrorsRstudio：DownloadRStudio-PositR、Rstudio安装：根据指引安装，R和Rstudio比较占
Linux中安装Tomcat 十一的学习笔记运维中服务安装管理 linux tomcat 运维
文章目录一、Tomcat介绍1.1、Tomcat是什么1.2、Tomcat的工作原理1.3、Tomcat适用的场景1.4、Tomcat与Nginx、Apache比较1.4.1、优势1.4.2、劣势1.4.3、定位功能1.5、Tomcat的主要组件1.6、Tomcat的主要配置文件二、Tomcat安装2.1、查看可用的JDK2.2、安装OpenJDK112.3、配置环境变量2.4、验证安装2.5、查
洛谷 B3627 立方根--二分法求解整数立方根问题 jdlxx_dongfangxing 算法 c++二分法
一、问题重述与数学建模给定一个正整数n，我们的目标是计算其立方根的整数部分，即找到最大的整数m满足m³≤n。这个问题可以形式化表述为：数学定义：⌊∛n⌋=max{x∈ℤ⁺|x³≤n}问题特性分析：单调性保证：立方函数f(x)=x³在正整数域上是严格单调递增的函数有界性：解的范围明确限定在[1,n]区间内离散性：我们需要寻找的是整数解而非实数解应用意义：该问题在实际中常用于需要快速估算立方根的场合，
基于单片机的住宅防火防盗报警系统设计启初科技 51单片机毕业设计单片机毕业设计单片机嵌入式硬件
文章目录一、系统概述二、项目内容和功能介绍三、效果图四、资料获取一、系统概述基于单片机的住宅防火防盗报警系统设计介绍一、系统设计背景与意义随着城市化进程的加快和居民生活水平的提高，住宅安全已成为人们关注的焦点。火灾和盗窃是威胁住宅安全的两大主要因素，传统的人工巡查和简单的安防设备已难以满足现代住宅的安全需求。基于单片机的住宅防火防盗报警系统集成了传感器技术、单片机控制技术和无线通信技术，能够实时监
蓝牙协议栈低功耗之安全管理协议层(SMP) 写代码的无赖的猴子 BLE低功耗蓝牙协议栈网络信息与通信物联网
逻辑链路控制和适配协议层L2CAPSMP层阶段一阶段二Legacyparing安全连接交换公匙鉴权阶段1鉴权阶段2阶段三LElegacypairing：LESecureConnections交叉密匙特性配对PDU类型Hello，我是无赖的猴子，一个蓝牙爱好者，分享蓝牙相关的知识，关注我，学习蓝牙：蓝牙文章链接直达：1.profile层（待更新）2.属性协议层(ATT)（待更新）3.安全管理协议层(
在实训云平台上配置云主机酒城译痴无心剑 Spark基础学习笔记（2）实训云云主机远程连接
文章目录零、学习目标一、实训云升级二、实训云登录（一）登录实训云（二）切换界面语言（三）规划云主机实例三、创建网络三、创建路由器四、连接子网五、创建虚拟网卡六、管理安全组规则七、创建云主机（一）云主机规划（二）创建ied云主机（三）创建其它云主机八、本机利用FinalShell连接虚拟机（一）连接ied云主机（二）连接其它云主机九、配置云主机（一）配置ied云主机1、查看IP地址2、配置主机名3、
Docker 高级管理 -- 容器通信技术与数据持久化婷儿z docker 容器运维
目录第一节:容器通信技术一：Docker容器的网络模式1：Bridge模式2：Host模式3：Container模式4：None模式5：Overlay模式6：Macvlan模式7：自定义网络模式二：端口映射关键对比三：容器互联关键对比四：容器间通信实现案例1.网络创建选项2.容器通信实现步骤3.通信方式对比第二节：数据持久化技术一：Docker的数据管理1.数据卷核心概念2.数据卷核心作用3.数据
Hutool TreeUtil快速构建树形数据结构 yifanghub 工具类 java
在管理菜单、部门结构等场景时，我们经常需要将数据库中的层级数据转换为树形结构。本文将通过Hutool的TreeUtil工具类，实现零递归快速构建树形结构。一、环境准备JDK1.8+SpringBoot2.xHutool5.8.16MySQL8.0二、数据准备--创建部门表CREATETABLE`sys_dept`(`id`intNOTNULLAUTO_INCREMENT,`dept_name`va
windows中dify本地部署，非docker环境
第一章win11中安装配置Archlinux文章目录第一章win11中安装配置Archlinux一、安装Archlinux1.直接在wsl中安装2.本地镜像安装3.wsl中卸载archlinux二、在Archlinux中创建新用户1.包管理工具升级2.使用useradd创建用户3.设置新用户密码4.测试用户5.删除用户三、其他设置1.wsl的互作性2.systemd支持四、安装vim1.安装前准备
Java基础学习笔记2 qichi333 学习笔记 java eclipse
今天是Java基础学习第二天，加油！！！下面是我今天记的一些笔记。（有点懒惰了，爬虫今天没学，因为赖床了(bushi)，但我会勤奋起来的^_^，一定一定！明天不能偷懒了天！！）一、运算符例子：inta=10;intb=20;intc=a+b;其中，“+”是运算符，且是算术运算符；“a+b”是表达式，且是算术表达式。1.算术运算符例1：publicclassdemo3{publicstaticvoi
【二分答案】-----【扑克牌】 float_com 二分答案算法二分答案贪心
扑克牌题目链接题目描述你有nnn种普通牌，第iii种牌的数量为cic_ici。另外你还有mmm张特殊的Joker牌。你可以使用以下两种方式来组成一套合法的牌组：不使用Joker，选择nnn种普通牌各一张；使用一张Joker，选择其余n−1n-1n−1种普通牌各一张（Joker可替代任意一种牌）。例如，当n=3n=3n=3时，一共有以下四种组合方式：{1,2,3}\{1,2,3\}{1,2,3}{J
vue3 + element-plus el-table表格二次封装，支持复选框，排序，分页。前端vue.js
一、customTable.vue组件{{(currentPage-1)*pageSize+scope.$index+1}}{{btn.text}}import{ref,onMounted}from"vue";letloading=defineModel("loading");constemits=defineEmits(["selection-change","sort-change"]);co
Android10 SystemUI系列需求定制（二）隐藏状态栏通知图标，锁屏通知，可定制包名，渠道等 Erorrs Android10 及Android10以下 ROM定制 android ROM定制
一、前言SystemUI所包含的界面和模块比较多，这一节主要分享一下状态栏通知图标和通知栏的定制需求：隐藏状态栏通知图标，锁屏通知，可定制包名，渠道等来熟悉一下Systemui。二、准备工作按照惯例先找到核心类。这里提前说一下，这个需求的修改方法更多，笔者这里也只是提供一个思路。不过由于笔者最看是是做SystemUI的自认为修改需求和解决问题要找到本质。下面说一下设计到的核心类frameworks
学习threejs，使用自定义GLSL 着色器，生成漂流的3D能量球 gis分享者 gis工程师 threejs threejs GLSL ShaderMaterial 3D 能量球着色器
‍⚕️主页：gis分享者‍⚕️感谢各位大佬点赞收藏⭐留言加关注✅!‍⚕️收录于专栏：threejsgis工程师文章目录一、前言1.1☘️GLSL着色器1.1.1☘️着色器类型1.1.2☘️工作原理1.1.3☘️核心特点1.1.4☘️应用场景1.1.5☘️实战示例二、使用自定义GLSL着色器，生成漂流的3D能量球1.☘️实现思路2.☘️代码样例一、前言本文详细介绍如何基于threejs在三维场景中自
SpringBoot ThreadLocal 全局动态变量设置 xdscode spring boot java ThreadLocal
需求说明：现有一个游戏后台管理系统，该系统可管理多个大区的数据，但是需要使用大区id实现数据隔离，并且提供了大区选择功能，先择大区后展示对应的数据。需要实现一下几点：1.前端请求时，area_id是必传的1.数据隔离，包括查询及增删改：使用mybatis拦截器实现2.多个用户同时操作互不影响3.非前端调用场景的处理：定时任务、mq1.前端决定area_id为了解决多个用户可以互不影响的使用不同的a
四、Actor-Critic Methods 沈夢昂志 DRL深度强化学习 python 深度学习
由于在看DRL论文中，很多公式都很难理解。因此最近在学习DRL的基本内容。再此说明，非常推荐B站“王树森老师的DRL强化学习”本文的图表及内容，都是基于王老师课程的后自行理解整理出的内容。目录A.书接上回1、Reinforce算法B.State-ValueFunctionC.PolicyNetWork（Actor）D.ActionValueNetwork(Critic)E.TraintheNeur
二分查找排序算法周凡杨 java 二分查找排序算法折半
一：概念二分查找又称折半查找（折半搜索/ 二分搜索），优点是比较次数少，查找速度快，平均性能好；其缺点是要求待查表为有序表，且插入删除困难。因此，折半查找方法适用于不经常变动而查找频繁的有序列表。首先，假设表中元素是按升序排列，将表中间位置记录的关键字与查找关键字比较，如果两者相等，则查找成功；否则利用中间位置记录将表分成前、后两个子表，如果中间位置记录的关键字大于查找关键字，则进一步
java中的BigDecimal bijian1013 java BigDecimal
在项目开发过程中出现精度丢失问题，查资料用BigDecimal解决，并发现如下这篇BigDecimal的解决问题的思路和方法很值得学习，特转载。原文地址：http://blog.csdn.net/ugg/article/de
Shell echo命令详解 daizj echo shell
Shell echo命令 Shell 的 echo 指令与 PHP 的 echo 指令类似，都是用于字符串的输出。命令格式： echo string 您可以使用echo实现更复杂的输出格式控制。 1.显示普通字符串: echo "It is a test" 这里的双引号完全可以省略，以下命令与上面实例效果一致： echo Itis a test 2.显示转义
Oracle DBA 简单操作周凡杨 oracle dba sql
--执行次数多的SQL select sql_text,executions from ( select sql_text,executions from v$sqlarea order by executions desc ) where rownum<81; &nb
画图重绘朱辉辉33 游戏
我第一次接触重绘是编写五子棋小游戏的时候，因为游戏里的棋盘是用线绘制的，而这些东西并不在系统自带的重绘里，所以在移动窗体时，棋盘并不会重绘出来。所以我们要重写系统的重绘方法。在重写系统重绘方法时，我们要注意一定要调用父类的重绘方法，即加上super.paint(g)，因为如果不调用父类的重绘方式，重写后会把父类的重绘覆盖掉，而父类的重绘方法是绘制画布，这样就导致我们
线程之初体验西蜀石兰线程
一直觉得多线程是学Java的一个分水岭，懂多线程才算入门。之前看《编程思想》的多线程章节，看的云里雾里，知道线程类有哪几个方法，却依旧不知道线程到底是什么？书上都写线程是进程的模块，共享线程的资源，可是这跟多线程编程有毛线的关系，呜呜。。。线程其实也是用户自定义的任务，不要过多的强调线程的属性，而忽略了线程最基本的属性。你可以在线程类的run()方法中定义自己的任务，就跟正常的Ja
linux集群互相免登陆配置林鹤霄 linux
配置ssh免登陆 1、生成秘钥和公钥 ssh-keygen -t rsa 2、提示让你输入，什么都不输，三次回车之后会在~下面的.ssh文件夹中多出两个文件id_rsa 和 id_rsa.pub 其中id_rsa为秘钥，id_rsa.pub为公钥，使用公钥加密的数据只有私钥才能对这些数据解密 c
mysql : Lock wait timeout exceeded; try restarting transaction aigo mysql
原文：http://www.cnblogs.com/freeliver54/archive/2010/09/30/1839042.html 原因是你使用的InnoDB 表类型的时候, 默认参数:innodb_lock_wait_timeout设置锁等待的时间是50s, 因为有的锁等待超过了这个时间,所以抱错. 你可以把这个时间加长,或者优化存储
Socket编程基本的聊天实现。 alleni123 socket
public class Server { //用来存储所有连接上来的客户 private List<ServerThread> clients; public static void main(String[] args) { Server s = new Server(); s.startServer(9988); } publi
多线程监听器事件模式(一个简单的例子) 百合不是茶线程监听模式
多线程的事件监听器模式监听器时间模式经常与多线程使用,在多线程中如何知道我的线程正在执行那什么内容,可以通过时间监听器模式得到创建多线程的事件监听器模式思路: 1, 创建线程并启动,在创建线程的位置设置一个标记 2,创建队
spring InitializingBean接口 bijian1013 java spring
spring的事务的TransactionTemplate，其源码如下： public class TransactionTemplate extends DefaultTransactionDefinition implements TransactionOperations, InitializingBean{ ... } TransactionTemplate继承了DefaultT
Oracle中询表的权限被授予给了哪些用户 bijian1013 oracle 数据库权限
Oracle查询表将权限赋给了哪些用户的SQL，以备查用。 select t.table_name as "表名", t.grantee as "被授权的属组", t.owner as "对象所在的属组"
【Struts2五】Struts2 参数传值 bit1129 struts2
Struts2中参数传值的3种情况 1.请求参数绑定到Action的实例字段上 2.Action将值传递到转发的视图上 3.Action将值传递到重定向的视图上一、请求参数绑定到Action的实例字段上以及Action将值传递到转发的视图上 Struts可以自动将请求URL中的请求参数或者表单提交的参数绑定到Action定义的实例字段上，绑定的规则使用ognl表达式语言
【Kafka十四】关于auto.offset.reset[Q/A] bit1129 kafka
I got serveral questions about auto.offset.reset. This configuration parameter governs how consumer read the message from Kafka when there is no initial offset in ZooKeeper or
nginx gzip压缩配置 ronin47 nginx gzip 压缩范例
nginx gzip压缩配置更多 0 nginx gzip 配置随着nginx的发展，越来越多的网站使用nginx，因此nginx的优化变得越来越重要，今天我们来看看nginx的gzip压缩到底是怎么压缩的呢？ gzip(GNU-ZIP)是一种压缩技术。经过gzip压缩后页面大小可以变为原来的30%甚至更小，这样，用
java-13.输入一个单向链表，输出该链表中倒数第 k 个节点 bylijinnan java
two cursors. Make the first cursor go K steps first. /* * 第 13 题：题目：输入一个单向链表，输出该链表中倒数第 k 个节点 */ public void displayKthItemsBackWard(ListNode head,int k){ ListNode p1=head,p2=head;
Spring源码学习-JdbcTemplate queryForObject bylijinnan java spring
JdbcTemplate中有两个可能会混淆的queryForObject方法： 1. Object queryForObject(String sql, Object[] args, Class requiredType) 2. Object queryForObject(String sql, Object[] args, RowMapper rowMapper) 第1个方法是只查
[冰川时代]在冰川时代,我们需要什么样的技术? comsci 技术
看美国那边的气候情况....我有个感觉...是不是要进入小冰期了? 那么在小冰期里面...我们的户外活动肯定会出现很多问题...在室内呆着的情况会非常多...怎么在室内呆着而不发闷...怎么用最低的电力保证室内的温度.....这都需要技术手段... &nb
js 获取浏览器型号 cuityang js 浏览器
根据浏览器获取iphone和apk的下载地址 <!DOCTYPE html> <html> <head> <meta charset="utf-8" content="text/html"/> <meta name=
C# socks5详解转 dalan_123 socket C#
http://www.cnblogs.com/zhujiechang/archive/2008/10/21/1316308.html 这里主要讲的是用.NET实现基于Socket5下面的代理协议进行客户端的通讯，Socket4的实现是类似的，注意的事，这里不是讲用C#实现一个代理服务器，因为实现一个代理服务器需要实现很多协议，头大，而且现在市面上有很多现成的代理服务器用，性能又好，
运维 Centos问题汇总 dcj3sjt126com 云主机
一、sh 脚本不执行的原因 sh脚本不执行的原因只有2个 1.权限不够 2.sh脚本里路径没写完整。二、解决You have new mail in /var/spool/mail/root 修改/usr/share/logwatch/default.conf/logwatch.conf配置文件 MailTo = MailFrom 三、查询连接数
Yii防注入攻击笔记 dcj3sjt126com sql WEB安全 yii
网站表单有注入漏洞须对所有用户输入的内容进行个过滤和检查，可以使用正则表达式或者直接输入字符判断，大部分是只允许输入字母和数字的，其它字符度不允许；对于内容复杂表单的内容，应该对html和script的符号进行转义替换：尤其是<,>,',"",&这几个符号这里有个转义对照表： http://blog.csdn.net/xinzhu1990/articl
MongoDB简介[一] eksliang mongodb MongoDB简介
MongoDB简介转载请出自出处：http://eksliang.iteye.com/blog/2173288 1.1易于使用 MongoDB是一个面向文档的数据库，而不是关系型数据库。与关系型数据库相比，面向文档的数据库不再有行的概念，取而代之的是更为灵活的“文档”模型。另外，不
zookeeper windows 入门安装和测试 greemranqq zookeeper 安装分布式
一、序言以下是我对zookeeper 的一些理解： zookeeper 作为一个服务注册信息存储的管理工具，好吧，这样说得很抽象，我们举个“栗子”。栗子1号：假设我是一家KTV的老板，我同时拥有5家KTV，我肯定得时刻监视
Spring之使用事务缘由(2-注解实现) ihuning spring
Spring事务注解实现 1. 依赖包： 1.1 spring包： spring-beans-4.0.0.RELEASE.jar spring-context-4.0.0.
iOS App Launch Option 啸笑天 option
iOS 程序启动时总会调用application:didFinishLaunchingWithOptions:，其中第二个参数launchOptions为NSDictionary类型的对象，里面存储有此程序启动的原因。 launchOptions中的可能键值见UIApplication Class Reference的Launch Options Keys节。 1、若用户直接
jdk与jre的区别（_） macroli java jvm jdk
简单的说JDK是面向开发人员使用的SDK，它提供了Java的开发环境和运行环境。SDK是Software Development Kit 一般指软件开发包，可以包括函数库、编译程序等。 JDK就是Java Development Kit JRE是Java Runtime Enviroment是指Java的运行环境，是面向Java程序的使用者，而不是开发者。如果安装了JDK，会发同你
Updates were rejected because the tip of your current branch is behind qiaolevip 学习永无止境每天进步一点点众观千象 git
$ git push joe prod-2295-1 To [email protected]:joe.le/dr-frontend.git ! [rejected] prod-2295-1 -> prod-2295-1 (non-fast-forward) error: failed to push some refs to '[email protected]
[一起学Hive]之十四-Hive的元数据表结构详解 superlxw1234 hive hive元数据结构
关键字：Hive元数据、Hive元数据表结构之前在 “[一起学Hive]之一–Hive概述，Hive是什么”中介绍过，Hive自己维护了一套元数据，用户通过HQL查询时候，Hive首先需要结合元数据，将HQL翻译成MapReduce去执行。本文介绍一下Hive元数据中重要的一些表结构及用途，以Hive0.13为例。文章最后面，会以一个示例来全面了解一下，
Spring 3.2.14，4.1.7，4.2.RC2发布 wiselyman Spring 3
Spring 3.2.14、4.1.7及4.2.RC2于6月30日发布。其中Spring 3.2.1是一个维护版本(维护周期到2016-12-31截止)，后续会继续根据需求和bug发布维护版本。此时，Spring官方强烈建议升级Spring框架至4.1.7 或者将要发布的4.2 。其中Spring 4.1.7主要包含这些更新内容。