つ〆以往的誓言゛

MapReduce流程讲解以及源码分析

MapReduce流程讲解

一.简介

对于用户来说只需要书写map操作和reduce操作

mapreduce计算数据的时间较长

整个过程分为map和reduce,map负责处理原始数据，reduce负责处理map数据

二.原理分析

1.map过程

block:块-->物理上的概念,默认是128M

split:切片-->本次map任务要处理的数据的大小;默认大小等于block的大小

maptask:map的任务-->一个split对应一个maptask

处理的都是切片数据,属于最原始的数据

kvbuffer:maptask临时结果输出的目的地.它是内存中的一块环形的空间,默认大小是100M;设置阈值,默认大小是80%,达到阈值之后进行溢写

spill:将环形数据缓冲区的临时结果写出到硬盘上;如果文件足够大，那么会溢写出很多的小文件，大概80M左右

partation:分区的数量和reduce的数量完全相同;

溢写的时候回提前计算出key所对应的的分区reduce

sort:排序先按照partation进行排序;然后按照key进行快速排序

merge:就是将溢写的多个小文件合并成一个大文件;使用归并算法，先按照分区，然后按照key进行归并排序

2.reduce

fetch:从maptask节点拉取reduce需要的数据

reducetask:我们必须将key相同临时数据拉取到同一个reducetask中进行计算;一个箱子里可以有不同的key，但是相同key必须在一个箱子里

output:因为每次产生结果不一样，为了防止产生的文件过大出现问题;每次产生的结果会默认存放到HDFS上;

3.搭建MapReduce集群

搭建的环境完全基于HA的环境

修改mapred-site.xml文件

cp mapred-site.xml.template mapred-site.xml

vim mapred-site.xml

<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>

修改yarn-site.xml文件

vim yarn-site.xml

<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.resourcemanager.ha.enabledname>
<value>truevalue>
property>
<property>
<name>yarn.resourcemanager.cluster-idname>
<value>mr_shsxtvalue>
property>
<property>
<name>yarn.resourcemanager.ha.rm-idsname>
<value>rm1,rm2value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm1name>
<value>node03value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm2name>
<value>node01value>
property>
<property>
<name>yarn.resourcemanager.zk-addressname>
<value>node01:2181,node02:2181,node03:2181value>
property>

拷贝配置文件(yarn-mapred)到其他节点

scp mapred-site.xml yarn-site.xml root@node02:pwd

scp mapred-site.xml yarn-site.xml root@node02:`pwd

先启动Zookeeper

zkServer.sh start

zkServer.sh status

启动DFS

start-all.sh( start-dfs.sh start-yarn.sh)

在备用节点输入命令

yarn-daemon.sh start resourcemanager

MapReduce源码分析

1.提交工作至客户端(client)

1.1创建job类

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * 作业类的主类
 * @author Administrator
 *
 */
public class WordCountJob {
 public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
  //获取配置文件对象
  Configuration configuration = new Configuration(true);
  //创建本次的作业对象
  Job job = Job.getInstance(configuration);
  //设置作业Jar的主类
  job.setJarByClass(WordCountJob.class);
  //设置作业的名称
  job.setJobName("shsxt-WorkCount");
  //设置ReduceTask的数量
  job.setNumReduceTasks(3);
  //设置输入路径(要处理的文件路径)
  FileInputFormat.setInputPaths(job, new Path("/shsxt/java/武动乾坤.txt"));
  //设置切片的大小
  //  CombineTextInputFormat.setMaxInputSplitSize(job, 1024 * 1024 * 100);
  //  CombineTextInputFormat.setMinInputSplitSize(job, 1024 * 1024 * 1);
  //设置输入路径(结果要输出的位置)
  FileOutputFormat.setOutputPath(job, 
    new Path("/shsxt/java/武动乾坤_result" + System.currentTimeMillis()));
  //定义Map的输出格式
  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(IntWritable.class);
  //设置Mapper类
  job.setMapperClass(WordCountMapper.class);
  //设置Reducer类
  job.setReducerClass(WordCountReducer.class);
  //提交作业
  job.waitForCompletion(true);
 }
 // MapTask

}

文件必须上传到Hadoop上并且路径必须正确 设置的job名称必须和导出的jar包名称一致

1.2创建mapper类

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {
 private Text text = new Text();
 private IntWritable one = new IntWritable(1);
 @Override
 protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
  //对数据进行切分
  String[] ss = value.toString().split("\\W+");
  for (String string : ss) {
   text.set(string);
   context.write(text, one);
  }

 }

}

1.3创建reduce类

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
 private Text text = new Text("ly");
 private int count = 1000;
 @Override
 protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
  //获取所有的key对应的值
  Iterator<IntWritable> iterator = values.iterator();
  //创建计数器
  int count = 0;
  //遍历
  while (iterator.hasNext()) {
   count += iterator.next().get();
  }
  context.write(key, new IntWritable(count));
 }
}

2.map源码分析

1.split 切片

解析job类中-->job.waitForCompletion(true);

/**
*Submit the job to the cluster and wait for it to finish
*提交该工作到集群并等待完成
*/
job.waitForCompletion(true);
// JobState是个枚举类只有两种状态 public static enum JobState {DEFINE, RUNNING};             //如果状态是DEFINE (定义的) 则直接提交
    if (state == JobState.DEFINE) {
      //(------------------------------submit())
        submit();
   }
//Monitor a job and print status in real-time as progress is made and tasks fail.
//实时监测当前进程的状态
 if (verbose) {
      monitorAndPrintJob();

解析job类中-->submit();

//(------------------------------submit())
//监测任务的状态
 ensureState(JobState.DEFINE);
//设置使用新的API
setUseNewAPI();
//通过配置文件创建新的对象
connect();
//(----------------------连接)
if (cluster == null) {
      cluster = 
        ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
                   public Cluster run()
                          throws IOException, InterruptedException, 
                                 ClassNotFoundException {
                     return new Cluster(getConfiguration());
                   }
                 });
   }
//通过分布式系统和客户端获取任务提交器
 final JobSubmitter submitter = 
        getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
 status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      //开启多线程提交任务
     public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
          //正式提交任务
          //进入(-----------------submitter)
        return submitter.submitJobInternal(Job.this, cluster);
     }
   });
//任务进入运行状态
state = JobState.RUNNING;

解析submitter

//Internal method for submitting jobs to the system
//使用内部的方法将任务提交到分布式系统
//validate the jobs output specs 
//验证任务输出空间
    checkSpecs(job);
//生成新的jobid并且设置进去
 JobID jobId = submitClient.getNewJobID();
    job.setJobID(jobId);
//设置job的提交路径
 Path submitJobDir = new Path(jobStagingArea, jobId.toString());
//Create the splits for the job
//创建给任务的切片
//(---------------------------writeSplits)
int maps = writeSplits(job, submitJobDir);
  List<InputSplit> splits = input.getSplits(job);
//将切片的集合转换成数组
    T[] array = (T[]) splits.toArray(new InputSplit[splits.size()]);
    // sort the splits into order based on size, so that the biggest go first
 //对切片进行排序将最大的放在前面
    Arrays.sort(array, new SplitComparator());
 //创建切片文件
    JobSplitWriter.createSplitFiles(jobSubmitDir, conf, 
        jobSubmitDir.getFileSystem(conf), array);
 //返回切片的个数
    return array.length;
//最终提交任务(设置好切片的数量以及maptask的数量)
 status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());

解析writeSplits

//获取配置文件
 Configuration conf = job.getConfiguration();
//Create an object for the given class and initialize it from conf
//通过反射工具类使用InputFormatClass()类创建类并使用conf配置文件进行初始化
//JobContext job = org.apache.hadoop.mapreduce.task.JobContextImpl
//input = org.apache.hadoop.mapreduce.lib.input.TextInputFormat.class
InputFormat?> input =
      ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
//通过输入格式对象获取切片
// getSplits = org.apache.hadoop.mapreduce.lib.input.FileInputFormat
//Generate the list of files and make them into FileSplits
//将切片放入到文件切片中
 List<InputSplit> splits = input.getSplits(job);
//设置切片的最大值和最小值
 long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));
 long maxSize = getMaxSplitSize(job);
//创建list
List<InputSplit> splits = new ArrayList<InputSplit>();
//获取需要处理的文件
List<FileStatus> files = listStatus(job);
//遍历
 for (FileStatus file: files) {
     //获取文件的路径
      Path path = file.getPath();
     //获取文件的大小
      long length = file.getLen();
     //判断文件是否为空
      if (length != 0) {
         //当前文件所对应的块
        BlockLocation[] blkLocations;
        if (file instanceof LocatedFileStatus) {
          blkLocations = ((LocatedFileStatus) file).getBlockLocations();
       } else {
          FileSystem fs = path.getFileSystem(job.getConfiguration());
          blkLocations = fs.getFileBlockLocations(file, 0, length);
       }
        //判断文件是否能被切片,压缩的文件不能进行切片
        if (isSplitable(job, path)) {
          //Get the block size of the file
          //获得块的数量
          long blockSize = file.getBlockSize();
         // 获取切片的最终大小
         //(-----------------------------computeSplitSize)
          long splitSize = computeSplitSize(blockSize, minSize, maxSize);
           //剩下的长度 
          long bytesRemaining = length;
            //判断剩下的是否能够切片,依据是不超过数据的10%不切,
            //即就是最后一个切片最大是140.8M,最小切片是12.9M
          while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
              //第一个切片的索引为0
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
              //将切片放到集合里,获取片的地址.路径
            splits.add(makeSplit(path, length-bytesRemaining, splitSize,
                        blkLocations[blkIndex].getHosts(),
                        blkLocations[blkIndex].getCachedHosts()));
              //获取剩余部分的长度
            bytesRemaining -= splitSize;
         }
   //最后一个切片的大小(0,1.1]*splitSize,将最后一个切片放入集合中
          if (bytesRemaining != 0) {
            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);
            splits.add(makeSplit(path, length-bytesRemaining, bytesRemaining,
                       blkLocations[blkIndex].getHosts(),
                       blkLocations[blkIndex].getCachedHosts()));
         }
       } else { 
            // not splitable
          splits.add(makeSplit(path, 0, length, blkLocations[0].getHosts(),
                      blkLocations[0].getCachedHosts()));
       }
     } else { 
        //Create empty hosts array for zero length files
        //如果文件为空就创建一个空的文件夹
        splits.add(makeSplit(path, 0, length, new String[0]));
     }
   }
 // Save the number of input files for metrics/loadgen
    job.getConfiguration().setLong(NUM_INPUT_FILES, files.size());
    sw.stop();
    if (LOG.isDebugEnabled()) {
      LOG.debug("Total # of splits generated by getSplits: " + splits.size()
          + ", TimeTaken: " + sw.elapsedMillis());
   }
 //返回带有切片的list集合
    return splits;
 }

computeSplitSize解析

//计算块的大小的方法
//如何让Splitsize>blockSize(将MinSize大于blockSize)
//如果让SplitsizeMath.max(minSize, Math.min(maxSize, blockSize));

2.maptask

//  所属包org.apache.hadoop.mapred
  public void run(final JobConf job, final TaskUmbilicalProtocol umbilical)
    throws IOException, ClassNotFoundException, InterruptedException {
    this.umbilical = umbilical;
    if (isMapTask()) {
      // If there are no reducers then there won't be any sort. Hence the map 
      // phase will govern the entire attempt's progress.
      //如果没有reducers,只进行map操作
      if (conf.getNumReduceTasks() == 0) {
        mapPhase = getProgress().addPhase("map", 1.0f);
     } else {
        // If there are reducers then the entire attempt's progress will be 
        // split between the map phase (67%) and the sort phase (33%).
          //设置溢写
        mapPhase = getProgress().addPhase("map", 0.667f);
        sortPhase  = getProgress().addPhase("sort", 0.333f);
     }
   }
    TaskReporter reporter = startReporter(umbilical);
  //判断是否使用新的API
    boolean useNewApi = job.getUseNewMapper();
      //进行数据初始化
    initialize(job, getJobID(), reporter, useNewApi);

    // check if it is a cleanupJobTask
    if (jobCleanup) {
      runJobCleanupTask(umbilical, reporter);
      return;
   }
    if (jobSetup) {
      runJobSetupTask(umbilical, reporter);
      return;
   }
    if (taskCleanup) {
      runTaskCleanupTask(umbilical, reporter);
      return;
   }
 //(------------------------------- runNewMapper)
    if (useNewApi) {
      runNewMapper(job, splitMetaInfo, umbilical, reporter);
   } else {
      runOldMapper(job, splitMetaInfo, umbilical, reporter);
   }
    done(umbilical, reporter);
 }

runNewMapper解析

//创建任务上下文对象
taskContext =
      new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job, 
                                                                  getTaskID(),
                                                                  reporter);
// make a mapper
//通过反射创建mapper类
//获取到的是wordcountmapper这个类
    org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE> mapper =
     (org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>)
        ReflectionUtils.newInstance(taskContext.getMapperClass(), job);
// make the input format
//所属包 org.apache.hadoop.mapreduce.lib.input.TextInputFormat
    org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE> inputFormat =
     (org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE>)
        ReflectionUtils.newInstance(taskContext.getInputFormatClass(), job);
//记录读取器,使用的类类似于装饰器模式
 input = new NewTrackingRecordReader (split, inputFormat, reporter, taskContext);
//真正的读取数据的内部组件
//(-------------------------createRecordReader)
this.real = inputFormat.createRecordReader(split, taskContext);
//数据写出器
 org.apache.hadoop.mapreduce.RecordWriter output = null;
    // get an output object
    if (job.getNumReduceTasks() == 0) {
      output = 
        new NewDirectOutputCollector(taskContext, job, umbilical, reporter);
   } else {
      output = new NewOutputCollector(taskContext, job, umbilical, reporter);
   }
//创建一个收集器
collector = createSortingCollector(job, reporter);
//创建收集器org.apache.hadoop.mapred.MapOutputBuffer.class
MapOutputCollector<KEY, VALUE> collector =
          ReflectionUtils.newInstance(subclazz, job);
//收集器初始化
//(------------------------------ collector.init(context))
 collector.init(context);
//创建分区 org.apache.hadoop.mapreduce.lib.partition.HashPartitioner.class
//(-------------------HashPartitioner.class)
 partitions = jobContext.getNumReduceTasks();
      if (partitions > 1) {
        partitioner = (org.apache.hadoop.mapreduce.Partitioner<K,V>)
          ReflectionUtils.newInstance(jobContext.getPartitionerClass(), job);
     } else {
        partitioner = new org.apache.hadoop.mapreduce.Partitioner<K,V>() {
          @Override
          public int getPartition(K key, V value, int numPartitions) {
            return partitions - 1;
         }
//创建mapper的上下文对象
 mapContext = 
      new MapContextImpl<INKEY, INVALUE, OUTKEY, OUTVALUE>(job,
                                                           getTaskID(), 
                                                           input, output, 
                                                           committer, 
                                                           reporter, split);
//创建mapperContext的包装类
mapperContext = 
          new WrappedMapper<INKEY, INVALUE, OUTKEY, OUTVALUE>().getMapContext(
              mapContext);
try {
 //行读取器初始化
    //(----------------------------------input.initialize)
      input.initialize(split, mapperContext);
    //(------------------mapper.run)
      mapper.run(mapperContext);
      mapPhase.complete();
      setPhase(TaskStatus.Phase.SORT);
    //更新状态
      statusUpdate(umbilical);
    //关闭行读取器
      input.close();
      input = null;
      output.close(mapperContext);
    //数据刷出
    //关闭写出器
      output = null;
   } finally {
      closeQuietly(input);
      closeQuietly(output, mapperContext);
   }

解析HashPartitioner.class

public class HashPartitioner<K, V> extends Partitioner<K, V> {

  /** Use {@link Object#hashCode()} to partition. */
  public int getPartition(K key, V value,
                          int numReduceTasks) {
      //取余运算防止负数
    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
 }

}

解析 collector.init(context)

  public void init(MapOutputCollector.Context context
                   ) throws IOException, ClassNotFoundException {
      job = context.getJobConf();
      reporter = context.getReporter();
      mapTask = context.getMapTask();
      mapOutputFile = mapTask.getMapOutputFile();
      sortPhase = mapTask.getSortPhase();
      spilledRecordsCounter = reporter.getCounter(TaskCounter.SPILLED_RECORDS);
      partitions = job.getNumReduceTasks();
      rfs = ((LocalFileSystem)FileSystem.getLocal(job)).getRaw();

      //sanity checks
      final float spillper =
        job.getFloat(JobContext.MAP_SORT_SPILL_PERCENT, (float)0.8);
      final int sortmb = job.getInt(JobContext.IO_SORT_MB, 100);
      indexCacheMemoryLimit = job.getInt(JobContext.INDEX_CACHE_MEMORY_LIMIT,
                                         INDEX_CACHE_MEMORY_LIMIT_DEFAULT);
      //确定溢写范围
      if (spillper > (float)1.0 || spillper <= (float)0.0) {
        throw new IOException("Invalid \"" + JobContext.MAP_SORT_SPILL_PERCENT +
            "\": " + spillper);
     }
      if ((sortmb & 0x7FF) != sortmb) {
        throw new IOException(
            "Invalid \"" + JobContext.IO_SORT_MB + "\": " + sortmb);
     }
      sorter = ReflectionUtils.newInstance(job.getClass("map.sort.class",
            QuickSort.class, IndexedSorter.class), job);
      // buffers and accounting
      int maxMemUsage = sortmb << 20;
      maxMemUsage -= maxMemUsage % METASIZE;
      //环形缓冲区,默认大小100M
      kvbuffer = new byte[maxMemUsage];
      bufvoid = kvbuffer.length;
      kvmeta = ByteBuffer.wrap(kvbuffer)
         .order(ByteOrder.nativeOrder())
         .asIntBuffer();
      setEquator(0);
      bufstart = bufend = bufindex = equator;
      kvstart = kvend = kvindex;

      maxRec = kvmeta.capacity() / NMETA;
      softLimit = (int)(kvbuffer.length * spillper);
      bufferRemaining = softLimit;
      LOG.info(JobContext.IO_SORT_MB + ": " + sortmb);
      LOG.info("soft limit at " + softLimit);
      LOG.info("bufstart = " + bufstart + "; bufvoid = " + bufvoid);
      LOG.info("kvstart = " + kvstart + "; length = " + maxRec);

      // k/v serialization
      comparator = job.getOutputKeyComparator();
      //获取key和value的类型
      keyClass = (Class<K>)job.getMapOutputKeyClass();
      valClass = (Class<V>)job.getMapOutputValueClass();
      serializationFactory = new SerializationFactory(job);
      keySerializer = serializationFactory.getSerializer(keyClass);
      keySerializer.open(bb);
      valSerializer = serializationFactory.getSerializer(valClass);
      valSerializer.open(bb);

      // output counters
      mapOutputByteCounter = reporter.getCounter(TaskCounter.MAP_OUTPUT_BYTES);
      mapOutputRecordCounter =
        reporter.getCounter(TaskCounter.MAP_OUTPUT_RECORDS);
      fileOutputByteCounter = reporter
         .getCounter(TaskCounter.MAP_OUTPUT_MATERIALIZED_BYTES);

      // compression
      if (job.getCompressMapOutput()) {
        Classextends CompressionCodec> codecClass =
          job.getMapOutputCompressorClass(DefaultCodec.class);
        codec = ReflectionUtils.newInstance(codecClass, job);
     } else {
        codec = null;
     }

      // combiner
      final Counters.Counter combineInputCounter =
        reporter.getCounter(TaskCounter.COMBINE_INPUT_RECORDS);
      combinerRunner = CombinerRunner.create(job, getTaskID(), 
                                             combineInputCounter,
                                             reporter, null);
      if (combinerRunner != null) {
        final Counters.Counter combineOutputCounter =
          reporter.getCounter(TaskCounter.COMBINE_OUTPUT_RECORDS);
        combineCollector= new CombineOutputCollector<K,V>(combineOutputCounter, reporter, job);
     } else {
        combineCollector = null;
     }
      spillInProgress = false;
      minSpillsForCombine = job.getInt(JobContext.MAP_COMBINE_MIN_SPILLS, 3);
      //溢写线程
      spillThread.setDaemon(true);
      spillThread.setName("SpillThread");
      spillLock.lock();
      try {
        spillThread.start();
        while (!spillThreadRunning) {
          spillDone.await();
       }
     } catch (InterruptedException e) {
        throw new IOException("Spill thread failed to initialize", e);
     } finally {
        spillLock.unlock();
     }
      if (sortSpillException != null) {
        throw new IOException("Spill thread failed to initialize",sortSpillException);
     }
   }

解析mapper.run

 public void run(Context context) throws IOException, InterruptedException {
    //开始任务
     setup(context);
    try {
        //循环调用数据(mapcontext-->mapcontextimpl-->lineRecordWriter.nextKeyValue())
      //(-------------------------context.nextKeyValue())
        while (context.nextKeyValue()) {
        //context.getCurrentKey()-->当前行的索引
       //context.getCurrentValue()-->value当前行的数据
        map(context.getCurrentKey(), context.getCurrentValue(), context);
     }
   } finally {
        //任务结束
      cleanup(context);
   }
 }

解析context.nextKeyValue()

 public boolean nextKeyValue() throws IOException {
    if (key == null) {
        //设置key的类型
      key = new LongWritable();
   }
    //设置当前行开始的索引
    key.set(pos);
    if (value == null) {
      //设置value的类型
      value = new Text();
   }
    int newSize = 0;
    // We always read one extra line, which lies outside the upper
    // split limit i.e. (end - 1)
    //除了最后一行之外,我们总是多读一行
    //getFilePosition确定当前读取的数据定位在切片中
    //in.needAdditionalRecordAfterSplit()即使超出范围,可能下个切片的第一行也是能读出来的
    while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {
      if (pos == 0) {
        //如果是第一行就跳过Utf-8字符
        newSize = skipUtfByteOrderMark();
     } else {
        //读取一行数据,并把它存到value中
        newSize = in.readLine(value, maxLineLength, maxBytesToConsume(pos));
        //将定位修改到下一行的开头
        pos += newSize;
     }

      if ((newSize == 0) || (newSize < maxLineLength)) {
        break;
     }

      // line too long. try again
      LOG.info("Skipped line of size " + newSize + " at pos " + 
               (pos - newSize));
   }
    if (newSize == 0) {
      key = null;
      value = null;
      return false;
   } else {
      return true;
   }
 }

解析createRecordReader

  @Override
  public RecordReader<LongWritable, Text> 
    createRecordReader(InputSplit split,
                       TaskAttemptContext context) {
    String delimiter = context.getConfiguration().get(
        "textinputformat.record.delimiter");
    byte[] recordDelimiterBytes = null;
    if (null != delimiter)
      recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
      //返回一个行读取器
    return new LineRecordReader(recordDelimiterBytes);
 }

解析input.initialize

public void initialize(InputSplit genericSplit,
                         TaskAttemptContext context) throws IOException {
    FileSplit split = (FileSplit) genericSplit;
    Configuration job = context.getConfiguration();
    //每一行最多读取的大小,默认最大读取值是int的最大范围
    this.maxLineLength = job.getInt(MAX_LINE_LENGTH, Integer.MAX_VALUE);
    //The position of the first byte in the file to process
    //获得进程的文件中第一的字节的偏移量
    start = split.getStart();
    //获取当前切片的最后一个字符的位置
    end = start + split.getLength();
    //获取切片的路径
    final Path file = split.getPath();
    // open the file and seek to the start of the split
    final FileSystem fs = file.getFileSystem(job);
    fileIn = fs.open(file);
    
    CompressionCodec codec = new CompressionCodecFactory(job).getCodec(file);
    if (null!=codec) {
      isCompressedInput = true; 
      decompressor = CodecPool.getDecompressor(codec);
      if (codec instanceof SplittableCompressionCodec) {
        final SplitCompressionInputStream cIn =
         ((SplittableCompressionCodec)codec).createInputStream(
            fileIn, decompressor, start, end,
            SplittableCompressionCodec.READ_MODE.BYBLOCK);
        in = new CompressedSplitLineReader(cIn, job,
            this.recordDelimiterBytes);
        start = cIn.getAdjustedStart();
        end = cIn.getAdjustedEnd();
        filePosition = cIn;
     } else {
        in = new SplitLineReader(codec.createInputStream(fileIn,
            decompressor), job, this.recordDelimiterBytes);
        filePosition = fileIn;
     }
   } else {
      fileIn.seek(start);
      in = new UncompressedSplitLineReader(
          fileIn, job, this.recordDelimiterBytes, split.getLength());
      filePosition = fileIn;
   }
    // If this is not the first split, we always throw away first record
    // because we always (except the last split) read one extra line in
    // next() method.
    //如果不是第一个切片,我们读取时就跳过第一行
    //除了最后一行之外,每个切片多读下一个切片的第一行
    if (start != 0) {
      start += in.readLine(new Text(), 0, maxBytesToConsume(start));
   }
    this.pos = start;
 }

3.reduce源码分析

1.reducetask

//所属包 org.apache.hadoop.mapred
public void run(JobConf job, final TaskUmbilicalProtocol umbilical)
    throws IOException, InterruptedException, ClassNotFoundException {
    job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());

    if (isMapOrReduce()) {
      copyPhase = getProgress().addPhase("copy");
      sortPhase  = getProgress().addPhase("sort");
      reducePhase = getProgress().addPhase("reduce");
   }
    // start thread that will handle communication with parent
    TaskReporter reporter = startReporter(umbilical);
    
    boolean useNewApi = job.getUseNewReducer();
    //(------------------initialize(job, getJobID(), reporter, useNewApi))
    initialize(job, getJobID(), reporter, useNewApi);

    // check if it is a cleanupJobTask
    if (jobCleanup) {
      runJobCleanupTask(umbilical, reporter);
      return;
   }
    if (jobSetup) {
      runJobSetupTask(umbilical, reporter);
      return;
   }
    if (taskCleanup) {
      runTaskCleanupTask(umbilical, reporter);
      return;
   }
    
    // Initialize the codec
    codec = initCodec();
    //(---------------------------------RawKeyValueIterator)
    //原生的key和value迭代器
    RawKeyValueIterator rIter = null;
    //洗牌插件,模式使用Shuffle.class
    //所属包org.apache.hadoop.mapreduce.task.reduce.Shuffle.class
    ShuffleConsumerPlugin shuffleConsumerPlugin = null;
    
    Class combinerClass = conf.getCombinerClass();
    CombineOutputCollector combineCollector = 
     (null != combinerClass) ? 
     new CombineOutputCollector(reduceCombineOutputCounter, reporter, conf) : null;

    Classextends ShuffleConsumerPlugin> clazz =
          job.getClass(MRConfig.SHUFFLE_CONSUMER_PLUGIN, Shuffle.class, ShuffleConsumerPlugin.class);
     
    shuffleConsumerPlugin = ReflectionUtils.newInstance(clazz, job);
    LOG.info("Using ShuffleConsumerPlugin: " + shuffleConsumerPlugin);

    ShuffleConsumerPlugin.Context shuffleContext = 
      new ShuffleConsumerPlugin.Context(getTaskID(), job, FileSystem.getLocal(job), umbilical, 
                  super.lDirAlloc, reporter, codec, 
                  combinerClass, combineCollector, 
                  spilledRecordsCounter, reduceCombineInputCounter,
                  shuffledMapsCounter,
                  reduceShuffleBytes, failedShuffleCounter,
                  mergedMapOutputsCounter,
                  taskStatus, copyPhase, sortPhase, this,
                  mapOutputFile, localMapFiles);
    shuffleConsumerPlugin.init(shuffleContext);
 //(---------------------------shuffleConsumerPlugin.run())
    rIter = shuffleConsumerPlugin.run();

    // free up the data structures
    mapOutputFilesOnDisk.clear();
    
    sortPhase.complete();                         // sort is complete
    setPhase(TaskStatus.Phase.REDUCE); 
    statusUpdate(umbilical);
    //获取输出key的类型
    Class keyClass = job.getMapOutputKeyClass();
    //获取输出value的类型
    Class valueClass = job.getMapOutputValueClass();
    //获取分组比较器
    //(----------------------job.getOutputValueGroupingComparator())
    RawComparator comparator = job.getOutputValueGroupingComparator();

    if (useNewApi) {
      //(-------------------------------- runNewReducer)
      runNewReducer(job, umbilical, reporter, rIter, comparator, 
                    keyClass, valueClass);
   } else {
      runOldReducer(job, umbilical, reporter, rIter, comparator, 
                    keyClass, valueClass);
   }

    shuffleConsumerPlugin.close();
    done(umbilical, reporter);
 }

解析initialize(job, getJobID(), reporter, useNewApi)

jobContext = new JobContextImpl(job, id, reporter);
taskContext = new TaskAttemptContextImpl(job, taskId, reporter);
outputFormat =ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job);
committer = outputFormat.getOutputCommitter(taskContext);

解析RawKeyValueIterator和shuffleConsumerPlugin.run()

 public RawKeyValueIterator run() throws IOException, InterruptedException {
    // Scale the maximum events we fetch per RPC call to mitigate OOM issues
    // on the ApplicationMaster when a thundering herd of reducers fetch events
   //设置最大的范围防止出现OOM(内存溢出)问题
    int eventsPerReducer = Math.max(MIN_EVENTS_TO_FETCH,
        MAX_RPC_OUTSTANDING_EVENTS / jobConf.getNumReduceTasks());
    int maxEventsToFetch = Math.min(MAX_EVENTS_TO_FETCH, eventsPerReducer);

    // Start the map-completion events fetcher thread
    final EventFetcher<K,V> eventFetcher = 
      new EventFetcher<K,V>(reduceId, umbilical, scheduler, this,
          maxEventsToFetch);
    eventFetcher.start();
    
    // Start the map-output fetcher threads
   //开启拉取线程
    boolean isLocal = localMapFiles != null;
   //如果是本期文件开启1个拉取器,如果是远程的开启5个拉取器
    final int numFetchers = isLocal ? 1 :
      jobConf.getInt(MRJobConfig.SHUFFLE_PARALLEL_COPIES, 5);
    Fetcher<K,V>[] fetchers = new Fetcher[numFetchers];
    if (isLocal) {
      fetchers[0] = new LocalFetcher<K, V>(jobConf, reduceId, scheduler,
          merger, reporter, metrics, this, reduceTask.getShuffleSecret(),
          localMapFiles);
      fetchers[0].start();
   } else {
      for (int i=0; i < numFetchers; ++i) {
        fetchers[i] = new Fetcher<K,V>(jobConf, reduceId, scheduler, merger, 
                                       reporter, metrics, this, 
                                       reduceTask.getShuffleSecret());
         //默认调用run方法;run方法中会调用copyFromHost(host);  
          fetchers[i].start();
     }
   }
    
    // Wait for shuffle to complete successfully
    while (!scheduler.waitUntilDone(PROGRESS_FREQUENCY)) {
      reporter.progress();
      
      synchronized (this) {
        if (throwable != null) {
          throw new ShuffleError("error in shuffle in " + throwingThreadName,
                                 throwable);
       }
     }
   }

    // Stop the event-fetcher thread
    eventFetcher.shutDown();
    
    // Stop the map-output fetcher threads
    for (Fetcher<K,V> fetcher : fetchers) {
      fetcher.shutDown();
   }
    
    // stop the scheduler
    scheduler.close();

    copyPhase.complete(); // copy is already complete
    taskStatus.setPhase(TaskStatus.Phase.SORT);
    reduceTask.statusUpdate(umbilical);

    // Finish the on-going merges...
    RawKeyValueIterator kvIter = null;
    try {
      kvIter = merger.close();
   } catch (Throwable e) {
      throw new ShuffleError("Error while doing final merge " , e);
   }

    // Sanity check
    synchronized (this) {
      if (throwable != null) {
        throw new ShuffleError("error in shuffle in " + throwingThreadName,
                               throwable);
     }
   }
    
    return kvIter;
 }

解析kvIter = merger.close();

kvIter = merger.close();
//(||||||||||||||||||||||||||||||||||||||||||||||||||||)
final RawComparator<K> comparator =
   (RawComparator<K>)job.getOutputKeyComparator();
return finalMerge(jobConf, rfs, memory, disk);
//(||||||||||||||||||||||||||||||||||||||||||||||||||||)
eturn Merger.merge(job, fs, keyClass, valueClass,
    finalSegments, finalSegments.size(), tmpDir,
    comparator, reporter, spilledRecordsCounter, null, null);
//(||||||||||||||||||||||||||||||||||||||||||||||||||||)
return merge(conf, fs, keyClass, valueClass, segments, mergeFactor, tmpDir,
    comparator, reporter, false, readsCounter, writesCounter,mergePhase);
//(||||||||||||||||||||||||||||||||||||||||||||||||||||)
return new MergeQueue<K, V>(conf, fs, segments, comparator, reporter,
    sortSegments,
    TaskType.REDUCE).merge(keyClass, valueClass,
                        mergeFactor, tmpDir,
                        readsCounter, writesCounter,
                        mergePhase);
//(||||||||||||||||||||||||||||||||||||||||||||||||||||)
//kvIter最终拿到的是org.apache.hadoop.mapred.Merger.MergeQueue
org.apache.hadoop.mapred.Merger.MergeQueue

解析job.getOutputValueGroupingComparator()

//获取设置的分组比较器
public RawComparator getOutputValueGroupingComparator() {
    Classextends RawComparator> theClass = getClass(
      JobContext.GROUP_COMPARATOR_CLASS, null, RawComparator.class);
    //如果没有设置分组比较器,选择key比较器
    if (theClass == null) {
      return getOutputKeyComparator();
   }
    
    return ReflectionUtils.newInstance(theClass, this);
 }
//如果没key比较器使用key类型自带的比较器 : mapreduce.map.output.key.class

解析 runNewReducer

 private <INKEY,INVALUE,OUTKEY,OUTVALUE> void runNewReducer(JobConf job,
                     final TaskUmbilicalProtocol umbilical,
                     final TaskReporter reporter,
                     RawKeyValueIterator rIter,
                     RawComparator<INKEY> comparator,
                     Class<INKEY> keyClass,
                     Class<INVALUE> valueClass
                     ) throws IOException,InterruptedException, 
                              ClassNotFoundException {
    // wrap value iterator to report progress.
    final RawKeyValueIterator rawIter = rIter;
    rIter = new RawKeyValueIterator() {
      public void close() throws IOException {
        rawIter.close();
     }
      public DataInputBuffer getKey() throws IOException {
        return rawIter.getKey();
     }
      public Progress getProgress() {
        return rawIter.getProgress();
     }
      public DataInputBuffer getValue() throws IOException {
        return rawIter.getValue();
     }
      public boolean next() throws IOException {
        boolean ret = rawIter.next();
        reporter.setProgress(rawIter.getProgress().getProgress());
        return ret;
     }
   };
    // make a task context so we can get the classes
    org.apache.hadoop.mapreduce.TaskAttemptContext taskContext =
      new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job,
          getTaskID(), reporter);
    // make a reducer
    //实际上就是Wordcountreducer.class                              
    org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE> reducer =
     (org.apache.hadoop.mapreduce.Reducer<INKEY,INVALUE,OUTKEY,OUTVALUE>)
        ReflectionUtils.newInstance(taskContext.getReducerClass(), job);
   //生成写出器
   //所属包org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.LineRecordWriter
    this.real  = return new LineRecordWriter<K, V>(fileOut, keyValueSeparator);           
    org.apache.hadoop.mapreduce.RecordWriter<OUTKEY,OUTVALUE> trackedRW = 
      new NewTrackingRecordWriter<OUTKEY, OUTVALUE>(this, taskContext);
    job.setBoolean("mapred.skip.on", isSkipping());
    job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
     //(---------------------------- reducerContext)                             
    org.apache.hadoop.mapreduce.Reducer.Context 
         reducerContext = createReduceContext(reducer, job, getTaskID(),
                                               rIter, reduceInputKeyCounter, 
                                               reduceInputValueCounter, 
                                               trackedRW,
                                               committer,
                                               reporter, comparator, keyClass,
                                               valueClass);
    try {
      //(----------------------reducer.run)
      reducer.run(reducerContext);
   } finally {
      trackedRW.close(reducerContext);
   }
 }

解析 reducerContext

//org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer
//org.apache.hadoop.mapreduce.task.ReduceContextImpl
reducerContext = new WrappedReducer-->new ReduceContextImpl

解析reducer.run

WrappedReducer.nextkey();
ReduceContextImpl.nextkey();
//拿到反序列化的key
key = keyDeserializer.deserialize(key);
//拿到反序列化的value
value = valueDeserializer.deserialize(value);
//下一个值是否存在
 hasMore = input.next();
//判断下个key是否有值,
if (hasMore) {
    //获得下一个key
      nextKey = input.getKey();
    //比较当前key和下一个key是否一样
      nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0, 
                                     currentRawKey.getLength(),
                                     nextKey.getData(),
                                     nextKey.getPosition(),
                                     nextKey.getLength() - nextKey.getPosition()
                                         ) == 0;
   } else {
      nextKeyIsSame = false;
   }
    inputValueCounter.increment(1);
//context.getValues()返回的是一个迭代器return iterable;

你可能感兴趣的:(MapReduce流程讲解以及源码分析)

C语言如何定义宏函数？小九格物 c语言
在C语言中，宏函数是通过预处理器定义的，它在编译之前替换代码中的宏调用。宏函数可以模拟函数的行为，但它们不是真正的函数，因为它们在编译时不会进行类型检查，也不会分配存储空间。宏函数的定义通常使用#define指令，后面跟着宏的名称和参数列表，以及宏展开后的代码。宏函数的定义方式：1.基本宏函数：这是最简单的宏函数形式，它直接定义一个表达式。#defineSQUARE(x)((x)*(x))2.带参
如何在 Fork 的 GitHub 项目中保留自己的修改并同步上游更新？github_fork_update iBaoxing github
如何在Fork的GitHub项目中保留自己的修改并同步上游更新？在GitHub上Fork了一个项目后，你可能会对项目进行一些修改，同时原作者也在不断更新。如果想要在保留自己修改的基础上，同步原作者的最新更新，很多人会不知所措。本文将详细讲解如何在不丢失自己改动的情况下，将上游仓库的更新合并到自己的仓库中。问题描述假设你在GitHub上Fork了一个项目，并基于该项目做了一些修改，随后你发现原作者对
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
Python数据分析与可视化实战指南 William数据分析 python python 数据
在数据驱动的时代，Python因其简洁的语法、强大的库生态系统以及活跃的社区，成为了数据分析与可视化的首选语言。本文将通过一个详细的案例，带领大家学习如何使用Python进行数据分析，并通过可视化来直观呈现分析结果。一、环境准备1.1安装必要库在开始数据分析和可视化之前，我们需要安装一些常用的库。主要包括pandas、numpy、matplotlib和seaborn等。这些库分别用于数据处理、数学
Pyecharts数据可视化大屏：打造沉浸式数据分析体验我的运维人生信息可视化数据分析数据挖掘运维开发技术共享
Pyecharts数据可视化大屏：打造沉浸式数据分析体验在当今这个数据驱动的时代，如何将海量数据以直观、生动的方式展现出来，成为了数据分析师和企业决策者关注的焦点。Pyecharts，作为一款基于Python的开源数据可视化库，凭借其丰富的图表类型、灵活的配置选项以及高度的定制化能力，成为了构建数据可视化大屏的理想选择。本文将深入探讨如何利用Pyecharts打造数据可视化大屏，并通过实际代码案例
怎么起诉借钱不还的人？怎样起诉欠款不还的人？影子爱学习
怎么起诉借钱不还的人？怎样起诉欠款不还的人？如果遇到难以解决的法律问题，我们可以匹配专业律师。例如：婚姻家庭（离婚纠纷）、刑事辩护、合同纠纷、债权债务、房产（继承）纠纷、交通事故、劳动争议、人身损害、公司相关法律事务（法律顾问）等咨询推荐手机/微信:15633770876【全国案件皆可】借钱不还起诉对方需要哪些资料起诉欠钱不还的，一般需要的材料包括以下这些：借据、收据、欠条、付款凭证等证据，以及向
linux中sdl的使用教程,sdl使用入门 Melissa Corvinus linux中sdl的使用教程
本文通过一个简单示例讲解SDL的基本使用流程。示例中展示一个窗口，窗口里面有个随机颜色快随机移动。当我们鼠标点击关闭按钮时间窗口关闭。基本步骤如下：1.初始化SDL并创建一个窗口。SDL_Init()初始化SDL_CreateWindow()创建窗口2.纹理渲染存储RGB和存储纹理的区别：比如一个从左到右由红色渐变到蓝色的矩形，用存储RGB的话就需要把矩形中每个点的具体颜色值存储下来；而纹理只是一
PHP环境搭建详细教程好看资源平台前端 php
PHP是一个流行的服务器端脚本语言，广泛用于Web开发。为了使PHP能够在本地或服务器上运行，我们需要搭建一个合适的PHP环境。本教程将结合最新资料，介绍在不同操作系统上搭建PHP开发环境的多种方法，包括Windows、macOS和Linux系统的安装步骤，以及本地和Docker环境的配置。1.PHP环境搭建概述PHP环境的搭建主要分为以下几类：集成开发环境：例如XAMPP、WAMP、MAMP，这
18-115 一切思考不能有效转化为行动，都TM是扯淡！成长时间线
7月25号写了一篇关于为什么会断更如此严重的反思，然而，之后日更仅仅维持了一周，又出现了这次更严重的现象。从8月2号到昨天8月6号，5天！又是5天没有更文！虽然这次断更时间和上次一样，那为什么说这次更严重？因为上次之后就分析了问题的原因，以及应该如何解决，按理说应该会好转，然而，没过几天严重断更的现象再次出现，想想，经过反思，问题依然没有解决与改变，这让我有些担忧。到底是哪里出了问题，难道我就真的
直返最高等级与直返APP：无需邀请码的返利新体验古楼
随着互联网的普及和电商的兴起，直返模式逐渐成为一种流行的商业模式。在这种模式下，消费者通过购买产品或服务，获得一定的返利，并可以分享给更多的人。其中，直返最高等级和直返APP是直返模式中的重要概念和工具。本文将详细介绍直返最高等级的概念、直返APP的使用以及与邀请码的关系。【高省】APP（高佣金领导者）是一个自用省钱佣金高，分享推广赚钱多的平台，百度有几百万篇报道，运行三年，稳定可靠。高省APP，
【加密社】Solidity 中的事件机制及其应用加密社闲侃区块链智能合约区块链
加密社引言在Solidity合约开发过程中，事件（Events）是一种非常重要的机制。它们不仅能够让开发者记录智能合约的重要状态变更，还能够让外部系统（如前端应用）监听这些状态的变化。本文将详细介绍Solidity中的事件机制以及如何利用不同的手段来触发、监听和获取这些事件。事件存储的地方当我们在Solidity合约中使用emit关键字触发事件时，该事件会被记录在区块链的交易收据中。具体而言，事件
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
【目标检测数据集】卡车数据集1073张VOC+YOLO格式熬夜写代码的平头哥∰ 目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：1073标注数量(xml文件个数)：1073标注数量(txt文件个数)：1073标注类别数：1标注类别名称:["truck"]每个类别标注的框数：truck框数=1120总框数：1120使用标注工具：labelImg标注
走向以教育叙事为载体的教育叙事研究 666小飞鱼
今天我读了吴松超老师的《给教师的68条建写作建议》中的第23条《如何通过教育叙事走向研究》，吴老师在文中与我们分享了一个德育案例，这是一个反面的案例，意在告知我们在处理问题时，不能就考虑的点太窄，思考要全面。走向教育叙事研究，教师要有敏锐的“感知力”，这个感知力来自于背后专业知识的支撑，思维能力以及广阔的视野和见识等。所以对于同一件事处理方法不同，这个就是教师背后“敏锐力”的不同造成的，也就是说是
钢筋长度超限检测检数据集VOC+YOLO格式215张1类别 futureflsl 数据集 YOLO 深度学习机器学习
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：215标注数量(xml文件个数)：215标注数量(txt文件个数)：215标注类别数：1标注类别名称:["iron"]每个类别标注的框数：iron框数=215总框数：215使用标注工具：labelImg标注规则：对类别进
【华为OD技术面试真题 - 技术面】- python八股文真题题库（4) 算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选**1.Python中的`with`**用途和功能自动资源管理示例：文件操作上下文管理协议示例代码工作流程解析优点2.\_\_new\_\_和**\_\_init\_\_**区别__new____init__区别总结3.**切片（Slicing）操作**基本切片语法
【华为OD技术面试真题 - 技术面】-测试八股文真题题库（1）算法大师华为od 面试 python 算法前端
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.黑盒测试和白盒测试的区别2.假设我们公司现在开发一个类似于微信的软件1.0版本，现在要你测试这个功能：打开聊天窗口，输入文本，限制字数在200字以内。问你怎么提取测试点。功能测试性能测试安全性测试可用性测试跨平台兼容性测试网络环境测试3.接口测试的工具你了解哪些
闲鱼鱼小铺怎么开通？鱼小铺开通需要哪些流程？高省APP大九
闲鱼鱼小铺是平台推出的一个专业程度的店铺，与普通店铺相比会有更多的权益，比如说发布的商品数量从50增加到500；拥有专业的店铺数据看板与分析的功能，这对于专门在闲鱼做生意的用户来说是非常有帮助的，那么鱼小铺每个人都能开通吗？大家好，我是高省APP联合创始人蓓蓓导师，高省APP是2021年推出的电商导购平台，0投资，0风险、高省APP佣金更高，模式更好，终端用户不流失。【高省】是一个可省钱佣金高，能
ARM驱动学习之基础小知识 JT灬新一 ARM 嵌入式 arm开发学习
ARM驱动学习之基础小知识•sch原理图工程师工作内容–方案–元器件选型–采购（能不能买到，价格）–原理图（涉及到稳定性）•layout画板工程师–layout（封装、布局，布线，log）（涉及到稳定性）–焊接的一部分工作（调试阶段板子的焊接）•驱动工程师–驱动，原理图，layout三部分的交集容易发生矛盾•PCB研发流程介绍–方案，原理图(网表)–layout工程师（gerber文件）–PCB板
Low Power概念介绍-Voltage Area 飞奔的大虎
随着智能手机，以及物联网的普及，芯片功耗的问题最近几年得到了越来越多的重视。为了实现集成电路的低功耗设计目标，我们需要在系统设计阶段就采用低功耗设计的方案。而且，随着设计流程的逐步推进，到了芯片后端设计阶段，降低芯片功耗的方法已经很少了，节省的功耗百分比也不断下降。芯片的功耗主要由静态功耗（staticleakagepower）和动态功耗(dynamicpower)构成。静态功耗主要是指电路处于等
Rust基础知识 GRKF15 rust 开发语言后端
1.Rust语言简介1.1基础语法变量声明：let关键字用于声明变量，可以指定或不指定类型，如leta=10;和letmutc=30i32;。函数定义：使用fn关键字定义函数，并指定参数类型及返回类型，如fnadd(i:i32,j:i32)->i32{i+j}。控制流：包括if、else等，控制语句后需要使用;来结束语句。1.2数据类型整数类型：i8、i16、i32、i64、i128，以及无符号的
C++菜鸟教程 - 从入门到精通第二节 DreamByte c++
一.上节课的补充(数据类型)1.前言继上节课,我们主要讲解了输入,输出和运算符,我们现在来补充一下数据类型的知识上节课遗漏了这个知识点,非常的抱歉顺便说一下,博主要上高中了,更新会慢,2-4周更新一次对了,正好赶上中秋节,小编跟大家说一句:中秋节快乐!2.int类型上节课,我们其实只用了int类型int类型,是整数类型,它们存贮的是整数,不能存小数(浮点数)定义变量的方式很简单inta;//定义一
Faiss：高效相似性搜索与聚类的利器网络·魚大数据 faiss
Faiss是一个针对大规模向量集合的相似性搜索库，由FacebookAIResearch开发。它提供了一系列高效的算法和数据结构，用于加速向量之间的相似性搜索，特别是在大规模数据集上。本文将介绍Faiss的原理、核心功能以及如何在实际项目中使用它。Faiss原理：近似最近邻搜索：Faiss的核心功能之一是近似最近邻搜索，它能够高效地在大规模数据集中找到与给定查询向量最相似的向量。这种搜索是近似的，
18、架构-可观测性之聚合度量大树~~ 架构 java python 后端架构
聚合度量聚合度量是指对系统运行时产生的各种指标数据进行收集、聚合和分析，以了解系统的健康状况和性能表现。聚合度量是可观测性的关键组成部分，通过对度量数据的分析，可以及时发现系统中的异常和瓶颈。以下是对聚合度量各个方面的详细解析，并结合具体的数据案例和技术支撑。指标收集收集系统运行时产生的各种指标数据是聚合度量的基础。常见的指标包括CPU使用率、内存使用率、请求处理时间、请求数、错误率等。以下是指标
【华为OD技术面试真题 - 技术面】- python八股文真题题库（1）算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.数据预处理流程数据预处理的主要步骤工具和库2.介绍线性回归、逻辑回归模型线性回归（LinearRegression）模型形式：关键点：逻辑回归（LogisticRegression）模型形式：关键点：参数估计与评估：3.python浅拷贝及深拷贝浅拷贝（Shal
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
如何选择最适合你的项目研发管理软件？TAPD卓越版全面解析北京云巴巴信息技术有限公司产品经理需求分析
在当今快速发展的科技时代，项目研发管理软件已成为企业不可或缺的重要工具。面对市场上琳琅满目的产品，如何选择一款适合自己团队的项目研发管理软件呢？本文将围绕项目研发管理软件的选择标准，重点介绍TAPD卓越版的特点、优势以及使用体验，让你更好地理解和选择适合自己的项目研发管理软件。项目研发管理软件的选择标准在选择项目研发管理软件时，我们需要考虑以下几个方面的因素：功能全面性：软件是否覆盖了从需求管理、
舜公郑金锋书辛丑自剪扇面书法作品（四O六）舜公郑金锋
辛丑小阳春，新自剪扇面400品，大多为各色撒金、撒银、描金、描银、水印、彩绘、荧光等亚粉、色宣纸，以及域外包装填充纸等；王一品长锋羊毫秃笔；一得阁云头艳墨、宿墨、水等。书体有甲骨文，金文(商周金文、春秋战国金文、中山王厝器金文、汉金文……)，楚简帛书，侯马盟书，温县盟书，小篆，果蝙书等，隶书(秦简、汉简帛书、汉碑……)，草书(章草、小草、大草……)，行书(行楷、行草)，楷书(魏碑及北朝墓志、隋朝墓
【讲解】怎么消除妊娠纹 poyan7160
女人是脆弱的，尤其是孕期的女性。辛辛苦苦怀胎十月，经历一次深到骨子里的痛还不够，无奈还要留下一身的妊娠纹。母亲是伟大的，但也是要付出代价的，妊娠纹就是最好的证明。可是，难道真的要带着妊娠纹过一辈子吗?不，坚决不!接下来新时代辣妈告诉你怎么去除妊娠纹?怎么去除妊娠纹——根据肌肤需要补充水分就像敷面膜那样，大家都知道敷面膜的目的是为了给肌肤补充水分。水分对一个人的肌肤很重要，只有有了足够的水分，肌肤才
下一站深圳默琊
昨天已经买好3/15到深圳的机票了，原本上周还有点拖延症发作，不太积极，所以昨天就直接逼迫自己买机票，然后在订房，下周就是确认行业和把具体的面谈日程定下来。行业的选择上目前没有太大的偏好，上一份工作主要是风控和客服，客服部分也算是个小组长，有负责培训和一些案件SOP流程的制定等工作。总感觉客服这个职位的职涯发展只能是垂直的往更高的管理层走，对于横向发展似乎不容易，而鉴于做客服1年的感受，我不太喜欢
windows下源码安装golang 616050468 golang安装 golang环境 windows
系统： 64位win7，开发环境：sublime text 2， go版本： 1.4.1 1. 安装前准备(gcc, gdb, git) golang在64位系
redis批量删除带空格的key bylijinnan redis
redis批量删除的通常做法： redis-cli keys "blacklist*" | xargs redis-cli del 上面的命令在key的前后没有空格时是可以的，但有空格就不行了： $redis-cli keys "blacklist*" 1) "blacklist:12: [email protected]
oracle正则表达式的用法 0624chenhong oracle 正则表达式
方括号表达示方括号表达式描述 [[:alnum:]] 字母和数字混合的字符 [[:alpha:]] 字母字符 [[:cntrl:]] 控制字符 [[:digit:]] 数字字符 [[:graph:]] 图像字符 [[:lower:]] 小写字母字符 [[:print:]] 打印字符 [[:punct：]] 标点符号字符 [[:space:]]
2048源码(核心算法有，缺少几个anctionbar，以后补上) 不懂事的小屁孩 2048
2048游戏基本上有四部分组成， 1：主activity，包含游戏块的16个方格，上面统计分数的模块 2：底下的gridview，监听上下左右的滑动，进行事件处理， 3：每一个卡片，里面的内容很简单，只有一个text，记录显示的数字 4：Actionbar，是游戏用重新开始，设置等功能(这个在底下可以下载的代码里面还没有实现) 写代码的流程 1：设计游戏的布局，基本是两块，上面是分
jquery内部链式调用机理换个号韩国红果果 JavaScript jquery
只需要在调用该对象合适(比如下列的setStyles)的方法后让该方法返回该对象（通过this 因为一旦一个函数称为一个对象方法的话那么在这个方法内部this（结合下面的setStyles）指向这个对象） function create(type){ var element=document.createElement(type); //this=element;
你订酒店时的每一次点击背后都是NoSQL和云计算蓝儿唯美 NoSQL
全球最大的在线旅游公司Expedia旗下的酒店预订公司，它运营着89个网站，跨越68个国家，三年前开始实验公有云，以求让客户在预订网站上查询假期酒店时得到更快的信息获取体验。云端本身是用于驱动网站的部分小功能的，如搜索框的自动推荐功能，还能保证处理Hotels.com服务的季节性需求高峰整体储能。 Hotels.com的首席技术官Thierry Bedos上个月在伦敦参加“2015 Clou
java笔记1 a-john java
1，面向对象程序设计（Object-oriented Propramming，OOP）：java就是一种面向对象程序设计。 2，对象：我们将问题空间中的元素及其在解空间中的表示称为“对象”。简单来说，对象是某个类型的实例。比如狗是一个类型，哈士奇可以是狗的一个实例，也就是对象。 3，面向对象程序设计方式的特性： 3.1 万物皆为对象。
C语言 sizeof和strlen之间的那些事 C/C++软件开发求职面试题必备考点（一） aijuans C/C++求职面试必备考点
找工作在即，以后决定每天至少写一个知识点，主要是记录，逼迫自己动手、总结加深印象。当然如果能有一言半语让他人收益，后学幸运之至也。如有错误，还希望大家帮忙指出来。感激不尽。后学保证每个写出来的结果都是自己在电脑上亲自跑过的，咱人笨，以前学的也半吊子。很多时候只能靠运行出来的结果再反过来
程序员写代码时就不要管需求了吗？ asia007 程序员不能一味跟需求走
编程也有2年了，刚开始不懂的什么都跟需求走，需求是怎样就用代码实现就行，也不管这个需求是否合理，是否为较好的用户体验。当然刚开始编程都会这样，但是如果有了2年以上的工作经验的程序员只知道一味写代码，而不在写的过程中思考一下这个需求是否合理，那么，我想这个程序员就只能一辈写敲敲代码了。我的技术不是很好，但是就不代
Activity的四种启动模式百合不是茶 android 栈模式启动 Activity的标准模式启动栈顶模式启动单例模式启动
android界面的操作就是很多个activity之间的切换,启动模式决定启动的activity的生命周期 ; 启动模式xml中配置 <activity android:name=".MainActivity" android:launchMode="standard&quo
Spring中@Autowired标签与@Resource标签的区别 bijian1013 java spring @Resource @Autowired @Qualifier
Spring不但支持自己定义的@Autowired注解，还支持由JSR-250规范定义的几个注解，如：@Resource、 @PostConstruct及@PreDestroy。 1. @Autowired @Autowired是Spring 提供的，需导入 Package:org.springframewo
Changes Between SOAP 1.1 and SOAP 1.2 sunjing Changes Enable SOAP 1.1 SOAP 1.2
JAX-WS SOAP Version 1.2 Part 0: Primer (Second Edition) SOAP Version 1.2 Part 1: Messaging Framework (Second Edition) SOAP Version 1.2 Part 2: Adjuncts (Second Edition) Which style of WSDL
【Hadoop二】Hadoop常用命令 bit1129 hadoop
以Hadoop运行Hadoop自带的wordcount为例， hadoop脚本位于/home/hadoop/hadoop-2.5.2/bin/hadoop，需要说明的是，这些命令的使用必须在Hadoop已经运行的情况下才能执行 Hadoop HDFS相关命令 hadoop fs -ls 列出HDFS文件系统的第一级文件和第一级
java异常处理（初级）白糖_ java DAO spring 虚拟机 Ajax
从学习到现在从事java开发一年多了，个人觉得对java只了解皮毛，很多东西都是用到再去慢慢学习，编程真的是一项艺术，要完成一段好的代码，需要懂得很多。最近项目经理让我负责一个组件开发，框架都由自己搭建，最让我头疼的是异常处理，我看了一些网上的源码，发现他们对异常的处理不是很重视，研究了很久都没有找到很好的解决方案。后来有幸看到一个200W美元的项目部分源码，通过他们对异常处理的解决方案，我终
记录整理-工作问题 braveCS 工作
1）那位同学还是CSV文件默认Excel打开看不到全部结果。以为是没写进去。同学甲说文件应该不分大小。后来log一下原来是有写进去。只是Excel有行数限制。那位同学进步好快啊。 2）今天同学说写文件的时候提示jvm的内存溢出。我马上反应说那就改一下jvm的内存大小。同学说改用分批处理了。果然想问题还是有局限性。改jvm内存大小只能暂时地解决问题，以后要是写更大的文件还是得改内存。想问题要长远啊
org.apache.tools.zip实现文件的压缩和解压，支持中文 bylijinnan apache
刚开始用java.util.Zip，发现不支持中文（网上有修改的方法，但比较麻烦）后改用org.apache.tools.zip org.apache.tools.zip的使用网上有更简单的例子下面的程序根据实际需求，实现了压缩指定目录下指定文件的方法 import java.io.BufferedReader; import java.io.BufferedWrit
读书笔记-4 chengxuyuancsdn 读书笔记
1、JSTL 核心标签库标签 2、避免SQL注入 3、字符串逆转方法 4、字符串比较compareTo 5、字符串替换replace 6、分拆字符串 1、JSTL 核心标签库标签共有13个，学习资料：http://www.cnblogs.com/lihuiyy/archive/2012/02/24/2366806.html 功能上分为4类： (1)表达式控制标签：out
[物理与电子]半导体教材的一个小问题 comsci 问题
各种模拟电子和数字电子教材中都有这个词汇-空穴书中对这个词汇的解释是; 当电子脱离共价键的束缚成为自由电子之后,共价键中就留下一个空位,这个空位叫做空穴我现在回过头翻大学时候的教材,觉得这个
Flashback Database --闪回数据库 daizj oracle 闪回数据库
Flashback 技术是以Undo segment中的内容为基础的，因此受限于UNDO_RETENTON参数。要使用flashback 的特性，必须启用自动撤销管理表空间。在Oracle 10g中， Flash back家族分为以下成员： Flashback Database， Flashback Drop，Flashback Query(分Flashback Query,Flashbac
简单排序:插入排序 dieslrae 插入排序
public void insertSort(int[] array){ int temp; for(int i=1;i<array.length;i++){ temp = array[i]; for(int k=i-1;k>=0;k--)
C语言学习六指针小示例、一维数组名含义，定义一个函数输出数组的内容 dcj3sjt126com c
# include <stdio.h> int main(void) { int * p; //等价于 int *p 也等价于 int* p; int i = 5; char ch = 'A'; //p = 5; //error //p = &ch; //error //p = ch; //error p = &i; //
centos下php redis扩展的安装配置3种方法 dcj3sjt126com redis
方法一 1.下载php redis扩展包代码如下复制代码 #wget http://redis.googlecode.com/files/redis-2.4.4.tar.gz 2 tar -zxvf 解压压缩包，cd /扩展包（进入扩展包然后运行phpize 一下是我环境中phpize的目录，/usr/local/php/bin/phpize (一定要
线程池(Executors) shuizhaosi888 线程池
在java类库中，任务执行的主要抽象不是Thread，而是Executor，将任务的提交过程和执行过程解耦 public interface Executor { void execute(Runnable command); } public class RunMain implements Executor{ @Override pub
openstack 快速安装笔记 haoningabc openstack
前提是要配置好yum源版本icehouse，操作系统redhat6.5 最简化安装，不要cinder和swift 三个节点 172 control节点keystone glance horizon 173 compute节点nova 173 network节点neutron control /etc/sysctl.conf net.ipv4.ip_forward =
从c面向对象的实现理解c++的对象（二） jimmee C++面向对象虚函数
1. 类就可以看作一个struct，类的方法，可以理解为通过函数指针的方式实现的，类对象分配内存时，只分配成员变量的，函数指针并不需要分配额外的内存保存地址。 2. c++中类的构造函数，就是进行内存分配(malloc)，调用构造函数 3. c++中类的析构函数，就时回收内存(free) 4. c++是基于栈和全局数据分配内存的，如果是一个方法内创建的对象，就直接在栈上分配内存了。专门在
如何让那个一个div可以拖动 lingfeng520240 html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml
第10章高级事件（中） onestopweb 事件
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
计算两个经纬度之间的距离 roadrunners 计算纬度 LBS 经度距离
要解决这个问题的时候，到网上查了很多方案，最后计算出来的都与百度计算出来的有出入。下面这个公式计算出来的距离和百度计算出来的距离是一致的。 /** * * @param longitudeA * 经度A点 * @param latitudeA * 纬度A点 * @param longitudeB *
最具争议的10个Java话题 tomcat_oracle java
1、Java8已经到来。什么！？ Java8 支持lambda。哇哦，RIP Scala！　　随着Java8 的发布，出现很多关于新发布的Java8是否有潜力干掉Scala的争论，最终的结论是远远没有那么简单。Java8可能已经在Scala的lambda的包围中突围，但Java并非是函数式编程王位的真正觊觎者。　　2、Java 9 即将到来　　 Oracle早在8月份就发布
zoj 3826 Hierarchical Notation(模拟) 阿尔萨斯 rar
题目链接：zoj 3826 Hierarchical Notation 题目大意：给定一些结构体，结构体有value值和key值，Q次询问，输出每个key值对应的value值。解题思路：思路很简单，写个类词法的递归函数，每次将key值映射成一个hash值，用map映射每个key的value起始终止位置，预处理完了查询就很简单了。这题是最后10分钟出的，因为没有考虑value为{}的情