tanggao1314

研究MapReduce源码之实现自定义LineRecordReader完成多行读取文件内容

TextInputFormat是Hadoop默认的数据输入格式,但是它只能一行一行的读记录，如果要读取多行怎么办？
很简单自己写一个输入格式，然后写一个对应的Recordreader就可以了，但是要实现确不是这么简单的

首先看看TextInputFormat是怎么实现一行一行读取的

大家看一看源码

public class TextInputFormat extends FileInputFormat<LongWritable, Text> {

  @Override
  public RecordReader<LongWritable, Text> 
    createRecordReader(InputSplit split,
                       TaskAttemptContext context) {
    String delimiter = context.getConfiguration().get(
        "textinputformat.record.delimiter");
    byte[] recordDelimiterBytes = null;
    if (null != delimiter)
      recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
    return new LineRecordReader(recordDelimiterBytes);
  }
//这个对文件做压缩用的
  @Override
  protected boolean isSplitable(JobContext context, Path file) {
    final CompressionCodec codec =
      new CompressionCodecFactory(context.getConfiguration()).getCodec(file);
    if (null == codec) {
      return true;
    }
    return codec instanceof SplittableCompressionCodec;
  }

}

我们只要看第一个createRecordReader方法即可，从源码分析可知，它new了一个LineRecordReader，那么我们再来看看LineRecordReader的源码，看看这小子的内部世界

public class LineRecordReader extends RecordReader<LongWritable, Text> {
  private static final Log LOG = LogFactory.getLog(LineRecordReader.class);
  public static final String MAX_LINE_LENGTH = 
    "mapreduce.input.linerecordreader.line.maxlength";

  private long start;
  private long pos;
  private long end;
  private SplitLineReader in;
  private FSDataInputStream fileIn;
  private Seekable filePosition;
  private int maxLineLength;
  private LongWritable key;
  private Text value;
  private boolean isCompressedInput;
  private Decompressor decompressor;
  private byte[] recordDelimiterBytes;

  public LineRecordReader() {
  }

  public LineRecordReader(byte[] recordDelimiter) {
    this.recordDelimiterBytes = recordDelimiter;
  }

  public void initialize(InputSplit genericSplit,
                         TaskAttemptContext context) throws IOException {
    FileSplit split = (FileSplit) genericSplit;
    Configuration job = context.getConfiguration();
    this.maxLineLength = job.getInt(MAX_LINE_LENGTH, Integer.MAX_VALUE);
    start = split.getStart();
    end = start + split.getLength();
    final Path file = split.getPath();

    // open the file and seek to the start of the split
    final FileSystem fs = file.getFileSystem(job);
    fileIn = fs.open(file);

    CompressionCodec codec = new CompressionCodecFactory(job).getCodec(file);
    if (null!=codec) {
      isCompressedInput = true; 
      decompressor = CodecPool.getDecompressor(codec);
      if (codec instanceof SplittableCompressionCodec) {
        final SplitCompressionInputStream cIn =
          ((SplittableCompressionCodec)codec).createInputStream(
            fileIn, decompressor, start, end,
            SplittableCompressionCodec.READ_MODE.BYBLOCK);
        in = new CompressedSplitLineReader(cIn, job,
            this.recordDelimiterBytes);
        start = cIn.getAdjustedStart();
        end = cIn.getAdjustedEnd();
        filePosition = cIn;
      } else {
        in = new SplitLineReader(codec.createInputStream(fileIn,
            decompressor), job, this.recordDelimiterBytes);
        filePosition = fileIn;
      }
    } else {
      fileIn.seek(start);
      in = new UncompressedSplitLineReader(
          fileIn, job, this.recordDelimiterBytes, split.getLength());
      filePosition = fileIn;
    }
    // If this is not the first split, we always throw away first record
    // because we always (except the last split) read one extra line in
    // next() method.
    if (start != 0) {
      start += in.readLine(new Text(), 0, maxBytesToConsume(start));
    }
    this.pos = start;
  }


  private int maxBytesToConsume(long pos) {
    return isCompressedInput
      ? Integer.MAX_VALUE
      : (int) Math.max(Math.min(Integer.MAX_VALUE, end - pos), maxLineLength);
  }

  private long getFilePosition() throws IOException {
    long retVal;
    if (isCompressedInput && null != filePosition) {
      retVal = filePosition.getPos();
    } else {
      retVal = pos;
    }
    return retVal;
  }

  private int skipUtfByteOrderMark() throws IOException {
    // Strip BOM(Byte Order Mark)
    // Text only support UTF-8, we only need to check UTF-8 BOM
    // (0xEF,0xBB,0xBF) at the start of the text stream.
    int newMaxLineLength = (int) Math.min(3L + (long) maxLineLength,
        Integer.MAX_VALUE);
    int newSize = in.readLine(value, newMaxLineLength, maxBytesToConsume(pos));
    // Even we read 3 extra bytes for the first line,
    // we won't alter existing behavior (no backwards incompat issue).
    // Because the newSize is less than maxLineLength and
    // the number of bytes copied to Text is always no more than newSize.
    // If the return size from readLine is not less than maxLineLength,
    // we will discard the current line and read the next line.
    pos += newSize;
    int textLength = value.getLength();
    byte[] textBytes = value.getBytes();
    if ((textLength >= 3) && (textBytes[0] == (byte)0xEF) &&
        (textBytes[1] == (byte)0xBB) && (textBytes[2] == (byte)0xBF)) {
      // find UTF-8 BOM, strip it.
      LOG.info("Found UTF-8 BOM and skipped it");
      textLength -= 3;
      newSize -= 3;
      if (textLength > 0) {
        // It may work to use the same buffer and not do the copyBytes
        textBytes = value.copyBytes();
        value.set(textBytes, 3, textLength);
      } else {
        value.clear();
      }
    }
    return newSize;
  }

  public boolean nextKeyValue() throws IOException {
    if (key == null) {
      key = new LongWritable();
    }
    key.set(pos);
    if (value == null) {
      value = new Text();
    }
    int newSize = 0;
    // We always read one extra line, which lies outside the upper
    // split limit i.e. (end - 1)
    while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {
      if (pos == 0) {
        newSize = skipUtfByteOrderMark();
      } else {
        newSize = in.readLine(value, maxLineLength, maxBytesToConsume(pos));
        pos += newSize;
      }

      if ((newSize == 0) || (newSize < maxLineLength)) {
        break;
      }

      // line too long. try again
      LOG.info("Skipped line of size " + newSize + " at pos " + 
               (pos - newSize));
    }
    if (newSize == 0) {
      key = null;
      value = null;
      return false;
    } else {
      return true;
    }
  }

  @Override
  public LongWritable getCurrentKey() {
    return key;
  }

  @Override
  public Text getCurrentValue() {
    return value;
  }

  /** * Get the progress within the split */
  public float getProgress() throws IOException {
    if (start == end) {
      return 0.0f;
    } else {
      return Math.min(1.0f, (getFilePosition() - start) / (float)(end - start));
    }
  }

  public synchronized void close() throws IOException {
    try {
      if (in != null) {
        in.close();
      }
    } finally {
      if (decompressor != null) {
        CodecPool.returnDecompressor(decompressor);
      }
    }
  }
}

（里面96-97行
in = new CompressedSplitLineReader(cIn, job, this.recordDelimiterBytes);

108-109行
in = new CompressedSplitLineReader(cIn, job,this.recordDelimiterBytes);

这是用来压缩的，大家不用管，知道有这回事就行了
其实这两个类都继承自SplitLineReader，是与压缩有关的，后面我们自定义的时候不用改，粘贴复制过来就行。
）
从上面发现了一个问题，看源码的第57行

private SplitLineReader in;

它引入了一个SplitLineReader 类,用这个小子来读取每一行，不信？你看源码的182-197行，如下（我的基于2.6.4版本的源码，不同的版本代码应该差别不大）

 while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {
      if (pos == 0) {
        newSize = skipUtfByteOrderMark();
      } else {
        newSize = in.readLine(value, maxLineLength, maxBytesToConsume(pos));
        pos += newSize;
      }

      if ((newSize == 0) || (newSize < maxLineLength)) {
        break;
      }

      // line too long. try again
      LOG.info("Skipped line of size " + newSize + " at pos " + 
               (pos - newSize));
    }

发现没有 ===》 newSize = in.readLine(value, maxLineLength, maxBytesToConsume(pos));

它用了SplitLineReader 里面的一个方法readLine来读取，所以我们就得继续跟踪去看看SplitLineReader 这个小子的庐山真面目，下面看SplitLineReader 的源码

public class SplitLineReader extends org.apache.hadoop.util.LineReader {
  public SplitLineReader(InputStream in, byte[] recordDelimiterBytes) {
    super(in, recordDelimiterBytes);
  }

  public SplitLineReader(InputStream in, Configuration conf,
      byte[] recordDelimiterBytes) throws IOException {
    super(in, conf, recordDelimiterBytes);
  }

  public boolean needAdditionalRecordAfterSplit() {
    return false;
  }
}

发现这家伙继承自LineReader（快接近我们的目标了，坚持看下去），发现这小子里面根本就没有readLine方法，大家是不是觉得我在忽悠大家，哈哈，我没有忽悠大家，它源码里面确实没有，但是但是，它可是继承了LineReader这个类，说不定他的父类LineReader有了，好，不信我们去看看LineReader
继续跟踪到LineReader的源码

public class LineReader implements Closeable {
  private static final int DEFAULT_BUFFER_SIZE = 64 * 1024;
  private int bufferSize = DEFAULT_BUFFER_SIZE;
  private InputStream in;
  private byte[] buffer;
  // the number of bytes of real data in the buffer
  private int bufferLength = 0;
  // the current position in the buffer
  private int bufferPosn = 0;

  private static final byte CR = '\r';
  private static final byte LF = '\n';

  // The line delimiter
  private final byte[] recordDelimiterBytes;

  /** * Create a line reader that reads from the given stream using the * default buffer-size (64k). * @param in The input stream * @throws IOException */
  public LineReader(InputStream in) {
    this(in, DEFAULT_BUFFER_SIZE);
  }

  /** * Create a line reader that reads from the given stream using the * given buffer-size. * @param in The input stream * @param bufferSize Size of the read buffer * @throws IOException */
  public LineReader(InputStream in, int bufferSize) {
    this.in = in;
    this.bufferSize = bufferSize;
    this.buffer = new byte[this.bufferSize];
    this.recordDelimiterBytes = null;
  }

  /** * Create a line reader that reads from the given stream using the * <code>io.file.buffer.size</code> specified in the given * <code>Configuration</code>. * @param in input stream * @param conf configuration * @throws IOException */
  public LineReader(InputStream in, Configuration conf) throws IOException {
    this(in, conf.getInt("io.file.buffer.size", DEFAULT_BUFFER_SIZE));
  }

  /** * Create a line reader that reads from the given stream using the * default buffer-size, and using a custom delimiter of array of * bytes. * @param in The input stream * @param recordDelimiterBytes The delimiter */
  public LineReader(InputStream in, byte[] recordDelimiterBytes) {
    this.in = in;
    this.bufferSize = DEFAULT_BUFFER_SIZE;
    this.buffer = new byte[this.bufferSize];
    this.recordDelimiterBytes = recordDelimiterBytes;
  }

  /** * Create a line reader that reads from the given stream using the * given buffer-size, and using a custom delimiter of array of * bytes. * @param in The input stream * @param bufferSize Size of the read buffer * @param recordDelimiterBytes The delimiter * @throws IOException */
  public LineReader(InputStream in, int bufferSize,
      byte[] recordDelimiterBytes) {
    this.in = in;
    this.bufferSize = bufferSize;
    this.buffer = new byte[this.bufferSize];
    this.recordDelimiterBytes = recordDelimiterBytes;
  }

  /** * Create a line reader that reads from the given stream using the * <code>io.file.buffer.size</code> specified in the given * <code>Configuration</code>, and using a custom delimiter of array of * bytes. * @param in input stream * @param conf configuration * @param recordDelimiterBytes The delimiter * @throws IOException */
  public LineReader(InputStream in, Configuration conf,
      byte[] recordDelimiterBytes) throws IOException {
    this.in = in;
    this.bufferSize = conf.getInt("io.file.buffer.size", DEFAULT_BUFFER_SIZE);
    this.buffer = new byte[this.bufferSize];
    this.recordDelimiterBytes = recordDelimiterBytes;
  }


  /** * Close the underlying stream. * @throws IOException */
  public void close() throws IOException {
    in.close();
  }

  /** * Read one line from the InputStream into the given Text. * * @param str the object to store the given line (without newline) * @param maxLineLength the maximum number of bytes to store into str; * the rest of the line is silently discarded. * @param maxBytesToConsume the maximum number of bytes to consume * in this call. This is only a hint, because if the line cross * this threshold, we allow it to happen. It can overshoot * potentially by as much as one buffer length. * * @return the number of bytes read including the (longest) newline * found. * * @throws IOException if the underlying stream throws */
  public int readLine(Text str, int maxLineLength,
                      int maxBytesToConsume) throws IOException {
    if (this.recordDelimiterBytes != null) {
      return readCustomLine(str, maxLineLength, maxBytesToConsume);
    } else {
      return readDefaultLine(str, maxLineLength, maxBytesToConsume);
    }
  }

  protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter)
      throws IOException {
    return in.read(buffer);
  }

  /** * Read a line terminated by one of CR, LF, or CRLF. */
  private int readDefaultLine(Text str, int maxLineLength, int maxBytesToConsume)
  throws IOException {
    /* We're reading data from in, but the head of the stream may be * already buffered in buffer, so we have several cases: * 1. No newline characters are in the buffer, so we need to copy * everything and read another buffer from the stream. * 2. An unambiguously terminated line is in buffer, so we just * copy to str. * 3. Ambiguously terminated line is in buffer, i.e. buffer ends * in CR. In this case we copy everything up to CR to str, but * we also need to see what follows CR: if it's LF, then we * need consume LF as well, so next call to readLine will read * from after that. * We use a flag prevCharCR to signal if previous character was CR * and, if it happens to be at the end of the buffer, delay * consuming it until we have a chance to look at the char that * follows. */
    str.clear();
    int txtLength = 0; //tracks str.getLength(), as an optimization
    int newlineLength = 0; //length of terminating newline
    boolean prevCharCR = false; //true of prev char was CR
    long bytesConsumed = 0;
    do {
      int startPosn = bufferPosn; //starting from where we left off the last time
      if (bufferPosn >= bufferLength) {
        startPosn = bufferPosn = 0;
        if (prevCharCR) {
          ++bytesConsumed; //account for CR from previous read
        }
        bufferLength = fillBuffer(in, buffer, prevCharCR);
        if (bufferLength <= 0) {
          break; // EOF
        }
      }
      for (; bufferPosn < bufferLength; ++bufferPosn) { //search for newline
        if (buffer[bufferPosn] == LF) {
          newlineLength = (prevCharCR) ? 2 : 1;
          ++bufferPosn; // at next invocation proceed from following byte
          break;
        }
        if (prevCharCR) { //CR + notLF, we are at notLF
          newlineLength = 1;
          break;
        }
        prevCharCR = (buffer[bufferPosn] == CR);
      }
      int readLength = bufferPosn - startPosn;
      if (prevCharCR && newlineLength == 0) {
        --readLength; //CR at the end of the buffer
      }
      bytesConsumed += readLength;
      int appendLength = readLength - newlineLength;
      if (appendLength > maxLineLength - txtLength) {
        appendLength = maxLineLength - txtLength;
      }
      if (appendLength > 0) {
        str.append(buffer, startPosn, appendLength);
        txtLength += appendLength;
      }
    } while (newlineLength == 0 && bytesConsumed < maxBytesToConsume);

    if (bytesConsumed > Integer.MAX_VALUE) {
      throw new IOException("Too many bytes before newline: " + bytesConsumed);
    }
    return (int)bytesConsumed;
  }

  /** * Read a line terminated by a custom delimiter. */
  private int readCustomLine(Text str, int maxLineLength, int maxBytesToConsume)
      throws IOException {
   /* We're reading data from inputStream, but the head of the stream may be * already captured in the previous buffer, so we have several cases: * * 1. The buffer tail does not contain any character sequence which * matches with the head of delimiter. We count it as a * ambiguous byte count = 0 * * 2. The buffer tail contains a X number of characters, * that forms a sequence, which matches with the * head of delimiter. We count ambiguous byte count = X * * // *** eg: A segment of input file is as follows * * " record 1792: I found this bug very interesting and * I have completely read about it. record 1793: This bug * can be solved easily record 1794: This ." * * delimiter = "record"; * * supposing:- String at the end of buffer = * "I found this bug very interesting and I have completely re" * There for next buffer = "ad about it. record 179 ...." * * The matching characters in the input * buffer tail and delimiter head = "re" * Therefore, ambiguous byte count = 2 **** // * * 2.1 If the following bytes are the remaining characters of * the delimiter, then we have to capture only up to the starting * position of delimiter. That means, we need not include the * ambiguous characters in str. * * 2.2 If the following bytes are not the remaining characters of * the delimiter ( as mentioned in the example ), * then we have to include the ambiguous characters in str. */
    str.clear();
    int txtLength = 0; // tracks str.getLength(), as an optimization
    long bytesConsumed = 0;
    int delPosn = 0;
    int ambiguousByteCount=0; // To capture the ambiguous characters count
    do {
      int startPosn = bufferPosn; // Start from previous end position
      if (bufferPosn >= bufferLength) {
        startPosn = bufferPosn = 0;
        bufferLength = fillBuffer(in, buffer, ambiguousByteCount > 0);
        if (bufferLength <= 0) {
          if (ambiguousByteCount > 0) {
            str.append(recordDelimiterBytes, 0, ambiguousByteCount);
            bytesConsumed += ambiguousByteCount;
          }
          break; // EOF
        }
      }
      for (; bufferPosn < bufferLength; ++bufferPosn) {
        if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
          delPosn++;
          if (delPosn >= recordDelimiterBytes.length) {
            bufferPosn++;
            break;
          }
        } else if (delPosn != 0) {
          bufferPosn--;
          delPosn = 0;
        }
      }
      int readLength = bufferPosn - startPosn;
      bytesConsumed += readLength;
      int appendLength = readLength - delPosn;
      if (appendLength > maxLineLength - txtLength) {
        appendLength = maxLineLength - txtLength;
      }
      bytesConsumed += ambiguousByteCount;
      if (appendLength >= 0 && ambiguousByteCount > 0) {
        //appending the ambiguous characters (refer case 2.2)
        str.append(recordDelimiterBytes, 0, ambiguousByteCount);
        ambiguousByteCount = 0;
        // since it is now certain that the split did not split a delimiter we
        // should not read the next record: clear the flag otherwise duplicate
        // records could be generated
        unsetNeedAdditionalRecordAfterSplit();
      }
      if (appendLength > 0) {
        str.append(buffer, startPosn, appendLength);
        txtLength += appendLength;
      }
      if (bufferPosn >= bufferLength) {
        if (delPosn > 0 && delPosn < recordDelimiterBytes.length) {
          ambiguousByteCount = delPosn;
          bytesConsumed -= ambiguousByteCount; //to be consumed in next
        }
      }
    } while (delPosn < recordDelimiterBytes.length 
        && bytesConsumed < maxBytesToConsume);
    if (bytesConsumed > Integer.MAX_VALUE) {
      throw new IOException("Too many bytes before delimiter: " + bytesConsumed);
    }
    return (int) bytesConsumed; 
  }

  /** * Read from the InputStream into the given Text. * @param str the object to store the given line * @param maxLineLength the maximum number of bytes to store into str. * @return the number of bytes read including the newline * @throws IOException if the underlying stream throws */
  public int readLine(Text str, int maxLineLength) throws IOException {
    return readLine(str, maxLineLength, Integer.MAX_VALUE);
  }

  /** * Read from the InputStream into the given Text. * @param str the object to store the given line * @return the number of bytes read including the newline * @throws IOException if the underlying stream throws */
  public int readLine(Text str) throws IOException {
    return readLine(str, Integer.MAX_VALUE, Integer.MAX_VALUE);
  }

  protected int getBufferPosn() {
    return bufferPosn;
  }

  protected int getBufferSize() {
    return bufferSize;
  }

  protected void unsetNeedAdditionalRecordAfterSplit() {
    // needed for custom multi byte line delimiters only
    // see MAPREDUCE-6549 for details
  }
}

它里面有很多方法，真的有我们要的readLine方法，说明我的推断没有错，没有忽悠大家，它重载了好几个readLine方法
其他我们不去管，我们只管对我们有用的，如下

 public int readLine(Text str, int maxLineLength,
                      int maxBytesToConsume) throws IOException {
    if (this.recordDelimiterBytes != null) {
      return readCustomLine(str, maxLineLength, maxBytesToConsume);
    } else {
      return readDefaultLine(str, maxLineLength, maxBytesToConsume);
    }
  }

它里面调用了readCustomLine方法和readDefaultLine方法，下面看看这两个方法

 private int readCustomLine(Text str, int maxLineLength, int maxBytesToConsume)
      throws IOException {
    str.clear();
    int txtLength = 0; // tracks str.getLength(), as an optimization
    long bytesConsumed = 0;
    int delPosn = 0;
    int ambiguousByteCount=0; // To capture the ambiguous characters count
    do {
      int startPosn = bufferPosn; // Start from previous end position
      if (bufferPosn >= bufferLength) {
        startPosn = bufferPosn = 0;
        bufferLength = fillBuffer(in, buffer, ambiguousByteCount > 0);
        if (bufferLength <= 0) {
          if (ambiguousByteCount > 0) {
            str.append(recordDelimiterBytes, 0, ambiguousByteCount);
            bytesConsumed += ambiguousByteCount;
          }
          break; // EOF
        }
      }
      for (; bufferPosn < bufferLength; ++bufferPosn) {
        if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
          delPosn++;
          if (delPosn >= recordDelimiterBytes.length) {
            bufferPosn++;
            break;
          }
        } else if (delPosn != 0) {
          bufferPosn--;
          delPosn = 0;
        }
      }
      int readLength = bufferPosn - startPosn;
      bytesConsumed += readLength;
      int appendLength = readLength - delPosn;
      if (appendLength > maxLineLength - txtLength) {
        appendLength = maxLineLength - txtLength;
      }
      bytesConsumed += ambiguousByteCount;
      if (appendLength >= 0 && ambiguousByteCount > 0) {
        //appending the ambiguous characters (refer case 2.2)
        str.append(recordDelimiterBytes, 0, ambiguousByteCount);
        ambiguousByteCount = 0;
        // since it is now certain that the split did not split a delimiter we
        // should not read the next record: clear the flag otherwise duplicate
        // records could be generated
        unsetNeedAdditionalRecordAfterSplit();
      }
      if (appendLength > 0) {
        str.append(buffer, startPosn, appendLength);
        txtLength += appendLength;
      }
      if (bufferPosn >= bufferLength) {
        if (delPosn > 0 && delPosn < recordDelimiterBytes.length) {
          ambiguousByteCount = delPosn;
          bytesConsumed -= ambiguousByteCount; //to be consumed in next
        }
      }
    } while (delPosn < recordDelimiterBytes.length 
        && bytesConsumed < maxBytesToConsume);
    if (bytesConsumed > Integer.MAX_VALUE) {
      throw new IOException("Too many bytes before delimiter: " + bytesConsumed);
    }
    return (int) bytesConsumed; 
  }

 private int readDefaultLine(Text str, int maxLineLength, int maxBytesToConsume)
  throws IOException {
    str.clear();
    int txtLength = 0; //tracks str.getLength(), as an optimization
    int newlineLength = 0; //length of terminating newline
    boolean prevCharCR = false; //true of prev char was CR
    long bytesConsumed = 0;
    do {
      int startPosn = bufferPosn; //starting from where we left off the last time
      if (bufferPosn >= bufferLength) {
        startPosn = bufferPosn = 0;
        if (prevCharCR) {
          ++bytesConsumed; //account for CR from previous read
        }
        bufferLength = fillBuffer(in, buffer, prevCharCR);
        if (bufferLength <= 0) {
          break; // EOF
        }
      }
      for (; bufferPosn < bufferLength; ++bufferPosn) { //search for newline
        if (buffer[bufferPosn] == LF) {
          newlineLength = (prevCharCR) ? 2 : 1;
          ++bufferPosn; // at next invocation proceed from following byte
          break;
        }
        if (prevCharCR) { //CR + notLF, we are at notLF
          newlineLength = 1;
          break;
        }
        prevCharCR = (buffer[bufferPosn] == CR);
      }
      int readLength = bufferPosn - startPosn;
      if (prevCharCR && newlineLength == 0) {
        --readLength; //CR at the end of the buffer
      }
      bytesConsumed += readLength;
      int appendLength = readLength - newlineLength;
      if (appendLength > maxLineLength - txtLength) {
        appendLength = maxLineLength - txtLength;
      }
      if (appendLength > 0) {
        str.append(buffer, startPosn, appendLength);
        txtLength += appendLength;
      }
    } while (newlineLength == 0 && bytesConsumed < maxBytesToConsume);

    if (bytesConsumed > Integer.MAX_VALUE) {
      throw new IOException("Too many bytes before newline: " + bytesConsumed);
    }
    return (int)bytesConsumed;
  }

注意注意readCustomLine和readDefaultLine方法的第一句都用了一句代码（重点，后面我们自定义时要改的地方）
他们都用了str.clear();
这句代码什么意思，意思是系统每读完一行，它会清空这一行的值！！！
如果我们自定义读取多行的时候，肯定不能清空它，因为我们需要它来计数第二行的位置
比如
123，
456
789，
111
如果一次读两行的话假如我把第一行清空了，那么我第二行的偏移量就得不到正确的值了，读出来的值本应该是
123，456
789，111
但是如果清空了的话就读出来少了一行
变成了
456
111

所以我们只有独到最后一行才清空值，它前面的行都不能清空

这就要求我们到时候自己重载一个方法了，到时候再说，大家接着看

有没有晕了，没有？那就好，我都说得这么清楚了，还晕的话，大家就先休息一下

我帮大家理一理：看都用到了哪些类

最开始TextInputFormat里面用到了LineRecordReader，LineRecordReader里面用到了SplitLineReader，而SplitLineReader里面用到了LineReader
自定义的时候思路按这个来
TextInputFormat–》LineRecordReader–》SplitLineReader–》LineReader

下面写第一个自定义的TextInputFormat

package com.my.input;

import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.SplittableCompressionCodec;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;


public class myInputFormat extends FileInputFormat<Text,Text> {
    //用来压缩的
    @Override
      protected boolean isSplitable(JobContext context, Path file) {
        final CompressionCodec codec =
          new CompressionCodecFactory(context.getConfiguration()).getCodec(file);
        if (null == codec) {
          return true;
        }
        return codec instanceof SplittableCompressionCodec;
      }

    @Override
    public RecordReader<Text, Text> createRecordReader(InputSplit genericSplit, TaskAttemptContext context)
            throws IOException, InterruptedException {
        context.setStatus(genericSplit.toString());
        return new MyRecordReader(context.getConfiguration());
    }

}

接着写一个自定义的LineRecordReader
其中修改了182行开始的以下代码

因为我这里要实现输出多行，所以写了一个for循环,又由于我前面说得前面的行不能清空，所以要加一个boolean标志量
把

 while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {
      if (pos == 0) {
        newSize = skipUtfByteOrderMark();
      } else {
        newSize = in.readLine(value, maxLineLength, maxBytesToConsume(pos));
        pos += newSize;
      }

      if ((newSize == 0) || (newSize < maxLineLength)) {
        break;
      }

      // line too long. try again
      LOG.info("Skipped line of size " + newSize + " at pos " + 
               (pos - newSize));
    }

改为

boolean clear = true;
        for (int i = 1; i <= 2; i++) {
            if (i == 2) {
                clear = false;
            }
            while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {
                if (pos == 0) {
                    newSize = skipUtfByteOrderMark();
                } else {
                    newSize = in.readLine(value, maxLineLength, maxBytesToConsume(pos), clear);
                    pos += newSize;
                }

                if ((newSize == 0) || (newSize < maxLineLength)) {
                    break;
                }

                // line too long. try again
                LOG.info("Skipped line of size " + newSize + " at pos " + (pos - newSize));
            }
        }

再写自定义SplitLineReader

package com.my.lingRecordReader;
import java.io.IOException;
import java.io.InputStream;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;

@InterfaceAudience.Private
@InterfaceStability.Unstable
public class MySplitLineReader extends MyLineReader {
  public MySplitLineReader(InputStream in, byte[] recordDelimiterBytes) {
    super(in, recordDelimiterBytes);
  }

  public MySplitLineReader(InputStream in, Configuration conf,
      byte[] recordDelimiterBytes) throws IOException {
    super(in, conf, recordDelimiterBytes);
  }

  public boolean needAdditionalRecordAfterSplit() {
    return false;
  }
}

最后写自定义LineReader
它重载了一个readLine(Text str, int maxLineLength, int maxBytesToConsume, boolean clear)方法来实现不清空前面读取的行的值

package com.my.lingRecordReader;

/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */

import java.io.Closeable;
import java.io.IOException;
import java.io.InputStream;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;

/** * A class that provides a line reader from an input stream. Depending on the * constructor used, lines will either be terminated by: * <ul> * <li>one of the following: '\n' (LF) , '\r' (CR), or '\r\n' (CR+LF).</li> * <li><em>or</em>, a custom byte sequence delimiter</li> * </ul> * In both cases, EOF also terminates an otherwise unterminated line. */
@InterfaceAudience.LimitedPrivate({ "MapReduce" })
@InterfaceStability.Unstable
public class MyLineReader implements Closeable {
    private static final int DEFAULT_BUFFER_SIZE = 64 * 1024;
    private int bufferSize = DEFAULT_BUFFER_SIZE;
    private InputStream in;
    private byte[] buffer;
    // the number of bytes of real data in the buffer
    private int bufferLength = 0;
    // the current position in the buffer
    private int bufferPosn = 0;

    private static final byte CR = '\r';
    private static final byte LF = '\n';

    // The line delimiter
    private final byte[] recordDelimiterBytes;

    /** * Create a line reader that reads from the given stream using the default * buffer-size (64k). * * @param in * The input stream * @throws IOException */
    public MyLineReader(InputStream in) {
        this(in, DEFAULT_BUFFER_SIZE);
    }

    /** * Create a line reader that reads from the given stream using the given * buffer-size. * * @param in * The input stream * @param bufferSize * Size of the read buffer * @throws IOException */
    public MyLineReader(InputStream in, int bufferSize) {
        this.in = in;
        this.bufferSize = bufferSize;
        this.buffer = new byte[this.bufferSize];
        this.recordDelimiterBytes = null;
    }

    /** * Create a line reader that reads from the given stream using the * <code>io.file.buffer.size</code> specified in the given * <code>Configuration</code>. * * @param in * input stream * @param conf * configuration * @throws IOException */
    public MyLineReader(InputStream in, Configuration conf) throws IOException {
        this(in, conf.getInt("io.file.buffer.size", DEFAULT_BUFFER_SIZE));
    }

    /** * Create a line reader that reads from the given stream using the default * buffer-size, and using a custom delimiter of array of bytes. * * @param in * The input stream * @param recordDelimiterBytes * The delimiter */
    public MyLineReader(InputStream in, byte[] recordDelimiterBytes) {
        this.in = in;
        this.bufferSize = DEFAULT_BUFFER_SIZE;
        this.buffer = new byte[this.bufferSize];
        this.recordDelimiterBytes = recordDelimiterBytes;
    }

    /** * Create a line reader that reads from the given stream using the given * buffer-size, and using a custom delimiter of array of bytes. * * @param in * The input stream * @param bufferSize * Size of the read buffer * @param recordDelimiterBytes * The delimiter * @throws IOException */
    public MyLineReader(InputStream in, int bufferSize, byte[] recordDelimiterBytes) {
        this.in = in;
        this.bufferSize = bufferSize;
        this.buffer = new byte[this.bufferSize];
        this.recordDelimiterBytes = recordDelimiterBytes;
    }

    /** * Create a line reader that reads from the given stream using the * <code>io.file.buffer.size</code> specified in the given * <code>Configuration</code>, and using a custom delimiter of array of * bytes. * * @param in * input stream * @param conf * configuration * @param recordDelimiterBytes * The delimiter * @throws IOException */
    public MyLineReader(InputStream in, Configuration conf, byte[] recordDelimiterBytes) throws IOException {
        this.in = in;
        this.bufferSize = conf.getInt("io.file.buffer.size", DEFAULT_BUFFER_SIZE);
        this.buffer = new byte[this.bufferSize];
        this.recordDelimiterBytes = recordDelimiterBytes;
    }

    /** * Close the underlying stream. * * @throws IOException */
    public void close() throws IOException {
        in.close();
    }

    /** * Read one line from the InputStream into the given Text. * * @param str * the object to store the given line (without newline) * @param maxLineLength * the maximum number of bytes to store into str; the rest of the * line is silently discarded. * @param maxBytesToConsume * the maximum number of bytes to consume in this call. This is * only a hint, because if the line cross this threshold, we * allow it to happen. It can overshoot potentially by as much as * one buffer length. * * @return the number of bytes read including the (longest) newline found. * * @throws IOException * if the underlying stream throws */
    public int readLine(Text str, int maxLineLength, int maxBytesToConsume) throws IOException {
        if (this.recordDelimiterBytes != null) {
            return readCustomLine(str, maxLineLength, maxBytesToConsume);
        } else {
            return readDefaultLine(str, maxLineLength, maxBytesToConsume);
        }
    }

    public int readLine(Text str, int maxLineLength, int maxBytesToConsume, boolean clear) throws IOException {
        if (this.recordDelimiterBytes != null) {
            return readCustomLine(str, maxLineLength, maxBytesToConsume,clear);
        } else {
            return readDefaultLine(str, maxLineLength, maxBytesToConsume,clear);
        }
    }

    protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter) throws IOException {
        return in.read(buffer);
    }

    private int readDefaultLine(Text str, int maxLineLength, int maxBytesToConsume, boolean clear) throws IOException {
        if (clear) {
            str.clear();
        }
        int txtLength = 0; // tracks str.getLength(), as an optimization
        int newlineLength = 0; // length of terminating newline
        boolean prevCharCR = false; // true of prev char was CR
        long bytesConsumed = 0;
        do {
            int startPosn = bufferPosn; // starting from where we left off the
                                        // last time
            if (bufferPosn >= bufferLength) {
                startPosn = bufferPosn = 0;
                if (prevCharCR) {
                    ++bytesConsumed; // account for CR from previous read
                }
                bufferLength = fillBuffer(in, buffer, prevCharCR);
                if (bufferLength <= 0) {
                    break; // EOF
                }
            }
            for (; bufferPosn < bufferLength; ++bufferPosn) { // search for
                                                                // newline
                if (buffer[bufferPosn] == LF) {
                    newlineLength = (prevCharCR) ? 2 : 1;
                    ++bufferPosn; // at next invocation proceed from following
                                    // byte
                    break;
                }
                if (prevCharCR) { // CR + notLF, we are at notLF
                    newlineLength = 1;
                    break;
                }
                prevCharCR = (buffer[bufferPosn] == CR);
            }
            int readLength = bufferPosn - startPosn;
            if (prevCharCR && newlineLength == 0) {
                --readLength; // CR at the end of the buffer
            }
            bytesConsumed += readLength;
            int appendLength = readLength - newlineLength;
            if (appendLength > maxLineLength - txtLength) {
                appendLength = maxLineLength - txtLength;
            }
            if (appendLength > 0) {
                str.append(buffer, startPosn, appendLength);
                txtLength += appendLength;
            }
        } while (newlineLength == 0 && bytesConsumed < maxBytesToConsume);

        if (bytesConsumed > Integer.MAX_VALUE) {
            throw new IOException("Too many bytes before newline: " + bytesConsumed);
        }
        return (int) bytesConsumed;
    }

    /** * Read a line terminated by one of CR, LF, or CRLF. */
    private int readDefaultLine(Text str, int maxLineLength, int maxBytesToConsume) throws IOException {
        str.clear();
        int txtLength = 0; // tracks str.getLength(), as an optimization
        int newlineLength = 0; // length of terminating newline
        boolean prevCharCR = false; // true of prev char was CR
        long bytesConsumed = 0;
        do {
            int startPosn = bufferPosn; // starting from where we left off the
                                        // last time
            if (bufferPosn >= bufferLength) {
                startPosn = bufferPosn = 0;
                if (prevCharCR) {
                    ++bytesConsumed; // account for CR from previous read
                }
                bufferLength = fillBuffer(in, buffer, prevCharCR);
                if (bufferLength <= 0) {
                    break; // EOF
                }
            }
            for (; bufferPosn < bufferLength; ++bufferPosn) { // search for
                                                                // newline
                if (buffer[bufferPosn] == LF) {
                    newlineLength = (prevCharCR) ? 2 : 1;
                    ++bufferPosn; // at next invocation proceed from following
                                    // byte
                    break;
                }
                if (prevCharCR) { // CR + notLF, we are at notLF
                    newlineLength = 1;
                    break;
                }
                prevCharCR = (buffer[bufferPosn] == CR);
            }
            int readLength = bufferPosn - startPosn;
            if (prevCharCR && newlineLength == 0) {
                --readLength; // CR at the end of the buffer
            }
            bytesConsumed += readLength;
            int appendLength = readLength - newlineLength;
            if (appendLength > maxLineLength - txtLength) {
                appendLength = maxLineLength - txtLength;
            }
            if (appendLength > 0) {
                str.append(buffer, startPosn, appendLength);
                txtLength += appendLength;
            }
        } while (newlineLength == 0 && bytesConsumed < maxBytesToConsume);

        if (bytesConsumed > Integer.MAX_VALUE) {
            throw new IOException("Too many bytes before newline: " + bytesConsumed);
        }
        return (int) bytesConsumed;
    }

    private int readCustomLine(Text str, int maxLineLength, int maxBytesToConsume, boolean clear) throws IOException {
        if (clear) {
            str.clear();
        }
        int txtLength = 0; // tracks str.getLength(), as an optimization
        long bytesConsumed = 0;
        int delPosn = 0;
        int ambiguousByteCount = 0; // To capture the ambiguous characters count
        do {
            int startPosn = bufferPosn; // Start from previous end position
            if (bufferPosn >= bufferLength) {
                startPosn = bufferPosn = 0;
                bufferLength = fillBuffer(in, buffer, ambiguousByteCount > 0);
                if (bufferLength <= 0) {
                    if (ambiguousByteCount > 0) {
                        str.append(recordDelimiterBytes, 0, ambiguousByteCount);
                        bytesConsumed += ambiguousByteCount;
                    }
                    break; // EOF
                }
            }
            for (; bufferPosn < bufferLength; ++bufferPosn) {
                if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
                    delPosn++;
                    if (delPosn >= recordDelimiterBytes.length) {
                        bufferPosn++;
                        break;
                    }
                } else if (delPosn != 0) {
                    bufferPosn--;
                    delPosn = 0;
                }
            }
            int readLength = bufferPosn - startPosn;
            bytesConsumed += readLength;
            int appendLength = readLength - delPosn;
            if (appendLength > maxLineLength - txtLength) {
                appendLength = maxLineLength - txtLength;
            }
            bytesConsumed += ambiguousByteCount;
            if (appendLength >= 0 && ambiguousByteCount > 0) {
                // appending the ambiguous characters (refer case 2.2)
                str.append(recordDelimiterBytes, 0, ambiguousByteCount);
                ambiguousByteCount = 0;
                // since it is now certain that the split did not split a
                // delimiter we
                // should not read the next record: clear the flag otherwise
                // duplicate
                // records could be generated
                unsetNeedAdditionalRecordAfterSplit();
            }
            if (appendLength > 0) {
                str.append(buffer, startPosn, appendLength);
                txtLength += appendLength;
            }
            if (bufferPosn >= bufferLength) {
                if (delPosn > 0 && delPosn < recordDelimiterBytes.length) {
                    ambiguousByteCount = delPosn;
                    bytesConsumed -= ambiguousByteCount; // to be consumed in
                                                            // next
                }
            }
        } while (delPosn < recordDelimiterBytes.length && bytesConsumed < maxBytesToConsume);
        if (bytesConsumed > Integer.MAX_VALUE) {
            throw new IOException("Too many bytes before delimiter: " + bytesConsumed);
        }
        return (int) bytesConsumed;
    }

    /** * Read a line terminated by a custom delimiter. */
    private int readCustomLine(Text str, int maxLineLength, int maxBytesToConsume) throws IOException {
        str.clear();
        int txtLength = 0; // tracks str.getLength(), as an optimization
        long bytesConsumed = 0;
        int delPosn = 0;
        int ambiguousByteCount = 0; // To capture the ambiguous characters count
        do {
            int startPosn = bufferPosn; // Start from previous end position
            if (bufferPosn >= bufferLength) {
                startPosn = bufferPosn = 0;
                bufferLength = fillBuffer(in, buffer, ambiguousByteCount > 0);
                if (bufferLength <= 0) {
                    if (ambiguousByteCount > 0) {
                        str.append(recordDelimiterBytes, 0, ambiguousByteCount);
                        bytesConsumed += ambiguousByteCount;
                    }
                    break; // EOF
                }
            }
            for (; bufferPosn < bufferLength; ++bufferPosn) {
                if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
                    delPosn++;
                    if (delPosn >= recordDelimiterBytes.length) {
                        bufferPosn++;
                        break;
                    }
                } else if (delPosn != 0) {
                    bufferPosn--;
                    delPosn = 0;
                }
            }
            int readLength = bufferPosn - startPosn;
            bytesConsumed += readLength;
            int appendLength = readLength - delPosn;
            if (appendLength > maxLineLength - txtLength) {
                appendLength = maxLineLength - txtLength;
            }
            bytesConsumed += ambiguousByteCount;
            if (appendLength >= 0 && ambiguousByteCount > 0) {
                // appending the ambiguous characters (refer case 2.2)
                str.append(recordDelimiterBytes, 0, ambiguousByteCount);
                ambiguousByteCount = 0;
                // since it is now certain that the split did not split a
                // delimiter we
                // should not read the next record: clear the flag otherwise
                // duplicate
                // records could be generated
                unsetNeedAdditionalRecordAfterSplit();
            }
            if (appendLength > 0) {
                str.append(buffer, startPosn, appendLength);
                txtLength += appendLength;
            }
            if (bufferPosn >= bufferLength) {
                if (delPosn > 0 && delPosn < recordDelimiterBytes.length) {
                    ambiguousByteCount = delPosn;
                    bytesConsumed -= ambiguousByteCount; // to be consumed in
                                                            // next
                }
            }
        } while (delPosn < recordDelimiterBytes.length && bytesConsumed < maxBytesToConsume);
        if (bytesConsumed > Integer.MAX_VALUE) {
            throw new IOException("Too many bytes before delimiter: " + bytesConsumed);
        }
        return (int) bytesConsumed;
    }

    /** * Read from the InputStream into the given Text. * * @param str * the object to store the given line * @param maxLineLength * the maximum number of bytes to store into str. * @return the number of bytes read including the newline * @throws IOException * if the underlying stream throws */
    public int readLine(Text str, int maxLineLength) throws IOException {
        return readLine(str, maxLineLength, Integer.MAX_VALUE);
    }

    /** * Read from the InputStream into the given Text. * * @param str * the object to store the given line * @return the number of bytes read including the newline * @throws IOException * if the underlying stream throws */
    public int readLine(Text str) throws IOException {
        return readLine(str, Integer.MAX_VALUE, Integer.MAX_VALUE);
    }

    protected int getBufferPosn() {
        return bufferPosn;
    }

    protected int getBufferSize() {
        return bufferSize;
    }

    protected void unsetNeedAdditionalRecordAfterSplit() {
        // needed for custom multi byte line delimiters only
        // see MAPREDUCE-6549 for details
    }
}

下面是两个帮助类，用来压缩的，自定义一下（其实是复制源码，没有改动，只改了类名，因为我改了整个继承结构，所以只能加一个压缩的）

package com.my.lingRecordReader;
/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.io.IOException;
import java.io.InputStream;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.SplitCompressionInputStream;


/**
 * Line reader for compressed splits
 *
 * Reading records from a compressed split is tricky, as the
 * LineRecordReader is using the reported compressed input stream
 * position directly to determine when a split has ended.  In addition the
 * compressed input stream is usually faking the actual byte position, often
 * updating it only after the first compressed block after the split is
 * accessed.
 *
 * Depending upon where the last compressed block of the split ends relative
 * to the record delimiters it can be easy to accidentally drop the last
 * record or duplicate the last record between this split and the next.
 *
 * Split end scenarios:
 *
 * 1) Last block of split ends in the middle of a record
 *      Nothing special that needs to be done here, since the compressed input
 *      stream will report a position after the split end once the record
 *      is fully read.  The consumer of the next split will discard the
 *      partial record at the start of the split normally, and no data is lost
 *      or duplicated between the splits.
 *
 * 2) Last block of split ends in the middle of a delimiter
 *      The line reader will continue to consume bytes into the next block to
 *      locate the end of the delimiter.  If a custom delimiter is being used
 *      then the next record must be read by this split or it will be dropped.
 *      The consumer of the next split will not recognize the partial
 *      delimiter at the beginning of its split and will discard it along with
 *      the next record.
 *
 *      However for the default delimiter processing there is a special case
 *      because CR, LF, and CRLF are all valid record delimiters.  If the
 *      block ends with a CR then the reader must peek at the next byte to see
 *      if it is an LF and therefore part of the same record delimiter.
 *      Peeking at the next byte is an access to the next block and triggers
 *      the stream to report the end of the split.  There are two cases based
 *      on the next byte:
 *
 *      A) The next byte is LF
 *           The split needs to end after the current record is returned.  The
 *           consumer of the next split will discard the first record, which
 *           is degenerate since LF is itself a delimiter, and start consuming
 *           records after that byte.  If the current split tries to read
 *           another record then the record will be duplicated between splits.
 *
 *      B) The next byte is not LF
 *           The current record will be returned but the stream will report
 *           the split has ended due to the peek into the next block.  If the
 *           next record is not read then it will be lost, as the consumer of
 *           the next split will discard it before processing subsequent
 *           records.  Therefore the next record beyond the reported split end
 *           must be consumed by this split to avoid data loss.
 *
 * 3) Last block of split ends at the beginning of a delimiter
 *      This is equivalent to case 1, as the reader will consume bytes into
 *      the next block and trigger the end of the split.  No further records
 *      should be read as the consumer of the next split will discard the
 *      (degenerate) record at the beginning of its split.
 *
 * 4) Last block of split ends at the end of a delimiter
 *      Nothing special needs to be done here. The reader will not start
 *      examining the bytes into the next block until the next record is read,
 *      so the stream will not report the end of the split just yet.  Once the
 *      next record is read then the next block will be accessed and the
 *      stream will indicate the end of the split.  The consumer of the next
 *      split will correctly discard the first record of its split, and no
 *      data is lost or duplicated.
 *
 *      If the default delimiter is used and the block ends at a CR then this
 *      is treated as case 2 since the reader does not yet know without
 *      looking at subsequent bytes whether the delimiter has ended.
 *
 * NOTE: It is assumed that compressed input streams *never* return bytes from
 *       multiple compressed blocks from a single read.  Failure to do so will
 *       violate the buffering performed by this class, as it will access
 *       bytes into the next block after the split before returning all of the
 *       records from the previous block.
 */
@InterfaceAudience.Private
@InterfaceStability.Unstable
public class MyCompressedSplitLineReader extends MySplitLineReader {

  SplitCompressionInputStream scin;
  private boolean usingCRLF;
  private boolean needAdditionalRecord = false;
  private boolean finished = false;

  public MyCompressedSplitLineReader(SplitCompressionInputStream in,
                                   Configuration conf,
                                   byte[] recordDelimiterBytes)
                                       throws IOException {
    super(in, conf, recordDelimiterBytes);
    scin = in;
    usingCRLF = (recordDelimiterBytes == null);
  }

  @Override
  protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter)
      throws IOException {
    int bytesRead = in.read(buffer);

    // If the split ended in the middle of a record delimiter then we need
    // to read one additional record, as the consumer of the next split will
    // not recognize the partial delimiter as a record.
    // However if using the default delimiter and the next character is a
    // linefeed then next split will treat it as a delimiter all by itself
    // and the additional record read should not be performed.
    if (inDelimiter && bytesRead > 0) {
      if (usingCRLF) {
        needAdditionalRecord = (buffer[0] != '\n');
      } else {
        needAdditionalRecord = true;
      }
    }
    return bytesRead;
  }

  @Override
  public int readLine(Text str, int maxLineLength, int maxBytesToConsume)
      throws IOException {
    int bytesRead = 0;
    if (!finished) {
      // only allow at most one more record to be read after the stream
      // reports the split ended
      if (scin.getPos() > scin.getAdjustedEnd()) {
        finished = true;
      }

      bytesRead = super.readLine(str, maxLineLength, maxBytesToConsume);
    }
    return bytesRead;
  }

  @Override
  public boolean needAdditionalRecordAfterSplit() {
    return !finished && needAdditionalRecord;
  }
}

第二个与压缩有关的类

package com.my.lingRecordReader;

/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */


import java.io.IOException;
import java.io.InputStream;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.io.Text;

/** * SplitLineReader for uncompressed files. * This class can split the file correctly even if the delimiter is multi-bytes. */
@InterfaceAudience.Private
@InterfaceStability.Unstable
public class MyUncompressedSplitLineReader extends MySplitLineReader {
  private boolean needAdditionalRecord = false;
  private long splitLength;
  /** Total bytes read from the input stream. */
  private long totalBytesRead = 0;
  private boolean finished = false;
  private boolean usingCRLF;

  public MyUncompressedSplitLineReader(FSDataInputStream in, Configuration conf,
      byte[] recordDelimiterBytes, long splitLength) throws IOException {
    super(in, conf, recordDelimiterBytes);
    this.splitLength = splitLength;
    usingCRLF = (recordDelimiterBytes == null);
  }

  @Override
  protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter)
      throws IOException {
    int maxBytesToRead = buffer.length;
    if (totalBytesRead < splitLength) {
      maxBytesToRead = Math.min(maxBytesToRead,
                                (int)(splitLength - totalBytesRead));
    }
    int bytesRead = in.read(buffer, 0, maxBytesToRead);

    // If the split ended in the middle of a record delimiter then we need
    // to read one additional record, as the consumer of the next split will
    // not recognize the partial delimiter as a record.
    // However if using the default delimiter and the next character is a
    // linefeed then next split will treat it as a delimiter all by itself
    // and the additional record read should not be performed.
    if (totalBytesRead == splitLength && inDelimiter && bytesRead > 0) {
      if (usingCRLF) {
        needAdditionalRecord = (buffer[0] != '\n');
      } else {
        needAdditionalRecord = true;
      }
    }
    if (bytesRead > 0) {
      totalBytesRead += bytesRead;
    }
    return bytesRead;
  }

  @Override
  public int readLine(Text str, int maxLineLength, int maxBytesToConsume)
      throws IOException {
    int bytesRead = 0;
    if (!finished) {
      // only allow at most one more record to be read after the stream
      // reports the split ended
      if (totalBytesRead > splitLength) {
        finished = true;
      }

      bytesRead = super.readLine(str, maxLineLength, maxBytesToConsume);
    }
    return bytesRead;
  }

  @Override
  public boolean needAdditionalRecordAfterSplit() {
    return !finished && needAdditionalRecord;
  }

  @Override
  protected void unsetNeedAdditionalRecordAfterSplit() {
    needAdditionalRecord = false;
  }
}

最后就可以来测试了

先看看测试前的文件内容

package com.my.lingRecordReader;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

import com.my.input.myInputFormat;

public class myTest {
    /** * * @author 汤高 */
    //Map过程
    static int count=0;
    public static class MyTestMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
        /*** * */
        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, LongWritable>.Context context)
                throws IOException, InterruptedException {
            //默认的map的value是每一行,我这里自定义的是以空格分割
            count++;
            //String[] vs = value.toString().split(",");
            //for (String v : vs) {
                //写出去
                context.write(new Text(value), key);
            //}
            System.out.println("========>"+count);
        }
    }

    public static void main(String[] args) {

        Configuration conf=new Configuration();
        try {
            //args从控制台获取路径 解析得到域名
            String[] paths=new GenericOptionsParser(conf,args).getRemainingArgs();
            if(paths.length<2){
                throw new RuntimeException("必須輸出 輸入 和输出路径");
            }
            //得到一个Job 并设置名字
            Job job=Job.getInstance(conf,"myTest");
            //设置Jar 使本程序在Hadoop中运行
            job.setJarByClass(myTest.class);
            //设置Map处理类
            job.setMapperClass(MyTestMapper.class);
            job.setInputFormatClass(MyTextInputFormat.class);
            //设置map的输出类型,因为不一致,所以要设置
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(LongWritable.class);
            //设置输入和输出目录
            FileInputFormat.addInputPath(job, new Path(paths[0]));
            FileOutputFormat.setOutputPath(job, new Path(paths[1] + System.currentTimeMillis()));// 整合好结果后输出的位置
            //启动运行
            System.exit(job.waitForCompletion(true) ? 0:1);
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }


}

测试结果：

看第2列的偏移量，发现已经实现了一次读多行(我测试的是2行)

到此所有分析已经完了，研究源码真不容易，花了我一个晚上去研究hadoop的源码，然后再花了几个小时把这些内容写成博客，所以，码字不易，转载请指明出处http://blog.csdn.net/tanggao1314/article/details/51307642

你可能感兴趣的:(mapreduce,源码,hadoop)

H5动态生日祝福源码 cas215asd 源码 html5
源码名称：动态生日祝福源码源码介绍：一款H5动态生日祝福源码，源码带有文字敲入效果与蛋糕生成特效。需求环境：H5下载地址：https://www.changyouzuhao.cn/14540.html
基于python+django的旅游信息网站-旅游景点门票管理系统源码+运行步骤冷琴1996 Python系统设计 python django 旅游
该系统是基于python+django开发的旅游景点门票管理系统。是给师弟做的课程作业。大家学习过程中，遇到问题可以在github咨询作者。学习过程问题可以留言哦演示地址前台地址：http://travel.gitapp.cn后台地址：http://travel.gitapp.cn/admin后台管理帐号：用户名：admin123密码：admin123源码地址https://github.com/
50个常见的python毕业设计/课程设计（源码+文档）冷琴1996 Python系统设计 python 课程设计开发语言
计算机课程设计/毕业设计指南，为计算机相关专业毕业生提供源码、数据库安装、远程调试等相关服务，提供功能讲解视频。下面是50个基于python/django/vue的毕业设计/课程设计。1.网上商城系统这是一个基于python+vue开发的商城网站，平台采用B/S结构，后端采用主流的Python语言进行开发，前端采用主流的Vue.js进行开发。整个平台包括前台和后台两个部分。前台功能包括：首页、商品
大数据学习（75）-大数据组件总结 viperrrrrrr 大数据 impala yarn hdfs hive CDH mapreduce
大数据学习系列专栏：哲学语录:用力所能及，改变世界。如果觉得博主的文章还不错的话，请点赞+收藏⭐️+留言支持一下博主哦一、CDHCDH（ClouderaDistributionIncludingApacheHadoop)是由Cloudera公司提供的一个集成了ApacheHadoop以及相关生态系统的发行版本。CDH是一个大数据平台，简化和加速了大数据处理分析的部署和管理。CDH提供Hadoop的
Sqoop安装部署愿与狸花过一生大数据 sqoop hadoop hive
ApacheSqoop简介Sqoop（SQL-to-Hadoop）是Apache开源项目，主要用于：将关系型数据库中的数据导入Hadoop分布式文件系统（HDFS）或相关组件（如Hive、HBase）。将Hadoop处理后的数据导出回关系型数据库。核心特性批量数据传输支持从数据库表到HDFS/Hive的全量或增量数据迁移。并行化处理基于MapReduce实现并行导入导出，提升大数据量场景的效率。自
分享Python7个爬虫小案例（附源码）人工智能-猫猫爬虫 python 开发语言
在这篇文章中，我们将分享7个Python爬虫的小案例，帮助大家更好地学习和了解Python爬虫的基础知识。以下是每个案例的简介和源代码：1.爬取豆瓣电影Top250这个案例使用BeautifulSoup库爬取豆瓣电影Top250的电影名称、评分和评价人数等信息，并将这些信息保存到CSV文件中。importrequestsfrombs4importBeautifulSoupimportcsv#请求U
基于cesium的二三维地图程序员小美博客毕业设计源码分享 java 开源 vue
一、项目简介基于cesium的二三维地图二、实现功能支持虚线和阴影支持以标注的方式显示属性支持要素查询支持二三维度地球显示支持小数据量文件矢量动态切片三、技术选型Cesiumproj4jsturftext-encodinggeojson-topojsonshpjs四、界面展示五、源码地址回复：地图
基于python+django+mysql的小区物业管理系统源码+运行步骤冷琴1996 Python系统设计 python 开发语言
该系统是基于python+django开发的小区物业管理系统。适用场景：大学生、课程作业、毕业设计。学习过程中，如遇问题可以在github给作者留言。主要功能有：业主管理、报修管理、停车管理、资产管理、小区管理、用户管理、日志管理、系统信息。源码学习技术。演示地址http://wuye.gitapp.cn/admin后台管理帐号：用户名：admin123密码：admin123源码地址https:/
适合阅读源码的 Java 优质开源框架、库盘点（初级友好项目、中级进阶项目、高级深入项目）我命由我12345 Java -项目 java 开源开发语言 java-ee spring boot spring intellij-idea
一、初级友好项目1、JUnit5基本介绍：JUnit5是单元测试框架，代码简洁，适合学习测试驱动开发（TDD）和设计模式GitHub地址：https://github.com/junit-team/junit5特点：代码量适中，模块化设计，适合学习测试框架的实现原理2、Guava基本介绍：Guava是Google核心库，包含集合、缓存、字符串处理等工具类GitHub地址：https://githu
nextjs 实现rag知识库检索增强的ai问答app *goliter * web开发学习人工智能
AI-Chat-一个基于LLM大语言模型的知识库问答系统项目源码：https://github.com/goliter/ai-chat项目简介AI-Chat是一个基于Next.js和React开发的现代化大语言模型的知识库问答系统。该平台提供了简易的对话界面，支持上传文件进行知识库的构建，让用户在与大语言模型进行问答时给与大模型知识库内的相关内容。主要功能上传文件构建属于自己的知识库支持doc,t
基于python+django的家教预约网站-家教信息管理系统源码+运行步骤冷琴1996 Python系统设计 python django 开发语言
该系统是基于python+django开发的家教预约网站。是给师妹做的课程作业。大家在学习过程中，遇到问题可以在github给作者留言。共同学习进步哦效果演示前台地址：http://jiajiao.gitapp.cn后台地址：http://jiajiao.gitapp.cn/admin后台管理帐号：用户名：admin123密码：admin123源码地址https://github.com/geee
mysql数据库应用与开发姜桂洪课后答案_清华大学出版社-图书详情-《MySQL数据库应用与开发》... 韦盛江课后答案
前言Oracle公司的MySQL是目前最流行的关系数据库管理系统之一。MySQL所使用的SQL语言是用于访问数据库的最常用标准化语言。MySQL数据库以其精巧灵活、运行速度快、经济适用性强、开放源码等优势，作为网站数据库获得许多中小型网站的开发公司的青睐。MySQL性能卓越，搭配PHP和Apache可组成良好的软件开发环境，并且已经大量部署到中小型企业和高校的教学平台。本书从教学实际需求出发，结合
python基于Django的旅游景点数据分析及可视化的设计与实现 7blk7 qq2295116502 python django 数据分析
目录项目介绍技术栈具体实现截图Scrapy爬虫框架关键技术和使用的工具环境等的说明解决的思路开发流程爬虫核心代码展示系统设计论文书写大纲详细视频演示源码获取项目介绍大数据分析是现下比较热门的词汇，通过分析之后可以得到更多深入且有价值的信息。现实的科技手段中，越来越多的应用都会涉及到大数据随着大数据时代的到来，数据挖掘、分析与应用成为多个行业的关键,本课题首先介绍了网络爬虫的基本概念以及技术实现方法
源码篇：python生成《蔬菜店销售数据分析报告》案例 IT小本本 python python 数据分析开发语言
本文将通过Python实现一个完整的蔬菜销售数据分析项目，涵盖数据生成、清洗、分析及可视化全流程。我们将利用模拟数据生成技术创建90天的销售记录，通过Pandas进行数据处理，结合Matplotlib和Seaborn实现多样化的可视化图表，并最终生成动态交互报告。一、数据生成：模拟真实销售场景为了模拟真实的蔬菜销售数据，我们设计了包含10种蔬菜（白菜、土豆、西红柿等）的90天销售记录。数据生成逻辑
JAVA毕业设计BS架构考研交流学习平台设计与实现计算机源码+lw文档+系统+调试部署+数据库瑞致网络 java 开发语言 jvm
JAVA毕业设计BS架构考研交流学习平台设计与实现计算机源码+lw文档+系统+调试部署+数据库JAVA毕业设计BS架构考研交流学习平台设计与实现计算机源码+lw文档+系统+调试部署+数据库本源码技术栈：项目架构：B/S架构开发语言：Java语言开发软件：ideaeclipse前端技术：Layui、HTML、CSS、JS、JQuery等技术后端技术：JAVA运行环境：Win10、JDK1.8数据库：
[附源码]Python计算机毕业设计SSM基于B-S的心理健康管理系统（程序+LW) Python、JAVA毕设程序源码 java 开发语言
环境配置：Jdk1.8+Tomcat7.0+Mysql+HBuilderX（Webstorm也行）+Eclispe（IntelliJIDEA,Eclispe,MyEclispe,Sts都支持）。项目技术：SSM+mybatis+Maven+Vue等等组成，B/S模式+Maven管理等等。环境需要1.运行环境：最好是javajdk1.8，我们在这个平台上运行的。其他版本理论上也可以。2.IDE环境：
计算机毕业设计JavaBS景区票务管理系统设计与实现(源码+系统+mysql数据库+lw文档）毅铭科技数据库
计算机毕业设计JavaBS景区票务管理系统设计与实现(源码+系统+mysql数据库+lw文档）计算机毕业设计JavaBS景区票务管理系统设计与实现(源码+系统+mysql数据库+lw文档）本源码技术栈：项目架构：B/S架构开发语言：Java语言开发软件：ideaeclipse前端技术：Layui、HTML、CSS、JS、JQuery等技术后端技术：JAVA运行环境：Win10、JDK1.8数据库：
ssh命令满分对我强制爱 linux 服务器运维 spark
ssh命令无需密码也可登录要先关闭防火墙，命令如下：systemctlstopfirewalldsystemctldisablefirewalldsystemctlstatusfirewalldeg：目标：hadoop100通过ssh访问hadoop101,hadoop102时不需要密码，其他两台设备也类似。具体操作如下：1.在hadoop100中生成公钥和密码。ssh-keygen-trsa三次
cv2 orb 图像拼接_图像拼接Opencv源码重构是佐罗而非索隆 cv2 orb 图像拼接
请看赵春江https://me.csdn.net/zhaocj的主页，他已经对Opencv图像拼接流程中的代码做了很详细的解释。前人栽树，后人乘凉。一.本文所做的事1.重构了Opencv图像拼接的源代码，整个代码是面向过程的；2.在赵春江源码分析基础上，对一些细节部分进行说明。代码链接：https://github.com/mhhai/ImageStitch二.特征点检测一切起源于这段代码Ptrf
springboot基于java的企业档案管理信息系统 QQ80213251 java spring boot 后端
收藏关注不迷路！！文末获取源码+数据库感兴趣的可以先收藏起来，还有大家在毕设选题（免费咨询指导选题），项目以及论文编写等相关问题都可以给我留言咨询，希望帮助更多的人文章目录前言详细视频演示一、项目介绍二、功能介绍三、核心代码数据库参考四、效果图五、文章目录六、源码获取前言企业档案管理信息系统是一种旨在提高文件资料归档、检索和利用效率的信息化解决方案。该系统通过电子化手段对企业的各类文档和档案进行归
python+flask计算机毕业设计基于Android平台的景区移动端旅游软件系统（程序+开题+论文） Node.js彤彤程序 python flask 课程设计
本系统（程序+源码+数据库+调试部署+开发环境）带论文文档1万字以上，文末可获取，系统界面在最后面。系统程序文件列表开题报告内容研究背景随着移动互联网技术的飞速发展，智能手机已成为人们日常生活中不可或缺的一部分，特别是在旅游领域，移动端应用以其便捷性、实时性和个性化服务的特点，极大地改变了人们的旅游体验方式。当前，旅游市场日益繁荣，游客对于旅游信息获取、行程规划、景点导航、票务预订及个性化服务的需
Hive面试题御风行云天面试题大全 hive hadoop 数据仓库面试
Hive面试题1Hive基础概念1.1解释Hive是什么以及它的用途Hive的主要用途：1.2描述Hive架构和组件1.HiveCLI/Beeline和WebUI2.HiveQL3.HiveDriver（驱动）4.Metastore5.Compiler（编译器）6.Optimizer（优化器）7.Executor（执行器）8.HadoopCoreComponents（核心组件）9.HiveUDFs
深入分析串口使用rs485功能的内部机制之使用gpio控制传输方向读取rs485温湿度传感器数据（第一期） @曙光， linux 网络嵌入式
前言首先这是一篇涉及内核分析的，学习这篇文章最好是打开内核源码跟着我的分析去看，我参考的内核源码是linux5.4内核，也可以辅助ai去分析。ModbusRTU读取rs485温湿度传感器使用ModbusRTU读取rs485温湿度传感器有俩种方法，第一种采用gpio控制数据的传输方向：高电平表示主发从收，低电平表示主收从发。第二种采用硬件流控的方法使用串口的rts引脚和cts引脚自动控制收发方向，接
基于AT89C52单片机的智能导盲杖报警设计七月小卖铺单片机单片机嵌入式硬件
点击链接获取Keil源码与ProjectBackups仿真图：https://download.csdn.net/download/qq_64505944/90498287?spm=1001.2014.3001.5503C+22部分参考设计如下：摘要超声波测距技术因其具有较强的指向性、低能耗、较长的传播距离等优点，已成为广泛应用于各类传感器技术和自动控制技术相结合的测距方案之一。超声波传感器利用声
【AI大模型应用开发】RAG-Fusion框架：忘掉 RAG，未来是 RAG-Fusion 同学小张大模型人工智能笔记 chatgpt agi embedding RAG prompt
大家好，我是同学小张，+v:jasper_8017一起交流，持续学习C++进阶、OpenGL、WebGL知识和AI大模型应用实战案例，持续分享，欢迎大家点赞+关注，共同学习和进步。RAG目前很火，但是也有一些不足的地方。有不足就有改进方法。本文我们来看一个方法：RAG-Fusion，理解其原理，并看一下其实现源码。文章目录0.RAG的不足1.RAG-Fusion原理概述2.步骤拆解与代码示例2.1
#Hadoop全分布式安装 #mysql安装 #hive安装砸吧砸吧 hadoop hive yarn mysql
分布式（多台机器部署不同组件）与集群（多台机器部署相同组件）概念。Linux基础命令linux具有文件数：目录、文件，从根目录开始，路径具有唯一性。pwd：显示当前路径特殊符号：/：根目录.：隐藏文件，如果路径以.开始，表示当前目录下..：当前目录下的上一级~：当前目录的home目录--help：帮助命令使用linux常用操作命令tab键：自动补全ls：显示指定目录内容默认：当前路径-a：显示所有
基于BCLinux制作Apache HTTPD 2.4.63 的RPM安装包 IT布道 apache
在这之前，我写过一篇《基于CentOS7制作ApacheHTTPD2.4.58的RPM安装包》的文章。本文大部分内容和之前差不多，但因为操作系统由CentOS7变成了BC-Linux，所以，有些内容就可以删减了。编译环境：操作系统：BC-Linuxhttpd版本：2.4.63制作工具：rpmbuild（这个之前的文章有介绍，看这里）下载httpd源码：官网目前的最新版本是2.4.63(2025.1
Pollinations AI文生图html源码酷爱码 html HTML
源码介绍用deepseek辅助制作了一个电脑端文生图小程序，html语言的，接口使用的是Pollinations，上传服务器访问首页即可一次生成4张，提示词最好用英文，点击小图可以预览大图，也可以点击下载按钮直接下载截图预览源码免费获取PollinationsAI文生图html源码
Android Compose 图标按钮深度剖析：从源码到实践(四) &有梦想的咸鱼& Android开发大全 Androiod Compose原理 android
AndroidCompose图标按钮深度剖析：从源码到实践一、引言在现代Android应用开发中，用户界面的交互性和美观性至关重要。图标按钮作为一种常见的UI元素，以其简洁直观的特点，在提升用户体验方面发挥着重要作用。AndroidCompose作为Google推出的新一代声明式UI工具包，为开发者提供了创建图标按钮的便捷方式。本文将深入AndroidCompose框架的图标按钮模块，从源码级别进
C#：实现二个数组求并集(附完整源码) 源代码大师 C#算法完整教程 c#linq 开发语言
C#：实现二个数组求并集下面是C#代码，用于计算两个数组的并集：usingSystem;usingSystem.Linq;classProgram{staticvoidMain(string
apache 安装linux windows 墙头上一根草 apache inux windows
linux安装Apache 有两种方式一种是手动安装通过二进制的文件进行安装，另外一种就是通过yum 安装，此中安装方式，需要物理机联网。以下分别介绍两种的安装方式通过二进制文件安装Apache需要的软件有apr,apr-util,pcre 1，安装 apr 下载地址：htt
fill_parent、wrap_content和match_parent的区别 Cb123456 match_parent fill_parent
fill_parent、wrap_content和match_parent的区别: 1）fill_parent 设置一个构件的布局为fill_parent将强制性地使构件扩展，以填充布局单元内尽可能多的空间。这跟Windows控件的dockstyle属性大体一致。设置一个顶部布局或控件为fill_parent将强制性让它布满整个屏幕。 2） wrap_conte
网页自适应设计天子之骄 html css 响应式设计页面自适应
网页自适应设计网页对浏览器窗口的自适应支持变得越来越重要了。自适应响应设计更是异常火爆。再加上移动端的崛起，更是如日中天。以前为了适应不同屏幕分布率和浏览器窗口的扩大和缩小，需要设计几套css样式，用js脚本判断窗口大小，选择加载。结构臃肿，加载负担较大。现笔者经过一定时间的学习，有所心得，故分享于此，加强交流，共同进步。同时希望对大家有所
[sql server] 分组取最大最小常用sql 一炮送你回车库 SQL Server
--分组取最大最小常用sql--测试环境if OBJECT_ID('tb') is not null drop table tb;gocreate table tb( col1 int, col2 int, Fcount int)insert into tbselect 11,20,1 union allselect 11,22,1 union allselect 1
ImageIO写图片输出到硬盘 3213213333332132 java image
package awt; import java.awt.Color; import java.awt.Font; import java.awt.Graphics; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import javax.imagei
自己的String动态数组宝剑锋梅花香 java 动态数组数组
数组还是好说，学过一两门编程语言的就知道，需要注意的是数组声明时需要把大小给它定下来，比如声明一个字符串类型的数组：String str[]=new String[10]; 但是问题就来了，每次都是大小确定的数组，我需要数组大小不固定随时变化怎么办呢？动态数组就这样应运而生，龙哥给我们讲的是自己用代码写动态数组，并非用的ArrayList 看看字符
pinyin4j工具类 darkranger .net
pinyin4j工具类Java工具类 2010-04-24 00:47:00 阅读69 评论0 字号：大中小引入pinyin4j-2.5.0.jar包: pinyin4j是一个功能强悍的汉语拼音工具包，主要是从汉语获取各种格式和需求的拼音，功能强悍，下面看看如何使用pinyin4j。本人以前用AscII编码提取工具，效果不理想，现在用pinyin4j简单实现了一个。功能还不是很完美，
StarUML学习笔记----基本概念 aijuans UML建模
介绍StarUML的基本概念，这些都是有效运用StarUML?所需要的。包括对模型、视图、图、项目、单元、方法、框架、模型块及其差异以及UML轮廓。模型、视与图（Model, View and Diagram） &
Activiti最终总结 avords Activiti id 工作流
1、流程定义ID：ProcessDefinitionId，当定义一个流程就会产生。 2、流程实例ID：ProcessInstanceId，当开始一个具体的流程时就会产生，也就是不同的流程实例ID可能有相同的流程定义ID。 3、TaskId，每一个userTask都会有一个Id这个是存在于流程实例上的。 4、TaskDefinitionKey和（ActivityImpl activityId
从省市区多重级联想到的，react和jquery的差别 bee1314 jquery UI react
在我们的前端项目里经常会用到级联的select，比如省市区这样。通常这种级联大多是动态的。比如先加载了省，点击省加载市，点击市加载区。然后数据通常ajax返回。如果没有数据则说明到了叶子节点。针对这种场景，如果我们使用jquery来实现，要考虑很多的问题，数据部分，以及大量的dom操作。比如这个页面上显示了某个区，这时候我切换省，要把市重新初始化数据，然后区域的部分要从页面
Eclipse快捷键大全 bijian1013 java eclipse 快捷键
Ctrl+1 快速修复(最经典的快捷键,就不用多说了)Ctrl+D: 删除当前行 Ctrl+Alt+↓ 复制当前行到下一行(复制增加)Ctrl+Alt+↑ 复制当前行到上一行(复制增加)Alt+↓ 当前行和下面一行交互位置(特别实用,可以省去先剪切,再粘贴了)Alt+↑ 当前行和上面一行交互位置(同上)Alt+← 前一个编辑的页面Alt+→ 下一个编辑的页面(当然是针对上面那条来说了)Alt+En
js 笔记函数征客丶 JavaScript
一、函数的使用 1.1、定义函数变量 var vName = funcation(params){ } 1.2、函数的调用函数变量的调用： vName(params); 函数定义时自发调用：(function(params){})(params); 1.3、函数中变量赋值 var a = 'a'; var ff
【Scala四】分析Spark源代码总结的Scala语法二 bit1129 scala
1. Some操作在下面的代码中，使用了Some操作：if (self.partitioner == Some(partitioner))，那么Some(partitioner)表示什么含义？首先partitioner是方法combineByKey传入的变量， Some的文档说明： /** Class `Some[A]` represents existin
java 匿名内部类 BlueSkator java匿名内部类
组合优先于继承 Java的匿名类，就是提供了一个快捷方便的手段，令继承关系可以方便地变成组合关系继承只有一个时候才能用，当你要求子类的实例可以替代父类实例的位置时才可以用继承。在Java中内部类主要分为成员内部类、局部内部类、匿名内部类、静态内部类。内部类不是很好理解，但说白了其实也就是一个类中还包含着另外一个类如同一个人是由大脑、肢体、器官等身体结果组成，而内部类相
盗版win装在MAC有害发热，苹果的东西不值得买，win应该不用 ljy325 游戏 apple windows XP OS
Mac mini 型号: MC270CH-A RMB:5,688 Apple 对windows的产品支持不好,有以下问题: 1.装完了xp,发现机身很热虽然没有运行任何程序！貌似显卡跑游戏发热一样，按照那样的发热量,那部机子损耗很大,使用寿命受到严重的影响! 2.反观安装了Mac os的展示机，发热量很小，运行了1天温度也没有那么高 &nbs
读《研磨设计模式》-代码笔记-生成器模式-Builder bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 生成器模式的意图在于将一个复杂的构建与其表示相分离，使得同样的构建过程可以创建不同的表示（GoF） * 个人理解： * 构建一个复杂的对象，对于创建者（Builder）来说，一是要有数据来源(rawData)，二是要返回构
JIRA与SVN插件安装 chenyu19891124 SVN jira
JIRA安装好后提交代码并要显示在JIRA上，这得需要用SVN的插件才能看见开发人员提交的代码。 1.下载svn与jira插件安装包，解压后在安装包(atlassian-jira-subversion-plugin-0.10.1) 2.解压出来的包里下的lib文件夹下的jar拷贝到(C:\Program Files\Atlassian\JIRA 4.3.4\atlassian-jira\WEB
常用数学思想方法 comsci 工作
对于搞工程和技术的朋友来讲，在工作中常常遇到一些实际问题，而采用常规的思维方式无法很好的解决这些问题，那么这个时候我们就需要用数学语言和数学工具，而使用数学工具的前提却是用数学思想的方法来描述问题。。下面转帖几种常用的数学思想方法，仅供学习和参考函数思想　　把某一数学问题用函数表示出来，并且利用函数探究这个问题的一般规律。这是最基本、最常用的数学方法
pl/sql集合类型 daizj oracle 集合 type pl/sql
--集合类型 /* 单行单列的数据，使用标量变量单行多列数据，使用记录单列多行数据，使用集合（。。。） *集合：类似于数组也就是。pl/sql集合类型包括索引表（pl/sql table）、嵌套表（Nested Table）、变长数组（VARRAY）等 */ /* --集合方法 &n
[Ofbiz]ofbiz初用 dinguangx 电商 ofbiz
从github下载最新的ofbiz（截止2015-7-13），从源码进行ofbiz的试用 1. 加载测试库 ofbiz内置derby，通过下面的命令初始化测试库 ./ant load-demo (与load-seed有一些区别) 2. 启动内置tomcat ./ant start 或 ./startofbiz.sh 或 java -jar ofbiz.jar &
结构体中最后一个元素是长度为0的数组 dcj3sjt126com c gcc
在Linux源代码中，有很多的结构体最后都定义了一个元素个数为0个的数组，如/usr/include/linux/if_pppox.h中有这样一个结构体： struct pppoe_tag { __u16 tag_type; __u16 tag_len; &n
Linux cp 实现强行覆盖 dcj3sjt126com linux
发现在Fedora 10 /ubutun 里面用cp -fr src dest，即使加了-f也是不能强行覆盖的，这时怎么回事的呢？一两个文件还好说，就输几个yes吧，但是要是n多文件怎么办，那还不输死人呢？下面提供三种解决办法。方法一我们输入alias命令，看看系统给cp起了一个什么别名。 [root@localhost ~]# aliasalias cp=’cp -i’a
Memcached(一)、HelloWorld frank1234 memcached
一、简介高性能的架构离不开缓存，分布式缓存中的佼佼者当属memcached，它通过客户端将不同的key hash到不同的memcached服务器中，而获取的时候也到相同的服务器中获取，由于不需要做集群同步，也就省去了集群间同步的开销和延迟，所以它相对于ehcache等缓存来说能更好的支持分布式应用，具有更强的横向伸缩能力。二、客户端选择一个memcached客户端，我这里用的是memc
Search in Rotated Sorted Array II hcx2013 search
Follow up for "Search in Rotated Sorted Array":What if duplicates are allowed? Would this affect the run-time complexity? How and why? Write a function to determine if a given ta
Spring4新特性——更好的Java泛型操作API jinnianshilongnian spring4 generic type
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
CentOS安装JDK liuxingguome centos
1、行卸载原来的： [root@localhost opt]# rpm -qa | grep java tzdata-java-2014g-1.el6.noarch java-1.7.0-openjdk-1.7.0.65-2.5.1.2.el6_5.x86_64 java-1.6.0-openjdk-1.6.0.0-11.1.13.4.el6.x86_64 [root@localhost
二分搜索专题2-在有序二维数组中搜索一个元素 OpenMind 二维数组算法二分搜索
1,设二维数组p的每行每列都按照下标递增的顺序递增。用数学语言描述如下：p满足 (1),对任意的x1，x2，y，如果x1<x2,则p(x1,y)<p(x2,y); (2),对任意的x，y1,y2, 如果y1<y2,则p(x,y1)<p(x,y2); 2,问题：给定满足1的数组p和一个整数k，求是否存在x0,y0使得p(x0,y0)=k? 3,算法分析： (
java 随机数 Math与Random SaraWon java Math Random
今天需要在程序中产生随机数，知道有两种方法可以使用，但是使用Math和Random的区别还不是特别清楚，看到一篇文章是关于的，觉得写的还挺不错的，原文地址是 http://www.oschina.net/question/157182_45274?sort=default&p=1#answers 产生1到10之间的随机数的两种实现方式： //Math Math.roun
oracle创建表空间 tugn oracle
create temporary tablespace TXSJ_TEMP tempfile 'E:\Oracle\oradata\TXSJ_TEMP.dbf' size 32m autoextend on next 32m maxsize 2048m extent m
使用Java8实现自己的个性化搜索引擎 yangshangchuan java superword 搜索引擎 java8 全文检索
需要对249本软件著作实现句子级别全文检索，这些著作均为PDF文件，不使用现有的框架如lucene，自己实现的方法如下： 1、从PDF文件中提取文本，这里的重点是如何最大可能地还原文本。提取之后的文本，一个句子一行保存为文本文件。 2、将所有文本文件合并为一个单一的文本文件，这样，每一个句子就有一个唯一行号。 3、对每一行文本进行分词，建立倒排表，倒排表的格式为：词=包含该词的总行数N=行号