just_young

基于Hadoop和CDC的重复数据检测实现

因为我电某专业课需要交一个Hadoop的作业，所以我翻出了两年前做过的一个Hadoop项目，顺便整理到博客里，不过内容已经忘得有点多了。

CDC：（Content-Defined Chunking）是一种适用于多种应用环境的重复数据删除算法。这里就是用Hadoop将这个算法并行化，但是没有做到将重复数据删除，只是检测到两个文件的重复部分。

使用Hadoop的版本：1.0.3

操作系统：ubuntu 12.04

总体思路：将两个文件按照Rabin指纹划分成许多个文件块，这些文件块的读取保存在ChunkInfo这个类的对象中；文件的读取方法写在CDC_RecordReader里；读取的文件块分别送到Map中计算MD5哈希值，相同的MD5的块由Reduce函数统计，并将文件块信息输出。

现在来简单解释一下我写的代码，首先是ChunkInfo.java，这个类继承了Writable接口（即创建Hadoop的数据类型需要继承的一个接口），我把这个类取名为ChunkInfo，这个类成员的意义我都写到代码的注释里了。然后重写了两个函数readFields和write，readFields是用来从FileInputFormat中读取数据的，write则是把这个数据写到下一个阶段。

由于是大三的时候写的代码，有不对的地方还请见谅。

import java.io.DataInput;
import java.io.DataOutput;
import java.io.EOFException;
import java.io.IOException;
import org.apache.hadoop.io.Writable;

public class ChunkInfo implements Writable {
	public int chunk_id; // 子块的id
	public int chunk_size; // 子块的大小
	public int chunk_filenum; // 该子块所属的文件数。
	public int chunk_num; // 该子块在所有文件中出现的总次数。
	public byte blockBytes[] = null; // 存放子块的字节。
	public String chunk_filename; // 子块的文件名
	public String hashValue; // 块的hash值，通常是md5值。

	public ChunkInfo() {
		chunk_id = 0;
		chunk_size = 8 * 1024;
		chunk_filename = "4321";
		chunk_filenum = 1;
		chunk_num = 1;
		hashValue = " ";
		blockBytes = new byte[chunk_size];
	}

	/**
	 * 
	 * 构造函数
	 * 
	 * 
	 * @param size
	 *            文件块的大小
	 * @return
	 */
	public ChunkInfo(int id, int size, String filename, int filenum,
			int chunknum, String hash, byte bytes[]) {
		chunk_id = id;
		chunk_size = size;
		chunk_filename = filename;
		chunk_filenum = filenum;
		chunk_num = chunknum;
		hashValue = hash;
		blockBytes = bytes;
	}

	@Override
	public void readFields(DataInput arg0) throws IOException {
		// TODO Auto-generated method stub
		// 从输入流中读取类的信息，并把其存放到类中。
		try {
			chunk_id = arg0.readInt(); // 子块的id
			chunk_size = arg0.readInt(); // 子块的大小
			chunk_filenum = arg0.readInt(); // 该子块所属的文件数。
			chunk_num = arg0.readInt(); // 该子块在所有文件中出现的总次数。
			hashValue = arg0.readUTF(); // 块的hash值，通常是md5值。
			chunk_filename = arg0.readUTF(); // 子块的文件名
			// int length = arg0.readInt();
			// arg0.readFully(blockBytes,0,length); //存放子块的字节。
		} catch (EOFException e) {
			return; // 获得读入文件末尾的异常后，函数返回。
		}
	}

	@Override
	public void write(DataOutput arg0) throws IOException {
		// TODO Auto-generated method stub
		// map阶段将类的信息输出到输出流中。
		arg0.writeInt(chunk_id);
		arg0.writeInt(chunk_size);
		arg0.writeInt(chunk_filenum);
		arg0.writeInt(chunk_num);
		arg0.writeUTF(hashValue);
		arg0.writeUTF(chunk_filename);
		// arg0.writeInt(blockBytes.length);
		// arg0.write(blockBytes);
	}

	/*
	 * @Override public int compareTo(Object o) { // TODO Auto-generated method
	 * stub // 将这个自定义类型的相互比较设置为哈希值的比较，即若哈希值相同，则这两个对象就相等。 ChunkInfo test =
	 * (ChunkInfo)o; if(test.hashValue.equals(test.hashValue)) return 0; else
	 * return -1; }
	 */
	public String toString() {
		return this.chunk_id + " " + this.chunk_size + " "
				+ this.chunk_filename.toString() + " "
				+ this.hashValue.toString() + " " + this.chunk_num + " "
				+ this.chunk_filenum;

	}

}

这个CDC_RecordReader.java文件中的RecordReader定义了怎样从给定的文件中读取数据，这个类会被CDC_FileInputFormat调用。这个类需要继承RecordReader。在initialnize的时候，用rabin指纹找出整个文件的划分点，并把它们存到arraylist里。之后，在nextKeyValue这个方法中，循环读取文件的内容，分别以arraylist中的相邻两个i点之间的内容作为一个文件块。

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

import SerialAlgorithm.RabinHashFunction;

public class CDC_RecordReader extends RecordReader {

	public int chunkId;
	public FileSplit fileSplit;
	public int chunkSize = 8 * 1024; // 一个文件块的大小。
	public String filename; // 文件的名字。
	public FSDataInputStream fileIn; // 分布式文件系统的输入流。
	public Path filePath; // 分布式文件系统路径。
	public FileSystem fileSystem; // 分布式文件系统。
	public long start; // 文件的第一个字节位置。
	public long pos; // 文件访问位置。
	public long end; // 文件结束的位置。
	public byte buffer[]; // 文件内容缓存。
	public Configuration conf;
	public IntWritable key = new IntWritable(0);
	public byte[] tempbytes = new byte[2];
	public ChunkInfo value = new ChunkInfo(0, chunkSize, " ", 0, 0, " ",
			tempbytes);
	public int chunkMask; // 划分掩码
	private List list = new ArrayList(); // 存放文件块划分点的标记.
	private RabinHashFunction rabin = new RabinHashFunction(); // 用于计算rabin指纹.
	private long magicValue = 1111; // 随便设置的值.

	CDC_RecordReader() {

	}

	@Override
	public void close() throws IOException {
		// TODO Auto-generated method stub
		if (fileIn != null) {
			fileIn.close();
		}
	}

	@Override
	public IntWritable getCurrentKey() throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		return key;
	}

	@Override
	public ChunkInfo getCurrentValue() throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		return value;
	}

	@Override
	public float getProgress() throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		if (start == end) {
			return 0.0f;
		} else {
			return Math.min(1.0f, (pos - start) / (float) (end - start));
		}
	}

	@Override
	public void initialize(InputSplit arg0, TaskAttemptContext arg1)
			throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		conf = arg1.getConfiguration();
		this.fileSplit = (FileSplit) arg0;
		this.filePath = this.fileSplit.getPath();
		this.chunkId = 0;
		this.start = fileSplit.getStart();
		this.pos = this.start;

		try {
			this.fileSystem = filePath.getFileSystem(conf);
			this.filename = this.filePath.toString();
			this.fileIn = fileSystem.open(filePath);
			fileIn.seek(start);
			// 将文件内容写入out中，再从out中返回byte数组
			ByteArrayOutputStream out = new ByteArrayOutputStream();
			buffer = new byte[4096];
			int n=0;
			while(( n = fileIn.read(buffer)) != -1){
				out.write(buffer);
			}
			if(n==-1){
				fileIn.close();
			}
			buffer = out.toByteArray();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		this.markBytesArray(buffer, 10, 128);

	}

	@Override
	public boolean nextKeyValue() throws IOException, InterruptedException {
		// TODO Auto-generated method stub
		int i = this.chunkId;
		this.chunkId++;// 自增，在传参之后。
		if((i + 1) >= list.size())
			return false;
		key.set(i);
		value.blockBytes=new byte[(int) (list.get(i+1)-list.get(i))];
		for(int j = 0; j < value.blockBytes.length; j++){
			value.blockBytes[j] = buffer[(int) (list.get(i) + j)];
		}
		value.chunk_filename = filename;
		value.chunk_filenum = 1;
		value.chunk_num = 1;
		value.chunk_id = chunkId;
		return true;
	}

	/**
	 * 
	 * 通过设置的exp_chunk_size的值，来计算该比对多少位rabin指纹值.
	 * 
	 * 
	 * @return 比对rabin指纹值的位数
	 */
	private int calculateMask(int exp_chunk_size) {
		int a = 0;
		a = (int) (Math.log(exp_chunk_size) / Math.log(2));
		a = (int) Math.pow(2, a) - 1;
		return a;
	}

	/**
	 * 
	 * 通过rabin指纹，为文件划分子块,在bytes数组中做上划分位置的标记， 并将它们保存在list链表中。
	 * 
	 * 
	 * @param bytes
	 *            字节数组，用于存放读入文件的字节。
	 * @param step
	 *            窗口滑动步数。
	 * @param substring_size
	 * 			  窗口长度
	 */
	private void markBytesArray(byte bytes[], int step, int substring_size) {

		chunkMask = this.calculateMask(chunkSize);// 计算划分块数时需要的掩码.

		// 在此循环中，按步长substring_size遍历数组bytes，用rabin指纹对
		// 符合一定要求的位置做上标记，并把标记放入list链表中。
		list.add((long) 0);
		for (int i = 0; i < bytes.length; i += step) {
			byte test[] = null;// 读取指定步长的数组
			if (i + substring_size < bytes.length) {
				test = new byte[substring_size];
			} else
				test = new byte[bytes.length - i];

			// 将bytes数组的某一部分放入test数组中，用于计算指纹
			for (int j = 0; j < test.length; j++) {
				test[j] = bytes[i + j];
			}
			long temp = rabin.hash(test);// 计算rabin指纹
			temp = temp & chunkMask;// 得到指纹的后chunkMask位

			// 将指纹与预先设定好的magicValue进行比对
			// 若指纹值等于预先设置好的magicValue或两个划分点之间的大小已经超过了预先设定的划分大小
			// 则标记划分点
			if (temp == magicValue) {
				list.add((long) (i + test.length));
			} else
				continue;
		}
		if (list.get(list.size() - 1) != bytes.length) {
			list.add((long) (bytes.length - 1));
		}
	}

}

CDC_FileInputFormat只是简单的继承FileInputFormat，然后重写RecordReader方法，即可。在RecordReader中调用CDC_RecorderReader。

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

public class CDC_FileInputFormat extends
FileInputFormat{

	@Override
	public RecordReader createRecordReader(
			InputSplit arg0, TaskAttemptContext arg1) throws IOException,
			InterruptedException {
		// TODO Auto-generated method stub
		return new CDC_RecordReader();
	}

}

CDC_Hadoop是这个程序的主类，在这个类中包含Main函数以及继承了Map和Reduce的类，这里，Map完成了对每个文件计算MD5哈希值，然后Reduce统计哈希值相同的文件块，并将结论输出。在Main函数中设置我们刚才写过的类，文件的输入路径和输出路径都写死在代码里了。

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class CDC_Hadoop {

	public static class CDCMapper extends
			Mapper {
		MD5Util MD5 = new MD5Util();

		public void map(IntWritable key, ChunkInfo value, Context context)
				throws IOException, InterruptedException {
			String hashValue = MD5Util.getMD5String(value.blockBytes);
			Text keyOfReduce = new Text();
			keyOfReduce.set(hashValue);
			value.hashValue = hashValue;
			System.out.println(value.toString());
			context.write(keyOfReduce, new ChunkInfo(value.chunk_id,
					value.chunk_size, value.chunk_filename,
					value.chunk_filenum, value.chunk_num, value.hashValue,
					value.blockBytes));

		}
	}

	public static class CDCReducer extends
			Reducer {
		IntWritable temp = new IntWritable(0);// 测试用
		Text hashValue = new Text();
		int id = 1;

		public void reduce(Text key, Iterable values, Context context)
				throws IOException, InterruptedException {
			int countChunkNum = 0;
			int countFileNum = 0;
			String filename = " ";
			int i = 0;
			ChunkInfo one = new ChunkInfo();
			// 遍历values。
			for (ChunkInfo chunk : values) {
				countChunkNum++;
				if(i == 0){
					filename = chunk.chunk_filename;
				}
				if(chunk.chunk_filename == filename){
					countFileNum++;					
				}
				System.out.println(chunk.toString());
				i++;
			}

			one.chunk_filename = filename;
			one.chunk_filenum = countFileNum;
			one.chunk_num = countChunkNum;
			one.hashValue = key.toString();
			one.chunk_id = id;
			temp.set(0);
			hashValue.set(one.toString());
			context.write(hashValue, temp);
			id++;

		}
	}

	public static void main(String[] args) throws Exception {
		long startTime=System.currentTimeMillis();   //获取开始时间  
		Configuration conf = new Configuration();
		Job job = new Job(conf, "CDC");
		job.setJarByClass(CDC_Hadoop.class);
		
//		Path in = new Path("hdfs://localhost:9000/user/justyoung/input");
//		Path in2 = new Path("hdfs://localhost:9000/user/justyoung/input2");
//		Path out = new Path("hdfs://localhost:9000/user/justyoung/CDCoutput");
		Path in = new Path("/home/justyoung/input");
		Path in2 = new Path("/home/justyoung/input2");
		Path out = new Path("/home/justyoung/CDCoutput");
		
		FileInputFormat.setInputPaths(job, in, in2);
		FileOutputFormat.setOutputPath(job, out);

		job.setMapperClass(CDCMapper.class);
		// job.setCombinerClass(FSPReducer.class);
		job.setReducerClass(CDCReducer.class);
		job.setInputFormatClass(CDC_FileInputFormat.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(ChunkInfo.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		long endTime=System.currentTimeMillis(); //获取结束时间  
		if(job.waitForCompletion(true))
			System.out.println("程序运行时间： "+(endTime-startTime)+"ms");   
		System.exit(job.waitForCompletion(true) ? 0 : 1);

	}

}

来看看程序运行的效果吧：

1.这里我上传两个相同的文件到HDFS文件系统，使用如下的命令：

hadoop dfs -put ~/RemoteSSH.py input
hadoop dfs -put ~/RemoteSSH.py input2

2.我使用了Eclipse的插件，所以可以在Eclipse中直接运行程序，效果如下图所示：

3.运行的结果，可看到我们的两个文件同属于一个文件块：

最后，在这里附上计算MD5值和Rabin指纹的源代码，这两段代码都是参考其他博客的，但是忘记出处了，不好意思。

首先是计算MD5的代码：

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class MD5Util {
	/**
	 * 默认的密码字符串组合，用来将字节转换成 16 进制表示的字符,apache校验下载的文件的正确性用的就是默认的这个组合
	 */
	protected static char hexDigits[] = { '0', '1', '2', '3', '4', '5', '6',
			'7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };

	protected static MessageDigest messagedigest = null;
	static {
		try {
			messagedigest = MessageDigest.getInstance("MD5");
		} catch (NoSuchAlgorithmException nsaex) {
			System.err.println(MD5Util.class.getName()
					+ "初始化失败，MessageDigest不支持MD5Util。");
			nsaex.printStackTrace();
		}
	}
	
	/**
	 * 生成字符串的md5校验值
	 * 
	 * @param s
	 * @return
	 */
	public static String getMD5String(String s) {
		return getMD5String(s.getBytes());
	}
	
	/**
	 * 判断字符串的md5校验码是否与一个已知的md5码相匹配
	 * 
	 * @param password 要校验的字符串
	 * @param md5PwdStr 已知的md5校验码
	 * @return
	 */
	public static boolean checkPassword(String password, String md5PwdStr) {
		String s = getMD5String(password);
		return s.equals(md5PwdStr);
	}
	
	/**
	 * 生成文件的md5校验值
	 * 
	 * @param file
	 * @return
	 * @throws IOException
	 */
	public static String getFileMD5String(File file) throws IOException {		
		InputStream fis;
	    fis = new FileInputStream(file);
	    byte[] buffer = new byte[1024];
	    int numRead = 0;
	    while ((numRead = fis.read(buffer)) > 0) {
	    	messagedigest.update(buffer, 0, numRead);
	    }
	    fis.close();
		return bufferToHex(messagedigest.digest());
	}

	/**
	 * JDK1.4中不支持以MappedByteBuffer类型为参数update方法，并且网上有讨论要慎用MappedByteBuffer，
	 * 原因是当使用 FileChannel.map 方法时，MappedByteBuffer 已经在系统内占用了一个句柄，
	 * 而使用 FileChannel.close 方法是无法释放这个句柄的，且FileChannel有没有提供类似 unmap 的方法，
	 * 因此会出现无法删除文件的情况。
	 * 
	 * 不推荐使用
	 * 
	 * @param file
	 * @return
	 * @throws IOException
	 */
	public static String getFileMD5String_old(File file) throws IOException {
		FileInputStream in = new FileInputStream(file);
		FileChannel ch = in.getChannel();
		MappedByteBuffer byteBuffer = ch.map(FileChannel.MapMode.READ_ONLY, 0,
				file.length());
		messagedigest.update(byteBuffer);
		return bufferToHex(messagedigest.digest());
	}

	public static String getMD5String(byte[] bytes) {
		messagedigest.update(bytes);
		return bufferToHex(messagedigest.digest());
	}

	private static String bufferToHex(byte bytes[]) {
		return bufferToHex(bytes, 0, bytes.length);
	}

	private static String bufferToHex(byte bytes[], int m, int n) {
		StringBuffer stringbuffer = new StringBuffer(2 * n);
		int k = m + n;
		for (int l = m; l < k; l++) {
			appendHexPair(bytes[l], stringbuffer);
		}
		return stringbuffer.toString();
	}

	private static void appendHexPair(byte bt, StringBuffer stringbuffer) {
		char c0 = hexDigits[(bt & 0xf0) >> 4];// 取字节中高 4 位的数字转换, >>> 为逻辑右移，将符号位一起右移,此处未发现两种符号有何不同 
		char c1 = hexDigits[bt & 0xf];// 取字节中低 4 位的数字转换 
		stringbuffer.append(c0);
		stringbuffer.append(c1);
	}
	
}

计算Rabin指纹的代码：

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.net.URL;

/**
 * We compute the checksum using Broder s implementation of
 * Rabin s fingerprinting algorithm. Fingerprints offer 
 * provably strong probabilistic guarantees that two 
 * different strings will not have the same fingerprint. 
 * Other checksum algorithms, such as MD5 and SHA, do not 
 * offer such provable guarantees, and are also more 
 * expensive to compute than Rabin fingerprint.
 *
 * A disadvantage is that these faster functions are 
 * efficiently invertible (that is, one can easily build an 
 * URL that hashes to a particular location), a fact that  
 * might be used by malicious users to nefarious purposes.
 *
 * Using the Rabin's fingerprinting function, the probability of
 * collision of two strings s1 and s2 can be bounded (in a adversarial
 * model for s1 and s2) by max(|s1|,|s2|)/2**(l-1), where |s1| is the 
 * length of the string s1 in bits.
 * 
 * The advantage of choosing Rabin fingerprints (which are based on random
 * irreducible polynomials) rather than some arbitrary hash function is that
 * their probability of collision os well understood. Furthermore Rabin 
 * fingerprints can be computed very efficiently in software and we can
 * take advantage of their algebraic properties when we compute the
 * fingerprints of "sliding windows".
 *
 * M. O. Rabin
 * Fingerprinting by random polynomials.
 * Center for Research in Computing Technology
 * Harvard University Report TR-15-81
 * 1981
 * 
 * A. Z. Broder
 * Some applications of Rabin's fingerprinting method
 * In R.Capicelli, A. De Santis and U. Vaccaro editors
 * Sequences II:Methods in Communications, Security, and Computer Science
 * pages 143-152
 * Springer-Verlag
 * 1993
 *
 */
public final class RabinHashFunction implements Serializable {

        private final static int P_DEGREE = 64;
        private final static int READ_BUFFER_SIZE = 2048;
        private final static int X_P_DEGREE = 1 << (P_DEGREE - 1);

       /* public static void main(String args[]) {
                RabinHashFunction h = new RabinHashFunction();
                System.out.println(h.hash(args[0]));
        }
*/
        private final byte[] buffer;

        //private long POLY = Long.decode("0x0060034000F0D50A").longValue();
        private long POLY = Long.decode("0x004AE1202C306041").longValue() | 1<<63;

        private final long[] table32, table40, table48, table54;
        private final long[] table62, table70, table78, table84;

        /**
         *  Constructor for the RabinHashFunction64 object
         *
         *@param  P  Description of the Parameter
         */
        public RabinHashFunction() {
                table32 = new long[256];
                table40 = new long[256];
                table48 = new long[256];
                table54 = new long[256];
                table62 = new long[256];
                table70 = new long[256];
                table78 = new long[256];
                table84 = new long[256];
                buffer = new byte[READ_BUFFER_SIZE];
                long[] mods = new long[P_DEGREE];
                mods[0] = POLY;
                for (int i = 0; i < 256; i++) {
                        table32[i] = 0;
                        table40[i] = 0;
                        table48[i] = 0;
                        table54[i] = 0;
                        table62[i] = 0;
                        table70[i] = 0;
                        table78[i] = 0;
                        table84[i] = 0;
                }
                for (int i = 1; i < P_DEGREE; i++) {
                        mods[i] = mods[i - 1] << 1;
                        if ((mods[i - 1] & X_P_DEGREE) != 0) {
                                mods[i] = mods[i] ^ POLY;
                        }
                }
                for (int i = 0; i < 256; i++) {
                        long c = i;
                        for (int j = 0; j < 8 && c != 0; j++) {
                                if ((c & 1) != 0) {
                                        table32[i] = table32[i] ^ mods[j];
                                        table40[i] = table40[i] ^ mods[j + 8];
                                        table48[i] = table48[i] ^ mods[j + 16];
                                        table54[i] = table54[i] ^ mods[j + 24];
                                        table62[i] = table62[i] ^ mods[j + 32];
                                        table70[i] = table70[i] ^ mods[j + 40];
                                        table78[i] = table78[i] ^ mods[j + 48];
                                        table84[i] = table84[i] ^ mods[j + 56];
                                }
                                c >>>= 1;
                        }
                }
        }

        /**
         *  Return the Rabin hash value of an array of bytes.
         *
         *@param  A  the array of bytes
         *@return    the hash value
         */
        public long hash(byte[] A) {
                return hash(A, 0, A.length, 0);
        }

        /**
         *  Description of the Method
         *
         *@param  A       Description of the Parameter
         *@param  offset  Description of the Parameter
         *@param  length  Description of the Parameter
         *@param  w       Description of the Parameter
         *@return         Description of the Return Value
         */
        private long hash(byte[] A, int offset, int length, long ws) {
                long w = ws;
                int start = length % 8;
                for (int s = offset; s < offset + start; s++) {
                        w = (w << 8) ^ (A[s] & 0xFF);
                }
                for (int s = offset + start; s < length + offset; s += 8) {
                        w =
                                table32[(int) (w & 0xFF)]
                                        ^ table40[(int) ((w >>> 8) & 0xFF)]
                                        ^ table48[(int) ((w >>> 16) & 0xFF)]
                                        ^ table54[(int) ((w >>> 24) & 0xFF)]
                                        ^ table62[(int) ((w >>> 32) & 0xFF)]
                                        ^ table70[(int) ((w >>> 40) & 0xFF)]
                                        ^ table78[(int) ((w >>> 48) & 0xFF)]
                                        ^ table84[(int) ((w >>> 56) & 0xFF)]
                                        ^ (long) (A[s] << 56)
                                        ^ (long) (A[s + 1] << 48)
                                        ^ (long) (A[s + 2] << 40)
                                        ^ (long) (A[s + 3] << 32)
                                        ^ (long) (A[s + 4] << 24)
                                        ^ (long) (A[s + 5] << 16)
                                        ^ (long) (A[s + 6] << 8)
                                        ^ (long) (A[s + 7]);
                }
                return w;
        }

        /**
         *  Return the Rabin hash value of an array of chars.
         *
         *@param  A  the array of chars
         *@return    the hash value
         */
        public long hash(char[] A) {
                long w = 0;
                int start = A.length % 4;
                for (int s = 0; s < start; s++) {
                        w = (w << 16) ^ (A[s] & 0xFFFF);
                }
                for (int s = start; s < A.length; s += 4) {
                        w =
                                table32[(int) (w & 0xFF)]
                                        ^ table40[(int) ((w >>> 8) & 0xFF)]
                                        ^ table48[(int) ((w >>> 16) & 0xFF)]
                                        ^ table54[(int) ((w >>> 24) & 0xFF)]
                                        ^ table62[(int) ((w >>> 32) & 0xFF)]
                                        ^ table70[(int) ((w >>> 40) & 0xFF)]
                                        ^ table78[(int) ((w >>> 48) & 0xFF)]
                                        ^ table84[(int) ((w >>> 56) & 0xFF)]
                                        ^ ((long) (A[s] & 0xFFFF) << 48)
                                        ^ ((long) (A[s + 1] & 0xFFFF) << 32)
                                        ^ ((long) (A[s + 2] & 0xFFFF) << 16)
                                        ^ ((long) (A[s + 3] & 0xFFFF));
                }
                return w;
        }

        /**
         *  Computes the Rabin hash value of the contents of a file.
         *
         *@param  f                       the file to be hashed
         *@return                         the hash value of the file
         *@throws  FileNotFoundException  if the file cannot be found
         *@throws  IOException            if an error occurs while reading the file
         */
        public long hash(File f) throws FileNotFoundException, IOException {
                FileInputStream fis = new FileInputStream(f);
                try {
                        return hash(fis);
                } finally {
                        fis.close();
                }
        }

        /**
         *  Computes the Rabin hash value of the data from an InputStream.
         *
         *@param  is            the InputStream to hash
         *@return               the hash value of the data from the InputStream
         *@throws  IOException  if an error occurs while reading from the
         *      InputStream
         */
        public long hash(InputStream is) throws IOException {
                long hashValue = 0;
                int bytesRead;
                synchronized (buffer) {
                        while ((bytesRead = is.read(buffer)) > 0) {
                                hashValue = hash(buffer, 0, bytesRead, hashValue);
                        }
                }
                return hashValue;
        }

        /**
         *  Returns the Rabin hash value of an array of integers. This method is the
         *  most efficient of all the hash methods, so it should be used when
         *  possible.
         *
         *@param  A  array of integers
         *@return    the hash value
         */
        public long hash(int[] A) {
                long w = 0;
                int start = 0;
                if (A.length % 2 == 1) {
                        w = A[0] & 0xFFFFFFFF;
                        start = 1;
                }
                for (int s = start; s < A.length; s += 2) {
                        w =
                                table32[(int) (w & 0xFF)]
                                        ^ table40[(int) ((w >>> 8) & 0xFF)]
                                        ^ table48[(int) ((w >>> 16) & 0xFF)]
                                        ^ table54[(int) ((w >>> 24) & 0xFF)]
                                        ^ table62[(int) ((w >>> 32) & 0xFF)]
                                        ^ table70[(int) ((w >>> 40) & 0xFF)]
                                        ^ table78[(int) ((w >>> 48) & 0xFF)]
                                        ^ table84[(int) ((w >>> 56) & 0xFF)]
                                        ^ ((long) (A[s] & 0xFFFFFFFF) << 32)
                                        ^ (long) (A[s + 1] & 0xFFFFFFFF);
                }
                return w;
        }

        /**
         *  Returns the Rabin hash value of an array of longs. This method is the
         *  most efficient of all the hash methods, so it should be used when
         *  possible.
         *
         *@param  A  array of integers
         *@return    the hash value
         */
        public long hash(long[] A) {
                long w = 0;
                for (int s = 0; s < A.length; s++) {
                        w =
                                table32[(int) (w & 0xFF)]
                                        ^ table40[(int) ((w >>> 8) & 0xFF)]
                                        ^ table48[(int) ((w >>> 16) & 0xFF)]
                                        ^ table54[(int) ((w >>> 24) & 0xFF)]
                                        ^ table62[(int) ((w >>> 32) & 0xFF)]
                                        ^ table70[(int) ((w >>> 40) & 0xFF)]
                                        ^ table78[(int) ((w >>> 48) & 0xFF)]
                                        ^ table84[(int) ((w >>> 56) & 0xFF)]
                                        ^ (A[s]);
                }
                return w;
        }

        /**
         *  Description of the Method
         *
         *@param  obj              Description of the Parameter
         *@return                  Description of the Return Value
         *@exception  IOException  Description of the Exception
         */
        public long hash(Object obj) throws IOException {
                return hash((Serializable) obj);
        }

        /**
         *  Returns the Rabin hash value of a serializable object.
         *
         *@param  obj           the object to be hashed
         *@return               the hash value
         *@throws  IOException  if serialization fails
         */
        public long hash(Serializable obj) throws IOException {
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                ObjectOutputStream oos = null;
                try {
                        oos = new ObjectOutputStream(baos);
                        oos.writeObject(obj);
                        return hash(baos.toByteArray());
                } finally {
                        oos.close();
                        baos.close();
                        oos = null;
                        baos = null;
                }
        }

        /**
         *  Computes the Rabin hash value of a String.
         *
         *@param  s  the string to be hashed
         *@return    the hash value
         */
        public long hash(String s) {
                return hash(s.toCharArray());
        }

        /**
         *  Computes the Rabin hash value of the contents of a file, specified by
         *  URL.
         *
         *@param  url           the URL of the file to be hashed
         *@return               the hash value of the file
         *@throws  IOException  if an error occurs while reading from the URL
         */
        public long hash(URL url) throws IOException {
                InputStream is = url.openStream();
                try {
                        return hash(is);
                } finally {
                        is.close();
                }
        }

}

你可能感兴趣的:(Hadoop,Java,hadoop,算法)

大二下开始学数据结构与算法--07,单项循环链表的实现爱我的你不说话链表数据结构
自习所完成的任务单向循环链表代码的实现和测验任务学课程到p28复现相关代码感悟其实这个教程上的观念，跟我刚开始理解想的并不一样，我以为会是：头节点使实例化的节点的循环链表，但是，教程给的更像是存在头节点，但头节点没有实际意义的添加了尾节点单项循环链表（跟之前单向不循环链表相比，更像是只多了一尾节点）。#include#include#includeusingnamespacestd;//存在头节点
React 18 如何定义变量，及赋值与渲染痴心阿文 React react.js javascript 前端
React18中，定义变量、赋值和渲染的方式因变量的用途和作用域不同而有所差异，下面为你详细介绍不同场景下的实现方法。1.函数组件内定义普通变量在函数组件里，你可以像在普通JavaScript函数中一样定义变量，并且这些变量会在每次组件重新渲染时重新创建。importReactfrom'react';constMyComponent=()=>{//定义普通变量并赋值constmessage='He
Java基础笔记（小白友好版）代码什么的真不会呀 java 笔记开发语言
Java基础笔记（小白友好版）1.Java简介Java是一种广泛使用的计算机编程语言，由詹姆斯·高斯林（JamesGosling）在1995年创建Java的口号是"一次编写，到处运行"（WriteOnce,RunAnywhere）Java程序需要先编译成字节码（.class文件），然后在Java虚拟机（JVM）上运行主要特点：面向对象：一切皆对象，代码更清晰易懂平台无关性：可以在Windows、M
使用Three.js渲染器创建炫酷3D场景 Front_Yue 3D技术实践指南 javascript three.js 3d
引言在当今数字化的时代，3D图形技术正以其独特的魅力在各个领域掀起波澜。从影视制作到游戏开发，从虚拟现实到网页交互，3D场景以其强烈的视觉冲击力和沉浸式的体验，成为了吸引用户、传达信息的重要手段。而Three.js，作为一款功能强大且广受欢迎的JavaScript3D库，为我们提供了便捷、高效的途径来创建令人炫目的3D场景。本文将深入探讨使用Three.js渲染器创建炫酷3D场景的方方面面，带领读
java中vector和list_java中vector和list的区别 Creamy络
java中vector和list的区别发布时间：2020-06-1917:07:11来源：亿速云阅读：106作者：元一vector的概念Vector类是在java中可以实现自动增长的对象数组，vector在C++标准模板库中的部分内容，它是一个多功能的，能够操作多种数据结构和算法的模板类和函数库。vector的使用连续存储结构：vector是可以实现动态增长的对象数组，支持对数组高效率的访问和在数
spring5-介绍Spring框架 m0_74824845 面试学习路线阿里巴巴 spring java 后端
Spring框架是一个Java平台，它为开发Java应用程序提供全面的基础架构支持。Spring负责基础架构，因此您可以专注于应用程序的开发。Spring可以让您从“plainoldJavaobjects”（POJO）中构建应用程序和通过非侵入性的POJO实现企业应用服务。此功能适用于JavaSE的编程模型，全部的或部分的适应JavaEE模型。2.1依赖注入和控制反转Java应用程序-这是一个宽松
UML类图综合实验三 minaMoonGirl uml
1.使用简单工厂模式模拟女娲(Nvwa)造人(Person)，如果传入参数“M”，则返回一个Man对象，如果传入参数“W”，则返回一个Woman对象，用Java语言实现该场景。现需要增加一个新的Robot类，如果传入参数“R”，则返回一个Robot对象，对代码进行修改并注意“女娲”的变化。2.现需要设计一个程序来读取多种不同类型的图片格式，针对每一种图片格式都设计一个图片读取器(ImageRead
JAVA网络通信 MeyrlNotFound java 开发语言
IP地址与InetAddress类在Java网络通信中，IP地址是设备在网络中的唯一标识，而InetAddress类则是Java对IP地址的高层表示，它封装了IP地址和域名的相关信息，并提供了一系列方法来获取和操作这些信息。以下是对IP地址与InetAddress类的详细解析：一、IP地址基础•定义：IP（InternetProtocol）地址是分配给上网设备的唯一标志，用于指明因特网上的一台计算
【C++】priority_queue的使用及模拟实现（含仿函数介绍）梓䈑 C++学习 c++开发语言
文章目录前言一、priority_queue的介绍二、priority_queue的使用三、仿函数四、priority_queue的模拟实现前言一、priority_queue的介绍（优先级队列是默认使用vector作为其底层存储数据的容器适配器，在vector上又使用了堆算法将vector中元素构造成堆的结构，因此priority_queue就是堆）二、priority_queue的使用及模拟实
大二下开始学数据结构与算法--06，判断两个节点是否相交，删除链表倒数第K个节点爱我的你不说话链表数据结构
自习所完成的任务完成函数判断单项链表是否相交的代码编写和测试。完成函数删除倒数第K个节点的代码编写和测试。感悟其实这篇是昨天晚上写的，但是昨天下午在实验室呆了一下，然后写完这些代码后感觉脑袋昏沉，晚上十点就回宿舍了，想着看会儿书，但是，没看成，还是玩手机了。感觉坚持做一件事，还挺难的，老是为自己找逃避的借口，比如说周三晚上跟舍友出去吃，就放下了写代码的每日任务。我在想，是不是应该改变一下观念，以进
【致100位技术同路人：代码无边界，GIS×编程的双向奔赴！】喆星时瑜留言感谢你们的关注
今天在地理信息科学的坐标系里标记了一个闪亮锚点——我的CSDN粉丝破百啦！✨破百节点亮起的不只是GISer，还有无数程序员伙伴的坐标！感谢你们的关注，是你们的每一次的让这些文章有了生命力，每一次的都化作我深夜调试的动力。作为穿梭在GIS与通用编程之间的开发者，我始终相信：空间算法是经纬度的代码诗，而工程思维是让地理智能落地的坐标系。未来会继续用PostGIS的严谨写空间索引，用React/Vue的
oceanbase与mysql性能对比_金融业分布式数据库:TDSQL、HotDB、OceanBase等原理、POC性能对比及选择是...... 高中物理宋老师
本帖最后由Amygo于2020-3-1501:33编辑1、分布式的实现，是通过中间件实现分布式，还是源码级别引入分布式算法实现的？解答：(1)分布式数据库是至少由计算节点、存储节点、管理平台、备份还原程序四个部分组成，从数据库系统理论知识上说分成：全局自治和场地自治，也粗略认为：全局可理解为计算节点、场地可理解为存储节点(2)这个问题的标题“中间件实现分布式还是源码级别引入分布式算法”这个说法存在
SpringBoot JVM性能调优 AI天才研究院 Python实战 Java实战自然语言处理人工智能语言模型编程实践开发语言架构设计 spring boot
作者：禅与计算机程序设计艺术1.简介SpringBoot是当前最流行的基于Java的Web框架，它为开发人员提供了很多便利，包括快速配置，强大的自动化特性等。但是，它的默认设置往往会给应用程序带来不小的性能开销。本文将讨论SpringBoot的默认设置，并着重探讨如何优化SpringBoot在JVM上的性能。2.JVM默认设置介绍在SpringBoot中，可以用application.proper
04.文本标签龙哥带你学编程 #html 前端
一、文本简介1、页面组成元素1）以淘宝购物官网为例，分析网页：在淘宝购物官网的首页上，我们可以看到它是由超链接，文字，图片等元素构成。2）页面组成元素①一个静态页面绝大部分由以下四种元素组成：文本图片超链接音频和视频②思考：符合以下特点的网页是静态还是动态页面？带有音频和视频带有flash动画带有css动画带有JavaScript特效不是。动态页面和静态页面区别在于：是否用到了后端技术，以及是否与
DeepSeek 模型未来怎么走？技术创新、行业落地全解析！网罗开发 AI 大模型人工智能人工智能职场和发展
网罗开发（小红书、快手、视频号同名）大家好，我是展菲，目前在上市企业从事人工智能项目研发管理工作，平时热衷于分享各种编程领域的软硬技能知识以及前沿技术，包括iOS、前端、HarmonyOS、Java、Python等方向。在移动端开发、鸿蒙开发、物联网、嵌入式、云原生、开源等领域有深厚造诣。图书作者：《ESP32-C3物联网工程开发实战》图书作者：《SwiftUI入门，进阶与实战》超级个体：CO
深度优先搜索（DFS）完全解析：从原理到 Java 实战 my_realmy Java基础知识深度优先 java 算法
深度优先搜索（DFS）完全解析：从原理到Java实战@TOC作为一名程序员，你是否遇到过需要在复杂的图结构中寻找路径、检测环，或者进行树遍历的问题？深度优先搜索（Depth-FirstSearch,DFS）作为一种经典的图遍历算法，能够轻松应对这些场景。在CSDN社区中，技术文章的受欢迎程度往往取决于内容的实用性、代码的可读性以及图文结合的讲解方式。因此，本文将为你带来一篇深入浅出、图文并茂、代码
本地锁 vs 分布式锁详解重生之我在成电转码 java 系统锁分布式锁
一、什么是本地锁？本地锁（LocalLock）指的是单机环境下使用Java/JVM自带的锁机制，实现线程之间的互斥和同步。✅本地锁的常见实现：锁类型说明synchronizedJVM内置，修饰方法或代码块，重量级锁，自动释放ReentrantLockJUC提供，支持可重入、可中断、公平锁、Condition等StampedLock支持读写锁和乐观读，适合读多写少场景ReadWriteLock读写分
【Apache Tomcat信息泄露漏洞】猫饭_ACE 业务所需 tomcat apache java
一、漏洞详情ApacheTomcat是一个流行的开源Web服务器和Java代码的Servlet容器。9月28日，Apache发布安全公告，公开披露了Tomcat中的一个信息泄露漏洞（CVE-2021-43980）。由于某些Tomcat版本中的阻塞式读写的简化实现导致存在并发错误（极难触发），可能使客户端连接共享一个Http11Processor实例，导致响应或部分响应被错误的客户端接收，造成信息泄
springboot 项目linux启停脚本 lovecode2011 linux 运维服务器
shutdown.shjps-lvm|grepxxx|awk'{print$1}'|xargskill-15xxx-进程号或项目名称(或名称关键字)startup.shls|grep"xxx"|grep-iv"bak"|tail-n1|xargs-n1-l{}nohupjava-jar{}-Dspring.config.location=/xxx/xxx/config/application-de
java面试题,什么是动态代理？、动态代理和静态代理有什么区别？说一下反射机制？JDK Proxy 和 CGLib 有什么区别？动态代理的底层述雾学java java 开发语言 java面试题反射 java核心基础
什么是动态代理？动态代理是在程序运行期，动态的创建目标对象的代理对象，并对目标对象中的方法进行功能性增强的一种技术。在生成代理对象的过程中，目标对象不变，代理对象中的方法是目标对象方法的增强方法。可以理解为运行期间，对象中方法的动态拦截，在拦截方法的前后执行功能操作。动态代理的常见使用场景有：统计每个api的请求耗时；统一的日志输出；校验被调用的api是否已经登录和权限鉴定；SpringAOP。动
网络编程、URI和URL的区别、TCP/IP协议、IP和端口、URLConnection 述雾学java Java核心基础 tcp/ip java java基础网络编程
DAY12.1Java核心基础网络编程在互联网时代，网络在生活中处处可见，javaWeb占据了很大一部分那如何实现javaWeb编程呢？Web编程就是运行在同一个网络下面的终端，使得它们之间可以进行数据传输计算机网络基本知识计算机网络是通过硬件设施，传输媒介把不同物理地址上的计算机网络进行连接，形成一个资源共享和数据传输的网络系统两台终端进行连接需要遵守规定的网络协议语法：数据信息的结构语义：描述
Rasa Webchat：开源聊天机器人组件乌昱有Melanie
RasaWebchat：开源聊天机器人组件rasa-webchatAfeature-richchatwidgetforRasaandBotfront项目地址:https://gitcode.com/gh_mirrors/ra/rasa-webchatRasaWebchat是一个开源项目，旨在为Rasa或Botfront开发的虚拟助手提供在任意网站上部署的聊天窗口组件。该项目主要使用JavaScri
贪心算法（10）（java）跳跃游戏奋进的小暄贪心算法 java 游戏
题目：给定一个长度为n的0索引整数数组nums。初始位置为nums[0]。每个元素nums[i]表示从索引i向前跳转的最大长度。换句话说，如果你在nums[i]处,你可以跳转到任意nums[i+j]处:1.0=n-1)//判断是否以经跳到最后一个位置{returnret;}for(inti=left;i<=right;i++)//更新下一层最右端点{maxPos=Math.max(maxPos,n
编写脚本在Linux下启动、停止SpringBoot工程流烟默系统运维 Linux全面入门 linux spring boot shell
【1】启动命令nohupjava-jaryour-application.jar>/dev/null2>&1&>/dev/null2>&1：这条命令将标准输出和标准错误都重定向到/dev/null，这意味着它们不会输出到控制台或任何文件。这样做是因为我们希望所有日志都由Logback处理并写入到配置文件中指定的日志文件里。然而，如果你想要保留控制台输出（例如，对于调试目的），你可以省略这部分重定向
Java类文档化：使用Javadoc注释 AR新视野 Javadoc 文档化类方法数据成员
Java类文档化：使用Javadoc注释背景简介在软件开发过程中，代码的可读性和可维护性是至关重要的。为了帮助其他开发者更好地理解代码的用途和使用方式，编写清晰的文档是非常必要的。在Java中，Javadoc注释提供了一种标准的方式来记录和生成类、方法和数据成员的文档。使用Javadoc注释进行类文档化在Java中，有三种风格的注释，分别是单行注释、多行注释和Javadoc注释。Javadoc注释
Java编程：从入门到实践 AR新视野 Java Scanner类分隔符增量开发字符串操作
背景简介本文将深入探讨Java编程中的标准类使用，特别是Scanner类的实用性和灵活性。通过实例和代码分析，我们将展示如何更有效地使用Scanner类进行用户输入处理，以及如何通过设置分隔符来接收用户输入的完整数据。此外，文章还将介绍增量开发技术在软件开发中的应用，并通过一个简单的Java程序实例，讨论如何设计和实现程序，以及如何在开发过程中考虑到用户体验。使用Scanner类获取用户输入Jav
视频管理平台：应急安全生产的坚实护盾智联视频超融合平台音视频安全人工智能视频编解码网络协议
在应急安全生产中，视频管理平台作为现代科技的重要组成部分，发挥着不可替代的作用。它不仅能够实时监测生产环境，还能在事故发生时提供关键信息，帮助企业快速响应、降低损失。以下是视频管理平台在应急安全生产中的具体作用：一、实时监控与风险预警1、全方位监控：通过部署高清摄像头，覆盖生产车间、仓库、设备区等关键区域，实现无死角监控，确保安全隐患无处遁形。2、智能分析：结合AI算法，自动识别异常行为（如人员违
算法-枚举 Java版蜡笔小新算法算法
信息在计算机之间的演示计算机的电路由逻辑门电路组成。一个逻辑门电路可以看成一个开关，每个开关的状态是“开"(高电位)或“关”(低电位)，即对应于或0二进制数的一位，取值只能是0或1，称为一个“比特”(bit)，简写:b八个二进制位称为一个“字节”(byte),简写:B1024(2的10次方)字节称为1KB，1024KB称作1MB(1兆)，1024MB称作1GB，1024GB0和1足以表示和传播各种
Java 双亲委派模型（Parent Delegation Model）重生之我在成电转码 java 开发语言 jvm
一、什么是双亲委派模型？双亲委派模型是Java类加载器（ClassLoader）的一种设计机制：✅避免重复加载✅保证核心类安全、避免被篡改✅提高类加载效率核心思想：类加载请求从子加载器逐级向上委托父加载器，只有父加载器加载失败（ClassNotFoundException）后，子加载器才会尝试自己加载。二、双亲委派的加载流程（核心）当某个类加载器接收到类加载请求时：1️⃣先检查自己是否加载过（缓存
java基础--序列化与反序列化的概念是什么？阿硕的技术时间【学习笔记】java 开发语言
经典总结序列化就是把Java对象变成一串字节流，字节流就像是一种“通用语言”，可以在不同的计算机间传递。这样做的主要目的是保存对象的状态，以便以后可以恢复。反序列化则是把这些字节流重新变回Java对象，恢复对象的状态，方便程序继续使用它。详情内容1.什么是序列化？序列化是将Java对象转换为字节流的过程。字节流是一个平台无关的格式，可以在不同的计算机系统间传输。序列化的主要目的是将对象的状态保存下
Dom 周华华 JavaScript html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
【Spark九十六】RDD API之combineByKey bit1129 spark
1. combineByKey函数的运行机制 RDD提供了很多针对元素类型为(K,V)的API，这些API封装在PairRDDFunctions类中，通过Scala隐式转换使用。这些API实现上是借助于combineByKey实现的。combineByKey函数本身也是RDD开放给Spark开发人员使用的API之一首先看一下combineByKey的方法说明：
msyql设置密码报错：ERROR 1372 (HY000): 解决方法详解 daizj mysql 设置密码
MySql给用户设置权限同时指定访问密码时，会提示如下错误： ERROR 1372 (HY000): Password hash should be a 41-digit hexadecimal number；问题原因：你输入的密码是明文。不允许这么输入。解决办法：用select password('你想输入的密码');查询出你的密码对应的字符串，然后
路漫漫其修远兮吾将上下而求索周凡杨学习思索
王国维在他的《人间词话》中曾经概括了为学的三种境界古今之成大事业、大学问者，罔不经过三种之境界。“昨夜西风凋碧树。独上高楼，望尽天涯路。”此第一境界也。“衣带渐宽终不悔，为伊消得人憔悴。”此第二境界也。“众里寻他千百度，蓦然回首，那人却在灯火阑珊处。”此第三境界也。学习技术，这也是你必须经历的三种境界。第一层境界是说，学习的路是漫漫的，你必须做好充分的思想准备，如果半途而废还不如不要开始。这里，注
Hadoop(二)对话单的操作朱辉辉33 hadoop
Debug： 1、 A = LOAD '/user/hue/task.txt' USING PigStorage(' ') AS (col1,col2,col3); DUMP A; //输出结果前几行示例： (>ggsnPDPRecord(21),,) (-->recordType(0),,) (-->networkInitiation(1),,)
web报表工具FineReport常用函数的用法总结（日期和时间函数）老A不折腾 finereport 报表工具 web开发
web报表工具FineReport常用函数的用法总结（日期和时间函数）说明：凡函数中以日期作为参数因子的，其中日期的形式都必须是yy/mm/dd。而且必须用英文环境下双引号(" ")引用。 DATE DATE(year,month,day):返回一个表示某一特定日期的系列数。 Year:代表年，可为一到四位数。 Month:代表月份。
c++ 宏定义中的##操作符墙头上一根草 C++
#与##在宏定义中的--宏展开 #include <stdio.h> #define f(a,b) a##b #define g(a) #a #define h(a) g(a) int main() { &nbs
分析Spring源代码之，DI的实现 aijuans spring DI 现源代码
(转) 分析Spring源代码之，DI的实现 2012/1/3 by tony 接着上次的讲，以下这个sample [java] view plain copy print
for循环的进化 alxw4616 JavaScript
// for循环的进化 // 菜鸟 for (var i = 0; i < Things.length ; i++) { // Things[i] } // 老鸟 for (var i = 0, len = Things.length; i < len; i++) { // Things[i] } // 大师 for (var i = Things.le
网络编程Socket和ServerSocket简单的使用百合不是茶网络编程基础 IP地址端口
网络编程;TCP/IP协议网络:实现计算机之间的信息共享,数据资源的交换协议:数据交换需要遵守的一种协议,按照约定的数据格式等写出去端口:用于计算机之间的通信每运行一个程序，系统会分配一个编号给该程序，作为和外界交换数据的唯一标识 0~65535 查看被使用的
JDK1.5 生产消费者 bijian1013 java thread 生产消费者 java多线程
ArrayBlockingQueue：一个由数组支持的有界阻塞队列。此队列按 FIFO（先进先出）原则对元素进行排序。队列的头部是在队列中存在时间最长的元素。队列的尾部是在队列中存在时间最短的元素。新元素插入到队列的尾部，队列检索操作则是从队列头部开始获得元素。 ArrayBlockingQueue的常用方法：
JAVA版身份证获取性别、出生日期及年龄 bijian1013 java 性别出生日期年龄
工作中需要根据身份证获取性别、出生日期及年龄，且要还要支持15位长度的身份证号码，网上搜索了一下，经过测试好像多少存在点问题，干脆自已写一个。 CertificateNo.java package com.bijian.study; import java.util.Calendar; import
【Java范型六】范型与枚举 bit1129 java
首先，枚举类型的定义不能带有类型参数，所以，不能把枚举类型定义为范型枚举类，例如下面的枚举类定义是有编译错的 public enum EnumGenerics<T> { //编译错，提示枚举不能带有范型参数 OK, ERROR; public <T> T get(T type) { return null;
【Nginx五】Nginx常用日志格式含义 bit1129 nginx
1. log_format 1.1 log_format指令用于指定日志的格式，格式： log_format name(格式名称) type(格式样式) 1.2 如下是一个常用的Nginx日志格式： log_format main '[$time_local]|$request_time|$status|$body_bytes
Lua 语言 15 分钟快速入门 ronin47 lua 基础
- - 单行注释 - - [[ [多行注释] - - ]] - - - - - - - - - - - 1. 变量 & 控制流 - - - - - - - - - - num = 23 - - 数字都是双精度 str = 'aspythonstring'
java-35.求一个矩阵中最大的二维矩阵 ( 元素和最大 ) bylijinnan java
the idea is from: http://blog.csdn.net/zhanxinhang/article/details/6731134 public class MaxSubMatrix { /**see http://blog.csdn.net/zhanxinhang/article/details/6731134 * Q35 求一个矩阵中最大的二维
mongoDB文档型数据库特点开窍的石头 mongoDB文档型数据库特点
MongoDD: 文档型数据库存储的是Bson文档-->json的二进制特点：内部是执行引擎是js解释器，把文档转成Bson结构，在查询时转换成js对象。 mongoDB传统型数据库对比传统类型数据库：结构化数据，定好了表结构后每一个内容符合表结构的。也就是说每一行每一列的数据都是一样的文档型数据库：不用定好数据结构，
[毕业季节]欢迎广大毕业生加入JAVA程序员的行列 comsci java
一年一度的毕业季来临了。。。。。。。。正在投简历的学弟学妹们。。。如果觉得学校推荐的单位和公司不适合自己的兴趣和专业，可以考虑来我们软件行业，做一名职业程序员。。。软件行业的开发工具中，对初学者最友好的就是JAVA语言了，网络上不仅仅有大量的
PHP操作Excel – PHPExcel 基本用法详解 cuiyadll PHP Excel
导出excel属性设置//Include classrequire_once('Classes/PHPExcel.php');require_once('Classes/PHPExcel/Writer/Excel2007.php');$objPHPExcel = new PHPExcel();//Set properties 设置文件属性$objPHPExcel->getProperties
IBM Webshpere MQ Client User Issue (MCAUSER) darrenzhu IBM jms user MQ MCAUSER
IBM MQ JMS Client去连接远端MQ Server的时候，需要提供User和Password吗？答案是根据情况而定，取决于所定义的Channel里面的属性Message channel agent user identifier (MCAUSER)的设置。 http://stackoverflow.com/questions/20209429/how-mca-user-i
网线的接法 dcj3sjt126com
一、PC连HUB (直连线)A端：（标准568B）：白橙，橙，白绿，蓝，白蓝，绿，白棕，棕。 B端：（标准568B）：白橙，橙，白绿，蓝，白蓝，绿，白棕，棕。二、PC连PC （交叉线）A端：(568A)：白绿，绿，白橙，蓝，白蓝，橙，白棕，棕； B端：（标准568B）：白橙，橙，白绿，蓝，白蓝，绿，白棕，棕。三、HUB连HUB&nb
Vimium插件让键盘党像操作Vim一样操作Chrome dcj3sjt126com chrome vim
什么是键盘党？键盘党是指尽可能将所有电脑操作用键盘来完成，而不去动鼠标的人。鼠标应该说是新手们的最爱，很直观，指哪点哪，很听话！不过常常使用电脑的人，如果一直使用鼠标的话，手会发酸，因为操作鼠标的时候，手臂不是在一个自然的状态，臂肌会处于绷紧状态。而使用键盘则双手是放松状态，只有手指在动。而且尽量少的从鼠标移动到键盘来回操作，也省不少事。在chrome里安装 vimium 插件
MongoDB查询（2）——数组查询[六] eksliang mongodb MongoDB查询数组
MongoDB查询数组转载请出自出处：http://eksliang.iteye.com/blog/2177292 一、概述 MongoDB查询数组与查询标量值是一样的，例如，有一个水果列表，如下所示： > db.food.find() { "_id" : "001", "fruits" : [ "苹
cordova读写文件（1） gundumw100 JavaScript Cordova
使用cordova可以很方便的在手机sdcard中读写文件。首先需要安装cordova插件：file 命令为： cordova plugin add org.apache.cordova.file 然后就可以读写文件了，这里我先是写入一个文件，具体的JS代码为： var datas=null;//datas need write var directory=&
HTML5 FormData 进行文件jquery ajax 上传到又拍云 ileson jquery Ajax html5 FormData
html5 新东西：FormData 可以提交二进制数据。页面test.html <!DOCTYPE> <html> <head> <title> formdata file jquery ajax upload</title> </head> <body> <
swift appearanceWhenContainedIn:(version1.2 xcode6.4) 啸笑天 version
swift1.2中没有oc中对应的方法： + (instancetype)appearanceWhenContainedIn:(Class <UIAppearanceContainer>)ContainerClass, ... NS_REQUIRES_NIL_TERMINATION; 解决方法：在swift项目中新建oc类如下： #import &
java实现SMTP邮件服务器 macroli java 编程
电子邮件传递可以由多种协议来实现。目前，在Internet 网上最流行的三种电子邮件协议是SMTP、POP3 和 IMAP，下面分别简单介绍。　　◆ SMTP 协议　　简单邮件传输协议(Simple Mail Transfer Protocol,SMTP)是一个运行在TCP/IP之上的协议，用它发送和接收电子邮件。SMTP 服务器在默认端口25上监听。SMTP客户使用一组简单的、基于文本的
mongodb group by having where 查询sql qiaolevip 每天进步一点点学习永无止境 mongo 纵观千象
SELECT cust_id, SUM(price) as total FROM orders WHERE status = 'A' GROUP BY cust_id HAVING total > 250 db.orders.aggregate( [ { $match: { status: 'A' } }, { $group: {
Struts2 Pojo（六） Luob. POJO strust2
注意：附件中有完整案例 1.采用POJO对象的方法进行赋值和传值 2.web配置 <?xml version="1.0" encoding="UTF-8"?> <web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee&q
struts2步骤 wuai struts
1、添加jar包 2、在web.xml中配置过滤器 <filter> <filter-name>struts2</filter-name> <filter-class>org.apache.st