闵帆

日撸 Java 三百行（51-60天，kNN 与 NB）

第 51 天: kNN 分类器

kNN 的原始论文为: T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Transactions in Information Theory, IT-13, pages 21–27, 1967.

这个代码 300 行, 分三天完成. 今天先把代码抄完并运行, 明后天有修改程序的工作. 要求熟练掌握.
kNN 的特点:

简单. 没有学习过程, 也被称为惰性学习 lazy learning. 类似于开卷考试, 在已有数据中去找答案.
本源. 找相似, 正是人类认识事物的常用方法, 隐藏于人类或者其他动物的基因里面. 当然, 人类也会上当, 例如有人把邻居的滴水观音误认为是芋头, 偷食后中毒.
效果好. 永远不要小视 kNN, 对于很多数据, 你很难设计算法超越它.
适应性强. 可用于分类, 回归. 可用于各种数据.
可扩展性强. 设计不同的度量, 可获得意想不到的效果.
一般需要对数据归一化.
复杂度高. 这也是 kNN 最重要的缺点. 对于每一个测试数据, 复杂度为 $O ((m + k) n)$ , 其中 $n$ 为训练数据个数, $m$ 为条件属性个数, $k$ 为邻居个数. 代码见 computeNearests().

代码说明:
8. 两种距离度量.
9. 数据随机分割方式.
10. 间址的灵活使用: trainingSet 和 testingSet 都是整数数组, 表示下标.
11. arff 文件的读取. 需要 weka.jar 包.
12. 求邻居.
13. 投票.

package machinelearning.knn;

import java.io.FileReader;
import java.util.Arrays;
import java.util.Random;

import weka.core.*;

/**
 * kNN classification.
 * 
 * @author Fan Min [email protected].
 */
public class KnnClassification {

	/**
	 * Manhattan distance.
	 */
	public static final int MANHATTAN = 0;

	/**
	 * Euclidean distance.
	 */
	public static final int EUCLIDEAN = 1;

	/**
	 * The distance measure.
	 */
	public int distanceMeasure = EUCLIDEAN;

	/**
	 * A random instance;
	 */
	public static final Random random = new Random();

	/**
	 * The number of neighbors.
	 */
	int numNeighbors = 7;

	/**
	 * The whole dataset.
	 */
	Instances dataset;

	/**
	 * The training set. Represented by the indices of the data.
	 */
	int[] trainingSet;

	/**
	 * The testing set. Represented by the indices of the data.
	 */
	int[] testingSet;

	/**
	 * The predictions.
	 */
	int[] predictions;

	/**
	 *********************
	 * The first constructor.
	 * 
	 * @param paraFilename
	 *            The arff filename.
	 *********************
	 */
	public KnnClassification(String paraFilename) {
		try {
			FileReader fileReader = new FileReader(paraFilename);
			dataset = new Instances(fileReader);
			// The last attribute is the decision class.
			dataset.setClassIndex(dataset.numAttributes() - 1);
			fileReader.close();
		} catch (Exception ee) {
			System.out.println("Error occurred while trying to read \'" + paraFilename
					+ "\' in KnnClassification constructor.\r\n" + ee);
			System.exit(0);
		} // Of try
	}// Of the first constructor

	/**
	 *********************
	 * Get a random indices for data randomization.
	 * 
	 * @param paraLength
	 *            The length of the sequence.
	 * @return An array of indices, e.g., {4, 3, 1, 5, 0, 2} with length 6.
	 *********************
	 */
	public static int[] getRandomIndices(int paraLength) {
		int[] resultIndices = new int[paraLength];

		// Step 1. Initialize.
		for (int i = 0; i < paraLength; i++) {
			resultIndices[i] = i;
		} // Of for i

		// Step 2. Randomly swap.
		int tempFirst, tempSecond, tempValue;
		for (int i = 0; i < paraLength; i++) {
			// Generate two random indices.
			tempFirst = random.nextInt(paraLength);
			tempSecond = random.nextInt(paraLength);

			// Swap.
			tempValue = resultIndices[tempFirst];
			resultIndices[tempFirst] = resultIndices[tempSecond];
			resultIndices[tempSecond] = tempValue;
		} // Of for i

		return resultIndices;
	}// Of getRandomIndices

	/**
	 *********************
	 * Split the data into training and testing parts.
	 * 
	 * @param paraTrainingFraction
	 *            The fraction of the training set.
	 *********************
	 */
	public void splitTrainingTesting(double paraTrainingFraction) {
		int tempSize = dataset.numInstances();
		int[] tempIndices = getRandomIndices(tempSize);
		int tempTrainingSize = (int) (tempSize * paraTrainingFraction);

		trainingSet = new int[tempTrainingSize];
		testingSet = new int[tempSize - tempTrainingSize];

		for (int i = 0; i < tempTrainingSize; i++) {
			trainingSet[i] = tempIndices[i];
		} // Of for i

		for (int i = 0; i < tempSize - tempTrainingSize; i++) {
			testingSet[i] = tempIndices[tempTrainingSize + i];
		} // Of for i
	}// Of splitTrainingTesting

	/**
	 *********************
	 * Predict for the whole testing set. The results are stored in predictions.
	 * #see predictions.
	 *********************
	 */
	public void predict() {
		predictions = new int[testingSet.length];
		for (int i = 0; i < predictions.length; i++) {
			predictions[i] = predict(testingSet[i]);
		} // Of for i
	}// Of predict

	/**
	 *********************
	 * Predict for given instance.
	 * 
	 * @return The prediction.
	 *********************
	 */
	public int predict(int paraIndex) {
		int[] tempNeighbors = computeNearests(paraIndex);
		int resultPrediction = simpleVoting(tempNeighbors);

		return resultPrediction;
	}// Of predict

	/**
	 *********************
	 * The distance between two instances.
	 * 
	 * @param paraI
	 *            The index of the first instance.
	 * @param paraJ
	 *            The index of the second instance.
	 * @return The distance.
	 *********************
	 */
	public double distance(int paraI, int paraJ) {
		double resultDistance = 0;
		double tempDifference;
		switch (distanceMeasure) {
		case MANHATTAN:
			for (int i = 0; i < dataset.numAttributes() - 1; i++) {
				tempDifference = dataset.instance(paraI).value(i) - dataset.instance(paraJ).value(i);
				if (tempDifference < 0) {
					resultDistance -= tempDifference;
				} else {
					resultDistance += tempDifference;
				} // Of if
			} // Of for i
			break;

		case EUCLIDEAN:
			for (int i = 0; i < dataset.numAttributes() - 1; i++) {
				tempDifference = dataset.instance(paraI).value(i) - dataset.instance(paraJ).value(i);
				resultDistance += tempDifference * tempDifference;
			} // Of for i
			break;
		default:
			System.out.println("Unsupported distance measure: " + distanceMeasure);
		}// Of switch

		return resultDistance;
	}// Of distance

	/**
	 *********************
	 * Get the accuracy of the classifier.
	 * 
	 * @return The accuracy.
	 *********************
	 */
	public double getAccuracy() {
		// A double divides an int gets another double.
		double tempCorrect = 0;
		for (int i = 0; i < predictions.length; i++) {
			if (predictions[i] == dataset.instance(testingSet[i]).classValue()) {
				tempCorrect++;
			} // Of if
		} // Of for i

		return tempCorrect / testingSet.length;
	}// Of getAccuracy

	/**
	 ************************************
	 * Compute the nearest k neighbors. Select one neighbor in each scan. In
	 * fact we can scan only once. You may implement it by yourself.
	 * 
	 * @param paraK
	 *            the k value for kNN.
	 * @param paraCurrent
	 *            current instance. We are comparing it with all others.
	 * @return the indices of the nearest instances.
	 ************************************
	 */
	public int[] computeNearests(int paraCurrent) {
		int[] resultNearests = new int[numNeighbors];
		boolean[] tempSelected = new boolean[trainingSet.length];
		double tempMinimalDistance;
		int tempMinimalIndex = 0;

		// Compute all distances to avoid redundant computation.
		double[] tempDistances = new double[trainingSet.length];
		for (int i = 0; i < trainingSet.length; i ++) {
			tempDistances[i] = distance(paraCurrent, trainingSet[i]);
		}//Of for i
		
		// Select the nearest paraK indices.
		for (int i = 0; i < numNeighbors; i++) {
			tempMinimalDistance = Double.MAX_VALUE;

			for (int j = 0; j < trainingSet.length; j++) {
				if (tempSelected[j]) {
					continue;
				} // Of if

				if (tempDistances[j] < tempMinimalDistance) {
					tempMinimalDistance = tempDistances[j];
					tempMinimalIndex = j;
				} // Of if
			} // Of for j

			resultNearests[i] = trainingSet[tempMinimalIndex];
			tempSelected[tempMinimalIndex] = true;
		} // Of for i

		System.out.println("The nearest of " + paraCurrent + " are: " + Arrays.toString(resultNearests));
		return resultNearests;
	}// Of computeNearests

	/**
	 ************************************
	 * Voting using the instances.
	 * 
	 * @param paraNeighbors
	 *            The indices of the neighbors.
	 * @return The predicted label.
	 ************************************
	 */
	public int simpleVoting(int[] paraNeighbors) {
		int[] tempVotes = new int[dataset.numClasses()];
		for (int i = 0; i < paraNeighbors.length; i++) {
			tempVotes[(int) dataset.instance(paraNeighbors[i]).classValue()]++;
		} // Of for i

		int tempMaximalVotingIndex = 0;
		int tempMaximalVoting = 0;
		for (int i = 0; i < dataset.numClasses(); i++) {
			if (tempVotes[i] > tempMaximalVoting) {
				tempMaximalVoting = tempVotes[i];
				tempMaximalVotingIndex = i;
			} // Of if
		} // Of for i

		return tempMaximalVotingIndex;
	}// Of simpleVoting

	/**
	 *********************
	 * The entrance of the program.
	 * 
	 * @param args
	 *            Not used now.
	 *********************
	 */
	public static void main(String args[]) {
		KnnClassification tempClassifier = new KnnClassification("D:/data/iris.arff");
		tempClassifier.splitTrainingTesting(0.8);
		tempClassifier.predict();
		System.out.println("The accuracy of the classifier is: " + tempClassifier.getAccuracy());
	}// Of main

}// Of class KnnClassification

在 https://github.com/FanSmale/sampledata/ 可下载 iris.arff. 万一访问不畅, 把下面的内容拷贝另存成 iris.arff 即可.

@RELATION iris

@ATTRIBUTE sepallength	REAL
@ATTRIBUTE sepalwidth 	REAL
@ATTRIBUTE petallength 	REAL
@ATTRIBUTE petalwidth	REAL
@ATTRIBUTE class 	{Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica

第 52 天: kNN 分类器 (续)

重新实现 computeNearests, 仅需要扫描一遍训练集, 即可获得 $k$ 个邻居. 提示: 现代码与插入排序思想相结合. 其时间复杂度为 $O (kn)$ , 其中 $O (n)$ 用于扫描训练集, $O (k)$ 用于插入.
增加 setDistanceMeasure() 方法.
增加 setNumNeighors() 方法.

第 53 天: kNN 分类器 (续)

增加 weightedVoting() 方法, 距离越短话语权越大. 支持两种以上的加权方式.
实现 leave-one-out 测试.

第 54 天: 基于 M-distance 的推荐

这里夹带一点私货, 即论文 Mei Zheng, Fan Min, Heng-Ru Zhang, Wen-Bin Chen, Fast recommendations with the M-distance, IEEE Access 4 (2016) 1464–1468 的源代码. 点击下载论文.

评分表 (用户, 项目, 评分) 的压缩方式给出. 见 https://github.com/FanSmale/sampledata/ 中 movielens-943u1682m.txt.
前几行数据为:
0,0,5
0,1,3
0,2,4
0,3,3
0,4,3
0,5,5
0,6,4
…
1,0,4
1,9,2
1,12,4
其中, “0,2,4” 表示用户 0 对项目 2 的评分为 4. 用户 1 对项目 1、2 等的评分没有, 表示没看过该电影. 在用户数、项目数很多时, 必须使用压缩存储.
一篇论文的代码就这么一点点. 当然, 这篇论文本身很简单. 所谓 M-distance, 就是根据平均分来计算两个用户 (或项目) 之间的距离.
炫一下数学表达式. 令项目 $j$ 的平均分为 $x_{\cdot j}$ ,
采用 item-based recommendation, 则第 $j$ 个项目关于第 $i$ 个用户的邻居项目集合为
$N_{ij} = \{1 \leq j' \leq m | j' \neq j, p_{ij'} \neq 0, |\overline{r_{\cdot j}} - \overline{r_{\cdot j'}}| < \epsilon\} \tag{1}$
第 $i$ 个用户对 $j$ 个项目的评分预测为:
$p_{ij} = \frac{\sum_{j' \in N_{ij}} r_{ij'}}{|N_{ij}|} \tag{2}$
邻居不用 $k$ 控制. 距离小于 radius (即 $\epsilon$ ) 的都是邻居. 使用 M-distance 时, 这种方式效果更好.
使用 leave-one-out 的测试方式, 很高效的算法才能使用这种方式.

package machinelearning.knn;

/**
 * Recommendation with M-distance.
 * @author Fan Min [email protected].
 */

import java.io.*;

public class MBR {

	/**
	 * Default rating for 1-5 points.
	 */
	public static final double DEFAULT_RATING = 3.0;

	/**
	 * The total number of users.
	 */
	private int numUsers;

	/**
	 * The total number of items.
	 */
	private int numItems;

	/**
	 * The total number of ratings (non-zero values)
	 */
	private int numRatings;

	/**
	 * The predictions.
	 */
	private double[] predictions;

	/**
	 * Compressed rating matrix. User-item-rating triples.
	 */
	private int[][] compressedRatingMatrix;

	/**
	 * The degree of users (how many item he has rated).
	 */
	private int[] userDegrees;

	/**
	 * The average rating of the current user.
	 */
	private double[] userAverageRatings;

	/**
	 * The degree of users (how many item he has rated).
	 */
	private int[] itemDegrees;

	/**
	 * The average rating of the current item.
	 */
	private double[] itemAverageRatings;

	/**
	 * The first user start from 0. Let the first user has x ratings, the second
	 * user will start from x.
	 */
	private int[] userStartingIndices;

	/**
	 * Number of non-neighbor objects.
	 */
	private int numNonNeighbors;

	/**
	 * The radius (delta) for determining the neighborhood.
	 */
	private double radius;

	/**
	 ************************* 
	 * Construct the rating matrix.
	 * 
	 * @param paraRatingFilename
	 *            the rating filename.
	 * @param paraNumUsers
	 *            number of users
	 * @param paraNumItems
	 *            number of items
	 * @param paraNumRatings
	 *            number of ratings
	 ************************* 
	 */
	public MBR(String paraFilename, int paraNumUsers, int paraNumItems, int paraNumRatings) throws Exception {
		// Step 1. Initialize these arrays
		numItems = paraNumItems;
		numUsers = paraNumUsers;
		numRatings = paraNumRatings;

		userDegrees = new int[numUsers];
		userStartingIndices = new int[numUsers + 1];
		userAverageRatings = new double[numUsers];
		itemDegrees = new int[numItems];
		compressedRatingMatrix = new int[numRatings][3];
		itemAverageRatings = new double[numItems];

		predictions = new double[numRatings];

		System.out.println("Reading " + paraFilename);

		// Step 2. Read the data file.
		File tempFile = new File(paraFilename);
		if (!tempFile.exists()) {
			System.out.println("File " + paraFilename + " does not exists.");
			System.exit(0);
		} // Of if
		BufferedReader tempBufReader = new BufferedReader(new FileReader(tempFile));
		String tempString;
		String[] tempStrArray;
		int tempIndex = 0;
		userStartingIndices[0] = 0;
		userStartingIndices[numUsers] = numRatings;
		while ((tempString = tempBufReader.readLine()) != null) {
			// Each line has three values
			tempStrArray = tempString.split(",");
			compressedRatingMatrix[tempIndex][0] = Integer.parseInt(tempStrArray[0]);
			compressedRatingMatrix[tempIndex][1] = Integer.parseInt(tempStrArray[1]);
			compressedRatingMatrix[tempIndex][2] = Integer.parseInt(tempStrArray[2]);

			userDegrees[compressedRatingMatrix[tempIndex][0]]++;
			itemDegrees[compressedRatingMatrix[tempIndex][1]]++;

			if (tempIndex > 0) {
				// Starting to read the data of a new user.
				if (compressedRatingMatrix[tempIndex][0] != compressedRatingMatrix[tempIndex - 1][0]) {
					userStartingIndices[compressedRatingMatrix[tempIndex][0]] = tempIndex;
				} // Of if
			} // Of if
			tempIndex++;
		} // Of while
		tempBufReader.close();

		double[] tempUserTotalScore = new double[numUsers];
		double[] tempItemTotalScore = new double[numItems];
		for (int i = 0; i < numRatings; i++) {
			tempUserTotalScore[compressedRatingMatrix[i][0]] += compressedRatingMatrix[i][2];
			tempItemTotalScore[compressedRatingMatrix[i][1]] += compressedRatingMatrix[i][2];
		} // Of for i

		for (int i = 0; i < numUsers; i++) {
			userAverageRatings[i] = tempUserTotalScore[i] / userDegrees[i];
		} // Of for i
		for (int i = 0; i < numItems; i++) {
			itemAverageRatings[i] = tempItemTotalScore[i] / itemDegrees[i];
		} // Of for i
	}// Of the first constructor

	/**
	 ************************* 
	 * Set the radius (delta).
	 * 
	 * @param paraRadius
	 *            The given radius.
	 ************************* 
	 */
	public void setRadius(double paraRadius) {
		if (paraRadius > 0) {
			radius = paraRadius;
		} else {
			radius = 0.1;
		} // Of if
	}// Of setRadius

	/**
	 ************************* 
	 * Leave-one-out prediction. The predicted values are stored in predictions.
	 * 
	 * @see predictions
	 ************************* 
	 */
	public void leaveOneOutPrediction() {
		double tempItemAverageRating;
		// Make each line of the code shorter.
		int tempUser, tempItem, tempRating;
		System.out.println("\r\nLeaveOneOutPrediction for radius " + radius);

		numNonNeighbors = 0;
		for (int i = 0; i < numRatings; i++) {
			tempUser = compressedRatingMatrix[i][0];
			tempItem = compressedRatingMatrix[i][1];
			tempRating = compressedRatingMatrix[i][2];

			// Step 1. Recompute average rating of the current item.
			tempItemAverageRating = (itemAverageRatings[tempItem] * itemDegrees[tempItem] - tempRating)
					/ (itemDegrees[tempItem] - 1);

			// Step 2. Recompute neighbors, at the same time obtain the ratings
			// Of neighbors.
			int tempNeighbors = 0;
			double tempTotal = 0;
			int tempComparedItem;
			for (int j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++) {
				tempComparedItem = compressedRatingMatrix[j][1];
				if (tempItem == tempComparedItem) {
					continue;// Ignore itself.
				} // Of if

				if (Math.abs(tempItemAverageRating - itemAverageRatings[tempComparedItem]) < radius) {
					tempTotal += compressedRatingMatrix[j][2];
					tempNeighbors++;
				} // Of if
			} // Of for j

			// Step 3. Predict as the average value of neighbors.
			if (tempNeighbors > 0) {
				predictions[i] = tempTotal / tempNeighbors;
			} else {
				predictions[i] = DEFAULT_RATING;
				numNonNeighbors++;
			} // Of if
		} // Of for i
	}// Of leaveOneOutPrediction

	/**
	 ************************* 
	 * Compute the MAE based on the deviation of each leave-one-out.
	 * 
	 * @author Fan Min
	 ************************* 
	 */
	public double computeMAE() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += Math.abs(predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		return tempTotalError / predictions.length;
	}// Of computeMAE

	/**
	 ************************* 
	 * Compute the MAE based on the deviation of each leave-one-out.
	 * 
	 * @author Fan Min
	 ************************* 
	 */
	public double computeRSME() throws Exception {
		double tempTotalError = 0;
		for (int i = 0; i < predictions.length; i++) {
			tempTotalError += (predictions[i] - compressedRatingMatrix[i][2])
					* (predictions[i] - compressedRatingMatrix[i][2]);
		} // Of for i

		double tempAverage = tempTotalError / predictions.length;

		return Math.sqrt(tempAverage);
	}// Of computeRSME

	/**
	 ************************* 
	 * The entrance of the program.
	 * 
	 * @param args
	 *            Not used now.
	 ************************* 
	 */
	public static void main(String[] args) {
		try {
			MBR tempRecommender = new MBR("D:/data/movielens-943u1682m.txt", 943, 1682, 100000);

			for (double tempRadius = 0.2; tempRadius < 0.6; tempRadius += 0.1) {
				tempRecommender.setRadius(tempRadius);

				tempRecommender.leaveOneOutPrediction();
				double tempMAE = tempRecommender.computeMAE();
				double tempRSME = tempRecommender.computeRSME();

				System.out.println("Radius = " + tempRadius + ", MAE = " + tempMAE + ", RSME = " + tempRSME
						+ ", numNonNeighbors = " + tempRecommender.numNonNeighbors);
			} // Of for tempRadius
		} catch (Exception ee) {
			System.out.println(ee);
		} // Of try
	}// Of main
}// Of class MBR

第 55 天: 基于 M-distance 的推荐 (续)

昨天实现的是 item-based recommendation. 今天自己来实现一下 user-based recommendation. 只需要在原有基础上增加即可.
提示: 数据是按照用户优先存放的, 因此 item-based recommendation 时,

j = userStartingIndices[tempUser]; j < userStartingIndices[tempUser + 1]; j++

就可将 tempUser 的所有评分信息读入. 然而, user-based recommendation 没有这样的便利. 为解决该问题, 可以有两种方案:

将压缩矩阵转置, 用户与项目关系互换. 这种方案要增加相应的代码, 但复杂度低. 推荐使用.
扫描时不仅仅是连续的数据, 而是需要整个数据集. 这种方案实现简单, 但复杂度高.

第 56 天: kMeans 聚类

kMeans 是最常用的聚类算法.

kMeans 聚类需要中心点收敛时结束. 偷懒使用了 Arrays.equals()
数据集为 iris, 所以最后一个属性没使用. 如果对于没有决策属性的数据集, 需要进行相应修改.
数据没有归一化.
getRandomIndices() 和 kMeans 的完全相同, 拷贝过来. 本来应该写在 SimpleTools.java 里面的, 代码不多, 为保证独立性就放这里了.
distance() 和 kMeans 的相似, 注意不要用决策属性, 而且参数不同. 第 2 个参数为实数向量, 这是类为中心可能为虚拟的, 而中心点那里并没有对象.

package machinelearning.kmeans;

import java.io.FileReader;
import java.util.Arrays;
import java.util.Random;
import weka.core.Instances;

/**
 * kMeans clustering.
 * @author Fan Min [email protected].
 */
 public class KMeans {

	/**
	 * Manhattan distance.
	 */
	public static final int MANHATTAN = 0;

	/**
	 * Euclidean distance.
	 */
	public static final int EUCLIDEAN = 1;

	/**
	 * The distance measure.
	 */
	public int distanceMeasure = EUCLIDEAN;

	/**
	 * A random instance;
	 */
	public static final Random random = new Random();

	/**
	 * The data.
	 */
	Instances dataset;

	/**
	 * The number of clusters.
	 */
	int numClusters = 2;

	/**
	 * The clusters.
	 */
	int[][] clusters;

	/**
	 ******************************* 
	 * The first constructor.
	 * 
	 * @param paraFilename
	 *            The data filename.
	 ******************************* 
	 */
	public KMeans(String paraFilename) {
		dataset = null;
		try {
			FileReader fileReader = new FileReader(paraFilename);
			dataset = new Instances(fileReader);
			fileReader.close();
		} catch (Exception ee) {
			System.out.println("Cannot read the file: " + paraFilename + "\r\n" + ee);
			System.exit(0);
		} // Of try
	}// Of the first constructor

	/**
	 ******************************* 
	 * A setter.
	 ******************************* 
	 */
	public void setNumClusters(int paraNumClusters) {
		numClusters = paraNumClusters;
	}// Of the setter

	/**
	 *********************
	 * Get a random indices for data randomization.
	 * 
	 * @param paraLength
	 *            The length of the sequence.
	 * @return An array of indices, e.g., {4, 3, 1, 5, 0, 2} with length 6.
	 *********************
	 */
	public static int[] getRandomIndices(int paraLength) {
		int[] resultIndices = new int[paraLength];

		// Step 1. Initialize.
		for (int i = 0; i < paraLength; i++) {
			resultIndices[i] = i;
		} // Of for i

		// Step 2. Randomly swap.
		int tempFirst, tempSecond, tempValue;
		for (int i = 0; i < paraLength; i++) {
			// Generate two random indices.
			tempFirst = random.nextInt(paraLength);
			tempSecond = random.nextInt(paraLength);

			// Swap.
			tempValue = resultIndices[tempFirst];
			resultIndices[tempFirst] = resultIndices[tempSecond];
			resultIndices[tempSecond] = tempValue;
		} // Of for i

		return resultIndices;
	}// Of getRandomIndices

	/**
	 *********************
	 * The distance between two instances.
	 * 
	 * @param paraI
	 *            The index of the first instance.
	 * @param paraArray
	 *            The array representing a point in the space.
	 * @return The distance.
	 *********************
	 */
	public double distance(int paraI, double[] paraArray) {
		int resultDistance = 0;
		double tempDifference;
		switch (distanceMeasure) {
		case MANHATTAN:
			for (int i = 0; i < dataset.numAttributes() - 1; i++) {
				tempDifference = dataset.instance(paraI).value(i) - paraArray[i];
				if (tempDifference < 0) {
					resultDistance -= tempDifference;
				} else {
					resultDistance += tempDifference;
				} // Of if
			} // Of for i
			break;

		case EUCLIDEAN:
			for (int i = 0; i < dataset.numAttributes() - 1; i++) {
				tempDifference = dataset.instance(paraI).value(i) - paraArray[i];
				resultDistance += tempDifference * tempDifference;
			} // Of for i
			break;
		default:
			System.out.println("Unsupported distance measure: " + distanceMeasure);
		}// Of switch

		return resultDistance;
	}// Of distance

	/**
	 ******************************* 
	 * Clustering.
	 ******************************* 
	 */
	public void clustering() {
		int[] tempOldClusterArray = new int[dataset.numInstances()];
		tempOldClusterArray[0] = -1;
		int[] tempClusterArray = new int[dataset.numInstances()];
		Arrays.fill(tempClusterArray, 0);
		double[][] tempCenters = new double[numClusters][dataset.numAttributes() - 1];

		// Step 1. Initialize centers.
		int[] tempRandomOrders = getRandomIndices(dataset.numInstances());
		for (int i = 0; i < numClusters; i++) {
			for (int j = 0; j < tempCenters[0].length; j++) {
				tempCenters[i][j] = dataset.instance(tempRandomOrders[i]).value(j);
			} // Of for j
		} // Of for i

		int[] tempClusterLengths = null;
		while (!Arrays.equals(tempOldClusterArray, tempClusterArray)) {
			System.out.println("New loop ...");
			tempOldClusterArray = tempClusterArray;
			tempClusterArray = new int[dataset.numInstances()];

			// Step 2.1 Minimization. Assign cluster to each instance.
			int tempNearestCenter;
			double tempNearestDistance;
			double tempDistance;

			for (int i = 0; i < dataset.numInstances(); i++) {
				tempNearestCenter = -1;
				tempNearestDistance = Double.MAX_VALUE;

				for (int j = 0; j < numClusters; j++) {
					tempDistance = distance(i, tempCenters[j]);
					if (tempNearestDistance > tempDistance) {
						tempNearestDistance = tempDistance;
						tempNearestCenter = j;
					} // Of if
				} // Of for j
				tempClusterArray[i] = tempNearestCenter;
			} // Of for i

			// Step 2.2 Mean. Find new centers.
			tempClusterLengths = new int[numClusters];
			Arrays.fill(tempClusterLengths, 0);
			double[][] tempNewCenters = new double[numClusters][dataset.numAttributes() - 1];
			// Arrays.fill(tempNewCenters, 0);
			for (int i = 0; i < dataset.numInstances(); i++) {
				for (int j = 0; j < tempNewCenters[0].length; j++) {
					tempNewCenters[tempClusterArray[i]][j] += dataset.instance(i).value(j);
				} // Of for j
				tempClusterLengths[tempClusterArray[i]]++;
			} // Of for i

			// Step 2.3 Now average
			for (int i = 0; i < tempNewCenters.length; i++) {
				for (int j = 0; j < tempNewCenters[0].length; j++) {
					tempNewCenters[i][j] /= tempClusterLengths[i];
				} // Of for j
			} // Of for i

			System.out.println("Now the new centers are: " + Arrays.deepToString(tempNewCenters));
			tempCenters = tempNewCenters;
		} // Of while

		// Step 3. Form clusters.
		clusters = new int[numClusters][];
		int[] tempCounters = new int[numClusters];
		for (int i = 0; i < numClusters; i++) {
			clusters[i] = new int[tempClusterLengths[i]];
		} // Of for i

		for (int i = 0; i < tempClusterArray.length; i++) {
			clusters[tempClusterArray[i]][tempCounters[tempClusterArray[i]]] = i;
			tempCounters[tempClusterArray[i]]++;
		} // Of for i

		System.out.println("The clusters are: " + Arrays.deepToString(clusters));
	}// Of clustering

	/**
	 ******************************* 
	 * Clustering.
	 ******************************* 
	 */
	public static void testClustering() {
		KMeans tempKMeans = new KMeans("D:/data/iris.arff");
		tempKMeans.setNumClusters(3);
		tempKMeans.clustering();
	}// Of testClustering

	/**
	 ************************* 
	 * A testing method.
	 ************************* 
	 */
	public static void main(String arags[]) {
		testClustering();
	}// Of main

}// Of class KMeans

第 57 天: kMeans 聚类 (续)

获得虚拟中心后, 换成与其最近的点作为实际中心, 再聚类.
今天主要是想控制下节奏. 毕竟 kMeans 也值得两天的工作量.

第 58 天: 符号型数据的 NB 算法

Naive Bayes 是一种用后验概率公式推导出的算法. 它有一个独立性假设, 从数学上看起来不靠谱. 但从机器学习效果来说是不错的. 写程序之前, 先点击NB 算法 (包括符号型与数值型, 结合 Java 程序分析)进行学习.

所有的程序都在今天列出, 但今天只研究符号型数据的分类. 为此, 可以只抄符号型数据相关的方法 (从 main() 顺藤摸瓜开始有选择性地抄), 明天再抄数值型数据处理算法. 421 行的代码仅仅是测试训练与测试集不同的情况, 没有必要抄.
必须自己举一个小的例子 (如 10 个对象, 3 个条件属性, 2 个类别) 来辅助理解.
需要查阅相关基础知识.
需要理解三维数组每个维度的涵义: The conditional probabilities for all classes over all attributes on all values. 注意到三维数组不是规则的, 例如, 不同属性的属性值个数可能不同.
这里使用同样的数据进行训练和测试. 如果要划分训练集和测试集, 可参考 kNN 代码.
tempPseudoProbability 初始化为 0 就错了. 对于类平衡数据集没影响, 但不平衡的话效果就不对了. 在这个问题上输了 50 块钱, 害!

package datastructure.nb;

import java.io.FileReader;
import java.util.Arrays;
import java.util.Random;

import weka.core.*;

/**
 * The Naive Bayes algorithm.
 * 
 * @author Fan Min [email protected].
 */
public class NaiveBayes {
	/**
	 ************************* 
	 * An inner class to store parameters.
	 ************************* 
	 */
	private class GaussianParamters {
		double mu;
		double sigma;

		public GaussianParamters(double paraMu, double paraSigma) {
			mu = paraMu;
			sigma = paraSigma;
		}// Of the constructor

		public String toString() {
			return "(" + mu + ", " + sigma + ")";
		}// Of toString
	}// Of GaussianParamters

	/**
	 * The data.
	 */
	Instances dataset;

	/**
	 * The number of classes. For binary classification it is 2.
	 */
	int numClasses;

	/**
	 * The number of instances.
	 */
	int numInstances;

	/**
	 * The number of conditional attributes.
	 */
	int numConditions;

	/**
	 * The prediction, including queried and predicted labels.
	 */
	int[] predicts;

	/**
	 * Class distribution.
	 */
	double[] classDistribution;

	/**
	 * Class distribution with Laplacian smooth.
	 */
	double[] classDistributionLaplacian;

	/**
	 * To calculate the conditional probabilities for all classes over all
	 * attributes on all values.
	 */
	double[][][] conditionalCounts;

	/**
	 * The conditional probabilities with Laplacian smooth.
	 */
	double[][][] conditionalProbabilitiesLaplacian;

	/**
	 * The Guassian parameters.
	 */
	GaussianParamters[][] gaussianParameters;

	/**
	 * Data type.
	 */
	int dataType;

	/**
	 * Nominal.
	 */
	public static final int NOMINAL = 0;

	/**
	 * Numerical.
	 */
	public static final int NUMERICAL = 1;

	/**
	 ********************
	 * The constructor.
	 * 
	 * @param paraFilename
	 *            The given file.
	 ********************
	 */
	public NaiveBayes(String paraFilename) {
		dataset = null;
		try {
			FileReader fileReader = new FileReader(paraFilename);
			dataset = new Instances(fileReader);
			fileReader.close();
		} catch (Exception ee) {
			System.out.println("Cannot read the file: " + paraFilename + "\r\n" + ee);
			System.exit(0);
		} // Of try

		dataset.setClassIndex(dataset.numAttributes() - 1);
		numConditions = dataset.numAttributes() - 1;
		numInstances = dataset.numInstances();
		numClasses = dataset.attribute(numConditions).numValues();
	}// Of the constructor

	/**
	 ********************
	 * The constructor.
	 * 
	 * @param paraFilename
	 *            The given file.
	 ********************
	 */
	public NaiveBayes(Instances paraInstances) {
		dataset = paraInstances;

		dataset.setClassIndex(dataset.numAttributes() - 1);
		numConditions = dataset.numAttributes() - 1;
		numInstances = dataset.numInstances();
		numClasses = dataset.attribute(numConditions).numValues();
	}// Of the constructor

	/**
	 ********************
	 * Set the data type.
	 ********************
	 */
	public void setDataType(int paraDataType) {
		dataType = paraDataType;
	}// Of setDataType

	/**
	 ********************
	 * Calculate the class distribution with Laplacian smooth.
	 ********************
	 */
	public void calculateClassDistribution() {
		classDistribution = new double[numClasses];
		classDistributionLaplacian = new double[numClasses];

		double[] tempCounts = new double[numClasses];
		for (int i = 0; i < numInstances; i++) {
			int tempClassValue = (int) dataset.instance(i).classValue();
			tempCounts[tempClassValue]++;
		} // Of for i

		for (int i = 0; i < numClasses; i++) {
			classDistribution[i] = tempCounts[i] / numInstances;
			classDistributionLaplacian[i] = (tempCounts[i] + 1) / (numInstances + numClasses);
		} // Of for i

		System.out.println("Class distribution: " + Arrays.toString(classDistribution));
		System.out.println("Class distribution Laplacian: " + Arrays.toString(classDistributionLaplacian));
	}// Of calculateClassDistribution

	/**
	 ********************
	 * Calculate the conditional probabilities with Laplacian smooth. ONLY scan
	 * the dataset once. There was a simpler one, I have removed it because the
	 * time complexity is higher.
	 ********************
	 */
	public void calculateConditionalProbabilities() {
		conditionalCounts = new double[numClasses][numConditions][];
		conditionalProbabilitiesLaplacian = new double[numClasses][numConditions][];

		// Allocate space
		for (int i = 0; i < numClasses; i++) {
			for (int j = 0; j < numConditions; j++) {
				int tempNumValues = (int) dataset.attribute(j).numValues();
				conditionalCounts[i][j] = new double[tempNumValues];
				conditionalProbabilitiesLaplacian[i][j] = new double[tempNumValues];
			} // Of for j
		} // Of for i

		// Count the numbers
		int[] tempClassCounts = new int[numClasses];
		for (int i = 0; i < numInstances; i++) {
			int tempClass = (int) dataset.instance(i).classValue();
			tempClassCounts[tempClass]++;
			for (int j = 0; j < numConditions; j++) {
				int tempValue = (int) dataset.instance(i).value(j);
				conditionalCounts[tempClass][j][tempValue]++;
			} // Of for j
		} // Of for i

		// Now for the real probability with Laplacian
		for (int i = 0; i < numClasses; i++) {
			for (int j = 0; j < numConditions; j++) {
				int tempNumValues = (int) dataset.attribute(j).numValues();
				for (int k = 0; k < tempNumValues; k++) {
					conditionalProbabilitiesLaplacian[i][j][k] = (conditionalCounts[i][j][k] + 1)
							/ (tempClassCounts[i] + tempNumValues);
					// I wrote a bug here. This is an alternative approach,
					// however its performance is better in the mushroom dataset.
					// conditionalProbabilitiesLaplacian[i][j][k] =
					// (numInstances * conditionalCounts[i][j][k] + 1)
					// / (numInstances * tempClassCounts[i] + tempNumValues);
				} // Of for k
			} // Of for j
		} // Of for i

		System.out.println("Conditional probabilities: " + Arrays.deepToString(conditionalCounts));
	}// Of calculateConditionalProbabilities

	/**
	 ********************
	 * Calculate the conditional probabilities with Laplacian smooth.
	 ********************
	 */
	public void calculateGausssianParameters() {
		gaussianParameters = new GaussianParamters[numClasses][numConditions];

		double[] tempValuesArray = new double[numInstances];
		int tempNumValues = 0;
		double tempSum = 0;

		for (int i = 0; i < numClasses; i++) {
			for (int j = 0; j < numConditions; j++) {
				tempSum = 0;

				// Obtain values for this class.
				tempNumValues = 0;
				for (int k = 0; k < numInstances; k++) {
					if ((int) dataset.instance(k).classValue() != i) {
						continue;
					} // Of if

					tempValuesArray[tempNumValues] = dataset.instance(k).value(j);
					tempSum += tempValuesArray[tempNumValues];
					tempNumValues++;
				} // Of for k

				// Obtain parameters.
				double tempMu = tempSum / tempNumValues;

				double tempSigma = 0;
				for (int k = 0; k < tempNumValues; k++) {
					tempSigma += (tempValuesArray[k] - tempMu) * (tempValuesArray[k] - tempMu);
				} // Of for k
				tempSigma /= tempNumValues;
				tempSigma = Math.sqrt(tempSigma);

				gaussianParameters[i][j] = new GaussianParamters(tempMu, tempSigma);
			} // Of for j
		} // Of for i

		System.out.println(Arrays.deepToString(gaussianParameters));
	}// Of calculateGausssianParameters

	/**
	 ********************
	 * Classify all instances, the results are stored in predicts[].
	 ********************
	 */
	public void classify() {
		predicts = new int[numInstances];
		for (int i = 0; i < numInstances; i++) {
			predicts[i] = classify(dataset.instance(i));
		} // Of for i
	}// Of classify

	/**
	 ********************
	 * Classify an instances.
	 ********************
	 */
	public int classify(Instance paraInstance) {
		if (dataType == NOMINAL) {
			return classifyNominal(paraInstance);
		} else if (dataType == NUMERICAL) {
			return classifyNumerical(paraInstance);
		} // Of if

		return -1;
	}// Of classify

	/**
	 ********************
	 * Classify an instances with nominal data.
	 ********************
	 */
	public int classifyNominal(Instance paraInstance) {
		// Find the biggest one
		double tempBiggest = -10000;
		int resultBestIndex = 0;
		for (int i = 0; i < numClasses; i++) {
			double tempPseudoProbability = Math.log(classDistributionLaplacian[i]);
			for (int j = 0; j < numConditions; j++) {
				int tempAttributeValue = (int) paraInstance.value(j);

				tempPseudoProbability += Math.log(conditionalProbabilitiesLaplacian[i][j][tempAttributeValue]);
			} // Of for j

			if (tempBiggest < tempPseudoProbability) {
				tempBiggest = tempPseudoProbability;
				resultBestIndex = i;
			} // Of if
		} // Of for i

		return resultBestIndex;
	}// Of classifyNominal

	/**
	 ********************
	 * Classify an instances with numerical data.
	 ********************
	 */
	public int classifyNumerical(Instance paraInstance) {
		// Find the biggest one
		double tempBiggest = -10000;
		int resultBestIndex = 0;

		for (int i = 0; i < numClasses; i++) {
			double tempPseudoProbability = Math.log(classDistributionLaplacian[i]);
			for (int j = 0; j < numConditions; j++) {
				double tempAttributeValue = paraInstance.value(j);
				double tempSigma = gaussianParameters[i][j].sigma;
				double tempMu = gaussianParameters[i][j].mu;

				tempPseudoProbability += -Math.log(tempSigma)
						- (tempAttributeValue - tempMu) * (tempAttributeValue - tempMu) / (2 * tempSigma * tempSigma);
			} // Of for j

			if (tempBiggest < tempPseudoProbability) {
				tempBiggest = tempPseudoProbability;
				resultBestIndex = i;
			} // Of if
		} // Of for i

		return resultBestIndex;
	}// Of classifyNumerical

	/**
	 ********************
	 * Compute accuracy.
	 ********************
	 */
	public double computeAccuracy() {
		double tempCorrect = 0;
		for (int i = 0; i < numInstances; i++) {
			if (predicts[i] == (int) dataset.instance(i).classValue()) {
				tempCorrect++;
			} // Of if
		} // Of for i

		double resultAccuracy = tempCorrect / numInstances;
		return resultAccuracy;
	}// Of computeAccuracy

	/**
	 ************************* 
	 * Test nominal data.
	 ************************* 
	 */
	public static void testNominal() {
		System.out.println("Hello, Naive Bayes. I only want to test the nominal data.");
		String tempFilename = "D:/data/mushroom.arff";

		NaiveBayes tempLearner = new NaiveBayes(tempFilename);
		tempLearner.setDataType(NOMINAL);
		tempLearner.calculateClassDistribution();
		tempLearner.calculateConditionalProbabilities();
		tempLearner.classify();

		System.out.println("The accuracy is: " + tempLearner.computeAccuracy());
	}// Of testNominal

	/**
	 ************************* 
	 * Test numerical data.
	 ************************* 
	 */
	public static void testNumerical() {
		System.out.println("Hello, Naive Bayes. I only want to test the numerical data with Gaussian assumption.");
		// String tempFilename = "D:/data/iris.arff";
		String tempFilename = "D:/data/iris-imbalance.arff";

		NaiveBayes tempLearner = new NaiveBayes(tempFilename);
		tempLearner.setDataType(NUMERICAL);
		tempLearner.calculateClassDistribution();
		tempLearner.calculateGausssianParameters();
		tempLearner.classify();

		System.out.println("The accuracy is: " + tempLearner.computeAccuracy());
	}// Of testNumerical

	/**
	 ************************* 
	 * Test this class.
	 * 
	 * @param args
	 *            Not used now.
	 ************************* 
	 */
	public static void main(String[] args) {
		testNominal();
		testNumerical();
		// testNominal(0.8);
	}// Of main

	/**
	 *********************
	 * Get a random indices for data randomization.
	 * 
	 * @param paraLength
	 *            The length of the sequence.
	 * @return An array of indices, e.g., {4, 3, 1, 5, 0, 2} with length 6.
	 *********************
	 */
	public static int[] getRandomIndices(int paraLength) {
		Random random = new Random();
		int[] resultIndices = new int[paraLength];

		// Step 1. Initialize.
		for (int i = 0; i < paraLength; i++) {
			resultIndices[i] = i;
		} // Of for i

		// Step 2. Randomly swap.
		int tempFirst, tempSecond, tempValue;
		for (int i = 0; i < paraLength; i++) {
			// Generate two random indices.
			tempFirst = random.nextInt(paraLength);
			tempSecond = random.nextInt(paraLength);

			// Swap.
			tempValue = resultIndices[tempFirst];
			resultIndices[tempFirst] = resultIndices[tempSecond];
			resultIndices[tempSecond] = tempValue;
		} // Of for i

		return resultIndices;
	}// Of getRandomIndices

	/**
	 *********************
	 * Split the data into training and testing parts.
	 * 
	 * @param paraTrainingFraction
	 *            The fraction of the training set.
	 *********************
	 */
	public static Instances[] splitTrainingTesting(Instances paraDataset, double paraTrainingFraction) {
		int tempSize = paraDataset.numInstances();
		int[] tempIndices = getRandomIndices(tempSize);
		int tempTrainingSize = (int) (tempSize * paraTrainingFraction);

		// Empty datasets.
		Instances tempTrainingSet = new Instances(paraDataset);
		tempTrainingSet.delete();
		Instances tempTestingSet = new Instances(tempTrainingSet);

		for (int i = 0; i < tempTrainingSize; i++) {
			tempTrainingSet.add(paraDataset.instance(tempIndices[i]));
		} // Of for i

		for (int i = 0; i < tempSize - tempTrainingSize; i++) {
			tempTestingSet.add(paraDataset.instance(tempIndices[tempTrainingSize + i]));
		} // Of for i

		tempTrainingSet.setClassIndex(tempTrainingSet.numAttributes() - 1);
		tempTestingSet.setClassIndex(tempTestingSet.numAttributes() - 1);
		Instances[] resultInstanesArray = new Instances[2];
		resultInstanesArray[0] = tempTrainingSet;
		resultInstanesArray[1] = tempTestingSet;

		return resultInstanesArray;
	}// Of splitTrainingTesting

	/**
	 ********************
	 * Classify all instances, the results are stored in predicts[].
	 ********************
	 */
	public double classify(Instances paraTestingSet) {
		double tempCorrect = 0;
		int[] tempPredicts = new int[paraTestingSet.numInstances()];
		for (int i = 0; i < tempPredicts.length; i++) {
			tempPredicts[i] = classify(paraTestingSet.instance(i));
			if (tempPredicts[i] == (int) paraTestingSet.instance(i).classValue()) {
				tempCorrect++;
			} // Of if
		} // Of for i

		System.out.println("" + tempCorrect + " correct over " + tempPredicts.length + " instances.");
		double resultAccuracy = tempCorrect / tempPredicts.length;
		return resultAccuracy;
	}// Of classify

	/**
	 ************************* 
	 * Test nominal data.
	 ************************* 
	 */
	public static void testNominal(double paraTrainingFraction) {
		System.out.println("Hello, Naive Bayes. I only want to test the nominal data.");
		String tempFilename = "D:/data/mushroom.arff";
		// String tempFilename = "D:/data/voting.arff";

		Instances tempDataset = null;
		try {
			FileReader fileReader = new FileReader(tempFilename);
			tempDataset = new Instances(fileReader);
			fileReader.close();
		} catch (Exception ee) {
			System.out.println("Cannot read the file: " + tempFilename + "\r\n" + ee);
			System.exit(0);
		} // Of try

		Instances[] tempDatasets = splitTrainingTesting(tempDataset, paraTrainingFraction);
		NaiveBayes tempLearner = new NaiveBayes(tempDatasets[0]);
		tempLearner.setDataType(NOMINAL);
		tempLearner.calculateClassDistribution();
		tempLearner.calculateConditionalProbabilities();

		double tempAccuracy = tempLearner.classify(tempDatasets[1]);

		System.out.println("The accuracy is: " + tempAccuracy);
	}// Of testNominal
}// Of class NaiveBayes

第 59 天: 数值型数据的 NB 算法

今天把数值型数据处理的代码加上去.
假设所有属性的属性值都服从高斯分布. 也可以做其它假设.
将概率密度当成概率值直接使用 Bayes 公式.
可以看到, 数值型数据的处理并不会比符号型的复杂.

第 60 天: 小结

描述这 10 天的学习体会, 不少于 10 条.

你可能感兴趣的:(Java,程序设计基础,机器学习)

定义一个dto对象_正确理解DTO、值对象和POCO
今天推荐的文章比较技术化也比较简单，但是对于一些初学者而言，可能也是容易搞混的概念：就是如何理解DTO、值对象和POCO之间的区别。所谓DTO就是数据传输对象(DataTransferObject)，POCO就是简单CLR对象(PlainOldCLRObject)，概念来源于Java中的POJO；不过值对象(ValueObject)并非.NET中的值类型(ValueType)的实例对象，而是领域驱
java dto对象_DTO与值对象和POJO比较
本文想澄清DTO与ValueObject与POCO的区别，其中DTO代表数据传输对象，而POCO是PlainOldCLRObject，在Java环境中也称为POJO。对ValueObject做一个注释：C＃中有一个类似的概念，即ValueType。它只是对象如何存储在内存中的实现细节，我不打算触及它。这里将讨论的是DDD概念中的值对象ValueObject。DTO，ValueObject和POCO
实体对象辨析(POCO、Entity、Model、DTO、BO、DO、PO) weixin_33981932 runtime 数据库 java
为什么80%的码农都做不了架构师？>>>POCO(PlainOldCLRObject)源自JavaEE编程领域的POJO概念(2000年由MartinFowler提出)和POTS(PlainOldTelephoneService)概念。POCO被应用于面向.NET框架的CLR(CommonLanguageRuntime,公共语言运行时)。但是POCO本身不依赖于外部框架，它是PLAIN的。POCO
流量分发代码实战｜学会用JS控制用户访问路径 javascript前端重定向
转载：流量分发代码实战｜学会用JS控制用户访问路径-天海博客流量分发工具（TrafficDistributor），也被称为“跳转器”或“负载均衡器”，其主要功能是根据预设规则将访问者随机引导至不同的目标网站，常用于以下场景：黑帽SEO中的流量分散策略（需注意合规性）网站推广中的A/B测试广告落地页轮换投放多服务器负载分流SEO优化中避免单一域名过度引流今天我们一起来看看流量分发，不到百行的Java
Mybatis常见运行报错（持续更新...）
报错一：Causedby:org.yaml.snakeyaml.error.YAMLException:java.nio.charset.MalformedInputException:Inputlength=1解决办法：setting->Editor->FileEncodings,编码方式都选择UTF-8报错二：Cause:java.sql.SQLIntegrityConstraintViola
Java安全之JNI java软件安全
介绍JNI（JavaNativeInterface）是一种允许Java程序与本地代码（如C或C++）互操作的接口技术。通过JNI，Java程序能够调用本地代码，实现性能和功能上的优化，克服Java在某些场景下的内存管理和执行效率瓶颈。它使得开发者可以在Java应用中集成底层操作系统功能或使用已存在的高效本地库，从而提升应用的执行速度或访问硬件资源的能力。JNI基本知识本地库生命周期阶段触发条件关键
python+requests接口自动化测试框架实例详解教程锦都不二 python 开发语言
前段时间由于公司测试方向的转型，由原来的web页面功能测试转变成接口测试，之前大多都是手工进行，利用postman和jmeter进行的接口测试，后来，组内有人讲原先web自动化的测试框架移驾成接口的自动化框架，使用的是java语言，但对于一个学java，却在学python的我来说，觉得python比起java更简单些，所以，我决定自己写python的接口自动化测试框架，由于本人也是刚学习pytho
华为OD机试 2025B卷 - 书籍叠放 (C++ & Python & JAVA & JS & GO) 无限码力华为OD机试真题刷题笔记华为od 算法华为OD机试华为OD2025B卷华为机试2025B卷
书籍叠放华为OD机试真题目录:点击去查看2025B卷200分题型题目描述书籍的长、宽都是整数对应(l,w)。如果书A的长宽度都比B长宽大时，则允许将B排列放在A上面。现在有一组规格的书籍，书籍叠放时要求书籍不能做旋转，请计算最多能有多少个规格书籍能叠放在一起。输入描述输入：books=[[20,16],[15,11],[10,10],[9,10]]说明：总共4本书籍，第一本长度为20宽度为16；第
AJAX 学习凌辰揽月 javaweb学习添砖加瓦系列 ajax 学习 okhttp java javascript 前端
1.AJAX简介AJAX（AsynchronousJavaScriptAndXML）是一种用于创建交互式网页的技术，允许在不刷新页面的情况下与服务器进行通信，从而实现页面的局部更新。1.1AJAX的优点无需刷新页面：可以与服务器进行异步通信，无需重新加载整个页面。提升用户体验：页面更新更加流畅，用户操作不会被中断。减轻服务器负担：只传输必要的数据，而不是整个页面内容。1.2AJAX的缺点无浏览历史
华为OD机考2025B卷 - 最佳对手 / 实力差距最小总和（Java & Python& JS & C++ & C ）算法大师最新华为OD机试真题华为OD机试真题 (Java/JS/Py/C)java 华为od python javascript 华为OD机考2025B卷 c++
题目描述游戏里面，队伍通过匹配实力相近的对手进行对战。但是如果匹配的队伍实力相差太大，对于双方游戏体验都不会太好。给定n个队伍的实力值，对其进行两两实力匹配，两支队伍实例差距在允许的最大差距d内，则可以匹配。要求在匹配队伍最多的情况下匹配出的各组实力差距的总和最小。输入描述第一行，n，d。队伍个数n。允许的最大实力差距d。2<=n<=500<=d<=100第二行，n个队伍的实力值空格分割。0<=各
2025B卷最新华为OD机试持续收录中 - 真题题库清单，按考点划分(Python / JS / C++ / JAVA / C语言) 算法大师最新华为OD机试真题华为OD机试真题 (Java/JS/Py/C)华为od python javascript java c++
目前在考：华为OD统一考试2025B卷（2025年B卷）2025年5月9日，华为od机考已经从2025年A卷（2025A卷）切换到华为OD2025年B卷（2025B卷）,有人说是16号，实际上是9号全面切换到B卷。2025B卷是要比2025A卷要简单的，2025B卷考试题目是旧题复用+新题。2025华为OD统一考试2025B卷+2025A卷+E卷+C卷+D卷+B卷+A卷题库OJ链接最新华为OD机试
JavaScript 性能优化秘籍：从代码压缩到懒加载的技巧数字魔方操控师《JavaScript 通关指南：从新手到高手的蜕变》javascript 性能优化开发语言
引言在当今的Web开发领域，JavaScript无疑是最核心的技术之一。从简单的网页交互到复杂的单页应用（SPA），从前端界面渲染到后端服务器逻辑处理（如Node.js应用），JavaScript无处不在。然而，随着应用程序的功能日益丰富，代码规模不断膨胀，性能问题逐渐凸显。性能不佳的JavaScript代码会导致网页加载缓慢、交互卡顿，严重影响用户体验，甚至可能导致用户流失。因此，掌握JavaS
JavaScript 异步函数优化：提升性能和可读性喵手前端 javascript 开发语言 ecmascript
全文目录：开篇语**前言****1.使用`async/await`替代回调函数****示例：回调地狱vs`async/await`****回调地狱示例：****使用`async/await`改写：****优化要点：****2.使用`Promise.all`和`Promise.race`提高并发性能****`Promise.all`示例：****`Promise.race`示例：****优化要点：*
华为OD机考2025B卷 - 池化资源共享（Java & Python& JS & C++ & C ）算法大师最新华为OD机试真题华为OD机试真题 (Java/JS/Py/C)java 华为od python 华为OD机考2025B卷 javascript c++
最新华为OD机试真题目录：点击查看目录华为OD面试真题精选：点击立即查看2025华为od机试2025B卷-华为机考OD2025年B卷题目描述有一个局部互联区域内的n台设备，每台设备都有一定数量的空闲资源，这些资源可以池化共享。用户会发起两种操作：申请资源：输入1x，表示本次申请需要x个资源。系统要返回当前资源池中能满足此申请且剩余资源最少的设备ID；如果有多台设备满足条件，返回设备ID最小的；如果
获取文件的所在路径(windows和linux都适用) 爱吃土豆的马铃薯ㅤㅤㅤㅤㅤㅤㅤㅤㅤ windows linux 运维
使用类路径资源方式（已测试-推荐）：//将javaSettings.cfg放在src/main/resources/config目录下StringconfigPath=IatCapacity.class.getResource("/config/javaSettings.cfg").getPath();getParam(configPath);使用相对路径：//相对于项目根目录Stringconf
华为OD机试 2025 B卷 - 服务失效判断 (C++ & Python & JAVA & JS & GO) 无限码力华为OD机试真题刷题笔记华为od 华为OD机试华为OD机试 2025B卷华为OD2025B卷华为机试2025B卷华为OD机考2025B卷
服务失效判断华为OD机试真题目录点击查看:华为OD机试2025B卷真题题库目录｜机考题库+算法考点详解华为OD机试2025B卷100分题型题目描述某系统中有众多服务，每个服务用字符串（只包含字母和数字，长度<=10）唯一标识，服务间可能有依赖关系，如A依赖B，则当B故障时导致A也故障。依赖具有传递性，如A依赖B，B依赖C，当C故障时导致B故障，也导致A故障。给出所有依赖关系，以及当前已知故障服务，
Vue 3 + Element Plus 动态表单构建器组件实战教程 JaysonJin vue.js 前端 javascript
Vue3+ElementPlus动态表单构建器组件实战教程✅适用技术栈：Vue3++ElementPlus+JavaScript✅教学目标：封装一个可复用的动态表单组件VFormBuilder，支持插槽、动态字段、表单校验、组件映射、v-model双向绑定等功能。一、组件功能一览功能说明✅动态表单项渲染支持通过配置项动态渲染el-input,el-select等组件✅v-model双向绑定外部可使
Java---day2
七、IDEA开发工具一、下载IntelliJIDEA官网地址：IntelliJIDEA–theIDEforProJavaandKotlinDevelopment版本选择：版本说明CommunityEdition(CE)免费开源版本，适合Java、Kotlin、Android等基础开发。UltimateEdition(UE)商业版，支持更多语言和框架（如Spring、Python、JavaScrip
javascript引入方式你这个小猪 javaSE 前端HTML Java javascript
2，JavaScript引入方式JavaScript引入方式就是HTML和JavaScript的结合方式。JavaScript引入方式有两种：内部脚本：将JS代码定义在HTML页面中外部脚本：将JS代码定义在外部JS文件中，然后引入到HTML页面中2.1内部脚本在HTML中，JavaScript代码必须位于与标签之间代码如下：alert(数据)是JavaScript的一个方法，作用是将参数数据以浏
华为OD机试2024年E卷-猜数字[100分]（ Java | Python3 | C++ | C语言 | JsNode | Go）实现100%通过率梅花C 华为OD题库华为od
题目描述一个人设定一组四码的数字作为谜底，另一方猜。每猜一个数，出数者就要根据这个数字给出提示，提示以XAYB形式呈现，直到猜中位置。其中X表示位置正确的数的个数(数字正确且位置正确)，而Y表示数字正确而位置不对的数的个数。例如，当谜底为8123，而猜谜者猜1052时，出题者必须提示0A2B。例如，当谜底为5637，而猜谜者才4931时，出题者必须提示1A0B。当前已知N组猜谜者猜的数字与提示，如
【手写 Promise A+规范实现 - 从零开始构建异步编程基石】
手写PromiseA+规范实现-从零开始构建异步编程基石引言Promise是现代JavaScript异步编程的核心，而Promise/A+规范则是Promise实现的标准。本文将从最基础的概念开始，逐步实现一个完整的PromiseA+规范，帮助深入理解Promise的工作原理。什么是PromiseA+规范？规范定义PromiseA+规范（Promise/A+Specification）是一个开放标
JavaScript条件语句啥时候都困系列 JavaScript javascript
if-else类型第一关：if-else类型functionmainJs(a){a=parseInt(a);//请在此处编写代码/**********Begin**********/if(a<60){return"unpass";}else{return"pass";}/**********End**********/}switch类型第一关：switch类型functionmainJs(a){a
2025华为OD机试A卷-猜数字（JAVA、Python、JavaScript、C++、C）大厂面试小达人华为od java python
2025华为OD机试A卷-猜数字（JAVA、Python、JavaScript、C++、C）题目描述一个人设定一组四码的数字作为谜底，另一方猜。每猜一个数，出数者就要根据这个数字给出提示，提示以XAYB形式呈现，直到猜中位置。其中X表示位置正确的数的个数（数字正确且位置正确），而Y表示数字正确而位置不对的数的个数。例如，当谜底为8123，而猜谜者猜1052时，出题者必须提示0A2B。例如，当谜底为
CKEditor中粘贴复杂公式的最佳实践是什么？ M_Snow umeditor粘贴word ueditor粘贴word ueditor复制word ueditor上传word图片 ueditor导入word ueditor导入pdf ueditor导入ppt
要求：开源，免费，技术支持编辑器：ckeditor前端：vue2,vue3.vue-cli后端：asp,java,jsp,springboot,php,asp.net,.netcore功能：导入Word,导入Excel,导入PPT(PowerPoint),导入PDF,复制粘贴word,导入微信公众号内容,web截屏平台：Windows,macOS,Linux,RedHat,Ubuntu,CentO
JAVA面试宝典 -《性能优化实战：从代码到架构的调优》
文章目录《性能优化实战：从代码到架构的调优》引言：性能优化是系统的“生命线”第一部分：代码级优化（微观层面）1️⃣常见性能“坑”2️⃣GC调优入门3️⃣性能分析工具推荐️第二部分：数据库性能优化1️⃣SQL优化策略2️⃣利用缓存“兜底”3️⃣MyBatisPlus实例优化第三部分：服务层调优（中观层面）1️⃣接口耗时分析2️⃣异步与线程池优化3️⃣限流与熔断️第四部分：架构级优化（宏观层面）1️⃣
Java面向对象三大特性精华实战笔记：static、继承、多态与接口
文章目录Java面向对象三大特性精华实战笔记：static、继承、多态与接口一、static1.静态变量2.静态方法二、工具类1.Javabean类2.测试类3.工具类三、继承四、多态定义表现形式多态的前提多态的好处五、接口接口的定义和使用接口中成员的特点总结Java面向对象三大特性精华实战笔记：static、继承、多态与接口一、static在public后加上static表示老师名字这个属性被所
个人总结 - LangChain4j应用（1）艾露z AI java langchain ai 人工智能
个人总结-LangChain4j应用（1）github：Releases·langchain4j/langchain4j·GitHub官方文档：Introduction|LangChain4j简要介绍：LangChain4j是一个旨在简化大语言模型（LLMs）与Java应用程序集成的框架。ChatandLanguageModels：LanguageModel：最简单的聊天模型，简单的接收字符串，不
java 对象属性转list_java将对象列表中的某个属性转换成List或Map weixin_39936558 java 对象属性转list
/***@Description对象属性操作工具类*@Packagecom.viathink.msswms.sample.utils.PropertiesUtils.java*@authorLiuJunGuang*@date2012-5-11下午1:54:08*@versionV1.0*/publicclassPropertiesUtils{/***根据对象列表和对象的某个属性返回属性的List集
lamda list 分组_Java Lambda 方式将List按照对象属性值分组成Map weixin_39874881 lamda list 分组
JavaLambda方式将List按照对象属性值分组成Map有时候，需要对一个List结果集进行分组处理(按照对象中的某一个属性值进行分组)例如：使用三国英雄的所属国家，进行分组英雄。1、英雄实体类(Hero)publicclassHero{privateStringname;privateStringcountry;publicHero(Stringname,Stringcountry){thi
Java8 stream特性之一：List转Map方案（返回某个属性或对象本身） m0_67392811 java 后端 hadoop 大数据
Stream将List转换为Map，使用Collectors.toMap方法进行转换背景：User类，类中分别有id，name,age三个属性。List集合,userList，存储User对象1、指定key-value，value是对象中的某个属性值。?MapuserMap1=userList.stream().collect(Collectors.toMap(User::getId,User::
linux系统服务器下jsp传参数乱码 3213213333332132 java jsp linux windows xml
在一次解决乱码问题中，发现jsp在windows下用js原生的方法进行编码没有问题，但是到了linux下就有问题， escape,encodeURI,encodeURIComponent等都解决不了问题但是我想了下既然原生的方法不行，我用el标签的方式对中文参数进行加密解密总该可以吧。于是用了java的java.net.URLDecoder,结果还是乱码，最后在绝望之际，用了下面的方法解决了
Spring 注解区别以及应用 BlueSkator spring
1. @Autowired @Autowired是根据类型进行自动装配的。如果当Spring上下文中存在不止一个UserDao类型的bean，或者不存在UserDao类型的bean，会抛出 BeanCreationException异常，这时可以通过在该属性上再加一个@Qualifier注解来声明唯一的id解决问题。 2. @Qualifier 当spring中存在至少一个匹
printf和sprintf的应用 dcj3sjt126com PHP sprintf printf
<?php printf('b: %b c: %c d: %d <bf>f: %f', 80,80, 80, 80); echo ' '; printf('%0.2f %+d %0.2f ', 8, 8, 1235.456); printf('th
config.getInitParameter 171815164 parameter
web.xml <servlet> <servlet-name>servlet1</servlet-name> <jsp-file>/index.jsp</jsp-file> <init-param> <param-name>str</param-name>
Ant标签详解--基础操作 g21121 ant
Ant的一些核心概念： build.xml：构建文件是以XML 文件来描述的，默认构建文件名为build.xml。 project：每个构建文
[简单]代码片段_数据合并 53873039oycg 代码
合并规则:删除家长phone为空的记录,若一个家长对应多个孩子,保留一条家长记录,家长id修改为phone,对应关系也要修改。代码如下:
java 通信技术云端月影 Java 远程通信技术
在分布式服务框架中，一个最基础的问题就是远程服务是怎么通讯的，在Java领域中有很多可实现远程通讯的技术，例如：RMI、MINA、ESB、Burlap、Hessian、SOAP、EJB和JMS等，这些名词之间到底是些什么关系呢，它们背后到底是基于什么原理实现的呢，了解这些是实现分布式服务框架的基础知识，而如果在性能上有高的要求的话，那深入了解这些技术背后的机制就是必须的了，在这篇blog中我们将来
string与StringBuilder 性能差距到底有多大 aijuans
之前也看过一些对string与StringBuilder的性能分析，总感觉这个应该对整体性能不会产生多大的影响，所以就一直没有关注这块！由于学程序初期最先接触的string拼接，所以就一直没改变过自己的习惯！
今天碰到 java.util.ConcurrentModificationException 异常 antonyup_2006 java 多线程工作 IBM
今天改bug，其中有个实现是要对map进行循环，然后有删除操作，代码如下： Iterator<ListItem> iter = ItemMap.keySet.iterator(); while(iter.hasNext()){ ListItem it = iter.next(); //...一些逻辑操作 ItemMap.remove(it); } 结果运行报Con
PL/SQL的类型和JDBC操作数据库百合不是茶 PL/SQL表标量类型游标 PL/SQL记录
PL/SQL的标量类型: 字符,数字,时间,布尔,%type五中类型的 --标量：数据库中预定义类型的变量 --定义一个变长字符串 v_ename varchar2(10); --定义一个小数,范围 -9999.99~9999.99 v_sal number(6,2); --定义一个小数并给一个初始值为5.4 :=是pl/sql的赋值号
Mockito：一个强大的用于 Java 开发的模拟测试框架实例 bijian1013 mockito 单元测试
Mockito框架： Mockito是一个基于MIT协议的开源java测试框架。 Mockito区别于其他模拟框架的地方主要是允许开发者在没有建立“预期”时验证被测系统的行为。对于mock对象的一个评价是测试系统的测
精通Oracle10编程SQL(10)处理例外 bijian1013 oracle 数据库 plsql
/* *处理例外 */ --例外简介 --处理例外-传递例外 declare v_ename emp.ename%TYPE; begin SELECT ename INTO v_ename FROM emp where empno=&no; dbms_output.put_line('雇员名：'||v_ename); exceptio
【Java】Java执行远程机器上Linux命令 bit1129 linux命令
Java使用ethz通过ssh2执行远程机器Linux上命令，封装定义Linux机器的环境信息 package com.tom; import java.io.File; public class Env { private String hostaddr; //Linux机器的IP地址 private Integer po
java通信之Socket通信基础白糖_ java socket 网络协议
正处于网络环境下的两个程序，它们之间通过一个交互的连接来实现数据通信。每一个连接的通信端叫做一个Socket。一个完整的Socket通信程序应该包含以下几个步骤： ①创建Socket； ②打开连接到Socket的输入输出流； ④按照一定的协议对Socket进行读写操作； ④关闭Socket。 Socket通信分两部分：服务器端和客户端。服务器端必须优先启动，然后等待soc
angular.bind boyitech AngularJS angular.bind AngularJS API bind
angular.bind 描述：上下文，函数以及参数动态绑定，返回值为绑定之后的函数. 其中args是可选的动态参数，self在fn中使用this调用。使用方法： angular.bind(se
java-13个坏人和13个好人站成一圈，数到7就从圈里面踢出一个来，要求把所有坏人都给踢出来，所有好人都留在圈里。请找出初始时坏人站的位置。 bylijinnan java
import java.util.ArrayList; import java.util.List; public class KickOutBadGuys { /** * 题目：13个坏人和13个好人站成一圈，数到7就从圈里面踢出一个来，要求把所有坏人都给踢出来，所有好人都留在圈里。请找出初始时坏人站的位置。 * Maybe you can find out
Redis.conf配置文件及相关项说明（自查备用） Kai_Ge redis
Redis.conf配置文件及相关项说明 # Redis configuration file example # Note on units: when memory size is needed, it is possible to specifiy # it in the usual form of 1k 5GB 4M and so forth: #
[强人工智能]实现大规模拓扑分析是实现强人工智能的前奏 comsci 人工智能
真不好意思,各位朋友...博客再次更新... 节点数量太少,网络的分析和处理能力肯定不足,在面对机器人控制的需求方面,显得力不从心.... 但是,节点数太多,对拓扑数据处理的要求又很高,设计目标也很高,实现起来难度颇大...
记录一些常用的函数 dai_lm java
public static String convertInputStreamToString(InputStream is) { StringBuilder result = new StringBuilder(); if (is != null) try { InputStreamReader inputReader = new InputStreamRead
Hadoop中小规模集群的并行计算缺陷 datamachine mapreduce hadoop 并行计算
注：写这篇文章的初衷是因为Hadoop炒得有点太热，很多用户现有数据规模并不适用于Hadoop，但迫于扩容压力和去IOE（Hadoop的廉价扩展的确非常有吸引力）而尝试。尝试永远是件正确的事儿，但有时候不用太突进，可以调优或调需求，发挥现有系统的最大效用为上策。 -----------------------------------------------------------------
小学4年级英语单词背诵第二课 dcj3sjt126com english word
egg 蛋 twenty 二十 any 任何 well 健康的，好 twelve 十二 farm 农场 every 每一个 back 向后，回 fast 快速的 whose 谁的 much 许多 flower 花 watch 手表 very 非常，很 sport 运动 Chinese 中国的
自己实践了github的webhooks, linux上面的权限需要注意 dcj3sjt126com github webhook
环境, 阿里云服务器 1. 本地创建项目, push到github服务器上面 2. 生成www用户的密钥 sudo -u www ssh-keygen -t rsa -C "[email protected]" 3. 将密钥添加到github帐号的SSH_KEYS里面 3. 用www用户执行克隆, 源使
Java冒泡排序蕃薯耀冒泡排序 Java冒泡排序 Java排序
冒泡排序 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月23日 10:40:14 星期二 http://fanshuyao.iteye.com/
Excle读取数据转换为实体List【基于apache-poi】 hanqunfeng apache
1.依赖apache-poi 2.支持xls和xlsx 3.支持按属性名称绑定数据值 4.支持从指定行、列开始读取 5.支持同时读取多个sheet 6.具体使用方式参见org.cpframework.utils.excelreader.CP_ExcelReaderUtilTest.java 比如： Str
3个处于草稿阶段的Javascript API介绍 jackyrong JavaScript
原文： http://www.sitepoint.com/3-new-javascript-apis-may-want-follow/?utm_source=html5weekly&utm_medium=email 本文中，介绍3个仍然处于草稿阶段，但应该值得关注的Javascript API. 1) Web Alarm API &
6个创建Web应用程序的高效PHP框架 lampcy Web 框架 PHP
以下是创建Web应用程序的PHP框架，有coder bay网站整理推荐： 1. CakePHP CakePHP是一个PHP快速开发框架，它提供了一个用于开发、维护和部署应用程序的可扩展体系。CakePHP使用了众所周知的设计模式，如MVC和ORM，降低了开发成本，并减少了开发人员写代码的工作量。 2. CodeIgniter CodeIgniter是一个非常小且功能强大的PHP框架，适合需
评"救市后中国股市新乱象泛起"谣言 nannan408
首先来看百度百家一位易姓作者的新闻：三个多星期来股市持续暴跌，跌得投资者及上市公司都处于极度的恐慌和焦虑中，都要寻找自保及规避风险的方式。面对股市之危机，政府突然进入市场救市，希望以此来重建市场信心，以此来扭转股市持续暴跌的预期。而政府进入市场后，由于市场运作方式发生了巨大变化，投资者及上市公司为了自保及为了应对这种变化，中国股市新的乱象也自然产生。首先，中国股市这两天
页面全屏遮罩的实现方式 Rainbow702 html css 遮罩 mask
之前做了一个页面，在点击了某个按钮之后，要求页面出现一个全屏遮罩，一开始使用了position:absolute来实现的。当时因为画面大小是固定的，不可以resize的，所以，没有发现问题。最近用了同样的做法做了一个遮罩，但是画面是可以进行resize的，所以就发现了一个问题，当画面被reisze到浏览器出现了滚动条的时候，就发现，用absolute 的做法是有问题的。后来改成fixed定位就
关于angularjs的点滴 tntxia AngularJS
angular是一个新兴的JS框架，和以往的框架不同的事，Angularjs更注重于js的建模，管理，同时也提供大量的组件帮助用户组建商业化程序，是一种值得研究的JS框架。 Angularjs使我们可以使用MVC的模式来写JS。Angularjs现在由谷歌来维护。这里我们来简单的探讨一下它的应用。首先使用Angularjs我
Nutz--->>反复新建ioc容器的后果 xiaoxiao1992428 DAO mvc IOC nutz
问题： public class DaoZ { public static Dao dao() { // 每当需要使用dao的时候就取一次 Ioc ioc = new NutIoc(new JsonLoader("dao.js")); return ioc.get(