Lu597203933

Lire源码解析一

Lucene image retrieval是以图搜图的java开源框架，这几天没什么事，就读了点源码，并写了点注释，特在这分享给大家。

这里主要给出的是BOVWBuilder.java、Kmeans.java及Cluster.java。就是用词频对特征进行编码，用到是BOF(bag of feature)模型，原理就是提取N张图片的特征(比如sift),放在一起就可以得到矩阵，然后对矩阵进行kmeans聚类，就会到到若干个聚类中心；对于新来的一副图像，我们分别计算该特征点与那个聚类中心最近，这样该聚类中心的量值就加1，这样就可以编码得到与聚类中心个数想等的维数向量。

一切都从BOVWBuilder中index函数开始...

BOVWBuilder.java(包含注释)

package lmc.imageretrieval.imageanalysis.bovw;

import java.io.File;
import java.io.IOException;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedList;

import javax.swing.ProgressMonitor;

import lmc.imageretrieval.imageanalysis.Histogram;
import lmc.imageretrieval.imageanalysis.LireFeature;
import lmc.imageretrieval.tools.DocumentBuilder;
import lmc.imageretrieval.utils.SerializationUtils;

import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.index.Term;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.Version;

public class BOVWBuilder {
    IndexReader reader;
    // number of documents used to build the vocabulary / clusters.
    private int numDocsForVocabulary = 500;
    private int numClusters = 512;
    private Cluster[] clusters = null;
    DecimalFormat df = (DecimalFormat) NumberFormat.getNumberInstance();
    private ProgressMonitor pm = null;

    protected LireFeature lireFeature;
    protected String localFeatureFieldName;
    protected String visualWordsFieldName;
    protected String localFeatureHistFieldName;
    protected String clusterFile;

    public static boolean DELETE_LOCAL_FEATURES = true;
    /**
     *
     * @param reader
     * @deprecated
     */
    public BOVWBuilder(IndexReader reader) {
        this.reader = reader;
    }

    /**
     * Creates a new instance of the BOVWBuilder using the given reader. The numDocsForVocabulary
     * indicates how many documents of the index are used to build the vocabulary (clusters).
     *
     * @param reader               the reader used to open the Lucene index,
     * @param numDocsForVocabulary gives the number of documents for building the vocabulary (clusters).
     * @deprecated
     */
    public BOVWBuilder(IndexReader reader, int numDocsForVocabulary) {
        this.reader = reader;
        this.numDocsForVocabulary = numDocsForVocabulary;
    }

    /**
     * Creates a new instance of the BOVWBuilder using the given reader. The numDocsForVocabulary
     * indicates how many documents of the index are used to build the vocabulary (clusters). The numClusters gives
     * the number of clusters k-means should find. Note that this number should be lower than the number of features,
     * otherwise an exception will be thrown while indexing.
     *
     * @param reader               the index reader
     * @param numDocsForVocabulary the number of documents that should be sampled for building the visual vocabulary
     * @param numClusters          the size of the visual vocabulary
     * @deprecated
     */
    public BOVWBuilder(IndexReader reader, int numDocsForVocabulary, int numClusters) {
        this.numDocsForVocabulary = numDocsForVocabulary;
        this.numClusters = numClusters;
        this.reader = reader;
    }

    /**
     * Creates a new instance of the BOVWBuilder using the given reader. TODO: write
     *
     * @param reader               the index reader
     * @param lireFeature          lireFeature used
     */
    public BOVWBuilder(IndexReader reader, LireFeature lireFeature) {
        this.reader = reader;
        this.lireFeature = lireFeature;
    }

    /**
     * Creates a new instance of the BOVWBuilder using the given reader. The numDocsForVocabulary
     * indicates how many documents of the index are used to build the vocabulary (clusters).
     * TODO: write
     *
     * @param reader               the index reader
     * @param lireFeature          lireFeature used
     * @param numDocsForVocabulary the number of documents that should be sampled for building the visual vocabulary
     */
    public BOVWBuilder(IndexReader reader, LireFeature lireFeature, int numDocsForVocabulary) {
        this.numDocsForVocabulary = numDocsForVocabulary;
        this.reader = reader;
        this.lireFeature = lireFeature;
    }

    /**
     * Creates a new instance of the BOVWBuilder using the given reader. The numDocsForVocabulary
     * indicates how many documents of the index are used to build the vocabulary (clusters). The numClusters gives
     * the number of clusters k-means should find. Note that this number should be lower than the number of features,
     * otherwise an exception will be thrown while indexing. TODO: write
     *
     * @param reader               the index reader
     * @param lireFeature          lireFeature used
     * @param numDocsForVocabulary the number of documents that should be sampled for building the visual vocabulary
     * @param numClusters          the size of the visual vocabulary
     */
    public BOVWBuilder(IndexReader reader, LireFeature lireFeature, int numDocsForVocabulary, int numClusters) {
        this.numDocsForVocabulary = numDocsForVocabulary;
        this.numClusters = numClusters;
        this.reader = reader;
        this.lireFeature = lireFeature;
    }

    protected void init() {
        localFeatureFieldName = lireFeature.getFieldName();
        visualWordsFieldName = lireFeature.getFieldName() + DocumentBuilder.FIELD_NAME_BOVW;
        localFeatureHistFieldName = lireFeature.getFieldName()+ DocumentBuilder.FIELD_NAME_BOVW_VECTOR;
        clusterFile = "./clusters-bovw" + lireFeature.getFeatureName() +  ".dat";
    }

    /**
     * Uses an existing index, where each and every document should have a set of local features. A number of
     * random images (numDocsForVocabulary) is selected and clustered to get a vocabulary of visual words
     * (the cluster means). For all images a histogram on the visual words is created and added to the documents.
     * Pre-existing histograms are deleted, so this method can be used for re-indexing.
     *
     * @throws java.io.IOException
     */
    public void index() throws IOException {
        init();
        df.setMaximumFractionDigits(3);
        // find the documents for building the vocabulary:
        HashSet<Integer> docIDs = selectVocabularyDocs();    //选择全部要进行聚类的文档docment的id
        KMeans k = new KMeans(numClusters);
        // fill the KMeans object:
        LinkedList<double[]> features = new LinkedList<double[]>();
        // Needed for check whether the document is deleted.
        Bits liveDocs = MultiFields.getLiveDocs(reader);
        for (Iterator<Integer> iterator = docIDs.iterator(); iterator.hasNext(); ) {
            int nextDoc = iterator.next();
            if (reader.hasDeletions() && !liveDocs.get(nextDoc)) continue; // if it is deleted, just ignore it.
            Document d = reader.document(nextDoc);   // 取出该文档
            features.clear();
            IndexableField[] fields = d.getFields(localFeatureFieldName);   // 取出sift特征点
            String file = d.getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0];   // 取出该图片路径名字
            for (int j = 0; j < fields.length; j++) {
                LireFeature f = getFeatureInstance();
                // 取出descriptor
                f.setByteArrayRepresentation(fields[j].binaryValue().bytes, fields[j].binaryValue().offset, fields[j].binaryValue().length);
                // copy the data over to new array ...  没有用
                //double[] feat = new double[f.getDoubleHistogram().length];
                //System.arraycopy(f.getDoubleHistogram(), 0, feat, 0, feat.length);
                features.add(f.getDoubleHistogram());
            }
            k.addImage(file, features);    // 将descriptor与图片相关联
        }
        if (pm != null) { // set to 5 of 100 before clustering starts.
            pm.setProgress(5);
            pm.setNote("Starting clustering");
        }
        if (k.getFeatureCount() < numClusters) {    // 总的特征数小于聚类中心个数，则抛出异常
            // this cannot work. You need more data points than clusters.
            throw new UnsupportedOperationException("Only " + features.size() + " features found to cluster in " + numClusters + ". Try to use less clusters or more images.");
        }
        // do the clustering:
        System.out.println("Number of local features: " + df.format(k.getFeatureCount()));
        System.out.println("Starting clustering ...");
        k.init();        // 聚类中心初始化
        System.out.println("Step.");
        double time = System.currentTimeMillis();
        double laststress = k.clusteringStep();    // 进行聚类，并获得sum of squared error

        if (pm != null) { // set to 8 of 100 after first step.
            pm.setProgress(8);
            pm.setNote("Step 1 finished");
        }

        System.out.println(getDuration(time) + " -> Next step.");
        time = System.currentTimeMillis();
        double newStress = k.clusteringStep();    // 第二步聚类

        if (pm != null) { // set to 11 of 100 after second step.
            pm.setProgress(11);
            pm.setNote("Step 2 finished");
        }

        // critical part: Give the difference in between steps as a constraint for accuracy vs. runtime trade off.
        double threshold = Math.max(20d, (double) k.getFeatureCount() / 1000d);   // 如果两次sse小于20 迭代停止
        System.out.println("Threshold = " + df.format(threshold));
        int cstep = 3;
        while (Math.abs(newStress - laststress) > threshold && cstep < 12) {    // 迭代次数超过12次，迭代停止
            System.out.println(getDuration(time) + " -> Next step. Stress difference ~ |" + (int) newStress + " - " + (int) laststress + "| = " + df.format(Math.abs(newStress - laststress)));
            time = System.currentTimeMillis();
            laststress = newStress;
            newStress = k.clusteringStep();
            if (pm != null) { // set to XX of 100 after second step.
                pm.setProgress(cstep * 3 + 5);
                pm.setNote("Step " + cstep + " finished");
            }
            cstep++;
        }
        // Serializing clusters to a file on the disk ...
        clusters = k.getClusters();    // 得到聚类中心
//        for (int i = 0; i < clusters.length; i++) {
//            Cluster cluster = clusters[i];
//            System.out.print(cluster.getMembers().size() + ", ");
//        }
//        System.out.println();
        Cluster.writeClusters(clusters, clusterFile);  // 将聚类中心点写入文本文件
        //  create & store histograms:
        System.out.println("Creating histograms ...");
        time = System.currentTimeMillis();
//        int[] tmpHist = new int[numClusters];
        @SuppressWarnings("deprecation")
		IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_4_10_2,
                new WhitespaceAnalyzer(Version.LUCENE_4_10_2));
        conf.setOpenMode(OpenMode.CREATE_OR_APPEND);
        IndexWriter iw = new IndexWriter(((DirectoryReader) reader).directory(), conf);
        if (pm != null) { // set to 50 of 100 after clustering.
            pm.setProgress(50);
            pm.setNote("Clustering finished");
        }
        // parallelized indexing
        LinkedList<Thread> threads = new LinkedList<Thread>();  // 线程队列
        int numThreads = 8;     // 设置了8个线程
        // careful: copy reader to RAM for faster access when reading ...
//        reader = IndexReader.open(new RAMDirectory(reader.directory()), true);
        int step = reader.maxDoc() / numThreads;   // 对每个线程分配一定数量的任务
        for (int part = 0; part < numThreads; part++) {
            Indexer indexer = null;
            if (part < numThreads - 1) indexer = new Indexer(part * step, (part + 1) * step, iw, null);
            else indexer = new Indexer(part * step, reader.maxDoc(), iw, pm);
            Thread t = new Thread(indexer);
            threads.add(t);    
            t.start();
        }
        for (Iterator<Thread> iterator = threads.iterator(); iterator.hasNext(); ) {
            Thread next = iterator.next();
            try {
                next.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        if (pm != null) { // set to 50 of 100 after clustering.
            pm.setProgress(95);
            pm.setNote("Indexing finished, optimizing index now.");
        }

        System.out.println(getDuration(time));
        iw.commit();
        // this one does the "old" commit(), it removes the deleted SURF features.
        iw.forceMerge(1);
        iw.close();
        if (pm != null) { // set to 50 of 100 after clustering.
            pm.setProgress(100);
            pm.setNote("Indexing & optimization finished");
            pm.close();
        }
        System.out.println("Finished.");
    }

   // 此函数没有用
    public void indexMissing() throws IOException {
        init();
        // Reading clusters from disk:
        clusters = Cluster.readClusters(clusterFile);
        //  create & store histograms:
        System.out.println("Creating histograms ...");
        LireFeature f = getFeatureInstance();

        // Needed for check whether the document is deleted.
        Bits liveDocs = MultiFields.getLiveDocs(reader);

        // based on bug report from Einav Itamar <[email protected]>
        @SuppressWarnings("deprecation")
		IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_4_10_2,
                new WhitespaceAnalyzer(Version.LUCENE_4_10_2));
        IndexWriter iw = new IndexWriter(((DirectoryReader) reader).directory(), conf);
        for (int i = 0; i < reader.maxDoc(); i++) {
            if (reader.hasDeletions() && !liveDocs.get(i)) continue; // if it is deleted, just ignore it.
            Document d = reader.document(i);
            // Only if there are no values yet:
            if (d.getValues(visualWordsFieldName) == null || d.getValues(visualWordsFieldName).length == 0) {
                createVisualWords(d, f);
                // now write the new one. we use the identifier to update ;)
                iw.updateDocument(new Term(DocumentBuilder.FIELD_NAME_IDENTIFIER, d.getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0]), d);
            }
        }
        iw.commit();
        // added to permanently remove the deleted docs.
        iw.forceMerge(1);
        iw.close();
        System.out.println("Finished.");
    }

    /**
     * Takes one single document and creates the visual words and adds them to the document. The same document is returned.
     *
     * @param d the document to use for adding the visual words
     * @return
     * @throws IOException
     */
    public Document getVisualWords(Document d) throws IOException {     // 得到文档d所对应的bow特征
        clusters = Cluster.readClusters(clusterFile);   // 读入聚类中心
        LireFeature f = getFeatureInstance();     
        createVisualWords(d, f);    // 创建bow特征

        return d;
    }


    @SuppressWarnings("unused")    // 没有用了
	private void quantize(double[] histogram) {
        double max = 0;
        for (int i = 0; i < histogram.length; i++) {
            max = Math.max(max, histogram[i]);
        }
        for (int i = 0; i < histogram.length; i++) {
            histogram[i] = (int) Math.floor((histogram[i] * 128d) / max);
        }
    }

    /**
     * Find the appropriate cluster for a given feature.
     *
     * @param f
     * @return the index of the cluster.
     */
    private int clusterForFeature(Histogram f) {   // 找到一个特征点最近的聚类中心并返回该聚类中心的下标
        double distance = clusters[0].getDistance(f);
        double tmp;
        int result = 0;
        for (int i = 1; i < clusters.length; i++) {
            tmp = clusters[i].getDistance(f);
            if (tmp < distance) {
                distance = tmp;
                result = i;
            }
        }
        return result;
    }

    private String arrayToVisualWordString(double[] hist) {   // 以这种string类型进行存储，感觉没什么用啊
        StringBuilder sb = new StringBuilder(1024);
        for (int i = 0; i < hist.length; i++) {
            int visualWordIndex = (int) hist[i];
            for (int j = 0; j < visualWordIndex; j++) {
                // sb.append('v');
                sb.append(Integer.toHexString(i));
                sb.append(' ');
            }
        }
        return sb.toString();
    }
        // 选择图片进行聚类
    private HashSet<Integer> selectVocabularyDocs() throws IOException {
        // need to make sure that this is not running forever ...
        int loopCount = 0;
        float maxDocs = reader.maxDoc();    // 返回总文档数量
        int capacity = (int) Math.min(numDocsForVocabulary, maxDocs);
        if (capacity < 0) capacity = (int) (maxDocs / 2);   // 如果是-1 则选择一半文档
        HashSet<Integer> result = new HashSet<Integer>(capacity);
        int tmpDocNumber, tmpIndex;
        LinkedList<Integer> docCandidates = new LinkedList<Integer>();
        // three cases:
        //
        // either it's more or the same number as documents
        if (numDocsForVocabulary >= maxDocs) {   // 指定数量大于已有的，则将已有全部用来聚类
            for (int i = 0; i < maxDocs; i++) {
                result.add(i);
            }
            return result;
        } else if (numDocsForVocabulary >= maxDocs - 100) { // 在[maxDocs-100, maxDocs]之间，
            for (int i = 0; i < maxDocs; i++) {
                result.add(i);         // 先全部加入
            }
            while (result.size() > numDocsForVocabulary) {   // 随机踢出掉多余的图片，使数量为numDocForVocabulary
                result.remove((int) Math.floor(Math.random() * result.size()));
            }
            return result;
        } else {             // 不满足上面几种情况即numDocForVocabulary在[1, maxDocs-100]之间
            for (int i = 0; i < maxDocs; i++) {
                docCandidates.add(i);    // 先将全部加入
            }
            for (int r = 0; r < capacity; r++) { // capacity就等于numDocForVocabulary
                boolean worksFine = false;
                do {
                    tmpIndex = (int) Math.floor(Math.random() * (double) docCandidates.size());
                    tmpDocNumber = docCandidates.get(tmpIndex);
                    docCandidates.remove(tmpIndex);
                    // 该文档是否存在及是否已经包含
                    // check if the selected doc number is valid: not null, not deleted and not already chosen.
                    worksFine = (reader.document(tmpDocNumber) != null) && !result.contains(tmpDocNumber);
                } while (!worksFine);
                result.add(tmpDocNumber);
                // need to make sure that this is not running forever ...
                if (loopCount++ > capacity * 100)
                    throw new UnsupportedOperationException("Could not get the documents, maybe there are not enough documents in the index?");
            }
            return result;
        }
    }

//    protected abstract LireFeature getFeatureInstance();

    protected LireFeature getFeatureInstance() {
        LireFeature result = null;
        try {
            result =  lireFeature.getClass().newInstance();
        } catch (InstantiationException e) {
            e.printStackTrace();
        } catch (IllegalAccessException e) {
            e.printStackTrace();
        }
        return result;
    }

    private class Indexer implements Runnable {     // 建索引的线程类 私有的
        int start, end;
        IndexWriter iw;
        ProgressMonitor pm = null;

        private Indexer(int start, int end, IndexWriter iw, ProgressMonitor pm) {
            this.start = start;
            this.end = end;
            this.iw = iw;
            this.pm = pm;
        }

        public void run() {               // 线程运行函数
            LireFeature f = getFeatureInstance();   // 得到feature的实例
            for (int i = start; i < end; i++) {
                try {
                    Document d = reader.document(i);    // 得到第i个文档
                    createVisualWords(d, f);
                    iw.updateDocument(new Term(DocumentBuilder.FIELD_NAME_IDENTIFIER, d.getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0]), d);
                    if (pm != null) {
                        double len = (double) (end - start);
                        double percent = (double) (i - start) / len * 45d + 50;
                        pm.setProgress((int) percent);
                        pm.setNote("Creating visual words, ~" + (int) percent + "% finished");
                    }
//                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    private void createVisualWords(Document d, LireFeature f)
    {
        double[] tmpHist = new double[numClusters];     
        Arrays.fill(tmpHist, 0d);
        IndexableField[] fields = d.getFields(localFeatureFieldName);
        // remove the fields if they are already there ...
        // 从索引中移除以下两个字段以防已经存在
        d.removeField(visualWordsFieldName);
        d.removeField(localFeatureHistFieldName);

        // find the appropriate cluster for each feature:
        for (int j = 0; j < fields.length; j++) {      // 获取该描述符 
            f.setByteArrayRepresentation(fields[j].binaryValue().bytes, fields[j].binaryValue().offset, fields[j].binaryValue().length);
            tmpHist[clusterForFeature((Histogram) f)]++;      // 得到每一个特征点所对应的最近聚类中心就+1
        }
        //quantize(tmpHist);   // tmpHist就是最终的结果
        d.add(new TextField(visualWordsFieldName, arrayToVisualWordString(tmpHist), Field.Store.YES));    // 以字符串的形式进行存储，没什么用
        d.add(new StoredField(localFeatureHistFieldName, SerializationUtils.toByteArray(tmpHist)));   // 转换成字节类型进行存储
        // remove local features to save some space if requested:
        if (DELETE_LOCAL_FEATURES) {
            d.removeFields(localFeatureFieldName);     // 移除原有的field
        }

        // for debugging ..
//        System.out.println(d.getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0] + " " + Arrays.toString(tmpHist));
    }

    private String getDuration(double time) {
        double min = (System.currentTimeMillis() - time) / (1000 * 60);
        double sec = (min - Math.floor(min)) * 60;
        return String.format("%02d:%02d", (int) min, (int) sec);
    }

    public void setProgressMonitor(ProgressMonitor pm) {
        this.pm = pm;
    }

}

KMeans.java(包含注释)

package lmc.imageretrieval.imageanalysis.bovw;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;

import lmc.imageretrieval.imageanalysis.Histogram;
import lmc.imageretrieval.utils.StatsUtils;

public class KMeans {
    protected List<Image> images = new LinkedList<Image>();
    protected int countAllFeatures = 0, numClusters = 256;
    protected ArrayList<double[]> features = null;
    protected Cluster[] clusters = null;
    protected HashMap<double[], Integer> featureIndex = null;

    public KMeans() {

    }

    public KMeans(int numClusters) {
        this.numClusters = numClusters;
    }

    public void addImage(String identifier, List<double[]> features) {    // 加入image
        images.add(new Image(identifier, features));
        countAllFeatures += features.size();
    }

    public int getFeatureCount() {
        return countAllFeatures;
    }

    public void init() {      // 聚类中心初始化
        // create a set of all features:
        features = new ArrayList<double[]>(countAllFeatures);
        for (Image image : images) {
            if (image.features.size() > 0)         // 将所有的descriptor放入features中
                for (double[] histogram : image.features) {
                    if (!hasNaNs(histogram)) features.add(histogram);
                }
            else {
                System.err.println("Image with no features: " + image.identifier);
            }
        }
        // --- check if there are (i) enough images and (ii) enough features
        if (images.size() < 500) {   // 图片数量小于500 错误
            System.err.println("WARNING: Please note that this approach has been implemented for big data and *a lot of images*. " +
                    "You might not get appropriate results with a small number of images employed for constructing the visual vocabulary.");
        }
        if (features.size() < numClusters*2) {   // 特征点个数不能小于聚类中心的两倍
            System.err.println("WARNING: Please note that the number of local features, in this case " + features.size() + ", is" +
                    "smaller than the recommended minimum number, which is two times the number of visual words, in your case 2*" + numClusters +
                    ". Please adapt your data and either use images with more local features or more images for creating the visual vocabulary.");
        }
        if (features.size() < numClusters + 1) {    //特征点个数不能小于聚类中心+1
            System.err.println("CRITICAL: The number of features is smaller than the number of clusters. This cannot work as there has to be at least one " +
                    "feature per cluster. Aborting process now.");
            System.out.println("images: " + images.size());
            System.out.println("features: " + features.size());
            System.out.println("clusters: " + numClusters);
            System.exit(1);
        }
        // find first clusters:
        clusters = new Cluster[numClusters];          // 初始的聚类中心
        Set<Integer> medians = selectInitialMedians(numClusters);
        assert(medians.size() == numClusters); // this has to be the same ...
        Iterator<Integer> mediansIterator = medians.iterator();
        for (int i = 0; i < clusters.length; i++) {
            double[] descriptor = features.get(mediansIterator.next());
            clusters[i] = new Cluster(new double[descriptor.length]);   // implicitly setting the length of the mean array.
            System.arraycopy(descriptor, 0, clusters[i].mean, 0, descriptor.length);
        }
    }

    protected Set<Integer> selectInitialMedians(int numClusters) {
        return StatsUtils.drawSample(numClusters, features.size());
    }

    /**
     * Do one step and return the overall stress (squared error). You should do this until
     * the error is below a threshold or doesn't change a lot in between two subsequent steps.
     *
     * @return
     */
    public double clusteringStep() {            // 聚类迭代
        for (int i = 0; i < clusters.length; i++) {
            clusters[i].members.clear();             // 清空该聚类中心所有的成员
        }
        reOrganizeFeatures();     // 重新计算每个样本点到聚类中心的距离，重新分配
        recomputeMeans();          // 重新计算聚类中心的大小
        return overallStress();        // 返回sum of squared  迭代结束指标
    }

    protected boolean hasNaNs(double[] histogram) {    // 判断是否有not a number
        boolean hasNaNs = false;
        for (int i = 0; i < histogram.length; i++) {
            if (Double.isNaN(histogram[i])) {
                hasNaNs = true;
                break;
            }
        }
        if (hasNaNs) {
            System.err.println("Found a NaN in init");
//            System.out.println("image.identifier = " + image.identifier);
            for (int j = 0; j < histogram.length; j++) {
                double v = histogram[j];
                System.out.print(v + ", ");
            }
            System.out.println("");
        }
        return hasNaNs;
    }

    /**
     * Re-shuffle all features.
     */
    protected void reOrganizeFeatures() {            // 重新计算每个点到聚类中心的距离，该点归属于哪一个聚类中心
        for (int k = 0; k < features.size(); k++) {     // 看k属于哪个聚类中心最近
            double[] f = features.get(k);
            Cluster best = clusters[0];
            double minDistance = clusters[0].getDistance(f);
            for (int i = 1; i < clusters.length; i++) {
                double v = clusters[i].getDistance(f);   // 采用的是欧式距离
                if (minDistance > v) {
                    best = clusters[i];
                    minDistance = v;
                }
            }
            best.members.add(k);
        }
    }

    /**
     * Computes the mean per cluster (averaged vector)
     */
    protected void recomputeMeans() {        // 重新计算聚类中心
        int length = features.get(0).length;
        for (int i = 0; i < clusters.length; i++) {
            Cluster cluster = clusters[i];
            double[] mean = cluster.mean;
            for (int j = 0; j < length; j++) {
                mean[j] = 0;
                for (Integer member : cluster.members) {
                    mean[j] += features.get(member)[j];
                }
                if (cluster.members.size() > 1)
                    mean[j] = mean[j] / (double) cluster.members.size();
            }
            if (cluster.members.size() == 1) {         // 该聚类中心只含有一个点
                System.err.println("** There is just one member in cluster " + i);
            } else if (cluster.members.size() < 1) {   // 该聚类中心没有点
                System.err.println("** There is NO member in cluster " + i);
                // fill it with a random member?!?
                int index = (int) Math.floor(Math.random()*features.size());    // 重新随机选择一个点作为该聚类中心
                System.arraycopy(features.get(index), 0, clusters[i].mean, 0, clusters[i].mean.length);
            }

        }
    }

    /**
     * Squared error in classification.
     *
     * @return
     */
    protected double overallStress() {         // 计算聚类中的sum of squared
        double v = 0;
        int length = features.get(0).length;
        for (int i = 0; i < clusters.length; i++) {
            for (Integer member : clusters[i].members) {
                float tmpStress = 0;
                for (int j = 0; j < length; j++) {
//                    if (Float.isNaN(features.get(member).descriptor[j])) System.err.println("Error: there is a NaN in cluster " + i + " at member " + member);
                    tmpStress += Math.abs(clusters[i].mean[j] - features.get(member)[j]);
                }
                v += tmpStress;
            }
        }
        return v;
    }

    public Cluster[] getClusters() {
        return clusters;
    }

    public List<Image> getImages() {
        return images;
    }

    /**
     * Set the number of desired clusters.
     *
     * @return
     */
    public int getNumClusters() {
        return numClusters;
    }

    public void setNumClusters(int numClusters) {
        this.numClusters = numClusters;
    }

    private HashMap<double[], Integer> createIndex() {
        featureIndex = new HashMap<double[], Integer>(features.size());
        for (int i = 0; i < clusters.length; i++) {
            Cluster cluster = clusters[i];
            for (Iterator<Integer> fidit = cluster.members.iterator(); fidit.hasNext(); ) {
                int fid = fidit.next();
                featureIndex.put(features.get(fid), i);
            }
        }
        return featureIndex;
    }

    /**
     * Used to find the cluster of a feature actually used in the clustering process (so
     * it is known by the k-means class).
     *
     * @param f the feature to search for
     * @return the index of the Cluster
     */
    public int getClusterOfFeature(Histogram f) {
        if (featureIndex == null) createIndex();
        return featureIndex.get(f);
    }
}

class Image {
    public List<double[]> features;
    public String identifier;
    public float[] localFeatureHistogram = null;
    private final int QUANT_MAX_HISTOGRAM = 256;

    Image(String identifier, List<double[]> features) {
        this.features = new LinkedList<double[]>();
        this.features.addAll(features);
        this.identifier = identifier;
    }

    public float[] getLocalFeatureHistogram() {
        return localFeatureHistogram;
    }

    public void setLocalFeatureHistogram(float[] localFeatureHistogram) {
        this.localFeatureHistogram = localFeatureHistogram;
    }

    public void initHistogram(int bins) {
        localFeatureHistogram = new float[bins];
        for (int i = 0; i < localFeatureHistogram.length; i++) {
            localFeatureHistogram[i] = 0;
        }
    }

    public void normalizeFeatureHistogram() {         // 对聚类中心进行归一化
        float max = 0;
        for (int i = 0; i < localFeatureHistogram.length; i++) {
            max = Math.max(localFeatureHistogram[i], max);
        }
        for (int i = 0; i < localFeatureHistogram.length; i++) {
            localFeatureHistogram[i] = (localFeatureHistogram[i] * QUANT_MAX_HISTOGRAM) / max;
        }
    }

    public void printHistogram() {
        for (int i = 0; i < localFeatureHistogram.length; i++) {
            System.out.print(localFeatureHistogram[i] + " ");

        }
        System.out.println("");
    }
}

Cluster.java(包含注释)

package lmc.imageretrieval.imageanalysis.bovw;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;

import lmc.imageretrieval.imageanalysis.Histogram;
import lmc.imageretrieval.utils.MetricsUtils;
import lmc.imageretrieval.utils.SerializationUtils;

public class Cluster implements Comparable<Object> {
    double[] mean;
    HashSet<Integer> members = new HashSet<Integer>();

    private double stress = 0;

    public Cluster() {
        this.mean = new double[4 * 4 * 8];
        Arrays.fill(mean, 0f);
    }

    public Cluster(double[] mean) {
        this.mean = mean;
    }

    public String toString() {
        StringBuilder sb = new StringBuilder(512);
        for (Integer integer : members) {
            sb.append(integer);
            sb.append(", ");
        }
        for (int i = 0; i < mean.length; i++) {
            sb.append(mean[i]);
            sb.append(';');
        }
        return sb.toString();
    }

    public int compareTo(Object o) {
        return ((Cluster) o).members.size() - members.size();
    }

    public double getDistance(Histogram f) {
        return getDistance(f.getDoubleHistogram());
    }

    public double getDistance(double[] f) {
//        L1
//        return MetricsUtils.distL1(mean, f);

//        L2
        return MetricsUtils.distL2(mean, f);
    }

    /**
     * Creates a byte array representation from the clusters mean.
     *
     * @return the clusters mean as byte array.
     */
    public byte[] getByteRepresentation() {
        return SerializationUtils.toByteArray(mean);
    }

    public void setByteRepresentation(byte[] data) {
        mean = SerializationUtils.toDoubleArray(data);
    }

    public static void writeClusters(Cluster[] clusters, String file) throws IOException {   // 将聚类中心写入磁盘上
        FileOutputStream fout = new FileOutputStream(file);
        fout.write(SerializationUtils.toBytes(clusters.length));   // 聚类中心个数
        fout.write(SerializationUtils.toBytes((clusters[0].getMean()).length));  // 聚类中心点的长度
        for (int i = 0; i < clusters.length; i++) {
            fout.write(clusters[i].getByteRepresentation());   // 写入每个聚类中心
        }
        fout.close();
    }

    // TODO: re-visit here to make the length variable (depending on the actual feature size).
    public static Cluster[] readClusters(String file) throws IOException {    // 从磁盘上读取聚类中心
        FileInputStream fin = new FileInputStream(file);
        byte[] tmp = new byte[4];
        fin.read(tmp, 0, 4);
        Cluster[] result = new Cluster[SerializationUtils.toInt(tmp)];
        fin.read(tmp, 0, 4);
        int size = SerializationUtils.toInt(tmp);
        tmp = new byte[size * 8];
        for (int i = 0; i < result.length; i++) {
            int bytesRead = fin.read(tmp, 0, size * 8);
            if (bytesRead != size * 8) System.err.println("Didn't read enough bytes ...");
            result[i] = new Cluster();
            result[i].setByteRepresentation(tmp);
        }
        fin.close();
        return result;
    }

    public double getStress() {
        return stress;
    }

    public void setStress(double stress) {
        this.stress = stress;
    }

    public HashSet<Integer> getMembers() {
        return members;
    }

    public void setMembers(HashSet<Integer> members) {
        this.members = members;
    }

    /**
     * Returns the cluster mean
     *
     * @return the cluster mean vector
     */
    public double[] getMean() {
        return mean;
    }
}

你可能感兴趣的:(Lucene,源码解析,LIRE,以图搜图)

深入剖析 Linux 内核网络核心：sock.c 源码解析 109702008 编程 #C语言网络 linux 网络人工智能
作为Linux网络子系统的基石，sock.c承载着协议无关的核心功能。本文将深入分析其关键实现，揭示高性能网络通信背后的设计哲学。一、Socket生命周期管理1.1初始化与分配sock_init_data()是socket的初始化入口，负责设置核心回调函数和默认参数：voidsock_init_data(structsocket*sock,structsock*sk){sk->sk_state=T
弹幕系统开发实战：QT框架与VS2015源码解析 Paula-柒月拾
本文还有配套的精品资源，点击获取简介：本源码项目融合了三个关键技术领域：弹幕系统设计、Qt框架开发和VisualStudio2015集成。它详细阐述了弹幕系统的核心功能实现，包括弹幕数据结构、渲染、碰撞检测和用户交互。同时，本项目介绍了如何利用Qt5的信号与槽机制、GUI组件和绘图系统来开发弹幕效果，并展示了如何在VisualStudio2015中进行项目管理、编辑、调试和构建。此项目提供了全面的
RabbitMQ学习笔记：rabbitmq-server -detached Warning: PID file not written； -detached was passed 码炫课堂-码哥 rabbitmq专题 rabbitmq
作者简介：大家好，我是smart哥，前中兴通讯、美团架构师，现某互联网公司CTO联系qq：184480602，加我进群，大家一起学习，一起进步，一起对抗互联网寒冬学习必须往深处挖，挖的越深，基础越扎实！阶段1、深入多线程阶段2、深入多线程设计模式阶段3、深入juc源码解析阶段4、深入jdk其余源码解析
cJSON 源码解析
1.概述cJSON是一个轻量级的C语言JSON解析库，支持JSON数据的解析和生成。它采用单一头文件和源文件的设计，易于集成到项目中。主要特性完整的JSON支持（解析和生成）内存管理自动化支持格式化输出支持自定义内存分配器跨平台兼容2.核心数据结构2.1cJSON结构体typedefstructcJSON{structcJSON*next;//指向下一个兄弟节点structcJSON*prev;/
互联网医院系统源码解析：如何实现视频问诊、电子处方等核心功能？万岳科技程序员小金在线问诊APP开发智慧医疗APP 数字药店系统源码 PHP 源码互联网医院系统源码医院软件开发智慧医疗小程序医院APP开发电子处方小程序
时下，互联网医院已经不再是“新鲜词”，而是医疗机构提升服务质量、优化运营模式的重要技术手段。从挂号排队到视频问诊，从智能开方到电子处方的全流程闭环，背后的核心支撑，正是互联网医院系统源码的“底层逻辑”。那么，一套高可用、可拓展、安全合规的互联网医院系统源码，是如何实现“视频问诊”“电子处方”等关键功能的？作为软件开发行业的从业者，我们今天不妨从技术与场景双视角，聊聊这其中的实现机制与落地难点。一、
网络数据包捕获工具源码解析与实战铭信
本文还有配套的精品资源，点击获取简介：本文介绍了抓包工具源码的重要性，主要用于网络分析、故障排查和安全监控。重点讨论了libcap和tcpdump这两个关键组件，它们分别提供了Linux内核能力接口的用户空间访问和命令行网络嗅探功能。通过分析libcap1.7.4和tcpdump4.7.4的源代码，开发者可以深入理解网络编程和数据包捕获机制，以及如何与libcap交互来实现网络数据包的捕获和解析。
【云原生】Docker 部署 Elasticsearch 9 操作详解逆风飞翔的小叔运维 Docker 部署es9 Docker部署es Docker搭建es9 Elasticsearch9 Docker搭建es
目录一、前言二、Elasticsearch9新特性介绍2.1基于Lucene10重大升级2.2BetterBinaryQuantization（BBQ）2.3ElasticDistributionsofOpenTelemetry（EDOT）2.4LLM可观测性2.5攻击发现与自动导入2.6ES|QL增强2.7语义检索三、基于Docker部署Elasticsearch93.1Elasticsearc
LangChain异步编程的应用与源码解析(67) Android 小码蜂 LangChain框架入门 langchain microsoft 人工智能深度学习
LangChain异步编程的应用与源码解析一、LangChain异步编程概述1.1异步编程的必要性在LangChain构建的大语言模型应用中，大量操作存在I/O密集特性，如与外部API（OpenAI等）交互、访问向量数据库、读取文件等。传统同步编程模式下，程序在执行这些操作时会处于阻塞状态，导致资源利用率低、响应速度慢，无法充分发挥系统性能。异步编程允许程序在等待I/O操作完成时，切换去执行其他任
Java引用类型String源码解析骆驼整理说 Java基础 java 开发语言
目录概述final关键字String类常用方法String常用方法源码String长度限制Java引用类型大致包括类、接口类型、数组类型、枚举类型、注解类型、字符串型。String类型就是引用类型。概述JVM运行时会分配一块空间给String，字符串的分配和其他对象分配一样，需要消耗高昂的时间和空间，JVM为了提高性能和减少内存的开销，在实例化字符串的时候进行了一些优化，使用字符串常量池，创建字符
深度解析Lucene IndexWriter 性能优化微笑听雨。 java 进阶教程 lucene indexWriter 全文检索性能调优内存缓冲
深度解析LuceneIndexWriter性能优化目标：在大规模写入、频繁更新的场景下，既保持吞吐量，又兼顾搜索实时性与系统稳定性。关键调优点内存缓冲：将RAMBufferSizeMB提升至128–1024MB，减少flush次数；必要时配合maxBufferedDocs。合并策略：使用TieredMergePolicy，典型参数为maxMergeAtOnce4–8、segmentsPerTier
C#进行串口应用开发如何处理不同操作系统的串口兼容性问题 openwin_top c#串口应用开发问题系列 c#单片机 stm32 开发语言串口通讯网络
python编程示例系列python编程示例系列二python的Web神器Streamlit如何应聘高薪职位C#视觉应用开发问题系列c#串口应用开发问题系列microPythonPython最小内核源码解析NI-motion运动控制c语言示例代码解析在C#进行串口应用开发时，需要处理不同操作系统之间的串口兼容性问
探秘Flink Streaming Source Analysis：一个强大的流处理源码解析工具强妲佳Darlene
探秘FlinkStreamingSourceAnalysis：一个强大的流处理源码解析工具去发现同类优质开源项目:https://gitcode.com/项目简介在大数据实时处理领域，ApacheFlink是一个不可或缺的名字。而flink-streaming-source-analysis项目是由开发者mickey0524创建的一个开源工具，旨在帮助我们更深入地理解和分析Flink流处理的源代码
Spring Boot 集成 Elasticsearch（含 ElasticsearchRestTemplate 示例）超级小忍 SpringBoot spring boot elasticsearch
Elasticsearch是一个基于Lucene的分布式搜索服务器，具有高效的全文检索能力。在现代应用中，尤其是需要强大搜索功能的系统中，Elasticsearch被广泛使用。SpringBoot提供了对Elasticsearch的集成支持，使得开发者可以轻松地将Elasticsearch集成到SpringBoot应用中，实现高效的搜索、分析等功能。本文将详细介绍如何在SpringBoot中集成E
零基础深入SpringCloud架构搭建与源码解析第五节 Hsmiau
本文还有配套的精品资源，点击获取简介：本教程第五节深入讲解从零开始搭建SpringCloud项目，包括源码解析和实践操作。SpringCloud是一个微服务框架，利用SpringBoot简化Java应用的服务配置和管理，提供了服务发现、配置中心等核心功能。教程将介绍项目初始化、服务注册与发现、微服务创建、路由服务配置、断路器实现和测试验证的完整流程，通过源码解析和操作截图帮助学习者深刻理解其工作原
[redis 源码走读] - redis 与 raft 算法码炫课堂-码哥 redis专题 redis raft
作者简介：大家好，我是smart哥，前中兴通讯、美团架构师，现某互联网公司CTO联系qq：184480602，加我进群，大家一起学习，一起进步，一起对抗互联网寒冬学习必须往深处挖，挖的越深，基础越扎实！阶段1、深入多线程阶段2、深入多线程设计模式阶段3、深入juc源码解析阶段4、深入jdk其余源码解析阶段5、深入jvm源码解析码哥源码部分码哥讲源码-原理源码篇【2024年最新大厂关于线程池使用的场
机器视觉工程师如何进行图像去噪和增强 zhangzhechun_02 运维深度学习人工智能机器人自动化
python编程示例系列python编程示例系列二python的Web神器Streamlit如何应聘高薪职位C#视觉应用开发问题系列c#串口应用开发问题系列microPythonPython最小内核源码解析NI-motion运动控制c语言示例代码解析
FastThreadLocal 的深度源码解析 ma451152002 P7【Java面试手记】java 开发语言 FastThreadLocal
FastThreadLocal的深度源码解析前言在高性能网络编程领域，Netty作为业界标杆，其内部的每一个优化细节都值得深入研究。今天我们将深入解析Netty中的FastThreadLocal，这是一个对JDKThreadLocal的高性能替代实现。通过本文，你将了解到FastThreadLocal的设计思想、实现原理、性能优势以及在实际项目中的应用。目录FastThreadLocal概述核心数
从零构建千万级日活推客小程序系统｜架构设计+源码解析+性能优化+安全防御+运维监控全链路实战 wx_ywyy6798 短剧短剧系统推客系统 java 海外短剧推客系统开发推客小程序开发
在流量红利见顶的今天，推客（社交分销）系统已成为电商企业实现用户裂变、低成本获客的核心武器！我们团队历时12个月，与5家头部电商平台深度合作，打造出一套日均承载5000万PV、峰值QPS突破3万的推客小程序系统。本文将从业务需求分析、技术架构选型、核心模块实现、高并发优化、安全风控、运维监控六大维度，完整拆解如何打造一个高性能、高可用、高扩展的推客系统。一、系统架构全景解析1.业务需求与挑战推客核
Arrays.asList() 的不可变陷阱：问题、原理与解决方案 weixin_52318532 java
Arrays.asList()的不可变陷阱：问题、原理与解决方案#Java集合#开发陷阱#源码解析#编程技巧一、问题现象：无法修改的集合当开发者使用Arrays.asList()转换数组为集合时，尝试添加/删除元素会抛出异常：String[]arr={"Java","Python","Go"};Listlist=Arrays.asList(arr);//尝试添加元素list.add("JavaSc
SpringBoot源码解析(二十五)：内嵌数据库H2的自动初始化逻辑好运仔dzl #SpringBoot源码分析 java
一、H2数据库概述1.1H2数据库特性H2是一个开源的嵌入式关系型数据库，具有以下核心特性：嵌入式运行：可作为内存数据库或文件数据库运行零配置部署：无需额外安装和配置兼容模式：支持多种SQL方言和兼容模式Web控制台：提供基于浏览器的管理界面快速启动：极低的内存占用和启动时间1.2SpringBoot集成优势SpringBoot对H2的自动配置提供了以下便利：自动检测：根据classpath自动配
mysql源码-innodb mvcc原理与源码解析 qhgxinxing mysql源码分析 mysql 数据库 mvcc原理 mvcc源码解析
原理图解说1在innodb中，保存了一个全局事务链表，记录了活跃事务，即还未完成的事务2t2时刻活跃的事务id为104～111，其中107已经完成3在t2时刻，读事务A查询数据，在查询范围内的事务id为100～114，需要判断事务id100～114那些是否可见的首先，先把当前活跃事务复制到自己空间，创建自己的readview,活跃的事务id有104～111，不包含107，107已经完成。最后，读取
机器视觉工程师如何进行条码与二维码识别优化 zhangzhechun_02 自动化运维深度学习人工智能机器人
python编程示例系列python编程示例系列二python的Web神器Streamlit如何应聘高薪职位C#视觉应用开发问题系列c#串口应用开发问题系列microPythonPython最小内核源码解析NI-motion运动控制c语言示例代码解析
时序数据库 Apache-IoTDB源码解析之文件索引块（五）刘涛华 IoTDB源码解析数据库大数据
上一章聊到TsFile的文件组成，以及数据块的详细介绍。详情请见：时序数据库Apache-IoTDB源码解析之文件数据块（四）打一波广告，欢迎大家访问IoTDB仓库，求一波Star。欢迎关注头条号：列炮缓开局，欢迎关注OSCHINA博客这一章主要想聊聊：TsFile索引块的组成索引块的查询过程索引块目前在做的改进项索引块索引块由两大部分组成，其写入的方式是从左到右写入，也就是从文件头向文件尾写入。
【Bluedroid】蓝牙启动之 bta_sys_init 源码解析 byte轻骑兵 Android c++bluedroid Android
本文深入剖析Android蓝牙协议栈中的核心管理模块bta_sys，通过解读其初始化流程、关键数据结构（tBTA_SYS_CB）、模块化注册机制（tBTA_SYS_REG）及事件调度策略，揭示其如何实现蓝牙多子系统的动态协作与资源管理。该模块作为协议栈的"中枢神经系统"，支撑音频传输、设备管理、低功耗控制等复杂功能的稳定运行。一、概述蓝牙BTA模块是蓝牙协议栈的应用层核心，负责协调底层协议（如HC
nghttp2库源码解析及客户端实现 ghie9090 nghttp2
HTTP/2是HTTP协议的重大升级，提供了更高效的传输性能和更好的用户体验。nghttp2是一个非常流行的HTTP/2实现库，本文将通过解析nghttp2的源码以及实现一个简单的客户端示例，帮助读者深入理解HTTP/2。一、HTTP/2基本概念HTTP/2引入了多个新特性来提升性能，包括：二进制分帧层：将数据分为帧，帧再组成消息，简化了数据解析。多路复用：在一个连接上同时发送多个请求和响应，消除
SpringBoot源码解析(二十二)：健康检查HealthIndicator的聚合机制好运仔dzl #SpringBoot源码分析 java mybatis spring boot
前言健康检查是生产级应用不可或缺的功能，SpringBoot通过HealthIndicator体系提供了强大的健康检查能力。本文将深入剖析HealthIndicator的聚合机制，从基础接口设计到复杂的聚合逻辑，全面解析SpringBoot如何管理、组织和聚合各类健康指标。通过本文，读者不仅能理解健康检查的核心实现原理，还能掌握如何扩展和定制健康检查系统。一、健康检查核心架构1.1核心接口定义//
QEMU源码全解析 —— 块设备虚拟化（31）蓝天居士 QEMU/KVM QEMU KVM
接前一篇文章：QEMU源码全解析——块设备虚拟化（30）本文内容参考：《趣谈Linux操作系统》——刘超，极客时间《QEMU/KVM源码解析与应用》——李强，机械工业出版社Virt
Spring AI Java程序员的AI之Spring AI（二）怎么起个名就那么难 java java 人工智能 spring spring boot Spring AI chatgpt
SpringAI之函数调用实战与原理分析历史SpringAI文章一丶SpringAI函数调用定义工具函数Function工具函数调用FunctionCallback工具函数二丶SpringAI函数调用源码解析请求处理请求调用函数调用交互流程图三丶案例总结历史SpringAI文章SpringAIJava程序员的AI之SpringAI（一）一丶SpringAI函数调用定义工具函数Function在Sp
浅谈时序数据库 Apache-IoTDB 源码解析之前言 AI科学小老师
个人博客导航页（点击右侧链接即可打开个人博客）：大牛带你入门技术栈这一章主要想聊一聊：为什么重复造轮子，从物联网行业的数据特点到IoTDB的发展过程这个轮子造的怎么样，IoTDB和竞品测试对比时序数据我个人理解时序数据是基于时间维度的同一个物体或概念的值构成的一个序列数据。在传统关系型数据库中，例如MySQL，我们通常会放置一个自增的Id列作为主键标识，如下：Id人名体温测量时间1张三36.520
c#视觉应用开发中如何在C#中处理3D图像数据？ openwin_top C#视觉应用开发问题系列 c#3d 单片机计算机视觉视觉检测
microPythonPython最小内核源码解析NI-motion运动控制c语言示例代码解析python编程示例系列python编程示例系列二python的Web神器Streamlit如何应聘高薪职位在C#中处理3D图像数据，通常涉及使用图形库或框架来加载、处理和显示3D图像。常用的库包括DirectX、OpenGL或更高层次的框架如Unity3D。下面我们将使用一个较为简单且流行的库——Sha
[黑洞与暗粒子]没有光的世界 comsci
无论是相对论还是其它现代物理学,都显然有个缺陷,那就是必须有光才能够计算但是,我相信,在我们的世界和宇宙平面中,肯定存在没有光的世界.... 那么,在没有光的世界,光子和其它粒子的规律无法被应用和考察,那么以光速为核心的 &nbs
jQuery Lazy Load 图片延迟加载 aijuans jquery
基于 jQuery 的图片延迟加载插件，在用户滚动页面到图片之后才进行加载。对于有较多的图片的网页，使用图片延迟加载，能有效的提高页面加载速度。版本： jQuery v1.4.4+ jQuery Lazy Load v1.7.2 注意事项：需要真正实现图片延迟加载，必须将真实图片地址写在 data-original 属性中。若 src
使用Jodd的优点 Kai_Ge jodd
1. 简化和统一 controller ，抛弃 extends SimpleFormController ，统一使用 implements Controller 的方式。 2. 简化 JSP 页面的 bind, 不需要一个字段一个字段的绑定。 3. 对 bean 没有任何要求，可以使用任意的 bean 做为 formBean。使用方法简介
jpa Query转hibernate Query 120153216 Hibernate
public List<Map> getMapList(String hql, Map map) { org.hibernate.Query jpaQuery = entityManager.createQuery(hql); if (null != map) { for (String parameter : map.keySet()) { jp
Django_Python3添加MySQL/MariaDB支持 2002wmj mariaDB
现状首先，[email protected] 中默认的引擎为 django.db.backends.mysql 。但是在Python3中如果这样写的话，会发现 django.db.backends.mysql 依赖 MySQLdb[5] ，而 MySQLdb 又不兼容 Python3 于是要找一种新的方式来继续使用MySQL。 MySQL官方的方案首先据MySQL文档[3]说，自从MySQL
在SQLSERVER中查找消耗IO最多的SQL 357029540 SQL Server
返回做IO数目最多的50条语句以及它们的执行计划。 select top 50 (total_logical_reads/execution_count) as avg_logical_reads, (total_logical_writes/execution_count) as avg_logical_writes, (tot
spring UnChecked 异常官方定义！ 7454103 spring
如果你接触过spring的事物管理！那么你必须明白 spring的非捕获异常！即 unchecked 异常！因为 spring 默认这类异常事物自动回滚！！ public static boolean isCheckedException(Throwable ex) { return !(ex instanceof RuntimeExcep
mongoDB 入门指南、示例 adminjun java mongodb 操作
一、准备工作 1、下载mongoDB 下载地址：http://www.mongodb.org/downloads 选择合适你的版本相关文档：http://www.mongodb.org/display/DOCS/Tutorial 2、安装mongoDB A、不解压模式：将下载下来的mongoDB-xxx.zip打开，找到bin目录，运行mongod.exe就可以启动服务，默
CUDA 5 Release Candidate Now Available aijuans CUDA
The CUDA 5 Release Candidate is now available at http://developer.nvidia.com/<wbr></wbr>cuda/cuda-pre-production. Now applicable to a broader set of algorithms, CUDA 5 has advanced fe
Essential Studio for WinRT网格控件测评 Axiba JavaScript html5
Essential Studio for WinRT界面控件包含了商业平板应用程序开发中所需的所有控件，如市场上运行速度最快的grid 和chart、地图、RDL报表查看器、丰富的文本查看器及图表等等。同时，该控件还包含了一组独特的库，用于从WinRT应用程序中生成Excel、Word以及PDF格式的文件。此文将对其另外一个强大的控件——网格控件进行专门的测评详述。网格控件功能 1、
java 获取windows系统安装的证书或证书链 bewithme windows
有时需要获取windows系统安装的证书或证书链，比如说你要通过证书来创建java的密钥库。有关证书链的解释可以查看此处。 public static void main(String[] args) { SunMSCAPI providerMSCAPI = new SunMSCAPI(); S
NoSQL数据库之Redis数据库管理(set类型和zset类型) bijian1013 redis 数据库 NoSQL
4.sets类型 Set是集合，它是string类型的无序集合。set是通过hash table实现的，添加、删除和查找的复杂度都是O(1)。对集合我们可以取并集、交集、差集。通过这些操作我们可以实现sns中的好友推荐和blog的tag功能。 sadd：向名称为key的set中添加元
异常捕获何时用Exception，何时用Throwable bingyingao
用Exception的情况 try { //可能发生空指针、数组溢出等异常 } catch (Exception e) {
【Kafka四】Kakfa伪分布式安装 bit1129 kafka
在http://bit1129.iteye.com/blog/2174791一文中，实现了单Kafka服务器的安装，在Kafka中，每个Kafka服务器称为一个broker。本文简单介绍下，在单机环境下Kafka的伪分布式安装和测试验证 1. 安装步骤 Kafka伪分布式安装的思路跟Zookeeper的伪分布式安装思路完全一样，不过比Zookeeper稍微简单些(不
Project Euler bookjovi haskell
Project Euler是个数学问题求解网站，网站设计的很有意思，有很多problem，在未提交正确答案前不能查看problem的overview，也不能查看关于problem的discussion thread，只能看到现在problem已经被多少人解决了，人数越多往往代表问题越容易。看看problem 1吧： Add all the natural num
Java-Collections Framework学习与总结-ArrayDeque BrokenDreams Collections
表、栈和队列是三种基本的数据结构，前面总结的ArrayList和LinkedList可以作为任意一种数据结构来使用，当然由于实现方式的不同，操作的效率也会不同。这篇要看一下java.util.ArrayDeque。从命名上看
读《研磨设计模式》-代码笔记-装饰模式-Decorator bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.io.BufferedOutputStream; import java.io.DataOutputStream; import java.io.FileOutputStream; import java.io.Fi
Maven学习(一) chenyu19891124 Maven私服
学习一门技术和工具总得花费一段时间，5月底6月初自己学习了一些工具，maven+Hudson+nexus的搭建，对于maven以前只是听说，顺便再自己的电脑上搭建了一个maven环境，但是完全不了解maven这一强大的构建工具，还有ant也是一个构建工具，但ant就没有maven那么的简单方便，其实简单点说maven是一个运用命令行就能完成构建，测试，打包，发布一系列功
[原创]JWFD工作流引擎设计----节点匹配搜索算法(用于初步解决条件异步汇聚问题) 补充 comsci 算法工作 PHP 搜索引擎嵌入式
本文主要介绍在JWFD工作流引擎设计中遇到的一个实际问题的解决方案，请参考我的博文"带条件选择的并行汇聚路由问题"中图例A2描述的情况(http://comsci.iteye.com/blog/339756),我现在把我对图例A2的一个解决方案公布出来，请大家多指点节点匹配搜索算法(用于解决标准对称流程图条件汇聚点运行控制参数的算法) 需要解决的问题：已知分支
Linux中用shell获取昨天、明天或多天前的日期 daizj linux shell 上几年昨天获取上几个月
在Linux中可以通过date命令获取昨天、明天、上个月、下个月、上一年和下一年 # 获取昨天 date -d 'yesterday' # 或 date -d 'last day' # 获取明天 date -d 'tomorrow' # 或 date -d 'next day' # 获取上个月 date -d 'last month' #
我所理解的云计算 dongwei_6688 云计算
在刚开始接触到一个概念时，人们往往都会去探寻这个概念的含义，以达到对其有一个感性的认知，在Wikipedia上关于“云计算”是这么定义的，它说： Cloud computing is a phrase used to describe a variety of computing co
YII CMenu配置 dcj3sjt126com yii
Adding id and class names to CMenu We use the id and htmlOptions to accomplish this. Watch. //in your view $this->widget('zii.widgets.CMenu', array( 'id'=>'myMenu', 'items'=>$this-&g
设计模式之静态代理与动态代理 come_for_dream 设计模式
静态代理与动态代理代理模式是java开发中用到的相对比较多的设计模式，其中的思想就是主业务和相关业务分离。所谓的代理设计就是指由一个代理主题来操作真实主题，真实主题执行具体的业务操作，而代理主题负责其他相关业务的处理。比如我们在进行删除操作的时候需要检验一下用户是否登陆，我们可以删除看成主业务，而把检验用户是否登陆看成其相关业务
【转】理解Javascript 系列 gcc2ge JavaScript
理解Javascript_13_执行模型详解摘要: 在《理解Javascript_12_执行模型浅析》一文中,我们初步的了解了执行上下文与作用域的概念，那么这一篇将深入分析执行上下文的构建过程，了解执行上下文、函数对象、作用域三者之间的关系。函数执行环境简单的代码:当调用say方法时，第一步是创建其执行环境，在创建执行环境的过程中，会按照定义的先后顺序完成一系列操作:1.首先会创建一个
Subsets II hcx2013 set
Given a collection of integers that might contain duplicates, nums, return all possible subsets. Note: Elements in a subset must be in non-descending order. The solution set must not conta
Spring4.1新特性——Spring缓存框架增强 jinnianshilongnian spring4
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
shell嵌套expect执行命令 liyonghui160com
一直都想把expect的操作写到bash脚本里,这样就不用我再写两个脚本来执行了,搞了一下午终于有点小成就,给大家看看吧. 系统:centos 5.x 1.先安装expect yum -y install expect 2.脚本内容: cat auto_svn.sh #!/bin/bash
Linux实用命令整理 pda158 linux
0. 基本命令　　linux 基本命令整理　　1. 压缩解压　　tar -zcvf a.tar.gz a #把a压缩成a.tar.gz 　　tar -zxvf a.tar.gz #把a.tar.gz解压成a 　　2. vim小结　　2.1 vim替换　　:m,ns/word_1/word_2/gc
独立开发人员通向成功的29个小贴士 shoothao 独立开发
概述：本文收集了关于独立开发人员通向成功需要注意的一些东西,对于具体的每个贴士的注解有兴趣的朋友可以查看下面标注的原文地址。明白你从事独立开发的原因和目的。保持坚持制定计划的好习惯。万事开头难，第一份订单是关键。培养多元化业务技能。提供卓越的服务和品质。谨小慎微。营销是必备技能。学会组织，有条理的工作才是最有效率的。 “独立
JAVA中堆栈和内存分配原理 uule java
1、栈、堆 1.寄存器：最快的存储区, 由编译器根据需求进行分配,我们在程序中无法控制.2. 栈：存放基本类型的变量数据和对象的引用，但对象本身不存放在栈中，而是存放在堆（new 出来的对象）或者常量池中（字符串常量对象存放在常量池中。）3. 堆：存放所有new出来的对象。4. 静态域：存放静态成员（static定义的）5. 常量池：存放字符串常量和基本类型常量（public static f