cassandra row的快速检索3—keycache

cassandra的应用中最主要的查询一般是带着rowkey所进行的查询,所以这个查询最主要的就是通过rowkey找到对应数据的位置。而cassandra是流动型的nosql DB,这样数据只有在一个sstable中才会将每个rowkey对应的row进行合并处理,那么查询的时候,即使带着rowkey,也需要查询出每个sstable中对应的rowkey的位置。

cassandra的一组sstable中虽然index.db已经记录了rowkey所在data.db文件中的位置,但是index.db是磁盘文件,如果每次查询都要通过该文件查询得到,那么查询就必然增加一次IO,所以就有了keycache,这个记录了sstable中rowkey对应的data.db文件中的位置,并且存储在内存中,减少一次IO来加快通过rowkey的检索速度。

在cassandra运行中,cache的总体管理和控制主要由CacheService管理

public final AutoSavingCache keyCache;

而一组sstable就会对应在内存中映射出一个SSTableReader对象进行对应SSTable的控制。

所以在SSTableReader中就会有相应的指针指向keycache

private InstrumentingCache keyCache;

1、keycache的初始化,是在CF创建生成内存对象ColumnFamilyStore的时候,根据cache的配置参数而进行开启的。

keycache的相关属性介绍如下:

keycache的内存容量:cassandra.yaml文件中的配置项key_cache_size_in_mb:如果不为空,则keycache在内存中的容量即使这个配置值,否则如果总量内存的5%如果大于100MB,则取100MB,否则取总量内存的5%。

keycache的内存结构:public final AutoSavingCache keyCache;为一个AutoSavingCache对象。AutoSavingCache是继承父类InstrumentingCache

首先InstrumentingCache中有四个属性分别是:

private volatile boolean capacitySetManually;判断是否进行过手动更改cache容量的

    private final ICache map;主要是内存中存放cache数据的内存结构,keycache的主要实现类就是org.apache.cassandra.cache.ConcurrentLinkedHashCache.主要是存放KeyCacheKey与 RowIndexEntry的对应关系,而KeyCacheKey主要是rowkey与SStable的信息,而RowIndexEntry则是rowkey对应sstable所在data.db文件中的偏移量。

    private final String type;cache的类型,现在cassandra中主要有两种一种是keycache,一种是rowcache。

  private CacheMetrics metrics;cache的度量器,主要是cache的命中几率等等记录

然后是AutoSavingCache中有四个属性:

    public static final Set flushInProgress = new NonBlockingHashSet();主要是记录的正在flush的cache的任务,在同一时刻只允许一个cache进行flush

    protected volatile ScheduledFuture saveTask; 保存cache的任务,后面详细介绍

    protected final CacheService.CacheType cacheType; cache的类型

private CacheSerializer cacheLoader; cache数据的序列化器,Keycache的实现主要是KeyCacheSerializer,在keycache进行持久化的时候,进行调用其序列化方法将keycache进行序列化。序列化格式主要是:

rowkey的长度 rowkey sstable的标示ssid 标示是否进行index升级的版本 如果升级过index则需要序列化RowIndexEntry,如果没有就返回

    RowIndexEntry的序列化已经在indexinfo中介绍过了。

其次KeyCacheKey的属性

 public final Descriptor desc;sstable的相关属性
   
public final byte[] key;rowkey信息

/**
     * We can use Weighers.singleton() because Long can't be leaking memory
     * @return auto saving cache object
     */
    private AutoSavingCache initKeyCache()
    {
        logger.info("Initializing key cache with capacity of {} MBs.", DatabaseDescriptor.getKeyCacheSizeInMB());

        long keyCacheInMemoryCapacity = DatabaseDescriptor.getKeyCacheSizeInMB() * 1024 * 1024;

        // as values are constant size we can use singleton weigher
        // where 48 = 40 bytes (average size of the key) + 8 bytes (size of value)
        ICache kc;
        if (MemoryMeter.isInitialized())
        {
            kc = ConcurrentLinkedHashCache.create(keyCacheInMemoryCapacity);
        }
        else
        {
            logger.warn("MemoryMeter uninitialized (jamm not specified as java agent); KeyCache size in JVM Heap will not be calculated accurately. " +
                        "Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead");
            /* We don't know the overhead size because memory meter is not enabled. */
            EntryWeigher weigher = new EntryWeigher()
            {
                public int weightOf(KeyCacheKey key, RowIndexEntry entry)
                {
                    return key.key.length + entry.serializedSize();
                }
            };
            kc = ConcurrentLinkedHashCache.create(keyCacheInMemoryCapacity, weigher);
        }
        AutoSavingCache keyCache = new AutoSavingCache(kc, CacheType.KEY_CACHE, new KeyCacheSerializer());

        int keyCacheKeysToSave = DatabaseDescriptor.getKeyCacheKeysToSave();

        logger.info("Scheduling key cache save to each {} seconds (going to save {} keys).",
                DatabaseDescriptor.getKeyCacheSavePeriod(),
                    keyCacheKeysToSave == Integer.MAX_VALUE ? "all" : keyCacheKeysToSave);

        keyCache.scheduleSaving(DatabaseDescriptor.getKeyCacheSavePeriod(), keyCacheKeysToSave);

        return keyCache;
    }

/**
 * Wraps an ICache in requests + hits tracking.
 */
public class InstrumentingCache
{
    private volatile boolean capacitySetManually;
    private final ICache map;
    private final String type;

    private CacheMetrics metrics;

    public InstrumentingCache(String type, ICache map)
    {
        this.map = map;
        this.type = type;
        this.metrics = new CacheMetrics(type, map);
    }

    public void put(K key, V value)
    {
        map.put(key, value);
    }

    public boolean putIfAbsent(K key, V value)
    {
        return map.putIfAbsent(key, value);
    }

    public boolean replace(K key, V old, V value)
    {
        return map.replace(key, old, value);
    }

    public V get(K key)
    {
        V v = map.get(key);
        metrics.requests.mark();
        if (v != null)
            metrics.hits.mark();
        return v;
    }

    public V getInternal(K key)
    {
        return map.get(key);
    }

    public void remove(K key)
    {
        map.remove(key);
    }

    public long getCapacity()
    {
        return map.capacity();
    }

    public boolean isCapacitySetManually()
    {
        return capacitySetManually;
    }

    public void updateCapacity(long capacity)
    {
        map.setCapacity(capacity);
    }

    public void setCapacity(long capacity)
    {
        updateCapacity(capacity);
        capacitySetManually = true;
    }

    public int size()
    {
        return map.size();
    }

    public long weightedSize()
    {
        return map.weightedSize();
    }

    public void clear()
    {
        map.clear();
        metrics = new CacheMetrics(type, map);
    }

    public Set getKeySet()
    {
        return map.keySet();
    }

    public Set hotKeySet(int n)
    {
        return map.hotKeySet(n);
    }

    public boolean containsKey(K key)
    {
        return map.containsKey(key);
    }

    public boolean isPutCopying()
    {
        return map.isPutCopying();
    }

    public CacheMetrics getMetrics()
    {
        return metrics;
    }
}


package org.apache.cassandra.cache;

import java.io.*;
import java.nio.ByteBuffer;
import java.util.*;
import java.util.concurrent.Future;
import java.util.concurrent.ScheduledFuture;
import java.util.concurrent.TimeUnit;

import org.cliffc.high_scale_lib.NonBlockingHashSet;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.apache.cassandra.config.CFMetaData;
import org.apache.cassandra.config.DatabaseDescriptor;
import org.apache.cassandra.db.Table;
import org.apache.cassandra.db.compaction.CompactionInfo;
import org.apache.cassandra.db.compaction.CompactionManager;
import org.apache.cassandra.db.compaction.OperationType;
import org.apache.cassandra.db.ColumnFamilyStore;
import org.apache.cassandra.io.FSWriteError;
import org.apache.cassandra.io.util.FileUtils;
import org.apache.cassandra.io.util.LengthAvailableInputStream;
import org.apache.cassandra.io.util.SequentialWriter;
import org.apache.cassandra.service.CacheService;
import org.apache.cassandra.service.StorageService;
import org.apache.cassandra.utils.ByteBufferUtil;
import org.apache.cassandra.utils.Pair;

public class AutoSavingCache extends InstrumentingCache
{
    private static final Logger logger = LoggerFactory.getLogger(AutoSavingCache.class);

    /** True if a cache flush is currently executing: only one may execute at a time. */
    public static final Set flushInProgress = new NonBlockingHashSet();

    protected volatile ScheduledFuture saveTask;
    protected final CacheService.CacheType cacheType;

    private CacheSerializer cacheLoader;
    private static final String CURRENT_VERSION = "b";

    public AutoSavingCache(ICache cache, CacheService.CacheType cacheType, CacheSerializer cacheloader)
    {
        super(cacheType.toString(), cache);
        this.cacheType = cacheType;
        this.cacheLoader = cacheloader;
    }

    public File getCachePath(String ksName, String cfName, String version)
    {
        return DatabaseDescriptor.getSerializedCachePath(ksName, cfName, cacheType, version);
    }

    public Writer getWriter(int keysToSave)
    {
        return new Writer(keysToSave);
    }

    public void scheduleSaving(int savePeriodInSeconds, final int keysToSave)
    {
        if (saveTask != null)
        {
            saveTask.cancel(false); // Do not interrupt an in-progress save
            saveTask = null;
        }
        if (savePeriodInSeconds > 0)
        {
            Runnable runnable = new Runnable()
            {
                public void run()
                {
                    submitWrite(keysToSave);
                }
            };
            saveTask = StorageService.optionalTasks.scheduleWithFixedDelay(runnable,
                                                                           savePeriodInSeconds,
                                                                           savePeriodInSeconds,
                                                                           TimeUnit.SECONDS);
        }
    }

    public int loadSaved(ColumnFamilyStore cfs)
    {
        int count = 0;
        long start = System.currentTimeMillis();

        // old cache format that only saves keys
        File path = getCachePath(cfs.table.name, cfs.columnFamily, null);
        if (path.exists())
        {
            DataInputStream in = null;
            try
            {
                logger.info(String.format("reading saved cache %s", path));
                in = new DataInputStream(new LengthAvailableInputStream(new BufferedInputStream(new FileInputStream(path)), path.length()));
                Set keys = new HashSet();
                while (in.available() > 0)
                {
                    keys.add(ByteBufferUtil.readWithLength(in));
                    count++;
                }
                cacheLoader.load(keys, cfs);
            }
            catch (Exception e)
            {
                logger.debug(String.format("harmless error reading saved cache %s fully, keys loaded so far: %d", path.getAbsolutePath(), count), e);
                return count;
            }
            finally
            {
                FileUtils.closeQuietly(in);
            }
        }

        // modern format, allows both key and value (so key cache load can be purely sequential)
        path = getCachePath(cfs.table.name, cfs.columnFamily, CURRENT_VERSION);
        if (path.exists())
        {
            DataInputStream in = null;
            try
            {
                logger.info(String.format("reading saved cache %s", path));
                in = new DataInputStream(new LengthAvailableInputStream(new BufferedInputStream(new FileInputStream(path)), path.length()));
                List>> futures = new ArrayList>>();
                while (in.available() > 0)
                {
                    Future> entry = cacheLoader.deserialize(in, cfs);
                    // Key cache entry can return null, if the SSTable doesn't exist.
                    if (entry == null)
                        continue;
                    futures.add(entry);
                    count++;
                }

                for (Future> future : futures)
                {
                    Pair entry = future.get();
                    put(entry.left, entry.right);
                }
            }
            catch (Exception e)
            {
                logger.debug(String.format("harmless error reading saved cache %s", path.getAbsolutePath()), e);
            }
            finally
            {
                FileUtils.closeQuietly(in);
            }
        }
        if (logger.isDebugEnabled())
            logger.debug(String.format("completed reading (%d ms; %d keys) saved cache %s",
                    System.currentTimeMillis() - start, count, path));
        return count;
    }

    public Future submitWrite(int keysToSave)
    {
        return CompactionManager.instance.submitCacheWrite(getWriter(keysToSave));
    }

    public void reduceCacheSize()
    {
        if (getCapacity() > 0)
        {
            int newCapacity = (int) (DatabaseDescriptor.getReduceCacheCapacityTo() * weightedSize());

            logger.warn(String.format("Reducing %s capacity from %d to %s to reduce memory pressure",
                                      cacheType, getCapacity(), newCapacity));

            setCapacity(newCapacity);
        }
    }

    public class Writer extends CompactionInfo.Holder
    {
        private final Set keys;
        private final CompactionInfo info;
        private long keysWritten;

        protected Writer(int keysToSave)
        {
            if (keysToSave >= getKeySet().size())
                keys = getKeySet();
            else
                keys = hotKeySet(keysToSave);

            OperationType type;
            if (cacheType == CacheService.CacheType.KEY_CACHE)
                type = OperationType.KEY_CACHE_SAVE;
            else if (cacheType == CacheService.CacheType.ROW_CACHE)
                type = OperationType.ROW_CACHE_SAVE;
            else
                type = OperationType.UNKNOWN;

            info = new CompactionInfo(new CFMetaData(Table.SYSTEM_KS, cacheType.toString(), null, null, null),
                                      type,
                                      0,
                                      keys.size(),
                                      "keys");
        }

        public CacheService.CacheType cacheType()
        {
            return cacheType;
        }

        public CompactionInfo getCompactionInfo()
        {
            // keyset can change in size, thus total can too
            return info.forProgress(keysWritten, Math.max(keysWritten, keys.size()));
        }

        public void saveCache()
        {
            logger.debug("Deleting old {} files.", cacheType);
            deleteOldCacheFiles();

            if (keys.isEmpty())
            {
                logger.debug("Skipping {} save, cache is empty.", cacheType);
                return;
            }

            long start = System.currentTimeMillis();

            HashMap, SequentialWriter> writers = new HashMap, SequentialWriter>();

            try
            {
                for (K key : keys)
                {
                    Pair path = key.getPathInfo();
                    SequentialWriter writer = writers.get(path);
                    if (writer == null)
                    {
                        writer = tempCacheFile(path);
                        writers.put(path, writer);
                    }

                    try
                    {
                        cacheLoader.serialize(key, writer.stream);
                    }
                    catch (IOException e)
                    {
                        throw new FSWriteError(e, writer.getPath());
                    }

                    keysWritten++;
                }
            }
            finally
            {
                for (SequentialWriter writer : writers.values())
                    FileUtils.closeQuietly(writer);
            }

            for (Map.Entry, SequentialWriter> info : writers.entrySet())
            {
                Pair path = info.getKey();
                SequentialWriter writer = info.getValue();

                File tmpFile = new File(writer.getPath());
                File cacheFile = getCachePath(path.left, path.right, CURRENT_VERSION);

                cacheFile.delete(); // ignore error if it didn't exist
                if (!tmpFile.renameTo(cacheFile))
                    logger.error("Unable to rename " + tmpFile + " to " + cacheFile);
            }

            logger.info(String.format("Saved %s (%d items) in %d ms", cacheType, keys.size(), System.currentTimeMillis() - start));
        }

        private SequentialWriter tempCacheFile(Pair pathInfo)
        {
            File path = getCachePath(pathInfo.left, pathInfo.right, CURRENT_VERSION);
            File tmpFile = FileUtils.createTempFile(path.getName(), null, path.getParentFile());
            return SequentialWriter.open(tmpFile, true);
        }

        private void deleteOldCacheFiles()
        {
            File savedCachesDir = new File(DatabaseDescriptor.getSavedCachesLocation());

            if (savedCachesDir.exists() && savedCachesDir.isDirectory())
            {
                for (File file : savedCachesDir.listFiles())
                {
                    if (file.isFile() && file.getName().endsWith(cacheType.toString()))
                    {
                        if (!file.delete())
                            logger.warn("Failed to delete {}", file.getAbsolutePath());
                    }

                    if (file.isFile() && file.getName().endsWith(CURRENT_VERSION + ".db"))
                    {
                        if (!file.delete())
                            logger.warn("Failed to delete {}", file.getAbsolutePath());
                    }
                }
            }
        }
    }

    public interface CacheSerializer
    {
        void serialize(K key, DataOutput out) throws IOException;

        Future> deserialize(DataInputStream in, ColumnFamilyStore cfs) throws IOException;

        @Deprecated
        void load(Set buffer, ColumnFamilyStore cfs);
    }
}

2、keycache的数据生成和使用

keycache中的数据主要来源于查询,或者是系统启动的时候对持久化的keycache数据进行加载。

对于查询主要是带着rowkey条件的检索的时候,会通过SStableReader来查询对应Rowkey的Position,如果这个时候Keycache不为null,则通过keycache查询,否则走常规的indexinfo的查询,然后将查询的结果写入到keycache中。这里不需要担心如果key不存在与这个sstable中,而会进行大量的重复检索的动作,因为bloomfilter就会过滤掉百分之九十以上的这种场景。

public void cacheKey(DecoratedKey key, RowIndexEntry info)
    {
        CFMetaData.Caching caching = metadata.getCaching();

        if (caching == CFMetaData.Caching.NONE
            || caching == CFMetaData.Caching.ROWS_ONLY
            || keyCache == null
            || keyCache.getCapacity() == 0)
        {
            return;
        }

        KeyCacheKey cacheKey = new KeyCacheKey(descriptor, key.key);
        logger.trace("Adding cache entry for {} -> {}", cacheKey, info);
        keyCache.put(cacheKey, info);
    }

    public RowIndexEntry getCachedPosition(DecoratedKey key, boolean updateStats)
    {
        return getCachedPosition(new KeyCacheKey(descriptor, key.key), updateStats);
    }

    private RowIndexEntry getCachedPosition(KeyCacheKey unifiedKey, boolean updateStats)
    {
        if (keyCache != null && keyCache.getCapacity() > 0)
            return updateStats ? keyCache.get(unifiedKey) : keyCache.getInternal(unifiedKey);
        return null;
    }

    /**
     * Get position updating key cache and stats.
     * @see #getPosition(org.apache.cassandra.db.RowPosition, org.apache.cassandra.io.sstable.SSTableReader.Operator, boolean)
     */
    public RowIndexEntry getPosition(RowPosition key, Operator op)
    {
        return getPosition(key, op, true);
    }

    /**
     * @param key The key to apply as the rhs to the given Operator. A 'fake' key is allowed to
     * allow key selection by token bounds but only if op != * EQ
     * @param op The Operator defining matching keys: the nearest key to the target matching the operator wins.
     * @param updateCacheAndStats true if updating stats and cache
     * @return The index entry corresponding to the key, or null if the key is not present
     */
    public RowIndexEntry getPosition(RowPosition key, Operator op, boolean updateCacheAndStats)
    {
        // first, check bloom filter
        if (op == Operator.EQ)
        {
            assert key instanceof DecoratedKey; // EQ only make sense if the key is a valid row key
            if (!bf.isPresent(((DecoratedKey)key).key))
            {
                logger.debug("Bloom filter allows skipping sstable {}", descriptor.generation);
                return null;
            }
        }

        // next, the key cache (only make sense for valid row key)
        if ((op == Operator.EQ || op == Operator.GE) && (key instanceof DecoratedKey))
        {
            DecoratedKey decoratedKey = (DecoratedKey)key;
            KeyCacheKey cacheKey = new KeyCacheKey(descriptor, decoratedKey.key);
            RowIndexEntry cachedPosition = getCachedPosition(cacheKey, updateCacheAndStats);
            if (cachedPosition != null)
            {
                logger.trace("Cache hit for {} -> {}", cacheKey, cachedPosition);
                Tracing.trace("Key cache hit for sstable {}", descriptor.generation);
                return cachedPosition;
            }
        }

        // next, see if the sampled index says it's impossible for the key to be present
        long sampledPosition = getIndexScanPosition(key);
        if (sampledPosition == -1)
        {
            if (op == Operator.EQ && updateCacheAndStats)
                bloomFilterTracker.addFalsePositive();
            // we matched the -1th position: if the operator might match forward, we'll start at the first
            // position. We however need to return the correct index entry for that first position.
            if (op.apply(1) >= 0)
            {
                sampledPosition = 0;
            }
            else
            {
                Tracing.trace("Index sample allows skipping sstable {}", descriptor.generation);
                return null;
            }
        }

        // scan the on-disk index, starting at the nearest sampled position.
        // The check against IndexInterval is to be exit the loop in the EQ case when the key looked for is not present
        // (bloom filter false positive). But note that for non-EQ cases, we might need to check the first key of the
        // next index position because the searched key can be greater the last key of the index interval checked if it
        // is lesser than the first key of next interval (and in that case we must return the position of the first key
        // of the next interval).
        int i = 0;
        Iterator segments = ifile.iterator(sampledPosition, INDEX_FILE_BUFFER_BYTES);
        while (segments.hasNext() && i <= DatabaseDescriptor.getIndexInterval())
        {
            FileDataInput in = segments.next();
            try
            {
                while (!in.isEOF() && i <= DatabaseDescriptor.getIndexInterval())
                {
                    i++;

                    ByteBuffer indexKey = ByteBufferUtil.readWithShortLength(in);

                    boolean opSatisfied; // did we find an appropriate position for the op requested
                    boolean exactMatch; // is the current position an exact match for the key, suitable for caching

                    // Compare raw keys if possible for performance, otherwise compare decorated keys.
                    if (op == Operator.EQ)
                    {
                        opSatisfied = exactMatch = indexKey.equals(((DecoratedKey) key).key);
                    }
                    else
                    {
                        DecoratedKey indexDecoratedKey = decodeKey(partitioner, descriptor, indexKey);
                        int comparison = indexDecoratedKey.compareTo(key);
                        int v = op.apply(comparison);
                        opSatisfied = (v == 0);
                        exactMatch = (comparison == 0);
                        if (v < 0)
                        {
                            Tracing.trace("Partition index lookup allows skipping sstable {}", descriptor.generation);
                            return null;
                        }
                    }

                    if (opSatisfied)
                    {
                        // read data position from index entry
                        RowIndexEntry indexEntry = RowIndexEntry.serializer.deserialize(in, descriptor.version);
                        if (exactMatch && updateCacheAndStats)
                        {
                            assert key instanceof DecoratedKey; // key can be == to the index key only if it's a true row key
                            DecoratedKey decoratedKey = (DecoratedKey)key;

                            if (logger.isTraceEnabled())
                            {
                                // expensive sanity check!  see CASSANDRA-4687
                                FileDataInput fdi = dfile.getSegment(indexEntry.position);
                                DecoratedKey keyInDisk = SSTableReader.decodeKey(partitioner, descriptor, ByteBufferUtil.readWithShortLength(fdi));
                                if (!keyInDisk.equals(key))
                                    throw new AssertionError(String.format("%s != %s in %s", keyInDisk, key, fdi.getPath()));
                                fdi.close();
                            }

                            // store exact match for the key
                            cacheKey(decoratedKey, indexEntry);
                        }
                        if (op == Operator.EQ && updateCacheAndStats)
                            bloomFilterTracker.addTruePositive();
                        Tracing.trace("Partition index lookup complete for sstable {}", descriptor.generation);
                        return indexEntry;
                    }

                    RowIndexEntry.serializer.skip(in, descriptor.version);
                }
            }
            catch (IOException e)
            {
                markSuspect();
                throw new CorruptSSTableException(e, in.getPath());
            }
            finally
            {
                FileUtils.closeQuietly(in);
            }
        }

        if (op == Operator.EQ && updateCacheAndStats)
            bloomFilterTracker.addFalsePositive();
        Tracing.trace("Partition index lookup complete (bloom filter false positive) for sstable {}", descriptor.generation);
        return null;
    }


3、keycache的持久化

主要是AutoSavingCache中的writer内部类

keycache的持久化,不是为了使keycache的容量变大,使用中内存中存不下的数据放入磁盘中,而是为了启动的时候,迅速的加载keycache数据,这样便于启动以后的查询性能提升。

主要实现在AutoSavingCache中,在KeyCache生成的时候,就会提交一个后台进程任务,每隔key_cache_save_period(4 hours)就会将进行持久化。

 如果keycache中的数据小于key_cache_keys_to_save(100),则全部持久化,如果大于,则从新到旧取出对应的key_cache_keys_to_save的数据进行持久化。

每次保存keycache的时候,就需要先删除旧的cache文件,然后生成cache文件,文件存放的目录为cassandra.yaml文件中的saved_caches_directory配置项,文件命名为“ksName - cfName- cacheType - version.db",然后再将keycache中选择出来的数据序列化到对应的文件中。

每个CF对应的cache文件都不一样,所以需要分别生成对应的文件,然后逐个遍历keycache,分别写入对应的sstable中。


4、keycache的持久化文件的加载

主要是AutoSavingCache中的public int loadSaved(ColumnFamilyStore cfs)方法

主要是每个CF在加载内存结构ColumnFamilyStore的时候就会加载keycache的持久化文件,但是如果没有cache版本的时候(即ksName - cfName- cacheType.db文件),加载keycache文件,主要只是加载rowkey,然后通过load方法,一个一个rowkey通过查询的方法,查询出RowIndexEngtry对象,然后加载到keycache集合中。

如果有cache版本(即ksName - cfName- cacheType - version.db)的文件,直接反序列化对应的KeyCacheKey和RowIndexEntry对象即可。

public void load(Set buffers, ColumnFamilyStore cfs)
        {
            for (ByteBuffer key : buffers)
            {
                DecoratedKey dk = cfs.partitioner.decorateKey(key);

                for (SSTableReader sstable : cfs.getSSTables())
                {
                    RowIndexEntry entry = sstable.getPosition(dk, Operator.EQ);
                    if (entry != null)
                        keyCache.put(new KeyCacheKey(sstable.descriptor, key), entry);
                }
            }
        }


你可能感兴趣的:(cassandra row的快速检索3—keycache)