Heritrix 3.1.0 源码解析(九)

Heritrix3.1.0系统里面Frontier组件管理链接队列,采用的是BDB数据库,利用BDB数据库来存储CrawlURI对象,首先我们来看Heritrix3.1.0是怎么实现BDB模块的

我们知道,创建BDB数据库首先要构建数据库环境,Heritrix3.1.0的BDB模块里面EnhancedEnvironment类实现了对BDB数据库环境的封装(继承自je的Environment),如果你不熟悉BDB数据库,可以先google一下吧

EnhancedEnvironment类的源码如下:

/**

 * Version of BDB_JE Environment with additional convenience features, such as

 * a shared, cached StoredClassCatalog. (Additional convenience caching of 

 * Databases and StoredCollections may be added later.)

 * 

 * @author gojomo

 */

public class EnhancedEnvironment extends Environment {

    StoredClassCatalog classCatalog; 

    Database classCatalogDB;

    

    /**

     * Constructor

     * 

     * @param envHome directory in which to open environment

     * @param envConfig config options

     * @throws DatabaseException

     */

    public EnhancedEnvironment(File envHome, EnvironmentConfig envConfig) throws DatabaseException {

        super(envHome, envConfig);

    }



    /**

     * Return a StoredClassCatalog backed by a Database in this environment,

     * either pre-existing or created (and cached) if necessary.

     * 

     * @return the cached class catalog

     */

    public StoredClassCatalog getClassCatalog() {

        if(classCatalog == null) {

            DatabaseConfig dbConfig = new DatabaseConfig();

            dbConfig.setAllowCreate(true);

            dbConfig.setReadOnly(this.getConfig().getReadOnly());

            try {

                classCatalogDB = openDatabase(null, "classCatalog", dbConfig);

                classCatalog = new StoredClassCatalog(classCatalogDB);

            } catch (DatabaseException e) {

                // TODO Auto-generated catch block

                throw new RuntimeException(e);

            }

        }

        return classCatalog;

    }



    @Override

    public synchronized void close() throws DatabaseException {

        if(classCatalogDB!=null) {

            classCatalogDB.close();

        }

        super.close();

    }



    /**

     * Create a temporary test environment in the given directory.

     * @param dir target directory

     * @return EnhancedEnvironment

     */

    public static EnhancedEnvironment getTestEnvironment(File dir) {

        EnvironmentConfig envConfig = new EnvironmentConfig();

        envConfig.setAllowCreate(true);

        envConfig.setTransactional(false);

        EnhancedEnvironment env;

        try {

            env = new EnhancedEnvironment(dir, envConfig);

        } catch (DatabaseException e) {

            throw new RuntimeException(e);

        } 

        return env;

    }

}

从该类源码可以看到,除了实现je的Environment功能外,还增加了StoredClassCatalog getClassCatalog()方法,是BDB存储自定义对象需要用到的,里面同时创建了classCatalogDB库用来构建StoredClassCatalog对象

那么 我们要创建以及操作BDB数据库是哪里实现的呢,接下来就是要分析的BdbModule类了(BdbModule类实现了一系列的接口,这部分暂时不具体解释)

BdbModule类的源码有点长,我这里就不贴出来了,只在分析时贴出相关代码

    private static class DatabasePlusConfig implements Serializable {

        private static final long serialVersionUID = 1L;

        public transient Database database;

        public BdbConfig config;

    }

    

    

    /**

     * Configuration object for databases.  Needed because 

     * {@link DatabaseConfig} is not serializable.  Also it prevents invalid

     * configurations.  (All databases opened through this module must be

     * deferred-write, because otherwise they can't sync(), and you can't

     * run a checkpoint without doing sync() first.)

     * 

     * @author pjack

     *

     */

    public static class BdbConfig implements Serializable {

        private static final long serialVersionUID = 1L;



        boolean allowCreate;

        boolean sortedDuplicates;

        boolean transactional;

        boolean deferredWrite = true; 



        public BdbConfig() {

        }





        public boolean isAllowCreate() {

            return allowCreate;

        }





        public void setAllowCreate(boolean allowCreate) {

            this.allowCreate = allowCreate;

        }





        public boolean getSortedDuplicates() {

            return sortedDuplicates;

        }





        public void setSortedDuplicates(boolean sortedDuplicates) {

            this.sortedDuplicates = sortedDuplicates;

        }



        public DatabaseConfig toDatabaseConfig() {

            DatabaseConfig result = new DatabaseConfig();

            result.setDeferredWrite(deferredWrite);

            result.setTransactional(transactional);

            result.setAllowCreate(allowCreate);

            result.setSortedDuplicates(sortedDuplicates);

            return result;

        }





        public boolean isTransactional() {

            return transactional;

        }





        public void setTransactional(boolean transactional) {

            this.transactional = transactional;

        }





        public void setDeferredWrite(boolean b) {

            this.deferredWrite = true; 

        }

    }

上面部分是静态类DatabasePlusConfig和BdbConfig,前者是私有的,只能在BdbModule类创建,后者是公有的,可以在外部创建 

显然,静态类DatabasePlusConfig除了Database database成员变量外,还有静态类BdbConfig的成员变量BdbConfig config

静态类BdbConfig是对BDB数据库配置的封装,我们从它的属性可以看到,通过设置里面的属性后,从它的DatabaseConfig toDatabaseConfig()方法返回BDB数据库配置对象

  public DatabaseConfig toDatabaseConfig() {

            DatabaseConfig result = new DatabaseConfig();

            result.setDeferredWrite(deferredWrite);

            result.setTransactional(transactional);

            result.setAllowCreate(allowCreate);

            result.setSortedDuplicates(sortedDuplicates);

            return result;

        }

 BdbModule源码下面部分为BDB数据库环境属性设置,在后面的BDB数据库环境实例化方法里面用到了这些参数

protected ConfigPath dir = new ConfigPath("bdbmodule subdirectory","state");

    public ConfigPath getDir() {

        return dir;

    }

    public void setDir(ConfigPath dir) {

        this.dir = dir;

    }

    

    int cachePercent = -1;

    public int getCachePercent() {

        return cachePercent;

    }

    public void setCachePercent(int cachePercent) {

        this.cachePercent = cachePercent;

    }



    boolean useSharedCache = true; 

    public boolean getUseSharedCache() {

        return useSharedCache;

    }

    public void setUseSharedCache(boolean useSharedCache) {

        this.useSharedCache = useSharedCache;

    }

    

    /**

     * Expected number of concurrent threads; used to tune nLockTables

     * according to JE FAQ

     * http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33

     */

    int expectedConcurrency = 64;

    public int getExpectedConcurrency() {

        return expectedConcurrency;

    }

    public void setExpectedConcurrency(int expectedConcurrency) {

        this.expectedConcurrency = expectedConcurrency;

    }

    

    /**

     * Whether to use hard-links to log files to collect/retain

     * the BDB log files needed for a checkpoint. Default is true. 

     * May not work on Windows (especially on pre-NTFS filesystems). 

     * If false, the BDB 'je.cleaner.expunge' value will be set to 

     * 'false', as well, meaning BDB will *not* delete obsolete JDB

     * files, but only rename the '.DEL'. They will have to be 

     * manually deleted to free disk space, but .DEL files referenced

     * in any checkpoint's 'jdbfiles.manifest' should be retained to

     * keep the checkpoint valid. 

     */

    boolean useHardLinkCheckpoints = true;

    public boolean getUseHardLinkCheckpoints() {

        return useHardLinkCheckpoints;

    }

    public void setUseHardLinkCheckpoints(boolean useHardLinkCheckpoints) {

        this.useHardLinkCheckpoints = useHardLinkCheckpoints;

    }

    

    private transient EnhancedEnvironment bdbEnvironment;

        

    private transient StoredClassCatalog classCatalog;

下面需要注意的是两个成员变量比较重要

@SuppressWarnings("rawtypes")

    private Map<String,ObjectIdentityCache> oiCaches = 

        new ConcurrentHashMap<String,ObjectIdentityCache>();



    private Map<String,DatabasePlusConfig> databases =

        new ConcurrentHashMap<String,DatabasePlusConfig>();

两者都是map类型的变量成员,可以理解为map容器,前者保存的是缓存管理的对象(BdbFrontier模块里面用来管理工作队列缓存),后者是DatabasePlusConfig对象,对外提供BDB数据库实例

我们看它的初始化方法start(该方法是spring框架里面的Lifecycle接口方法,BdbModule实现了该接口)

public synchronized void start() {

        if (isRunning()) {

            return;

        }

        

        isRunning = true;

        

        try {

            boolean isRecovery = false; 

            if(recoveryCheckpoint!=null) {

                isRecovery = true; 

                doRecover(); 

            }

   

            setup(getDir().getFile(), !isRecovery);

        } catch (DatabaseException e) {

            throw new IllegalStateException(e);

        } catch (IOException e) {

            throw new IllegalStateException(e);

        }

    }

doRecover()方法用于从断点恢复,setup(getDir().getFile(), !isRecovery);用于实初始化数据库环境的封装对象EnhancedEnvironment和StoredClassCatalog对象

protected void setup(File f, boolean create) 

    throws DatabaseException, IOException {

        EnvironmentConfig config = new EnvironmentConfig();

        config.setAllowCreate(create);

        config.setLockTimeout(75, TimeUnit.MINUTES); // set to max

        if(getCachePercent()>0) {

            config.setCachePercent(getCachePercent());

        }

        config.setSharedCache(getUseSharedCache());

        

        // we take the advice literally from...

        // http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33

        long nLockTables = getExpectedConcurrency()-1;

        while(!BigInteger.valueOf(nLockTables).isProbablePrime(Integer.MAX_VALUE)) {

            nLockTables--;

        }

        config.setConfigParam("je.lock.nLockTables", Long.toString(nLockTables));

        

        // triple this value to 6K because stats show many faults

        config.setConfigParam("je.log.faultReadSize", "6144"); 



        if(!getUseHardLinkCheckpoints()) {

            // to support checkpoints by textual manifest only, 

            // prevent BDB's cleaner from deleting log files

            config.setConfigParam("je.cleaner.expunge", "false");

        } // else leave whatever other setting was already in place



        org.archive.util.FileUtils.ensureWriteableDirectory(f);

        this.bdbEnvironment = new EnhancedEnvironment(f, config);

        this.classCatalog = this.bdbEnvironment.getClassCatalog();

        if(!create) {

            // freeze last log file -- so that originating checkpoint isn't fouled

            DbBackup dbBackup = new DbBackup(bdbEnvironment);

            dbBackup.startBackup();

            dbBackup.endBackup();

        }

    }

打开数据库的方法是openDatabase(String name, BdbConfig config, boolean usePriorData) 

/**

     * Open a Database inside this BdbModule's environment, and 

     * remember it for automatic close-at-module-stop. 

     * 

     * @param name

     * @param config

     * @param usePriorData

     * @return

     * @throws DatabaseException

     */

    public Database openDatabase(String name, BdbConfig config, boolean usePriorData) 

    throws DatabaseException {

        if (bdbEnvironment == null) {

            // proper initialization hasn't occurred

            throw new IllegalStateException("BdbModule not started");

        }

        if (databases.containsKey(name)) {

            DatabasePlusConfig dpc = databases.get(name);

            if(dpc.config == config) {

                // object-identical configs: OK to share DB

                return dpc.database;

            }

            // unshared config object: might be name collision; error

            throw new IllegalStateException("Database already exists: " +name);

        }

        

        DatabasePlusConfig dpc = new DatabasePlusConfig();

        if (!usePriorData) {

            try {

                bdbEnvironment.truncateDatabase(null, name, false);

            } catch (DatabaseNotFoundException e) {

                // Ignored

            }

        }

        dpc.database = bdbEnvironment.openDatabase(null, name, config.toDatabaseConfig());

        dpc.config = config;

        databases.put(name, dpc);

        return dpc.database;

    }

 在调用该方法时先判断Map<String,DatabasePlusConfig> databases成员变量里面有没有保存,然后再创建

下面的方法是返回StoredQueue队列,StoredQueue队列里面保存的类型为参数里面的Class<K> clazz,数据库配置是StoredQueue.databaseConfig()(StoredQueue本身的)

 public <K extends Serializable> StoredQueue<K> getStoredQueue(String dbname, Class<K> clazz, boolean usePriorData) {

        try {

            Database queueDb;

            queueDb = openDatabase(dbname,

                    StoredQueue.databaseConfig(), usePriorData);

            return new StoredQueue<K>(queueDb, clazz, getClassCatalog());

        } catch (DatabaseException e) {

            throw new RuntimeException(e);

        }

        

    }

在实例化StoredQueue队列时,传入的StoredClassCatalog对象用于创建EntryBinding<E>类型的对象(比如Heritrix里面有KryoBinding<K>类型的)(用于可序列化化类到BDB数据类型的转换,K为可序列化类型对象 <K extends Serializable>)

这里有必要看来一段插曲,进去看看StoredQueue类的源码,StoredQueue继承自AbstractQueue<E>,实现了用BDB数据库存储队列成员的队列操作

/**

 * Queue backed by a JE Collections StoredSortedMap. 

 * 

 * @author gojomo

 *

 * @param <E>

 */

public class StoredQueue<E extends Serializable> extends AbstractQueue<E>  {

    @SuppressWarnings("unused")

    private static final Logger logger =

        Logger.getLogger(StoredQueue.class.getName());



    transient StoredSortedMap<Long,E> queueMap; // Long -> E

    transient Database queueDb; // Database

    AtomicLong tailIndex; // next spot for insert

    transient volatile E peekItem = null;

    

    /**

     * Create a StoredQueue backed by the given Database. 

     * 

     * The Class of values to be queued may be provided; there is only a 

     * benefit when a primitive type is specified. A StoredClassCatalog

     * must be provided if a primitive type is not supplied. 

     * 

     * @param db

     * @param clsOrNull 

     * @param classCatalog

     */

    public StoredQueue(Database db, Class<E> clsOrNull, StoredClassCatalog classCatalog) {

        hookupDatabase(db, clsOrNull, classCatalog);

        tailIndex = new AtomicLong(queueMap.isEmpty() ? 0L : queueMap.lastKey()+1);

    }



    /**

     * @param db

     * @param clsOrNull

     * @param classCatalog

     */

    public void hookupDatabase(Database db, Class<E> clsOrNull, StoredClassCatalog classCatalog) {

        EntryBinding<E> valueBinding = TupleBinding.getPrimitiveBinding(clsOrNull);

        if(valueBinding == null) {

            valueBinding = new SerialBinding<E>(classCatalog, clsOrNull);

        }

        queueDb = db;

        queueMap = new StoredSortedMap<Long,E>(

                db,

                TupleBinding.getPrimitiveBinding(Long.class),

                valueBinding,

                true);

    }



    @Override

    public Iterator<E> iterator() {

        return queueMap.values().iterator();

    }



    @Override

    public int size() {

        try {

            return Math.max(0, 

                    (int)(tailIndex.get() 

                          - queueMap.firstKey())); 

        } catch (IllegalStateException ise) {

            return 0; 

        } catch (NoSuchElementException nse) {

            return 0;

        } catch (NullPointerException npe) {

            return 0;

        }

    }

    

    @Override

    public boolean isEmpty() {

        if(peekItem!=null) {

            return false;

        }

        try {

            return queueMap.isEmpty();

        } catch (IllegalStateException de) {

            return true;

        }

    }



    public boolean offer(E o) {

        long targetIndex = tailIndex.getAndIncrement();

        queueMap.put(targetIndex, o);

        return true;

    }



    public synchronized E peek() {

        if(peekItem == null) {

            if(queueMap.isEmpty()) {

                return null; 

            }

            peekItem = queueMap.remove(queueMap.firstKey());

        }

        return peekItem; 

    }



    public synchronized E poll() {

        E head = peek();

        peekItem = null;

        return head; 

    }



    /**

     * A suitable DatabaseConfig for the Database backing a StoredQueue. 

     * (However, it is not necessary to use these config options.)

     * 

     * @return DatabaseConfig suitable for queue

     */

    public static BdbModule.BdbConfig databaseConfig() {

        BdbModule.BdbConfig dbConfig = new BdbModule.BdbConfig();

        dbConfig.setTransactional(false);

        dbConfig.setAllowCreate(true);

        return dbConfig;

    }

    

    public void close() {

        try {

            queueDb.sync();

            queueDb.close();

        } catch (DatabaseException e) {

            throw new RuntimeException(e);

        }

    }

}

je封装了StoredSortedMap<Long,E>类型的类用于操作管理BDB数据库里面的数据,至此,我们可以将StoredQueue对象理解为数据存储在BDB数据库(里面经过StoredSortedMap的封装)的队列(queue)

后面的部分为缓存管理(管理实现了IdentityCacheable接口的对象的缓存,如BdbWorkQueue类间接实现了该接口,从而实现了工作队列对象的缓存的管理;其实ObjectIdentityBdbManualCache对象本身的缓存也是通过BDB数据库存储的)

 /**

     * Get an ObjectIdentityBdbCache, backed by a BDB Database of the 

     * given name, with the given value class type. If 'recycle' is true,

     * reuse values already in the database; otherwise start with an 

     * empty cache. 

     *  

     * @param <V>

     * @param dbName

     * @param recycle

     * @param valueClass

     * @return

     * @throws DatabaseException

     */

    public <V extends IdentityCacheable> ObjectIdentityBdbManualCache<V> getOIBCCache(String dbName, boolean recycle,

            Class<? extends V> valueClass) 

    throws DatabaseException {

        if (!recycle) {

            try {

                bdbEnvironment.truncateDatabase(null, dbName, false);

            } catch (DatabaseNotFoundException e) {

                // ignored

            }

        }

        ObjectIdentityBdbManualCache<V> oic = new ObjectIdentityBdbManualCache<V>();

        oic.initialize(bdbEnvironment, dbName, valueClass, classCatalog);

        oiCaches.put(dbName, oic);

        return oic;

    }

  

    public <V extends IdentityCacheable> ObjectIdentityCache<V> getObjectCache(String dbName, boolean recycle,

            Class<V> valueClass) 

    throws DatabaseException {

        return getObjectCache(dbName, recycle, valueClass, valueClass);

    }

    

    /**

     * Get an ObjectIdentityCache, backed by a BDB Database of the given 

     * name, with objects of the given valueClass type. If 'recycle' is

     * true, reuse values already in the database; otherwise start with 

     * an empty cache. 

     * 

     * @param <V>

     * @param dbName

     * @param recycle

     * @param valueClass

     * @return

     * @throws DatabaseException

     */

    public <V extends IdentityCacheable> ObjectIdentityCache<V> getObjectCache(String dbName, boolean recycle,

            Class<V> declaredClass, Class<? extends V> valueClass) 

    throws DatabaseException {

        @SuppressWarnings("unchecked")

        ObjectIdentityCache<V> oic = oiCaches.get(dbName);

        if(oic!=null) {

            return oic; 

        }

        oic =  getOIBCCache(dbName, recycle, valueClass);

        return oic; 

    }

再后面部分为设置断点及从断点恢复

public void doCheckpoint(Checkpoint checkpointInProgress) throws IOException {

        // First sync objectCaches

        for (@SuppressWarnings("rawtypes") ObjectIdentityCache oic : oiCaches.values()) {

            oic.sync();

        }



        try {

            // sync all databases

            for (DatabasePlusConfig dbc: databases.values()) {

                dbc.database.sync();

            }

        

            // Do a force checkpoint.  Thats what a sync does (i.e. doSync).

            CheckpointConfig chkptConfig = new CheckpointConfig();

            chkptConfig.setForce(true);

            

            // Mark Hayes of sleepycat says:

            // "The default for this property is false, which gives the current

            // behavior (allow deltas).  If this property is true, deltas are

            // prohibited -- full versions of internal nodes are always logged

            // during the checkpoint. When a full version of an internal node

            // is logged during a checkpoint, recovery does not need to process

            // it at all.  It is only fetched if needed by the application,

            // during normal DB operations after recovery. When a delta of an

            // internal node is logged during a checkpoint, recovery must

            // process it by fetching the full version of the node from earlier

            // in the log, and then applying the delta to it.  This can be

            // pretty slow, since it is potentially a large amount of

            // random I/O."

            // chkptConfig.setMinimizeRecoveryTime(true);

            bdbEnvironment.checkpoint(chkptConfig);

            LOGGER.fine("Finished bdb checkpoint.");

        

            DbBackup dbBackup = new DbBackup(bdbEnvironment);

            try {

                dbBackup.startBackup();

                

                File envCpDir = new File(dir.getFile(),checkpointInProgress.getName());

                org.archive.util.FileUtils.ensureWriteableDirectory(envCpDir);

                File logfilesList = new File(envCpDir,"jdbfiles.manifest");

                String[] filedata = dbBackup.getLogFilesInBackupSet();

                for (int i=0; i<filedata.length;i++) {

                    File f = new File(dir.getFile(),filedata[i]);

                    filedata[i] += ","+f.length();

                    if(getUseHardLinkCheckpoints()) {

                        File hardLink = new File(envCpDir,filedata[i]);

                        if (!FilesystemLinkMaker.makeHardLink(f.getAbsolutePath(), hardLink.getAbsolutePath())) {

                            LOGGER.log(Level.SEVERE, "unable to create required checkpoint link "+hardLink); 

                        }

                    }

                }

                FileUtils.writeLines(logfilesList,Arrays.asList(filedata));

                LOGGER.fine("Finished processing bdb log files.");

            } finally {

                dbBackup.endBackup();

            }

        } catch (DatabaseException e) {

            throw new IOException(e);

        }

    }

    

    @SuppressWarnings("unchecked")

    protected void doRecover() throws IOException {

        File cpDir = new File(dir.getFile(),recoveryCheckpoint.getName());

        File logfilesList = new File(cpDir,"jdbfiles.manifest");

        List<String> filesAndLengths = FileUtils.readLines(logfilesList);

        HashMap<String,Long> retainLogfiles = new HashMap<String,Long>();

        for(String line : filesAndLengths) {

            String[] fileAndLength = line.split(",");

            long expectedLength = Long.valueOf(fileAndLength[1]);

            retainLogfiles.put(fileAndLength[0],expectedLength);

            

            // check for files in checkpoint directory; relink to environment as necessary

            File cpFile = new File(cpDir, line);

            File destFile = new File(dir.getFile(), fileAndLength[0]);

            if(cpFile.exists()) {

                if(cpFile.length()!=expectedLength) {

                    LOGGER.warning(cpFile.getName()+" expected "+expectedLength+" actual "+cpFile.length());

                    // TODO: is truncation necessary? 

                }

                if(destFile.exists()) {

                    if(!destFile.delete()) {

                        LOGGER.log(Level.SEVERE, "unable to delete obstructing file "+destFile);  

                    }

                }

                int status = CLibrary.INSTANCE.link(cpFile.getAbsolutePath(), destFile.getAbsolutePath());

                if (status!=0) {

                    LOGGER.log(Level.SEVERE, "unable to create required restore link "+destFile); 

                }

            }

            

        }

        

        IOFileFilter filter = FileFilterUtils.orFileFilter(

                FileFilterUtils.suffixFileFilter(".jdb"), 

                FileFilterUtils.suffixFileFilter(".del"));

        filter = FileFilterUtils.makeFileOnly(filter);

        

        // reverify environment directory is as it was at checkpoint time, 

        // deleting any extra files

        for(File f : dir.getFile().listFiles((FileFilter)filter)) {

            if(retainLogfiles.containsKey(f.getName())) {

                // named file still exists under original name

                long expectedLength = retainLogfiles.get(f.getName());

                if(f.length()!=expectedLength) {

                    LOGGER.warning(f.getName()+" expected "+expectedLength+" actual "+f.length());

                    // TODO: truncate? this unexpected length mismatch

                    // probably only happens if there was already a recovery

                    // where the affected file was the last of the set, in 

                    // which case BDB appends a small amount of (harmless?) data

                    // to the previously-undersized file

                }

                retainLogfiles.remove(f.getName()); 

                continue;

            }

            // file as now-named not in restore set; check if un-".DEL" renaming needed

            String undelName = f.getName().replace(".del", ".jdb");

            if(retainLogfiles.containsKey(undelName)) {

                // file if renamed matches desired file name

                long expectedLength = retainLogfiles.get(undelName);

                if(f.length()!=expectedLength) {

                    LOGGER.warning(f.getName()+" expected "+expectedLength+" actual "+f.length());

                    // TODO: truncate to expected size?

                }

                if(!f.renameTo(new File(f.getParentFile(),undelName))) {

                    throw new IOException("Unable to rename " + f + " to " +

                            undelName);

                }

                retainLogfiles.remove(undelName); 

            }

            // file not needed; delete/move-aside

            if(!f.delete()) {

                LOGGER.warning("unable to delete "+f);

                org.archive.util.FileUtils.moveAsideIfExists(f);

            }

            // TODO: log/warn of ruined later checkpoints? 

        }

        if(retainLogfiles.size()>0) {

            // some needed files weren't present

            LOGGER.severe("Checkpoint corrupt, needed log files missing: "+retainLogfiles);

        }

        

    }

最后还有getStoredMap(String dbName, Class<K> keyClass, Class<V> valueClass, boolean allowDuplicates, boolean usePriorData)方法。用于创建临时的DisposableStoredSortedMap<K,V>对象(继承自je的StoredSortedMap,可以理解为存储在BDB数据库的(经过StoredSortedMap封装)临时的map容器, Class<K> keyClass, Class<V> valueClass参数为key和value的类型)

/**

     * Creates a database-backed TempStoredSortedMap for transient 

     * reporting requirements. Calling the returned map's destroy()

     * method when done discards the associated Database. 

     * 

     * @param <K>

     * @param <V>

     * @param dbName Database name to use; if null a name will be synthesized

     * @param keyClass Class of keys; should be a Java primitive type

     * @param valueClass Class of values; may be any serializable type

     * @param allowDuplicates whether duplicate keys allowed

     * @return

     */

    public <K,V> DisposableStoredSortedMap<K, V> getStoredMap(String dbName, Class<K> keyClass, Class<V> valueClass, boolean allowDuplicates, boolean usePriorData) {

        BdbConfig config = new BdbConfig(); 

        config.setSortedDuplicates(allowDuplicates);

        config.setAllowCreate(!usePriorData); 

        Database mapDb;

        if(dbName==null) {

            dbName = "tempMap-"+System.identityHashCode(this)+"-"+sn;

            sn++;

        }

        final String openName = dbName; 

        try {

            mapDb = openDatabase(openName,config,usePriorData);

        } catch (DatabaseException e) {

            throw new RuntimeException(e); 

        } 

        EntryBinding<V> valueBinding = TupleBinding.getPrimitiveBinding(valueClass);

        if(valueBinding == null) {

            valueBinding = new SerialBinding<V>(classCatalog, valueClass);

        }

        DisposableStoredSortedMap<K,V> storedMap = new DisposableStoredSortedMap<K, V>(

                mapDb,

                TupleBinding.getPrimitiveBinding(keyClass),

                valueBinding,

                true) {

                    @Override

                    public void dispose() {

                        super.dispose();

                        DatabasePlusConfig dpc = BdbModule.this.databases.remove(openName);

                        if (dpc == null) {

                            BdbModule.LOGGER.log(Level.WARNING,"No such database: " + openName);

                        }

                    }

        };

        return storedMap; 

    }

    

经过本文分析,我们还有很多疑问,待后文再继续吧

---------------------------------------------------------------------------

本系列Heritrix 3.1.0 源码解析系本人原创

转载请注明出处 博客园 刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/14/3019757.html 

你可能感兴趣的:(Heritrix)