Dissecting The Nutch Crawler -Aside: net.nutch.util.NutchConfig

 
     英文原文出处: DissectingTheNutchCrawler
  转载本文请注明出处:http://blog.csdn.net/pwlazy

Aside: net.nutch.util.NutchConfig

If you have been reading the code along with our discussion, you may have noticed several "private static final" variables at the start of the "command" class definitions. For example, net.nutch.db.WebDBInjector has these definitions for DEFAULT_INTERVAL and NEW_INJECTED_PAGE_NAME:

private static final byte DEFAULT_INTERVAL =
(byte)NutchConf.getInt("db.default.fetch.interval", 30);

private static final float NEW_INJECTED_PAGE_SCORE =
NutchConf.getFloat("db.score.injected", 2.0f);

The values are loaded by calls to net.nutch.util.NutchConf, which is, intuitively enough, a class that loads configuration files. It has two static variables, "List resourceNames" and "Properties properties".The class has several static methods to manipulate these variables. Here's a summary of its operations:

  1. resourceNames is initialized with the strings "nutch-default.xml" and "nutch-site.xml"

  2. "properties" is initially null

  3. A call to one of the "getXXX" methods results in a call to getProps(). If (properties == null), loadResource() is successively called with the values from "resourceNames".

  4. loadResource() loads each file, parses theXML, and sets values in "properties" per the config


附上 net.nutch.util.NutchConfig

如果你随着我们的讨论看代码,你会在几个与命令对应的类的开始处看到几个 "private static final"变量。例如 net.nutch.db.WebDBInjector类的DEFAULT_INTERVAL和 NEW_INJECTED_PAGE_NAME属性就有这种限制符,看以下代码:

private static final byte DEFAULT_INTERVAL =
(byte)NutchConf.getInt("db.default.fetch.interval", 30);

private static final float NEW_INJECTED_PAGE_SCORE =
NutchConf.getFloat("db.score.injected", 2.0f);

通过调用net.nutch.util.NutchConf可以加载上面那些变量的值,你完全可以凭直觉知道net.nutch.util.NutchConf就是一个加载配置文件的类。它有两个静态变量: "resourceNames(List 类型)" 和 "properties(Properties 类型)"。该类有些静态方法可以操作这些变量。以下是操作的总结:

  1. 通过"nutch-default.xml" 和  "nutch-site.xml" 初始化resourceNames
  2. properties开始是null
  3. 对getXXX方法的调用会首先调用getProps,如果properties == null,那么接着调用loadResource并传入resourceNames的各个值
  4. 针对resourceNames中定义的每个配置文件,loadResource方法回加载,然后解析,最后将解析结果植入到properties中

你可能感兴趣的:(Dissecting The Nutch Crawler -Aside: net.nutch.util.NutchConfig)