看看里面的成员变量:
public static final String DIR_NAME = "fetchlist";//要写入磁盘的目录 private final static byte CUR_VERSION = 2;//当前的版本号 private boolean fetch;//是否抓取以便以后更新 private Page page;//当前抓取的页面 private String[] anchors;//抓取到的该页面包含的链接
byte version = in.readByte(); // read version if (version > CUR_VERSION) // check version throw new VersionMismatchException(CUR_VERSION, version); fetch = in.readByte() != 0; // read fetch flag page = Page.read(in); // read page if (version > 1) { // anchors added in version 2 anchors = new String[in.readInt()]; // read anchors for (int i = 0; i < anchors.length; i++) { anchors[i] = UTF8.readString(in); } } else { anchors = new String[0]; }
同时还提供了一个静态的读取各个字段的函数,并构建出FetchListEntry对象返回:
public static FetchListEntry read(DataInput in) throws IOException { FetchListEntry result = new FetchListEntry(); result.readFields(in); return result; }
public final void write(DataOutput out) throws IOException { out.writeByte(CUR_VERSION); // store current version out.writeByte((byte)(fetch ? 1 : 0)); // write fetch flag page.write(out); // write page out.writeInt(anchors.length); // write anchors for (int i = 0; i < anchors.length; i++) { UTF8.writeString(out, anchors[i]); } }
/********************************************* * A row in the Page Database. * <pre> * type name description * --------------------------------------------------------------- * byte VERSION - A byte indicating the version of this entry. * String URL - The url of a page. This is the primary key. * 128bit ID - The MD5 hash of the contents of the page. * 64bit DATE - The date this page should be refetched. * byte RETRIES - The number of times we've failed to fetch this page. * byte INTERVAL - Frequency, in days, this page should be refreshed. * float SCORE - Multiplied into the score for hits on this page. * float NEXTSCORE - Multiplied into the score for hits on this page. * </pre> * * @author Mike Cafarella * @author Doug Cutting *********************************************/
private final static byte CUR_VERSION = 4; private static final byte DEFAULT_INTERVAL = (byte)NutchConf.get().getInt("db.default.fetch.interval", 30); private UTF8 url; private MD5Hash md5; private long nextFetch = System.currentTimeMillis(); private byte retries; private byte fetchInterval = DEFAULT_INTERVAL; private int numOutlinks; private float score = 1.0f; private float nextScore = 1.0f;
ublic void readFields(DataInput in) throws IOException { byte version = in.readByte(); // read version if (version > CUR_VERSION) // check version throw new VersionMismatchException(CUR_VERSION, version); url.readFields(in); md5.readFields(in); nextFetch = in.readLong(); retries = in.readByte(); fetchInterval = in.readByte(); numOutlinks = (version > 2) ? in.readInt() : 0; // added in Version 3 score = (version>1) ? in.readFloat() : 1.0f; // score added in version 2 nextScore = (version>3) ? in.readFloat() : 1.0f; // 2nd score added in V4 }
public void write(DataOutput out) throws IOException { out.writeByte(CUR_VERSION); // store current version url.write(out); md5.write(out); out.writeLong(nextFetch); out.write(retries); out.write(fetchInterval); out.writeInt(numOutlinks); out.writeFloat(score); out.writeFloat(nextScore); }
各种粒度结构的读写功能,代码都比较直接,不再详述。
补充一下Content类:
public final class Content extends VersionedWritable
我们看到继承了VersionedWritable类。VersionedWritable类实现了版本字段的读写功能。
我们先看看成员变量:
public static final String DIR_NAME = "content"; private final static byte VERSION = 1; private String url; private String base; private byte[] content; private String contentType; private Properties metadata;
super.readFields(in); // check version url = UTF8.readString(in); // read url base = UTF8.readString(in); // read base content = WritableUtils.readCompressedByteArray(in); contentType = UTF8.readString(in); // read contentType int propertyCount = in.readInt(); // read metadata metadata = new Properties(); for (int i = 0; i < propertyCount; i++) { metadata.put(UTF8.readString(in), UTF8.readString(in)); }
public final void write(DataOutput out) throws IOException { super.write(out); // write version UTF8.writeString(out, url); // write url UTF8.writeString(out, base); // write base WritableUtils.writeCompressedByteArray(out, content); // write content UTF8.writeString(out, contentType); // write contentType out.writeInt(metadata.size()); // write metadata Iterator i = metadata.entrySet().iterator(); while (i.hasNext()) { Map.Entry e = (Map.Entry)i.next(); UTF8.writeString(out, (String)e.getKey()); UTF8.writeString(out, (String)e.getValue()); } }