Solr5.3.1流程整理 2
1. 主要的action
/$[new_core]/update/extract, /$[new_core]/update/,
其中$[new_core],为实际的core
2. 配置文件 conf/solrconfig.xml 中,对应以下
<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
源文件位置:
solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java
3. 主Servlet — SolrDispatchFilter,
abstract interface javax.servlet.Filter ...
public abstract void init(FilterConfig arg0) throws ServletException;
public abstract void doFilter(ServletRequest arg0, ServletResponse arg1, FilterChain arg2) throws...
public abstract void destroy();
abstract class BaseSolrFilter implements Filter ...
只包含了一个static 静态代码块 (日志配置 相关)
SolrDispatchFilter extends BaseSolrFilter ...
public void init(FilterConfig config) ...
|-> o.a.s.c.CoreContainer
cores = new CoreContainer(nodeConfig, extraProperties, true);
cores.load();
|-> org.apache.solr.handler.component.HttpShardHandlerFactory
|-> org.apache.solr.update.UpdateShardHandler
|-> zkSys.initZooKeeper(this, solrHome, cfg.getCloudConfig());
|-> o.a.s.c.CoresLocator
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) ...
|-> HttpSolrCall call = getHttpSolrCall(...
|-> Action result = call.call();
|-> private void init() throws...
|-> handler = cores.getRequestHandler(path);
public void destroy() {...
|-> cores.shutdown();
DirectUpdateHandler2
SolrCoreState
CachingDirectoryFactory
NRTCachingDirectoryFactory
ContextHandler
4. 类继承关系
SolrInfoMBean interface
SolrRequestHandler extends SolrInfoMBean
RequestHandlerBase implements SolrRequestHandler, SolrInfoMBean, NestedRequestHandler
ContentStreamHandlerBase extends RequestHandlerBase
ExtractingRequestHandler extends ContentStreamHandlerBase implements SolrCoreAware
主要方法:
interface SolrRequestHandler
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp);
abstract class RequestHandlerBase implements SolrRequestHandler...
public abstract void handleRequestBody( SolrQueryRequest req, SolrQueryResponse rsp ) throws Exception;
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {...
|-> handleRequestBody( req, rsp );
5. 数据相关的方法
abstract class ContentStreamHandlerBase extends RequestHandlerBase ...
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {...
|-> ContentStreamLoader documentLoader = newLoader(req, processor);
|-> documentLoader.load(req, rsp, stream, processor);
ExtractingRequestHandler extends ContentStreamHandlerBase ...
protected ContentStreamLoader newLoader(SolrQueryRequest req, UpdateRequestProcessor processor) {...
|-> new ExtractingDocumentLoader(req, processor, config, factory);
以上可以看到,在重载的handler中,创建了与之相关的Document类, 并且由这个类的load方法,具体处理输入的content
接下来,需要看一下这个文档的loader具体做了哪些。
abstract class ContentStreamLoader ...
public abstract void load(SolrQueryRequest req,
SolrQueryResponse rsp,
ContentStream stream,
UpdateRequestProcessor processor) throws Exception;
ExtractingDocumentLoader extends ContentStreamLoader ...
public void load(SolrQueryRequest req, SolrQueryResponse rsp, ...
主要被调用的方法有:(准备以后详细分析)
1)
SolrContentHandler handler = factory.createSolrContentHandler(metadata, params, req.getSchema());
ContentHandler parsingHandler = handler;
2)
parser.parse(inputStream, parsingHandler, metadata, context);
tika-core-1.7.jar (以前是lucene的一个子项目)
org.apache.tika.parser.Parser
3)
addDoc(handler);
|-> processor.processAdd(template);