solr dataimport 数据导入源码分析(十三)

本文接下来分析EntityProcessor相关类,我们可以称之为实体处理器,针对不同的数据源有不同的实体处理器,屏蔽了不同数据源的差异

本文只介绍针对数据库数据源的实体处理器,其他实体处理器类似

EntityProcessor类为抽象类,定义了获取数据源的Map类型数据的方法(针对添加 修改 删除的数据)

/**

 * <p>

 * An instance of entity processor serves an entity. It is reused throughout the

 * import process.

 * </p>

 * <p/>

 * <p>

 * Implementations of this abstract class must provide a public no-args constructor.

 * </p>

 * <p/>

 * <p>

 * Refer to <a

 * href="http://wiki.apache.org/solr/DataImportHandler">http://wiki.apache.org/solr/DataImportHandler</a>

 * for more details.

 * </p>

 * <p/>

 * <b>This API is experimental and may change in the future.</b>

 *

 * @version $Id: EntityProcessor.java 824359 2009-10-12 14:31:54Z ehatcher $

 * @since solr 1.3

 */

public abstract class EntityProcessor {



  /**

   * This method is called when it starts processing an entity. When it comes

   * back to the entity it is called again. So it can reset anything at that point.

   * For a rootmost entity this is called only once for an ingestion. For sub-entities , this

   * is called multiple once for each row from its parent entity

   *

   * @param context The current context

   */

  public abstract void init(Context context);



  /**

   * This method helps streaming the data for each row . The implementation

   * would fetch as many rows as needed and gives one 'row' at a time. Only this

   * method is used during a full import

   *

   * @return A 'row'.  The 'key' for the map is the column name and the 'value'

   *         is the value of that column. If there are no more rows to be

   *         returned, return 'null'

   */

  public abstract Map<String, Object> nextRow();



  /**

   * This is used for delta-import. It gives the pks of the changed rows in this

   * entity

   *

   * @return the pk vs value of all changed rows

   */

  public abstract Map<String, Object> nextModifiedRowKey();



  /**

   * This is used during delta-import. It gives the primary keys of the rows

   * that are deleted from this entity. If this entity is the root entity, solr

   * document is deleted. If this is a sub-entity, the Solr document is

   * considered as 'changed' and will be recreated

   *

   * @return the pk vs value of all changed rows

   */

  public abstract Map<String, Object> nextDeletedRowKey();



  /**

   * This is used during delta-import. This gives the primary keys and their

   * values of all the rows changed in a parent entity due to changes in this

   * entity.

   *

   * @return the pk vs value of all changed rows in the parent entity

   */

  public abstract Map<String, Object> nextModifiedParentRowKey();



  /**

   * Invoked for each parent-row after the last row for this entity is processed. If this is the root-most

   * entity, it will be called only once in the import, at the very end.

   * 

   */

  public abstract void destroy();



  /**

   * Invoked after the transformers are invoked. EntityProcessors can add, remove or modify values

   * added by Transformers in this method.

   *

   * @param r The transformed row

   * @since solr 1.4

   */

  public void postTransform(Map<String, Object> r) {

  }



  /**

   * Invoked when the Entity processor is destroyed towards the end of import.

   *

   * @since solr 1.4

   */

  public void close() {

    //no-op

  }

}

继承类EntityProcessorBase是所有具体实体处理器的基类,定义了公用方法,其中最重要的是Map<String, Object> getNext(),从数据迭代器Iterator<Map<String, Object>> rowIterator获取Map类型数据记录(其中DIHCacheSupport cacheSupport对象用于缓存

protected Map<String, Object> getNext() {

    if(cacheSupport==null) {

      try {

        if (rowIterator == null)

          return null;

        if (rowIterator.hasNext())

          return rowIterator.next();

        query = null;

        rowIterator = null;

        return null;

      } catch (Exception e) {

        SolrException.log(log, "getNext() failed for query '" + query + "'", e);

        query = null;

        rowIterator = null;

        wrapAndThrow(DataImportHandlerException.WARN, e);

        return null;

      }

    } else  {

      return cacheSupport.getCacheData(context, query, rowIterator);

    }      

  }

SqlEntityProcessor类为数据库数据源的实体处理器

/**

 * <p>

 * An {@link EntityProcessor} instance which provides support for reading from

 * databases. It is used in conjunction with {@link JdbcDataSource}. This is the default

 * {@link EntityProcessor} if none is specified explicitly in data-config.xml

 * </p>

 * <p/>

 * <p>

 * Refer to <a

 * href="http://wiki.apache.org/solr/DataImportHandler">http://wiki.apache.org/solr/DataImportHandler</a>

 * for more details.

 * </p>

 * <p/>

 * <b>This API is experimental and may change in the future.</b>

 *

 * @version $Id: SqlEntityProcessor.java 1065312 2011-01-30 16:08:25Z rmuir $

 * @since solr 1.3

 */

public class SqlEntityProcessor extends EntityProcessorBase {

  private static final Logger LOG = LoggerFactory.getLogger(SqlEntityProcessor.class);

  //数据源

  protected DataSource<Iterator<Map<String, Object>>> dataSource;

  //初始化数据源

  @Override

  @SuppressWarnings("unchecked")

  public void init(Context context) {

    super.init(context);

    dataSource = context.getDataSource();

  }

  //初始化数据迭代器(根据查询语句从数据源获取)

  protected void initQuery(String q) {

    try {

      DataImporter.QUERY_COUNT.get().incrementAndGet();

      rowIterator = dataSource.getData(q);

      this.query = q;

    } catch (DataImportHandlerException e) {

      throw e;

    } catch (Exception e) {

      LOG.error( "The query failed '" + q + "'", e);

      throw new DataImportHandlerException(DataImportHandlerException.SEVERE, e);

    }

  }



  @Override

  public Map<String, Object> nextRow() {    

    if (rowIterator == null) {

      String q = getQuery();

      initQuery(context.replaceTokens(q));

    }

    return getNext();

  }



  @Override

  public Map<String, Object> nextModifiedRowKey() {

    if (rowIterator == null) {

      String deltaQuery = context.getEntityAttribute(DELTA_QUERY);

      if (deltaQuery == null)

        return null;

      initQuery(context.replaceTokens(deltaQuery));

    }

    return getNext();

  }



  @Override

  public Map<String, Object> nextDeletedRowKey() {

    if (rowIterator == null) {

      String deletedPkQuery = context.getEntityAttribute(DEL_PK_QUERY);

      if (deletedPkQuery == null)

        return null;

      initQuery(context.replaceTokens(deletedPkQuery));

    }

    return getNext();

  }



  @Override

  public Map<String, Object> nextModifiedParentRowKey() {

    if (rowIterator == null) {

      String parentDeltaQuery = context.getEntityAttribute(PARENT_DELTA_QUERY);

      if (parentDeltaQuery == null)

        return null;

      LOG.info("Running parentDeltaQuery for Entity: "

              + context.getEntityAttribute("name"));

      initQuery(context.replaceTokens(parentDeltaQuery));

    }

    return getNext();

  }

  

  public String getQuery() {

    String queryString = context.getEntityAttribute(QUERY);

    if (Context.FULL_DUMP.equals(context.currentProcess())) {

      return queryString;

    }

    if (Context.DELTA_DUMP.equals(context.currentProcess())) {

      String deltaImportQuery = context.getEntityAttribute(DELTA_IMPORT_QUERY);

      if(deltaImportQuery != null) return deltaImportQuery;

    }

    LOG.warn("'deltaImportQuery' attribute is not specified for entity : "+ entityName);

    return getDeltaImportQuery(queryString);

  }



  public String getDeltaImportQuery(String queryString) {    

    StringBuilder sb = new StringBuilder(queryString);

    if (SELECT_WHERE_PATTERN.matcher(queryString).find()) {

      sb.append(" and ");

    } else {

      sb.append(" where ");

    }

    boolean first = true;

    String[] primaryKeys = context.getEntityAttribute("pk").split(",");

    for (String primaryKey : primaryKeys) {

      if (!first) {

        sb.append(" and ");

      }

      first = false;

      Object val = context.resolve("dataimporter.delta." + primaryKey);

      if (val == null) {

        Matcher m = DOT_PATTERN.matcher(primaryKey);

        if (m.find()) {

          val = context.resolve("dataimporter.delta." + m.group(1));

        }

      }

      sb.append(primaryKey).append(" = ");

      if (val instanceof Number) {

        sb.append(val.toString());

      } else {

        sb.append("'").append(val.toString()).append("'");

      }

    }

    return sb.toString();

  }



  private static Pattern SELECT_WHERE_PATTERN = Pattern.compile(

          "^\\s*(select\\b.*?\\b)(where).*", Pattern.CASE_INSENSITIVE);



  public static final String QUERY = "query";



  public static final String DELTA_QUERY = "deltaQuery";



  public static final String DELTA_IMPORT_QUERY = "deltaImportQuery";



  public static final String PARENT_DELTA_QUERY = "parentDeltaQuery";



  public static final String DEL_PK_QUERY = "deletedPkQuery";



  public static final Pattern DOT_PATTERN = Pattern.compile(".*?\\.(.*)$");

}

 我们接下来分析EntityProcessorWrapper类,该类继承自抽象类EntityProcessor,用于装饰具体的实体处理器(装饰模式)

其重要成员如下

 //被装饰的实体处理器 

 EntityProcessor delegate;

  private DocBuilder docBuilder;



  String onError;

  Context context;

  protected VariableResolverImpl resolver;

  String entityName;



  protected List<Transformer> transformers;



  protected List<Map<String, Object>> rowcache;

在它的构造方法里面,初始化被装饰的成员对象

public EntityProcessorWrapper(EntityProcessor delegate, DocBuilder docBuilder) {

    this.delegate = delegate;

    this.docBuilder = docBuilder;

  }

初始化方法里面调用被装饰对象的初始化方法(获取数据源) 

@Override

  public void init(Context context) {

    rowcache = null;

    this.context = context;

    resolver = (VariableResolverImpl) context.getVariableResolver();

    //context has to be set correctly . keep the copy of the old one so that it can be restored in destroy

    if (entityName == null) {

      onError = resolver.replaceTokens(context.getEntityAttribute(ON_ERROR));

      if (onError == null) {

          onError = ABORT;

      }

      entityName = context.getEntityAttribute(DataConfig.NAME);

    }

    delegate.init(context);



  }

其他相关方法均为调用被装饰的具体实体处理器的相应方法,另外添加了数据转换等功能,本文不再具体分析 

---------------------------------------------------------------------------

本系列solr dataimport 数据导入源码分析系本人原创

转载请注明出处 博客园 刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/05/04/3059397.html

你可能感兴趣的:(import)