跟益达学Solr5之批量索引JSON数据

        假定你有这样一堆JSON数据,

 

[
  {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00},
  {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00},
  {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00},
  {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00},
  {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00},
  {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00},
  {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00},
  {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00},
  {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00},
  {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00},
  {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50},
  {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00},
  {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50},
  {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00},
  {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00},
  {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00}
]

   你想导入到Solr中进行索引,怎么办?其实Solr的Web UI界面就可以操作,在左侧有个Documents菜单,表示导入Document(当然也支持Document更新)的意思,Document加个s即表示支持批量导入多个Document,如图:
跟益达学Solr5之批量索引JSON数据_第1张图片
 Document Type即表示你的Document数据来源是什么,是来自于JSON,来自于XML,来自于CVS等等,

 

 Commit Within表示document提交必须在指定的毫秒数内完成,否则提交操作视为超时;

 Overwriter即表示是否覆盖索引目录下已有的索引数据,设置为false即表示不覆盖已有索引只在原来的基础上追加索引数据;

 Boost:表示设置Document的权重,默认值为1.0;

 如果你只是单个JSON对象需要导入,那直接选择Document Type为JSON即可,当你选择Document Type为JSON后,Document(s)文本框会提示一个示例,如图:
跟益达学Solr5之批量索引JSON数据_第2张图片
 当然你也可以选择
Document Type为Solr Command(raw XML or JSON),只不过这时候JSON数据格式就有特殊要求了,你的JSON数据格式需要这样定义:

{
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
   ............. and so on.
}

    其中{.........}部分就是你的Document对象,其余部分为固定格式。使用这种格式正好弥补了Document Type为JSON这种方式只能一条一条的导入,效率太低,当你需要批量导入多个Document时,采用这种格式支持批量导入多个Document。

 

    如果你需要导入XML数据,你需要选择Document Type为XML,如图:
跟益达学Solr5之批量索引JSON数据_第3张图片
 <doc></doc>标签之间的就是你的XML数据,不过它跟Document Type选择为JSON有同样的弊端就是只支持单条导入,如果你需要批量导入XML数据,你同样可以选择Document Type为Solr Command(raw XML or JSON),只不过这时候,数据格式应该是类似这样的:

<add>
    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>
    
    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>

    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>
    
    ............ and so on
</add>

    如果你想更新Document,那就把<add>元素改成<update>即可,同理还有<delete>你懂的,先前在讲post.jar的时候我有提到过,具体请参阅《跟益达学Solr5之玩转post.jar》,OK,说了那么多,那现在我就以JSON数据为例进行一个操作示范,假定我有这样一个JSON数据,如图:
跟益达学Solr5之批量索引JSON数据_第4张图片
     首先我们需要从JSON数据中提炼出Field域,并在我们的Schema.xml配置文件定义域,如图:
跟益达学Solr5之批量索引JSON数据_第5张图片
   然后我们需要把传统的JSON数据转换成Solr能识别的格式,如图:
跟益达学Solr5之批量索引JSON数据_第6张图片

{
	"add": {
		"doc": {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00}
	},
	"add": {
		"doc": {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00}
	},
	"add": {
		"doc": {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00}
	},
	"add": {
		"doc": {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00}
	},
	"add": {
		"doc": {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00}
	},
	"add": {
		"doc": {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00}
	},
	"add": {
		"doc": {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00}
	},
	"add": {
		"doc": {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00}
	},
	"add": {
		"doc": {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00}
	},
	"add": {
		"doc": {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00}
	},
	"add": {
		"doc": {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50}
	},
	"add": {
		"doc": {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50}
	},
	"add": {
		"doc": {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00}
	},
	"add": {
		"doc": {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00}
	},
	"add": {
		"doc": {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00}
	}
}

    然后启动你的Tomcat,然后如图操作:
跟益达学Solr5之批量索引JSON数据_第7张图片
 

    提交后,执行查询,如图:
跟益达学Solr5之批量索引JSON数据_第8张图片
 as

   请注意Document Type选择项,如果你选择为JSON,那你将会收到这样一个异常,如图: 
跟益达学Solr5之批量索引JSON数据_第9张图片
    示例相关的配置以及测试数据,请看底下的附件,如果你们在操作过程中,遇到任何问题,请联系我,同时也欢迎各路Java高手加群一起交流学习,

   益达Q-Q:                7-3-6-0-3-1-3-0-5

 

   益达的Q-Q群:      1-0-5-0-9-8-8-0-6

 

 

   

 

 

   

你可能感兴趣的:(json,Solr,import)