php-elasticsearch使用时的踩坑【完结】

2019.09.17 16:30:00

  1. 创建索引/修改配置
            //创建索引
    	public function create_index(){
    		$params = [
    			'index' => 'my_index',
    			'body'  => [
    				'settings' => [
    					'number_of_shards' => 2,
    					'number_of_replicas' => 0,
    				]
    			]
    		];
    		$client = ClientBuilder::create()->build();
    		$response = $client->indices()->create($params);
    		var_dump($response);
    	}
    
    
           //修改配置
    	public function put_setting(){
    		$params = [
    			'index' => 'person',
    			'body'  => [
    				'settings' => [
    					'number_of_replicas' => 10,
    				]
    			],
    		];
    		$client = ClientBuilder::create()->build();
    		var_dump($client->indices()->putSettings($params));
    	}

    创建好的索引分片是无法通过put_setting来修改的,这个是一个坑,要求我们在创建索引之处就要好好规划这个结构及容量,否则之后的扩容过程会比较辛苦

  2. 修改mapping
            //将修改mapping
    	public function put_mapping(){
    		$mapping = [
    			'properties' => [
    				'address' => [
    					'type' => 'keyword',
    				],
    				'email'  => [
    					'type' => 'keyword',
    				]
    			]
    		];
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'body'  => $mapping,
    		];
    		$client = ClientBuilder::create()->build();
    		var_dump($client->indices()->putMapping($params));
    	}

    如果要对已存在的索引进行修改,与创建时有所不同,要指出修改的mapping类型,这里还要有一个地方要注意,那就是修改的mapping,新增的字段是追加的形式放入es里的,之前存在的并不会消失。

  3. bulk批量操作
    //批量创建文档
    	public function bulk_create_another(){
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'body'  => [],
    		];
    
    		for ($i =1; $i<=10;$i++){
    			$params['body'][] = [
    				'create' => [    //index 与 create一致都是创建文档
    					'_id' => $i,
    				]
    			];
    			$params['body'][] = [
    				'name' => 'PHPerJiang'.$i,
    				'age'  => $i,
    				'sex'  => $i%2,
    			];
    		}
    		$client = ClientBuilder::create()->build();
    		var_dump($client->bulk($params));
    	}
    
    	//批量更新
    	public function bulk_update_another(){
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'body'  => []
    		];
    		for($i = 1; $i <= 10; $i++){
    			$params['body'][] = [
    				'update' => [
    					'_id' => $i
    				]
    			];
    			$params['body'][] = [
    				'doc' => [
    					'name' => 'PHPerJiang'.$i*2,
    					'age'  => $i*3,
    					'sex'  => $i%2,
    				]
    			];
    		}
    		$client = ClientBuilder::create()->build();
    		var_dump($client->bulk($params));
    	}
    
    	//批量删除
    	public function bluk_delete_another(){
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'body'  => [],
    		];
    		for ($i = 1; $i <= 10; $i++){
    			$params['body'][] = [
    				'delete' => [
    					'_id' => $i,
    				]
    			];
    		}
    		$client = ClientBuilder::create()->build();
    		var_dump($client->bulk($params));
    	}

    批量增删改,要注意批量参数中body的写法,指出索引、类型、身体,身体中的操作分为连两部分,一部分是条件,一部分是数据。另外要注意的就是修改和产出操作,身体的第二部分数据部分要指明索引,否则es会报错,而新增数据参数中的第二部分不需要志宁索引

  4. 部分修改文档

    //部分更改doc,若 body 参数中指定一个 doc 参数。这样 doc 参数内的字段会与现存字段进行合并。
    	public function update_doc(){
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'id'    => 2,
    			'body'  => [
    				'doc' => [
    					'bbb' => '3'
    				]
    			]
    		];
    		$client = ClientBuilder::create()->build();
    		var_dump($client->update($params));
        }

    body参数中若指出doc参数,则会将es现有的字段与doc中的字段合并,相当于php的array_merge()函数,即es中如果没有这个字段则会创建。

2019-09-19更新

  1. 使用脚本script更新doc
    $params = [
        'index' => 'my_index',
        'type' => 'my_type',
        'id' => 'my_id',
        'body' => [
            'script' => 'ctx._source.counter += count',
            'params' => [
                'count' => 4
            ]
        ]
    ];
    
    $response = $client->update($params);

    PHP-ElasticSearch文档中是如上写的,经过我实际应用发现是个坑,按照以上写法会报错找不到参数count,正确的写法应该是如下

    //使用脚本更新数据
        public function update_doc_by_script(){
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'id'    => 2,
    			'body'  => [
    				'script' => [
    					'lang' => 'painless',
    					'source' => 'ctx._source.age += params.count',
    					'params' => ['count' => 1],
    				]
    			]
    		];
    		$client = ClientBuilder::create()->build();
    		var_dump($client->update($params));
        }

    将参数放入script参数内才可以,表示开始对文档有深深的怀疑了。

2019-09-20 更新

php-es的官方文档有很多错误,希望大家选择性使用

  1. 使用脚本更新数据,若数数据中没有这个字段则设定默认值。文档中是这么用的
    $params = [
        'index' => 'my_index',
        'type' => 'my_type',
        'id' => 'my_id',
        'body' => [
            'script' => 'ctx._source.counter += count',
            'params' => [
                'count' => 4
            ],
            'upsert' => [
                'counter' => 1
            ]
        ]
    ];
    
    $response = $client->update($params);

    第一点文档中的script使用方法不对,首先我们先把script给修正以下,如下代码,注意下列代码中的age1字段在es中是没有的。

    $params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'id'    => 8,
    			'body'  => [
    				'script' => [
    					'lang' => 'painless',
    					'source' => "ctx._source.age1 += params.count",
    					'params' => [
    						'count' => 5,
    					],
    				],
    				'upsert' => [
    					'count' => 1
    				]
    			],
    		];

    当我们执行如上脚本的时候,会报错找不到这个字段

    Message: {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[first-node][127.0.0.1:9300][indices:data/write/update[s]]"}],"type":"illegal_argument_exception","reason":"failed to execute script","caused_by":{"type":"script_exception","reason":"runtime error","script_stack":["ctx._source.age1 += params.count"," ^---- HERE"],"script":"ctx._source.age1 += params.count","lang":"painless","caused_by":{"type":"null_pointer_exception","reason":null}}},"status":400}

    实际上就是这个upsert参数没有生效,这是文档里的第二个错误。正确的写法应该如下

    $params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'id'    => 8,
    			'body'  => [
    				'script' => [
    					'lang' => 'painless',
    					'source' => "ctx._source.age1 = (ctx._source.age1 ?: 2) + params.count",
    					'params' => [
    						'count' => 5,
    					],
    				],
    			],
    		];

    我们在script脚本中判断是否存在这个age1字段,如果存在则执行后面的累加,如果不存在则给它一个默认值2,并且此时会在es的索引中会加入此字段。这里要注意 script中出现的  ?: 是painless中特定的语法,详情看https://www.elastic.co/guide/en/elasticsearch/reference/5.4/modules-scripting-painless-syntax.html

  2. 搜索的bool查询:filter\should\must\must_not

    public function search_complex(){
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'body'  => [
    				'query' => [
    					'bool' => [
    						'filter' => [
    							'term' => ['age1' => 22]
    						],
    						'must' => [
    							['term' => ['age' =>8]],
    							['term' => ['sex' =>0]]
    						],
    					],
    				],
    			],
    		];
    	    $client = ClientBuilder ::create() -> build();
    	    echo json_encode($client -> search($params));
        }

    搜索分为过滤filter 和查询 must\must_not\should,其中在bool参数下单独使用filter则不会打分,单独使用must\must_not\should或与filter与前面三个方式组合查询会返回参数。如果想使用filter查询又想获取相关性的得分,有以下两种方式可以实现:

                    //方式一
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'body'  => [
    				'query' => [
    					'bool' => [
    						'filter' => [
    							'term' => ['age1' => 22]
    						],
    						'must' => [
    							'match_all' => new stdClass()
    						]
    					],
    				],
    			],
    		];
    
    
    
                    //方式二
    		$params = [
    			'index' => 'person',
    			'type'  => 'doc',
    			'body'  => [
    				'query' => [
    					'constant_score' => [
    						'boost' => 2,
    						'filter' => [
    							'term' => ['sex' => 0]
    						],
    					],
    				],
    			],
    		];

    方式一是使用的must与filter组合查询,must中使用match_all匹配全部,相当于过滤filter后文档的全体。方式二是用的contanst_score,它取代了bool,这样过滤后的文档得分会被置为1,配合boost权重,可以给某一个查询过滤增加权重来分配不同的得分。

你可能感兴趣的:(elasticsearch)