銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠

浣犵煡閬撶殑瓒婂锛屼綘涓嶇煡閬撶殑瓒婂

鐐硅禐鍐嶇湅锛屽吇鎴愪範鎯�

GitHub https://github.com/JavaFamily涓婂凡缁忔敹褰曟湁涓�绾垮ぇ鍘傞潰璇曠偣鑴戝浘銆佷釜浜鸿仈绯绘柟寮忓拰鎶�鏈氦娴佺兢锛屾杩嶴tar鍜屾寚鏁�

鍓嶈█

杩欐槸甯呬笝鐪熷疄浜嬩欢锛屽ぇ瀹堕兘鐭ラ亾寰堝鍏徃閮芥槸鏈夋晠闅滅瓑绾ц繖涔堜竴璇寸殑锛岃繖灏辨槸鏁栦笝鍦ㄥ叕鍙歌儗鐨凱0绾ф晠闅滐紝鏁栦笝宸偣鍥犳琚В闆�锛屼簨鎯呯粡杩�鍗佸垎鎯婂績鍔ㄩ瓌锛屾垜鐨�蹇冭剰鐥呴兘宸偣澶嶅彂銆�

浜嬫晠绛夌骇涓昏閽堝鐢熶骇鐜锛屽垝鍒嗕緷鎹被浼间簬bug绛夌骇銆�

P0灞炰簬鏈�楂樼骇鍒簨鏁咃紝姣斿宕╂簝锛岄〉闈㈡棤娉曡闂紝涓绘祦绋嬩笉閫氾紝涓诲姛鑳芥湭瀹炵幇锛屾垨鑰呭湪褰卞搷闈笂褰卞搷寰堝ぇ锛堝嵆浣縝ug鏈韩涓嶄弗閲嶏級銆�

P1浜嬫晠灞炰簬楂樼骇鍒簨鏁咃紝涓�鑸睘浜庝富鍔熻兘涓婄殑鍒嗘敮锛屾敮绾挎祦绋嬶紝鏍稿績娆″姛鑳界瓑锛屽悗闈㈣繕鏈塒2锛孭3绛夛紝涓昏鏍规嵁浼佷笟瀹為檯鎯呭喌鍒掑垎銆�

姝f枃

鏁栦笝涔嬪墠涔熻礋璐e叕鍙哥殑鍟嗗搧鎼滅储涓氬姟锛屽洜涓轰笟鍔′綋閲忓閫熷お蹇簡锛屽晢鍝佽〃涓殑鍟嗗搧鏁版嵁涔熷緢蹇穬鍏ュ崈涓囩骇鍒紝鏌ヨ鐨凴T锛坮esponse time 鍝嶅簲鏃堕棿锛変篃瓒婃潵瓒婇珮浜嗭紝鑰屼笖浜у搧璇撮渶瑕佹牴鎹�鏇村缁村害鍘绘煡璇㈠晢鍝�銆�

鍥犱负涔嬪墠鎴戜滑閮芥槸鏍规嵁鍟嗗搧鐨勫悕绉板幓鏌ヨ鐨勶紝浣嗘槸鐢靛晢鍏跺疄閮戒細鏍规嵁寰堝涓淮搴﹀幓鏌ヨ鍟嗗搧銆�

灏辨瘮濡傚ぇ瀹跺幓娣樺疂鐨勬煡璇㈢殑鏃跺�欏氨浼氬彂鐜帮紝浣犳悳鍟嗗搧鍚嶇О銆侀鑹层�佹爣绛剧瓑绛夊涓淮搴﹂兘鍙互鎵惧埌杩欎釜鍟嗗搧锛屽氨姣斿涓嬪浘鐨勬悳绱紝鎴戝彧鏄悳浜嗐��甯呬笝銆戜綘浼氬彂鐜帮紝鍚嶅瓧閲岄潰涔熸病鏈夎繛缁殑甯呬笝涓や釜瀛楋紝鏈夊竻鍜屼笝鐨勫嚭鏉ヤ簡

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第1张图片

澶у鐭ラ亾鐨勪紶缁熺殑鍏崇郴鍨嬫暟鎹簱閮芥槸鐢ㄤ粈涔� name like %甯呬笝% 杩欐牱鐨勬柟寮忔煡璇㈢殑锛岃�屼笖鏌ヨ鍑烘潵鐨勭粨鏋滆偗瀹氬彧鑳芥槸name閲岄潰甯﹀竻涓欑殑瀵瑰惂銆�

閭d綘杩樻兂鎼滃埆鐨勫瓧娈垫瘮濡備粈涔堝昂瀵搞�佸叧閿瘝銆佷环鏍肩瓑绛夛紝閮借兘鎼滃埌甯呬笝锛岃繖鐩稿綋浜庢槸澶氫釜缁村害鐨勪簡锛屼紶缁熺殑鍏崇郴鍨嬫暟鎹簱鍋氫笉鍒板憖銆�

鍋氭妧鏈�夊瀷鐨勬椂鍊欙紝甯呬笝绗竴鏃堕棿鎯冲埌浜嗘悳绱㈠紩鎿庛��

褰撴椂甯傞潰鏄瘮杈冩祦琛岀殑鏈夛細Apache Lucene銆�Elasticsearch銆�Solr

鎼滅储寮曟搸鎴戝悗闈細璁�ELK锛圗lasticsearch銆丩ogstash銆並ibana锛�鍜�Canal锛屾垜鍛�鐪熺殑鏄お瀹犱綘浠簡锛岃繖鏍蜂細涓嶄細鎶婁綘浠儻鍧忎簡銆�

甯呬笝鎴戝憖锛屽櫦閲屽暘鍟︿竴椤挎搷浣滐紝鏈�鍚庡緱鍑虹粨璁猴細

鐩稿鏉ヨ锛屽鏋滆�冭檻闈欐�佹悳绱紝Sorl鐩稿鏇村悎閫傘��

濡傛灉鑰冭檻瀹炴椂锛屾秹鍙婂埌鍒嗗竷寮忥紝Elasticsearch鐩稿鍚堥�傘��

閭f垜浠晢鍝佽繕鏄瀹炴椂鐨勫憖锛屼綘鍚庡彴鏀逛簡浠锋牸鍟ョ殑锛屾槸涓嶆槸閮借瀹炴椂鍚屾鍑哄幓锛屼笉鐒朵笉鏄偢浜嗗槢銆�

鐪嬪埌杩欙紝鎴戞兂鍙埍鐨勪綘鍜屽竻涓欏績涓兘鏈変簡绛旀锛欵lasticsearch杩欐槸涓涓�鏍风殑寮曟搸銆�

鎴戣繖閲屽氨鍋氫竴涓畝鍗曠殑浠嬬粛灏卞ソ浜嗭紝缁嗚妭鐨勭偣鎴戜滑鍚庨潰鍘讳粬鐨勭珷鑺傝锛屽暐閮藉啓浜嗭紝鏁栦笝鍝噷鏈夎繖涔堝绱犳潗鍐欐枃绔狅紵

ElasticSearch鏄竴涓熀浜嶭ucene鐨勬悳绱㈡湇鍔″櫒銆�

瀹冩彁渚涗簡涓�涓垎甯冨紡澶氱敤鎴疯兘鍔涚殑鍏ㄦ枃鎼滅储寮曟搸锛屽熀浜嶳ESTful web鎺ュ彛銆�

Elasticsearch鏄敤Java璇█寮�鍙戠殑锛屽苟浣滀负Apache璁稿彲鏉℃涓嬬殑寮�鏀炬簮鐮佸彂甯冿紝鏄竴绉嶆祦琛岀殑浼佷笟绾ф悳绱㈠紩鎿庛��

ElasticSearch鐢ㄤ簬浜戣绠椾腑锛岃兘澶熻揪鍒板疄鏃舵悳绱紝绋冲畾锛屽彲闈狅紝蹇�燂紝瀹夎浣跨敤鏂逛究銆傚畼鏂瑰鎴风鍦↗ava銆�.NET锛圕#锛夈�丳HP銆丳ython銆丄pache Groovy銆丷uby鍜岃澶氬叾浠栬瑷�涓兘鏄彲鐢ㄧ殑銆�

鏍规嵁DB-Engines鐨勬帓鍚嶆樉绀猴紝Elasticsearch鏄渶鍙楁杩庣殑浼佷笟鎼滅储寮曟搸锛屽叾娆℃槸Apache Solr锛屼篃鏄熀浜嶭ucene銆�

鐪嬭繃鏁栦笝涔嬪墠鏂囩珷鐨勬湅鍙嬮兘鐭ラ亾锛屾垜浠仛鎶�鏈�夊瀷涔嬪墠锛岃鍋氫粈涔堝憖锛�璁捐锛�

鎴戜滑瑕佸幓浜嗚В杩欑帺鎰忕殑濂藉銆�鍧忓銆�甯歌鐨勫潙銆�鍑轰簡闂鐨勫簲鎬ラ妗�绛夌瓑锛岃繕鏈変粬鐨�鏁版嵁鍚屾鏈哄埗鍟婏紝鎸佷箙鍖栨満鍒跺暐鐨勶紝灏辨槸楂樺彲鐢ㄥ槢銆�

鍚屾牱鐨勬垜涓嶅ぇ绡囧箙浠嬬粛浜嗭紝浠ュ悗閮戒細鍐欑殑鍢涳紝鎴戝氨缁欏ぇ瀹剁湅鐪嬫垜褰撴椂鍋氱殑璁捐鍚с��

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第2张图片 銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第3张图片

杩欎釜鍙槸鏈�鍒濈殑demo锛岃缁嗙殑缁堢鎴戝氨涓嶇粰澶у鐪嬩簡锛屽洜涓烘湁寰堝鍏徃鍐呴儴鐨勯�昏緫銆�

涓嶈繃澶у杩樻槸鍙互鐪嬪埌鏁栦笝鐪熺殑鑰冭檻浜嗗緢澶氾紝杩樻槸閭e彞璇濓紝涓嶆墦娌℃妸鎻$殑浠楋紒

璁捐鍋氬ソ鏁栦笝灏�鍗″崱鍗�鐨勭敤璧锋潵浜嗐��

璇村疄璇濓紝鐪熼锛岃繖鐜╂剰鐪熺殑濂界敤锛屽涔犳垚鏈篃寰堜綆锛屾煡璇㈣鍙ュ垎鍒嗛挓鎺屾彙浜嗭紝瀹樼綉鏂囨。鎶婂姛鑳戒粙缁嶅緱娓呮櫚鏃犳瘮銆�

https://www.elastic.co/cn/

鐢ㄧ潃鐢ㄧ潃閲嶅ご鎴忔潵浜嗭紝浣犱滑閮界煡閬撴晼涓欐垜鏄仛鐢靛晢娲诲姩鐨勶紝閮芥槸浠�涔堝緢楂樼殑娴侀噺鎵撹繘鏉ヨ繖鏍凤紝杩樻槸濡傚線甯镐竴鏍蜂笂绾夸簡涓�涓椿鍔ㄣ��

杩欐槸涓�涓湀椋為楂樼殑澶滄櫄锛屼笣涓濆噳椋庤繋闈㈠惞鏉ワ紝鏁栦笝鎮犻棽鐨勫潗鍦ㄦ瀛愪笂锛屾墜閲屾嬁鐫�鐮存棫鐨勮尪鏉紝鍠濈潃澶栧﹩鐐掔殑鑻﹁崋鑼讹紝浜彈鐫�杩欐儸鎰忕殑鏃跺厜銆�

绐佺劧锛岃鏃惰繜閭f椂蹇紝杩愮淮鎵撴潵浜嗙揣鎬ョ數璇滶S闆嗙兢CPU鎵撳埌浜�99%瑕佹寕浜嗭紝鎴戠殑蹇冭摝鐒朵竴鐥�锛屽績閲岃繕鍦ㄥ簡骞歌繕鏄泦缇ゆ病宕┿��

鐒跺悗浠栨帴鐫�璇翠簡涓�鍙ワ紝涓嶅ソ闆嗙兢鎸備簡锛�

鏁栦笝鍗掞紝鏈瘒瀹屸��.

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第4张图片

寮�鐜╃瑧鐨勫搱锛屼笉杩囧綋鏃舵晼涓欑湡鐨�瑕佹鐨勫績鐪熺殑閮借鏈変簡锛屽氨鍦ㄥ穿鎺夌殑1鍒嗛挓鍐咃紝灏辨湁鐢ㄦ埛鍙嶉鎼滅储鏈搷搴旓紝鎴戠涓�鏃堕棿鎯冲埌鐨勫氨鏄噸鍚紝浜庢槸鎴戜竴涓仴姝ュ啿鍑哄幓锛屽紑鍚數鑴戯紝杩涙満鍣紝杈撳叆浜嗛噸鍚懡浠ゃ��

濂戒簡锛屾槸鐨勫ソ浜嗭紝杩樺ソ鏈夋儕鏃犻櫓锛屼笉杩囧彧杩囦簡10绉掞紝闆嗙兢鍙�99%浜嗭紝鍛愬憿锛�

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第5张图片

鎴戝張鍙兘閲嶅惎浜嗭紝杩欐娌℃寕锛岃繃浜嗗緢涔呭緢涔咃紝鐩村埌娲诲姩缁撴潫锛岃繕鏄病鎸傘��

鏌ユ壘闂

浣嗘槸杩欐褰卞搷鍒扮嚎涓婏紝3鍒嗛挓鐨勬悳绱㈡湭鍝嶅簲锛屾垜鎯虫垜浼拌鏄庡ぉ鏄鍘昏储鍔¢宸ヨ祫锛屾彁鍓嶅洖瀹惰繃骞翠簡銆�

杩樺ソLeader璇存病浜�锛屽厛鎵惧埌闂锛屾妸浠栦慨澶嶆帀銆�

浣犱滑閮界煡閬撴晼涓欏ぉ鎵嶆潵鐨勶紝鎴戠涓�鏃堕棿鎯冲埌鐨勫氨鏄湅鏃ュ織锛屾垜鐧讳笂鍘荤湅es娌℃姤閿欙紝鍐嶇湅鏈韩鐨勬湇鍔★紝闄や簡瓒呮椂鐨勯敊璇暐閮芥病鏈夛紝鍗фЫ锛屾槸鐨勫綋鏃舵垜鑴戣鍡″棥鍝嶃��

涓嶈繃鎴戠户缁兂涓哄暐鏄垜鐨勬悳绱㈡寕浜嗭紝浼氫笉浼氭槸鏈変汉鎼滀簡浠�涔堝鎬殑涓滆タ锛�

鎴戞墦寮�浜嗘垜鐨勬悳绱㈡棩蹇楋紒锛侊紒

鍗фЫ杩欎笉鏄惂锛屽摢涓潙鐖圭帺鎰忔悳杩欎箞闀跨殑涓�涓蹭腑鏂囷紝宸笉澶�250涓瓧鍚с��

浣嗘槸鎴戜竴鎯筹紝鎼滆繖涔堥暱涔熶笉搴旇鎵撴寕鏈嶅姟鍟婏紝浼氫笉浼氭槸鎴戝啓浜哹ug锛�

鎴戣劯棰婃祦涓嬩竴婊存睏姘答煉︼紝鎴戠湅浜嗙湅鍛ㄥ洿锛屽彂鐜版病浜烘敞鎰忓埌鎴戠殑绱у紶锛屾垜鏁呬綔闀囧畾鐨勬妸瀹冩摝鎺夈��

鎴戜粩缁嗕竴鎯筹紝鍒汉鏌ヨ铏界劧闀匡紝灏辩畻鏌ユ暟鎹簱涔熸病浜嬪晩锛屼负鍟s灏辨姤閿欎簡锛熶細涓嶄細锛�

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第6张图片

Es鏈塀ug锛佹病閿欒偗瀹氭槸Es鐨勯攨銆�

閭d负鍟ヤ細杩欐牱鍛紝鎴戠洿鎺ヨ窡鑰佸ぇ杩欐牱瑙i噴涔熷ソ鍍忎笉琛屽晩锛�杩樻槸瑕佽寮�闄ょ殑鍚э紒

浜庢槸鎴戝幓鐪嬬湅鐪嬩唬鐮侊紝鎴戝湪鍏抽敭璇嶄娇鐢ㄤ簡閫氶厤绗︼紝鎴戝綋鏃舵槸涓轰簡鍖归厤鏇村鍐呭鎵嶈繖涔堝仛鐨勶紝绫讳技鏁版嵁搴撶殑like锛孍s鐨勯�氶厤绗﹀氨鏄細 * 甯呬笝 * 杩欐牱鍦ㄥ叧閿瘝鍓嶅悗鍔犫��*鈥濆彿鍘绘煡璇�銆�

鍚庨潰鎴戝彂鐜板氨鏄�氶厤绗︾殑閿咃紝閭�鏌崡涓�灏辫涓�涓嬩负鍟ヤ細杩欐牱鐨勯棶棰樺嚭鐜般��

璁稿鏈塕DBMS/SQL鑳屾櫙鐨勫紑鍙戣�咃紝鍦ㄥ垵娆¤笍鍏lasticSearch涓栫晫鐨勬椂鍊欙紝寰堝鏄撳氨鎯冲埌浣跨敤閫氶厤绗�(Wildcard Query)鏉ュ疄鐜版ā绯婃煡璇紙姣斿鐢ㄦ埛杈撳叆琛ュ叏)锛屽洜涓鸿繖鏄拰SQL閲宭ike鎿嶄綔鏈�鐩镐技鐨勬煡璇㈡柟寮忥紝鐢ㄨ捣鏉ユ劅瑙夐潪甯歌垝閫傘��

鐒惰�屽竻涓欑殑鏁呴殰灏辨彮绀轰簡锛�婊ョ敤Wildcard query鍙兘甯︽潵鐏鹃毦鎬х殑鍚庢灉銆�

鎴戝綋鏃堕鍏堝鐜颁簡闂

澶嶇幇鏂规硶

  1. 鍒涘缓涓�涓彧鏈変竴鏉℃枃妗g殑绱㈠紩

POST test_index/type1/?refresh=true

{

"foo": "bar"

}

2.浣跨敤wildcard query鎵ц涓�涓灏惧甫鏈夐�氶厤绗�*鐨勯暱瀛楃涓叉煡璇�

POST /test_index/_search

{

"query": {

"wildcard": {

"foo": {

鈥� "value": "杞昏交鐨勬垜璧颁簡锛屾濡傛垜杞昏交鐨勬潵锛涙垜杞昏交鐨勬嫑鎵嬶紝浣滃埆瑗垮ぉ鐨勪簯褰┿�傞偅娌崇晹鐨勯噾鏌筹紝鏄闃充腑鐨勬柊濞橈紱娉㈠厜閲岀殑鑹冲奖锛屽湪鎴戠殑蹇冨ご鑽℃季銆傝蒋娉ヤ笂鐨勯潚鑽囷紝娌规补鐨勫湪姘村簳鎷涙憞锛涘湪搴锋渤鐨勬煍娉㈤噷锛屾垜鐢樺績鍋氫竴鏉℃按鑽夛紒閭f鑽笅鐨勪竴娼紝涓嶆槸娓呮硥锛屾槸澶╀笂铏癸紱鎻夌鍦ㄦ诞钘婚棿锛屾矇娣�鐫�褰╄櫣浼肩殑姊︺�傚姊︼紵鎾戜竴鏀暱绡欙紝鍚戦潚鑽夋洿闈掑婕函锛涙弧杞戒竴鑸规槦杈夛紝鍦ㄦ槦杈夋枒鏂撻噷鏀炬瓕銆備絾鎴戜笉鑳芥斁姝岋紝鎮勬倓鏄埆绂荤殑绗欑锛涘铏篃涓烘垜娌夐粯锛屾矇榛樻槸浠婃櫄鐨勫悍妗ワ紒鎮勬倓鐨勬垜璧颁簡锛屾濡傛垜鎮勬倓鐨勬潵锛涙垜鎸ヤ竴鎸ヨ。琚栵紝涓嶅甫璧颁竴鐗囦簯褰┿��"

}

}

}

}

  1. 鏌ョ湅缁撴灉

{

"took": 3445,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits":

}

}

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第7张图片

鍗充娇no hits锛岃�楁椂鍗存槸鎯婁汉鐨�3.4绉� (娴嬭瘯鏈烘槸macbook pro, i7 CPU)锛屽苟涓旀墽琛岃繃绋嬩腑锛孋PU鏈変竴涓緢楂樼殑灏栧嘲銆�

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第8张图片

绾夸笂鐨勬煡璇㈡瘮鎴戣繖涓寖渚嬭澶嶆潅寰楀锛屼細鍚屾椂鏌ュ嚑涓瓧娈碉紝瀹為檯娴嬭瘯涓嬫潵锛屼竴涓煡璇㈠彲鑳戒細鎵ц鍗佸嚑绉掗挓銆�

鍐嶆湁姣旇緝澶氶暱瀛楃涓叉煡璇㈢殑鏃跺�欙紝闆嗙兢鍙兘灏盌OS浜嗐��

鎺㈡煡娣卞眰娆℃牴婧�

涓轰粈涔堝鍙湁涓�鏉℃暟鎹殑绱㈠紩鍋氳繖涓煡璇㈠紑閿�杩欎箞楂橈紵 鐩磋涓婂簲璇ユ槸鐬棿杩斿洖缁撴灉鎵嶅!

鍥炵瓟杩欎釜闂鍓嶏紝鍙互鍐嶅仛涓祴璇曪紝濡傛灉缁х画鍔犲ぇ鏌ヨ瀛楃涓茬殑闀垮害锛屽埌浜嗕竴瀹氶暱搴﹀悗锛孍S鐩存帴鎶涘紓甯镐簡锛屾湇鍔″櫒ES閲屽紓甯哥粰鍑虹殑cause濡備笅:

Caused by: org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 22082 states and 34182 transitions would result in more than 10000 states. at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) ~[lucene-core-6.4.1.jar:6.4.1

瑙i噴锛氳寮傚父鏉ヨ嚜org.apache.lucene.util.automaton杩欎釜鍖咃紝寮傚父鍘熷洜鐨勫瓧闈㈠惈涔夋槸璇粹��鑷姩鏈鸿繃浜庡鏉傝�屾棤娉曠‘瀹氱姸鎬侊細 鐢变簬鐘舵�佸拰杞崲澶锛岀‘瀹氫竴涓嚜鍔ㄦ満闇�瑕佺敓鎴愮殑鐘舵�佽秴杩�10000涓笂闄�"

鏌崡涓�缃戜笂鏌ユ壘浜嗗ぇ閲忚祫鏂欏悗锛岀粓浜庢悶娓呮浜嗛棶棰樼殑鏉ラ緳鍘昏剦銆�

涓轰簡鍔犻�熼�氶厤绗﹀拰姝e垯琛ㄨ揪寮忕殑鍖归厤閫熷害锛孡ucene4.0寮�濮嬩細灏嗚緭鍏ョ殑瀛楃涓叉ā寮忔瀯寤烘垚涓�涓狣FA (Deterministic Finite Automaton)锛屽甫鏈夐�氶厤绗︾殑pattern鏋勯�犲嚭鏉ョ殑DFA鍙兘浼氬緢澶嶆潅锛�寮�閿�寰堝ぇ銆�

姣斿a*bc鏋勯�犲嚭鏉ョ殑DFA灏卞儚涓嬮潰杩欎釜鍥句竴鏍�:

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第9张图片

Lucene鏋勯�燚FA鐨勫疄鐜�

鐪嬩簡涓�涓婰ucene鐨勯噷鐩稿叧鐨勪唬鐮侊紝鏋勫缓杩囩▼澶ц嚧濡備笅:

  1. org.apache.lucene.search.WildcardQuery閲岀殑toAutomaton鏂规硶锛岄亶鍘嗚緭鍏ョ殑閫氶厤绗attern锛屽皢姣忎釜瀛楃鍙樻垚涓�涓嚜鍔ㄦ満(automaton)锛岀劧鍚庡皢姣忎釜瀛楃鐨勮嚜鍔ㄦ満閾炬帴璧锋潵鐢熸垚涓�涓柊鐨勮嚜鍔ㄦ満銆�
public static Automaton toAutomaton(Term wildcardquery) {
        List automata = new ArrayList<>();
        String wildcardText = wildcardquery.text();
        for (int i = 0; i < wildcardText.length();) {
            final int c = wildcardText.codePointAt(i);
            int length = Character.charCount(c);
            switch(c) {
                case WILDCARD_STRING:
                    automata.add(Automata.makeAnyString());
                    break;
                case WILDCARD_CHAR:
                    automata.add(Automata.makeAnyChar());
                    break;
                case WILDCARD_ESCAPE:
                    // add the next codepoint instead, if it exists
                    if (i + length < wildcardText.length()) {
                        final int nextChar = wildcardText.codePointAt(i + length);
                        length += Character.charCount(nextChar);
                        automata.add(Automata.makeChar(nextChar));
                        break;
                    } // else fallthru, lenient parsing with a trailing \
                default:
                    automata.add(Automata.makeChar(c));
            }
            i += length;
        }
        return Operations.concatenate(automata);
    }
  1. 姝ゆ椂鐢熸垚鐨勭姸鎬佹満鏄笉纭畾鐘舵�佹満锛屼篃灏辨槸Non-deterministic Finite Automaton锛圢FA)銆�

  2. org.apache.lucene.util.automaton.Operations绫婚噷鐨刣eterminize鏂规硶鍒欎細灏哊FA杞崲涓篋FA

/**
  \* Determinizes the given automaton.
  \* 


  \* Worst case complexity: exponential in number of states.
  \* @param maxDeterminizedStates Maximum number of states created when
  \*  determinizing. Higher numbers allow this operation to consume more
  \*  memory but allow more complex automatons. Use
  \*  DEFAULT_MAX_DETERMINIZED_STATES as a decent default if you don't know
  \*  how many to allow.
  \* @throws TooComplexToDeterminizeException if determinizing a creates an
  \*  automaton with more than maxDeterminizedStates
  */

浠g爜娉ㄩ噴閲岃杩欎釜杩囩▼鐨勬椂闂村鏉傚害鏈�宸儏鍐典笅鏄姸鎬佹暟閲忕殑鎸囨暟绾у埆锛�

涓洪槻姝骇鐢熺殑鐘舵�佽繃澶氾紝娑堣�楄繃澶氱殑鍐呭瓨鍜孋PU锛岀被閲岄潰瀵规渶澶х姸鎬佹暟閲忓仛浜嗛檺鍒�

 /**
  * Default maximum number of states that {@link Operations#determinize} should create.
  */

 public static final int DEFAULT_MAX_DETERMINIZED_STATES = 10000;

鍦ㄦ湁棣栧熬閫氶厤绗︼紝骞朵笖瀛楃涓插緢闀跨殑鎯呭喌涓嬶紝杩欎釜determinize杩囩▼浼氫骇鐢熷ぇ閲忕殑state锛岀敋鑷充細瓒呰繃涓婇檺銆�

鑷充簬NFA鍜孌FA鐨勫尯鍒槸浠�涔堬紵 濡備綍鐩镐簰杞崲锛�

缃戜笂鏈夊緢澶氭暟瀛﹀眰闈㈢殑璧勬枡鍜岃鏂囷紝闄愪簬甯呬笝绠楁硶鏂归潰鏈夐檺鐨勭煡璇嗭紝鏃犵簿鍔涘幓娣卞叆鎺㈢┒銆�

浣嗘槸涓�涓矖娴呯殑鐞嗚В鏄�: NFA鍦ㄨ緭鍏ヤ竴涓潯浠剁殑鎯呭喌涓嬶紝鍙互浠庝竴涓姸鎬佽浆绉诲埌澶氱鐘舵�侊紝鑰孌FA鍙細鏈変竴涓‘瀹氱殑鐘舵�佸彲浠ヨ浆绉伙紝鍥犳DFA鍦ㄥ瓧绗︿覆鍖归厤鏃堕�熷害鏇村揩銆�

DFA铏界劧鎼滅储鐨勬椂鍊欏揩锛屼絾鏄瀯閫犳柟闈㈢殑鏃堕棿澶嶆潅搴﹀彲鑳芥瘮杈冮珮锛岀壒鍒槸甯︽湁棣栭儴閫氶厤绗�+闀垮瓧绗︿覆鐨勬椂鍊欍��

鍥炴兂Elasticsearch瀹樻柟鏂囨。閲屽浜嶹ildcard query鏈夌壒鍒鏄庯紝瑕侀伩鍏嶄娇鐢ㄩ�氶厤绗﹀紑澶寸殑term銆�

" Note that this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?."

缁撳悎瀵逛笂闈ildcard query搴曞眰瀹炵幇鐨勬帰绌讹紝涔熷氨涓嶉毦鐞嗚В杩欏彞璇濈殑鍚箟浜嗭紒

灏忕粨锛� Wildcard query搴旀潨缁濅娇鐢ㄩ�氶厤绗︽墦澶达紝瀹炲湪涓嶅緱宸茶杩欎箞鍋氾紝灏变竴瀹氶渶瑕侀檺鍒剁敤鎴疯緭鍏ョ殑瀛楃涓查暱搴︺��

鏈�濂芥崲涓�绉嶅疄鐜版柟寮忥紝閫氳繃鍦╥ndex time鍋氭枃绔狅紝閫夌敤鍚堥�傜殑鍒嗚瘝鍣紝姣斿nGram tokenizer棰勫鐞嗘暟鎹紝鐒跺悗浣跨敤鏇村粔浠风殑term query鏉ュ疄鐜板悓绛夌殑妯$硦鎼滅储鍔熻兘銆�

瀵逛簬閮ㄥ垎杈撳叆鍗虫彁绀虹殑搴旂敤鍦烘櫙锛屽彲浠ヨ�冭檻浼樺厛浣跨敤completion suggester, phrase/term suggeter涓�绫绘�ц兘鏇村ソ,妯$硦绋嬪害鐣ュ樊鐨勬柟寮忔煡璇紝寰卻uggester娌℃湁鍖归厤缁撴灉鐨勬椂鍊欙紝鍐峟all back鍒版洿妯$硦浣嗘�ц兘杈冨樊鐨剋ildcard, regex, fuzzy涓�绫荤殑鏌ヨ銆�

琛ュ厖锛� 鏈夊悓瀛﹂棶regex, fuzzy query鏄惁鏈夊悓鏍风殑闂锛岀瓟妗堟槸鏈夛紝鍘熷洜鍦ㄤ簬浠栦滑搴曞眰鍜寃ildcard涓�鏍凤紝閮芥槸閫氳繃灏唒attern鏋勯�犳垚DFA鏉ュ姞閫熷瓧绗︿覆鍖归厤閫熷害鐨勩��

鍥炲繂锛氫负鍟ヤ箣鍓嶆寕浜嗕竴娆¢噸鍚仮澶嶄簡锛岄┈涓婂張鎸備簡锛熺敤鎴锋悳浜嗕袱娆°�傘�傘��

瑙e喅鏂规

鍏跺疄瑙e喅杩欑闂寰堢畝鍗曪紝鏃㈢劧鐭ラ亾鍏抽敭璇嶉暱浜嗕細鏈夐棶棰橈紝鎴戝氨鍋氶檺鍒�鍢涳紝澶у鍙互鍘荤湅鐪嬫悳绱㈠紩鎿庢煇搴︺�佹煇瀹濆暐鐨勶紝鏄笉鏄兘鍋氫簡闀垮害闄愬埗锛�

鎴戝鍒朵簡寰堥暱鐨勪竴娈垫眽瀛楄繘鍘荤櫨搴﹀氨鏄繖涓粨鏋滃挴锛屾煇瀹濊繃闀块兘杩斿洖榛樿椤甸潰浜嗐��

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第10张图片 image-20191204205715057 銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第11张图片

濡傛灉浣犵殑浜у搧涓�瀹氳缁欑敤鎴蜂竴鐐逛笢瑗匡紝绠�鍗曪紝鎵惧嚭涓�浜涚儹璇嶅垎鏋愬嚭鏉ュ氨濂戒簡锛屾垨鑰呯粰鐐圭儹鎼滃晢鍝�鍏滃簳銆�

鎴戞�庝箞鍋氱殑鍛紵鍒ゆ柇瀛楃涓查暱搴﹀ぇ浜�50鎴戝氨鐩存帴杩斿洖绌烘暟缁勪簡锛岃繖鏍峰鐢ㄦ埛浣撻獙濂界偣锛屼綘杩斿洖涓弬鏁伴敊璇垨鑰呴粯璁ら敊璇埆浜鸿繕浠ヤ负浣犳湁Bug鍛㈠鍚с��

鎬荤粨

鍏跺疄鏁栦笝鎴戝暐浜嬫晠绛夌骇閮芥病鑳屽搱鍝堬紝杩欎釜绠楁槸浜嬫晠锛屼絾鏄晼涓欐垜杩欎箞鍙埍锛岄瀵间篃蹇冪柤鎴戝晩锛岃偗瀹氫笉浼氭�垜鐨勬媺锛屼富瑕佹槸鎴戣璁¢兘鑰冭檻浜嗗緢澶氭柟妗堝拰鍦烘櫙浜嗭紝娌℃兂鍒版湁杩欎釜鍧戙�傦紙yy锛氭晼涓欎綘涓福鐢凤紝鍙堟槸鏍囬鍏氾紝浜哄杩樹互涓轰綘娌″伐浣滀簡瑕佸吇浣犲憿锛侊級

澶у涔熷彲浠ラ�氳繃杩欐浜嬫晠浣撲細鍒帮紝鎶�鏈�夊瀷鐨勬椂鍊欙紝鏂规鐨勯噸瑕佹��浜嗗惂锛屽氨绠椾綘鑰冭檻涓嶅叏锛屼絾鏄笉鑷充簬鐪熸鐨勯棶棰樻潵浜嗘墜瓒虫棤鎺晩锛屽苟涓嶆槸鎵�鏈夌殑浜嬫晠閮藉彲浠ュ儚杩欐杩欐牱閲嶅惎灏辨悶瀹氫簡锛�涓嶈瀛樻湁渚ュ垢蹇冪悊锛屽績瀛樻暚鐣�銆�

绲彣

鏁栦笝鍟婏紝鍙堟湁鐗岄潰浜嗭紝寰楀埌闃块噷浜戞秷鎭腑闂翠欢鍥㈤槦灏忎紮浼寸殑璁ゅ彲锛屽苟涓斿彂鐜板眳鐒舵槸鎴戝濮�-椋庝簯锛堣姳鍚嶏級锛侊紒锛�

濂规槸涓ソ瀛︾殑灏忓濮愶紝澶у澶氬鍍忎紭绉�鐨勪粩瀛︿範锛屽濮愪笉鏄仛鎶�鏈殑锛屼絾鏄兘鍦ㄤ笉鏂涔狅紝璇村疄璇濇垜鐨勭溂瑙掑張婀夸簡銆�

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第12张图片

鍒窇锛屾姇绁紒锛侊紒

鎴戝噯澶囨妸鎴戠殑鍏紬鍙�JavaFamily 杩欎釜鍚嶅瓧鏀逛簡锛岃繖涓悕瀛楄繕鏄樊鐐规剰鎬濓紝浣嗘槸鍙堜笉鑳藉彨鏁栦笝浜嗭紝琚敞鍐屽晢鏍囦簡锛屾垜灏遍棶浜嗕笅缇ら噷鐨勪汉鎵嶏紝鐩墠鏈変袱涓垜姣旇緝鍠滄鐨�

  • 甯呬笝
  • 涓夊お瀛愭晼涓�
  • 鍏朵粬缁欐垜鐣欒█

鍥犱负杩欎釜鍙兘浼氶櫔浼存垜寰堜箙锛岀敋鑷崇洿鍒版鍘伙紝甯屾湜澶у閮界粰鐐瑰缓璁搱鍝堛��

鍒棶鎴戜负鍟ヨ璺熸晼涓欒繖涓悕瀛楃浉鍏筹紝鍐嶉棶鑷潃锛�

鎴戣姳鍚嶅氨鍙繖涓紝鎵�浠ヰ煒�

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第13张图片

鐐瑰叧娉紝涓嶈糠璺�

濂戒簡鍚勪綅锛屼互涓婂氨鏄繖绡囨枃绔犵殑鍏ㄩ儴鍐呭浜嗭紝鑳界湅鍒拌繖閲岀殑浜哄憖锛岄兘鏄�浜烘墠銆�

鎴戝悗闈細姣忓懆閮芥洿鏂板嚑绡囦竴绾夸簰鑱旂綉澶у巶闈㈣瘯鍜屽父鐢ㄦ妧鏈爤鐩稿叧鐨勬枃绔狅紝闈炲父鎰熻阿浜烘墠浠兘鐪嬪埌杩欓噷锛屽鏋滆繖涓枃绔犲啓寰楄繕涓嶉敊锛岃寰椼�屾晼涓欍�嶆垜鏈夌偣涓滆タ鐨勮瘽 姹傜偣璧烉煈� 姹傚叧娉ㄢ潳锔� 姹傚垎浜煈� 瀵规殩鐢锋垜鏉ヨ鐪熺殑 闈炲父鏈夌敤锛侊紒锛�

鍒涗綔涓嶆槗锛屽悇浣嶇殑鏀寔鍜岃鍙紝灏辨槸鎴戝垱浣滅殑鏈�澶у姩鍔涳紝鎴戜滑涓嬬瘒鏂囩珷瑙侊紒

鏁栦笝 | 鏂� 銆愬師鍒涖��

濡傛灉鏈瘒鍗氬鏈変换浣曢敊璇紝璇锋壒璇勬寚鏁欙紝涓嶈儨鎰熸縺 锛�


鏂囩珷姣忓懆鎸佺画鏇存柊锛屽彲浠ュ井淇℃悳绱€�� 涓夊お瀛愭晼涓� 銆嶇涓�鏃堕棿闃呰鍜屽偓鏇达紙鍏紬鍙锋瘮鍗氬鏃╀竴鍒颁袱绡囧摕锛夛紝鏈枃 GitHub https://github.com/JavaFamily 宸茬粡鏀跺綍锛屾湁涓�绾垮ぇ鍘傞潰璇曠偣鎬濈淮瀵煎浘锛屼篃鏁寸悊浜嗗緢澶氭垜鐨勬枃妗o紝娆㈣繋Star鍜屽畬鍠勶紝澶у闈㈣瘯鍙互鍙傜収鑰冪偣澶嶄範锛屽笇鏈涙垜浠竴璧锋湁鐐逛笢瑗裤��

銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠_第14张图片

你可能感兴趣的:(銆婄▼搴忎汉鐢熴�嬬郴鍒�-瀹崇▼搴忓憳宸偣琚紑闄ょ殑P0浜嬫晠)