Endeca Japanese Segmentation (Advanced)

 Answers for the Japanese Base dictionary question:

 

There are two types of CJK searches which will be available:

 

1.)    CJK Natural Search – ナチュラルサ�`チ -  flexible search – currently available

a.       This includes Hankaku to Zenkaku Katakana normalization and Zenkaku to Hankaku Alphanumeric normalization. (single-byte vs. double-byte normalization)

                                                               i.      User may choose not to normalize i f they don’t want to.

b.      Onbiki Searches (A search for マネ�`ジャ�` returns same results as マネジャ)

                                                &nb sp;              i.      User may choose not to have this feature if they don’t want to.

c.       Thesaurus (Synonym) mappings (these are generally created manually through the use of developer studio or IAP workbench)

                  &nbs p;                                            i.      Another option is to use a batch script to create the thesaurus.xml file

d.      Phrased Search

       & nbsp;                                                       i.      a search for “朝日新�” will not return “は月曜だったので早く起きようと思ったが、寝坊して新�を�iむ暇もなかった。

1.       CJK Natural Search is semi-intelligent, in that it will only return results which contain 朝日新�

                                                             ii.      < ![endif]>User has the option of changing this to a full wildcarded search if they want, so that the search featured in (i) will return everything. (Customers who use a lot of numeric data searches may prefer this)

e.      Multiple phrased search

                                           &nb sp;                   i.      User has the option to submit multiple phrases in one query. They can submit 朝日(space)新� this will submit “朝日” and “新�” as independent phrases so that all results containing “新�” will be returned as well as all results containing "朝日

f.< span style='font:7.0pt "Times New Roman"'>        Stop words

                                                               i.      If you do not want some words to be included in the search index, you can add these to stop words. For example: Sonyプレステ、任天堂DS

1.       Adding の as a stop word will allow users to be able to submit shortcut searches: Sonyプレステ or 任天堂DS

a.       Stopwords may be useful for part numbers and merchan dise

b.      Other stopwords include: は に が  を

 

2.)    CJK Linguistics Search – 言�Zサ�`チ

a.       This is currently in Beta phase and will be available around end of Q2

b.      Should contain the following features:

                                                    & nbsp;          i.      Customizable dictionary

                                                             ii.      Spell correction

                                                            iii.      Stemming

                                                            iv.      base dictionary – base dictionary is created dynamically and dependent on the data (unknown – how many original words are contained)

                                                 & nbsp;           v.      half-width and full-width normalization

本文出自 “平行线的凝聚” 博客,转载请与作者联系!

你可能感兴趣的:(职场,Dictionary,休闲,segmentation,Endeca)