white_while

Elasticsearch7 分词器(内置分词器和自定义分词器)

文章目录

Elasticsearch7 分词器(内置分词器和自定义分词器)

analysis

概览
char_filter

html_strip
mapping
pattern_replace

filter

asciifolding
length
lowercase
uppercase
ngram
edge_ngram
decimal_digit

tokenizer

Word Oriented Tokenizers

Standard tokenizer

Partial Word Tokenizers

NGram Tokenizer
Edge NGram Tokenizer

Structured Text Tokenizers

analyzer

standard /Standard Tokenizer;Lower Case Token Filter,Stop Token Filter
simple /Lower Case Tokenizer
whitespace /Whitespace Tokenizer
stop /Lower Case Tokenizer;Stop Token Filter
keyword /Keyword Tokenizer
pattern /Pattern Tokenizer;Lower Case Token Filter,Stop Token Filter
Language Analyzers
fingerprint /Standard Tokenizer;Lower Case Token Filter,ASCII Folding Token Filter,Stop Token Filter,Fingerprint Token Filter
customer分词器

Elasticsearch7 分词器(内置分词器和自定义分词器)

analysis

概览

"settings":{
    "analysis": { # 自定义分词
      "filter": {
      	"自定义过滤器": {
            "type": "edge_ngram",  # 过滤器类型
            "min_gram": "1",  # 最小边界 
            "max_gram": "6"  # 最大边界
        }
      },  # 过滤器
      "char_filter": {},  # 字符过滤器
      "tokenizer": {},   # 分词
      "analyzer": {
      	"自定义分词器名称": {
          "type": "custom",
          "tokenizer": "上述自定义分词名称或自带分词",
          "filter": [
            "上述自定义过滤器名称或自带过滤器"
          ],
          "char_filter": [
          	"上述自定义字符过滤器名称或自带字符过滤器"
          ]
        }
      }  # 分词器
    }
}

查询分词效果：

1.查询指定索引库的分词器效果
POST /discovery-user/_analyze
{
  "analyzer": "analyzer_ngram", 
  "text":"i like cats"
}
2.查询所有索引库通用的分词器效果
POST _analyze
{
  "analyzer": "standard",  # english,ik_max_word,ik_smart
  "text":"i like cats"
}

char_filter

定义：字符过滤器将原始文本作为字符流来接收，并可以新增，移除或修改字符转换字符流
A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.
可去除HTML元素或转换0123为零一二三

一个分词器可应用0或多个字符过滤器，按顺序生效
An analyzer may have zero or more character filters, which are applied in order.

es7自带字符过滤器：

HTML Strip Character Filter:html_strip

去除HTML元素
The html_strip character filter strips out HTML elements like  and decodes HTML entities like &.

Mapping Character Filter:mapping

符合映射关系的字符进行替换 The mapping character filter replaces any occurrences of the specified strings with the specified replacements.

Pattern Replace Character Filter:pattern_replace

符合正则表达式的字符替换为指定的字符 The pattern_replace character filter replaces any characters matching a regular expression with the specified replacement.

html_strip

html_strip接受escaped_tags参数

"char_filter": { "my_char_filter": { "type": "html_strip", "escaped_tags": ["b"] } } escaped_tags：An array of HTML tags which should not be stripped from the original text. 即忽略的HTML标签 POST my_index/_analyze { "analyzer": "my_analyzer", "text": "I'm so happy!" } I'm so happy! # 忽略了b标签

mapping

The mapping character filter accepts a map of keys and values. Whenever it encounters a string of characters that is the same as a key, it replaces them with the value associated with that key.
Replacements are allowed to be the empty string允许空值

The mapping character filter accepts the following parameters：映射有以下两个参数,且必选其一
mappings：

A array of mappings, with each element having the form key => value 映射的数组，每个映射的格式为 key => value

mappings_path

A path, either absolute or relative to the config directory, to a UTF-8 encoded text mappings file containing a key => value mapping per line. 文件映射，路径是绝对路径或相对于config文件夹的相对路径，文件需utf-8编码且每行的映射格式为key => value

"char_filter": { "my_char_filter": { "type": "mapping", "mappings": [ "一 => 0", "二 => 1", "# => ", # 映射值可以为空 "一二三 => 老虎" # 映射可以多个字符 ] } }

pattern_replace

The pattern_replace character filter uses a regular expression to match characters which should be replaced with the specified replacement string. The replacement string can refer to capture groups in the regular expression.

Beware of Pathological Regular Expressions
使用正则需要注意低效率的正则表达式，此类表达式可能引起StackOverflowError，es7的正则表达式遵从Java 的Pattern

正则表达式有以下参数：
pattern:必选

A Java regular expression. Required.

replacement:

The replacement string, which can reference capture groups using the $1..$9 syntax 要替换的字符串，通过

flags:

Java regular expression flags. Flags should be pipe-separated, eg "CASE_INSENSITIVE|COMMENTS".

123-456-789 → 123_456_789: "char_filter": { "my_char_filter": { "type": "pattern_replace", "pattern": "(\\d+)-(?=\\d)", "replacement": "$1_" } }

Using a replacement string that changes the length of the original text will work for search purposes, but will result in incorrect highlighting
正则过滤改变长度可能导致高亮结果有误

filter

A token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase, a stop token filter removes common words (stop words) like the from the token stream, and a synonym token filter introduces synonyms into the token stream.

Token filters are not allowed to change the position or character offsets of each token.

An analyzer may have zero or more token filters, which are applied in order.

asciifolding

A token filter of type asciifolding that converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the “Basic Latin” Unicode block) into their ASCII equivalents, if one exists

Accepts preserve_original setting which defaults to false but if true will keep the original token as well as emit the folded token
将前127个ASCII字符(基本拉丁语的Unicode块)中不包含的字母、数字和符号Unicode字符转换为对应的ASCII字符(如果存在的话）

length

过滤掉太短或太长的单词
Setting Description

min The minimum number. Defaults to 0. max The maximum number. Defaults to Integer.MAX_VALUE, which is 2^31-1 or 2147483647

lowercase

标准化文本为小写
参数language指定除了英语的其他语种

uppercase

标准化文本为大写

ngram

ngram过滤器，将分词进行ngram过滤处理，可实现中文分词器中对英文的分词
Setting Description

min_gram Defaults to 1. max_gram Defaults to 2. index.max_ngram_diff：即最大最小的差额 The index level setting index.max_ngram_diff controls the maximum allowed difference between max_gram and min_gram.

edge_ngram

边界ngram过滤 123过滤为1，12,123没有2,23
Setting Description

min_gram Defaults to 1. max_gram Defaults to 2. side deprecated. Either front or back. Defaults to front.

decimal_digit

decimal_digit的作用是将unicode数字转化为0-9
\u0032 转成2

tokenizer

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens.

The tokenizer is also responsible for recording the order or position of each term and the start and end character offsets of the original word which the term represents.

An analyzer must have exactly one tokenizer.

测试tokenzer效果

POST _analyze { "tokenizer": "tokenzer名称", "text": "分词文本：The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }

Word Oriented Tokenizers

The following tokenizers are usually used for tokenizing full text into individual words
单词取词通常将整个文本切成独立的单词

Standard tokenizer

configuration参数：
max_token_length

The maximum token length. If a token is seen that exceeds this length then it is split at max_token_length intervals. Defaults to 255 超过此长度切割如长度3，abcd分成abc d

{ "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "standard", "max_token_length": 5 } } } } }

Partial Word Tokenizers

These tokenizers break up text or words into small fragments
部分词取词，将文本或单词切分成更小的片段

NGram Tokenizer

The ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word of the specified length.
ngram取词会将文本切成单词，然后每个单词是指定长度区间的ngram片段

取词效果：

POST _analyze { "tokenizer": "ngram", "text": "Quick Fox" } The above sentence would produce the following terms: [ Q, Qu, u, ui, i, ic, c, ck, k, "k ", " ", " F", F, Fo, o, ox, x ]

With the default settings, the ngram tokenizer treats the initial text as a single token and produces N-grams with minimum length 1 and maximum length 2
ngram默认最小长度1，最大长度2

Configuration
min_gram

Minimum length of characters in a gram. Defaults to 1 片段的最小长度默认1

max_gram

Maximum length of characters in a gram. Defaults to 2. 片段的最大长度默认2

token_chars

默认取值[]保留所有字符指定的不包含 Character classes that should be included in a token. Elasticsearch will split on characters that don’t belong to the classes specified. Defaults to [] (keep all characters). Character classes may be any of the following: letter — for example a, b, ï or 京 digit — for example 3 or 7 whitespace — for example " " or "\n" punctuation — for example ! or " symbol — for example $ or √

{ "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "ngram", "min_gram": 3, "max_gram": 3, "token_chars": [ "letter", "digit" ] } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "2 Quick Foxes." } 结果不包含digit和letter [ Qui, uic, ick, Fox, oxe, xes ]

Edge NGram Tokenizer

The edge_ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word.
边界ngram，固定从每个单词的开始生成指定长度的ngram 如abc生成ab和abc不会有bc
参数同ngram

Structured Text Tokenizers

The following tokenizers are usually used with structured text like identifiers, email addresses, zip codes, and paths, rather than with full text
身份验证，邮箱地址，路径等有结构的文书取词

analyzer

built-in analyzers： 内置分词器

Standard Analyzer：standard
The standard analyzer divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lowercases terms, and supports removing stop words.

Simple Analyzer：simple
The simple analyzer divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.

Whitespace Analyzer：whitespace
The whitespace analyzer divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.

Stop Analyzer：stop
The stop analyzer is like the simple analyzer, but also supports removal of stop words.

Keyword Analyzer：keyword
The keyword analyzer is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.

Pattern Analyzer：pattern
The pattern analyzer uses a regular expression to split the text into terms. It supports lower-casing and stop words.

Language Analyzers：english等
Elasticsearch provides many language-specific analyzers like english or french.

Fingerprint Analyzer：fingerprint
The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.

custom analyzer：自定义分词器
If you do not find an analyzer suitable for your needs, you can create a custom analyzer which combines the appropriate character filters, tokenizer, and token filters.

The built-in analyzers can be used directly without any configuration.
内置分词器开箱即用无需配置
Some of them, however, support configuration options to alter their behaviour
一些内置分词器支持有选择的配置

standard分词器搭配stopwords参数 "analysis": { "analyzer": { "自定义分词器名称": { "type": "standard", "stopwords": "_english_" # 支持英语停用词即分词忽略the a等 } } }

standard /Standard Tokenizer;Lower Case Token Filter,Stop Token Filter

默认分词器
The standard analyzer is the default analyzer which is used if none is specified
分词效果：

POST _analyze { "analyzer": "standard", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } The above sentence would produce the following terms: [ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]

standard参数：
max_token_length

The maximum token length. If a token is seen that exceeds this length then it is split at max_token_length intervals. Defaults to 255. 分词器分成token的最大长度，例如为5，则jumped分成jumpe d ；此参数最大255

stopwords

A pre-defined stop words list like _english_ or an array containing a list of stop words. Defaults to _none_ 可以设置为_english_ 或自定义数组["a", "the"]，默认_none_

stopwords_path

The path to a file containing stop words.文件方式

"analyzer": { "my_english_analyzer": { "type": "standard", "max_token_length": 5, # token最长为5 "stopwords": "_english_" # 忽略英语停用词 } }

definition定义
standard分词器默认组成：
-Tokenizer

Standard Tokenizer

Token Filters

Lower Case Token Filter

Stop Token Filter (disabled by default)

If you need to customize the standard analyzer beyond the configuration parameters then you need to recreate it as a custom analyzer and modify it, usually by adding token filters.
若需要自定义standard分词器需要指定type为custom

"analysis": { "analyzer": { "rebuilt_standard": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase" ] } } } 自定义standard分词器无法使用max_token_length，stopwords等参数，需要自定义Token Filters过滤器 Lower Case Token Filter Stop Token Filter (disabled by default)

simple /Lower Case Tokenizer

The simple analyzer breaks text into terms whenever it encounters a character which is not a letter. All terms are lower cased.
遇到
分词效果：

POST _analyze { "analyzer": "simple", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } The above sentence would produce the following terms: [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

The simple analyzer is not configurable.
simple分词器没有参数

definition：

Tokenizer

Lower Case Tokenizer

whitespace /Whitespace Tokenizer

The whitespace analyzer breaks text into terms whenever it encounters a whitespace character
根据空格进行分词
分词效果：

POST _analyze { "analyzer": "whitespace", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } The above sentence would produce the following terms: [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

没有可选参数
Definition

Tokenizer

Whitespace Tokenizer

stop /Lower Case Tokenizer;Stop Token Filter

The stop analyzer is the same as the simple analyzer but adds support for removing stop words. It defaults to using the english stop words.
与simple分词器类似，但是默认提供停止词
分词效果：

POST _analyze { "analyzer": "stop", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } The above sentence would produce the following terms: [ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]

可选参数：
stopwords

A pre-defined stop words list like _english_ or an array containing a list of stop words. Defaults to _english_.

stopwords_path

The path to a file containing stop words. This path is relative to the Elasticsearch config directory.

"analyzer": { "my_stop_analyzer": { "type": "stop", "stopwords": ["the", "over"] } }

definition:

Tokenizer

Lower Case Tokenizer

Token filters

Stop Token Filter

自定义stop

"settings": { "analysis": { "filter": { "english_stop": { "type": "stop", "stopwords": "_english_" } }, "analyzer": { "rebuilt_stop": { "tokenizer": "lowercase", "filter": [ "english_stop" ] } } } }

keyword /Keyword Tokenizer

The keyword analyzer is a “noop” analyzer which returns the entire input string as a single token
无操作的分词器，输入即输出
分词效果

POST _analyze { "analyzer": "keyword", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } The above sentence would produce the following single term: [ The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. ]

keyword分词器无可选参数

definition

Tokenizer

Keyword Tokenizer

pattern /Pattern Tokenizer;Lower Case Token Filter,Stop Token Filter

The pattern analyzer uses a regular expression to split the text into terms. The regular expression should match the token separators not the tokens themselves. The regular expression defaults to \W+ (or all non-word characters).
正则表达式分词器默认所有非单词字符
分词效果：

POST _analyze { "analyzer": "pattern", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } The above sentence would produce the following terms: [ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

configuration:可选配置参数
pattern

A Java regular expression, defaults to \W+. java正则表达式，有默认

flags

Java regular expression flags. Flags should be pipe-separated, eg "CASE_INSENSITIVE|COMMENTS".

lowercase

Should terms be lowercased or not. Defaults to true. 是否小写分词，默认true

stopwords

A pre-defined stop words list like _english_ or an array containing a list of stop words. Defaults to _none_. 停止词，默认_none_

stopwords_path

The path to a file containing stop words.

"analyzer": { "my_email_analyzer": { "type": "pattern", "pattern": "\\W|_", "lowercase": true } }

示例：驼峰分词

PUT my_index { "settings": { "analysis": { "analyzer": { "camel": { "type": "pattern", "pattern": "([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])" } } } } } GET my_index/_analyze { "analyzer": "camel", "text": "MooseX::FTPClass2_beta" } [ moose, x, ftp, class, 2, beta ]

一般正则表达式：

([^\p{L}\d]+) # swallow non letters and numbers, | (?<=\D)(?=\d) # or non-number followed by number, | (?<=\d)(?=\D) # or number followed by non-number, | (?<=[ \p{L} && [^\p{Lu}]]) # or lower case (?=\p{Lu}) # followed by upper case, | (?<=\p{Lu}) # or upper case (?=\p{Lu} # followed by upper case [\p{L}&&[^\p{Lu}]] # then lower case )

definition

Tokenizer

Pattern Tokenizer

Token Filters

Lower Case Token Filter

Stop Token Filter (disabled by default)

自定义正则分词器

{ "settings": { "analysis": { "tokenizer": { "split_on_non_word": { "type": "pattern", "pattern": "\\W+" } }, "analyzer": { "rebuilt_pattern": { "tokenizer": "split_on_non_word", "filter": [ "lowercase" ] } } } } }

Language Analyzers

各种语言的分词器

configure可选参数有：
stopwords 停止词
stem_exclusion：The stem_exclusion parameter allows you to specify an array of lowercase words that should not be stemmed. Internally, this functionality is implemented by adding the keyword_marker token filter with the keywords set to the value of the stem_exclusion parameter

english analyzer
The english analyzer could be reimplemented as a custom analyzer as follows:
英语分词器等同以下自定义分词器

PUT /english_example { "settings": { "analysis": { "filter": { "english_stop": { "type": "stop", "stopwords": "_english_" }, "english_keywords": { "type": "keyword_marker", "keywords": ["example"] }, "english_stemmer": { "type": "stemmer", "language": "english" }, "english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" } }, "analyzer": { "rebuilt_english": { "tokenizer": "standard", "filter": [ "english_possessive_stemmer", "lowercase", "english_stop", "english_keywords", "english_stemmer" ] } } } } }

fingerprint /Standard Tokenizer;Lower Case Token Filter,ASCII Folding Token Filter,Stop Token Filter,Fingerprint Token Filter

Input text is lowercased, normalized to remove extended characters, sorted, deduplicated and concatenated into a single token. If a stopword list is configured, stop words will also be removed.
去重，排序并聚集为一个单个的term，若配置停止词则删除停止词
分词效果：

POST _analyze { "analyzer": "fingerprint", "text": "Yes yes, Gödel said this sentence is consistent and." } The above sentence would produce the following single term: [ and consistent godel is said sentence this yes ]

configuration
separator

The character to use to concatenate the terms. Defaults to a space. 默认使用空格聚集所有term 即分隔符

max_output_size

The maximum token size to emit. Defaults to 255. Tokens larger than this size will be discarded. 输出被允许的最大长度，超过则丢弃默认255

stopwords
stopwords_path

{ "settings": { "analysis": { "analyzer": { "my_fingerprint_analyzer": { "type": "fingerprint", "stopwords": "_english_", "max_output_size": 222, "separator": "," } } } } }

Definition

Tokenizer

Standard Tokenizer

Token Filters (in order)

Lower Case Token Filter

ASCII Folding Token Filter

Stop Token Filter (disabled by default)

Fingerprint Token Filter

customer分词器

When the built-in analyzers do not fulfill your needs, you can create a custom analyzer which uses the appropriate combination of:

zero or more character filters

a tokenizer

zero or more token filters.

内置分词器不符合需求可通过filter和tokenizer自定义分词器

Configuration：
tokenizer:必选

A built-in or customised tokenizer. (Required)

char_filter

An optional array of built-in or customised character filters.

filter

An optional array of built-in or customised token filters.

position_increment_gap

When indexing an array of text values, Elasticsearch inserts a fake "gap" between the last term of one value and the first term of the next value to ensure that a phrase query doesn’t match two terms from different array elements. Defaults to 100

{ "settings": { "analysis": { "analyzer": { "my_custom_analyzer": { "type": "custom", # 自定义的analyzer其type固定custom "tokenizer": "standard", "char_filter": [ "html_strip" ], "filter": [ "lowercase", "asciifolding" ] } } } } }

LeetCode 21Merge Two Sorted Lists 合并两个排序链表 Java 我欲混吃与等死 LeetCode leetcode 链表 java
题目：将两个已排序的链表合并在一起。举例1：输入：list1=[1,2,4],list2=[1,3,4];输出：[1,1,2,3,4,4];举例2：输入：list1=[],list2=[];输出：[]举例3：输入：list1=[],list2=[0];输出：[0]解题思路：遍历两个链表，比较节点值来合并链表，当其中一个链表遍历完成时，将另一个链表剩余部分拼入新链表。/***Definitionfo
Java后端开发技术详解小二爱编程· java 开发语言
Java作为一门成熟的编程语言，已广泛应用于后端开发领域。其强大的生态系统和广泛的支持库使得Java成为许多企业和开发者的首选后端开发语言。随着云计算、微服务架构和大数据技术的兴起，Java后端开发的技术栈也不断演进。本文将详细介绍Java后端开发的核心技术，包括Java基础、常见框架、数据库操作、缓存技术、异步编程等。1.Java基础：理解面向对象的编程Java是一种面向对象的编程语言，面向对象
【Linux】Hadoop-3.4.1的伪分布式集群的初步配置孤独打铁匠Julian Linux linux hadoop ubuntu
配置步骤一、检查环境JDK#目前还是JDK8最适合Hadoopjava-versionecho$JAVA_HOMEHadoophadoopversionecho$HADOOP_HOME二、配置SSH免密登录Hadoop需要通过SSH管理节点（即使在伪分布式模式下）sudoaptinstallopenssh-server#安装SSH服务（如未安装）cd~/.ssh/ssh-keygen-trsa#生
Spring Boot 项目 90% 存在这 15 个致命漏洞，你的代码在裸奔吗？风象南原创随笔 java spring boot 后端 web安全系统安全
文章首发公众号【风象南】SpringBoot作为一款广泛使用的Java开发框架，虽然为开发者提供了诸多便利，但也并非无懈可击，其安全漏洞问题不容忽视。本文将深入探讨SpringBoot常见的安全漏洞类型、产生原因以及相应的解决方案，帮助开发者更好地保障应用程序的安全。1.SQL注入漏洞漏洞描述：当应用程序使用用户输入的数据来构建SQL查询时，如果没有进行适当的过滤或转义，攻击者就可以通过构造恶意的
golang jwt挖坑 qiang527052 golang个人笔记 golang jwt
golangjwt使用golangjwt使用中遇到的一个坑，特此记录。具体描述：因为公司需要，现有架构jwt生成token的代码是java实现的，然后现在在golang中需要对此token进行解析。java用到的jar包：io.jsonwebtoken.jjwt0.9.0golang用到的库：github.com/dgrijalva/jwt-gojava生成token测试代码如下：publicst
入门级带你实现一个安卓智能家居APP（2）kotlin版本一粒程序米 android kotlin 智能家居 WiFi 单片机
前言上一篇写过java版本的实现，这一篇就写一下kotlin版本的吧。效果展示本APP是通过tcp/ip协议与连了WiFi的单片机通信。其实除了主活动类和新建项目时有一丢丢不同，其他的都是一样的哈~第一步：你得会一点点kotlin基础，建议看一本书，是郭霖大神些的《第一行代码》第三版，里面除了安卓的基础教学，还有kotlin的。第二步：建议看一本书，是郭霖大神些的《第一行代码》，先入门安卓基础。不
vscode设置console.log的快捷输出方式活宝小娜 vscode vscode ide 编辑器
vscode设置console.log的快捷输出方式编辑器中输入clg回车，可以直接输出console.log，并且同步输出变量的字符串和值1、打开vscode点击左上角的文件2、找到首选项3、点击用户代码配置4、在顶部输入框种输入javas，选择JavaScript选项5、打开里面注释的代码，写入如下内容{//Placeyoursnippetsforjavascripthere."Printto
【Java se】程序逻辑控制 MABO-mb java 开发语言前端
一、顺序结构顺序结构比较简单，按照代码书写的顺序一行一行执行。System.out.println("aaa");System.out.println("bbb");System.out.println("ccc");//运行结果aaabbbccc如果调整代码的书写顺序,则执行顺序也发生变化System.out.println("aaa");System.out.println("ccc");Sy
springboot基于bs 架构的母婴用户商城全程服务管理系统(源码+lw+部署文档+讲解等) 源码哆哆V+ymhydo Java毕设优质源码 spring boot 架构后端
具体实现截图技术栈后端框架SpringBoot采用springboot作为后台的框架，java框架具有简化配置和开发的效率。Spring框架目前是很多java开发者的首选框架，Spring主要有两大功能，控制反转和面向切面的编程。控制反转（IOC）可以实现代码的依赖注入，减少代码的耦合性，大大提高了软件质量，面向切面编程（AOP）主要是应用动态代理的技术对代码逻辑进行分离，可以实现对代码的重用，适
Java对象的hashcode 阿黄学技术 Java基础 java 开发语言
在Java中，hashcode和equals方法是Object类的两个重要方法，它们在处理对象比较和哈希集合（如HashMap、HashSet）时起着关键作用。对于equals大部分Java程序员都不陌生，它通常是比较两个对象的内容(值)是否相等(==双等于比较对象的内存地址)，如果是Object中的equals方法默认就是比较内存地址(在没有被重写的情况下和==一样)。hashCode方法返回对
Java中卫语句的设计思想而为. java 服务器开发语言
卫语句（GuardClauses）是一种通过提前返回简化条件嵌套、提升代码可读性的编程技巧。其核心思想是优先处理异常或边界情况，让主逻辑保持扁平化。以下是deepseek做出的设计思想详解：核心设计原则FailFast（快速失败）在函数入口处立即检查非法参数或无效状态，若不符合条件则提前终止（如返回、抛异常），避免后续无效操作。减少嵌套层级用卫语句替换多层if-else嵌套，将代码从“箭头型”结构
Java进阶面试速记登陆成功200 JAVA进阶开发语言 java
注解注解@Override类似一个标签,作用在方法上,表示此方法是从父类中重写而来注解是java中的标注方式,可以最用在类,方法,变量,参数成员上在编译期间,会被编译到字节码文件中,运行时通过反射机制获得注解内容,进行解析.内置注解java中内定好的注解例如@Override@Deprecated-标记过时方法。如果使用该方法，会报编译警告。@SuppressWarnings-指示编译器去忽略注解
手写promise ,实现 then ,catch,finally,resolve,reject,all,allSettled 会飞的鱼先生前端 javascript 开发语言
完整代码原生Promise的用法1.Promise是JavaScript中用于处理异步操作的重要工具。它代表了一个异步操作的最终完成或失败，并且使异步方法可以像同步方法那样返回值。resolve：当异步操作成功时调用的函数，用于将Promise的状态改为fulfilled，并将结果值传递给后续的.then()方法。reject：当异步操作失败时调用的函数，用于将Promise的状态改为reject
Java单例模式【懒汉式&&饿汉式】 ice-Hamster Java学习单例模式 java eclipse 学习经验分享
目录一、单例模式的解释二、实现方法2.1饿汉式2.1.1饿汉式的实现代码2.2懒汉式2.2.1懒汉式的实现代码三、单例设计模式的好处3.1单例模式的应用场景一、单例模式的解释所谓类的单例设计模式，就是采用一定的方法保证在整个的软件系统中，对某个类只能存在一个对象实例。并且该类只提供一个取得其对象实例的方法。（简单来说，在整个的软件系统中，对某个类只能存在一个对象实例）二、实现方法单例设计模式的实现
微信小程序的旅游服务助手景点酒店旅游规划的设计与实现 QQ1304979694 微信小程序旅游小程序
文章目录具体实现截图本项目支持的技术语言研究思路、方法和步骤本系统开发思路主要软件与实现手段系统可行性分析源码获取详细视频演示：文章底部获取博主联系方式！！！！java类核心代码部分展示微信小程序技术现状源码获取/详细视频演示具体实现截图本项目支持的技术语言前端开发框架:vue.js+uniapp数据库mysql版本不限微信开发者工具/hbuiderx数据库工具：Navicat/SQLyog等都可
Java单例设计模式（懒汉式和饿汉式）俺是凡人很好 java 设计模式开发语言
一、什么是单例设计模式概念：java中单例模式是一种常见的设计模式，单例模式的写法有好几种，这里主要介绍俩种：懒汉式单例、饿汉式单例。单例模式有以下特点：1、单例类只能有一个实例。2、单例类必须自己创建自己的唯一实例。3、单例类必须给所有其他对象提供这一实例。单例模式确保某个类只有一个实例，而且自行实例化并向整个系统提供这个实例。在计算机系统中，线程池、缓存、日志对象、对话框、打印机、显卡的驱动程
java队列实现限流_如何使用队列实现微服务限流算法？纽太普 java队列实现限流
队列在平时开发中可能是出现频率最高的数据结构之一了，但是大部分情况下，我们都是用别人已经实现好的，比如kafka，比如redis里的list，以至于让人怀疑为什么还要去学习队列呢？希望今天的内容可以给你一些启发。什么是队列为了整个文章的完整性，我们还是来介绍一下什么是队列。我们举个生活中常见的案例，假设你在周杰伦的奶茶店买奶茶，由于人很多，为了保持公平和秩序，你被要求排队，最先来的人排到最前面，这
Spring的JavaWeb三层架构可问可问春风 JAVA SSM框架 spring 架构 java
Spring三层架构的核心注解及协作在Spring的JavaWeb三层架构中，通过分层注解实现职责分离和组件管理，各层（表现层、业务层、数据访问层）的协作基于组件扫描和依赖注入（DI）机制。以下是各层的核心注解及其协作关系：1.分层架构与对应注解层级职责注解关联技术表现层处理用户请求，返回响应@Controller/@RestControllerSpringMVC,RESTfulAPI业务层实现业
Java面试宝典，kafka优先级队列 m0_57081324 程序员 java 经验分享面试
为什么要分库分表？首先回答一下为什么要分库分表，答案很简单：数据库出现性能瓶颈。用大白话来说就是数据库快扛不住了。数据库出现性能瓶颈，对外表现有几个方面：大量请求阻塞在高并发场景下，大量请求都需要操作数据库，导致连接数不够了，请求处于阻塞状态。SQL操作变慢如果数据库中存在一张上亿数据量的表，一条SQL没有命中索引会全表扫描，这个查询耗时会非常久。存储出现问题业务量剧增，单库数据量越来越大，给存储
深入理解 JSON.stringify：优雅输出 JSON 数据天天进步2015 前端开发 json
在JavaScript开发中，JSON数据的处理是一项基础且关键的技能。JSON.stringify()方法作为将JavaScript对象转换为JSON字符串的标准工具，其功能远不止于简单的数据转换。本文将深入探讨JSON.stringify()的使用技巧、参数配置以及常见陷阱，帮助开发者更优雅地处理JSON数据输出。基础用法JSON.stringify()的基本语法如下：JSON.stringi
JavaScript的函数拦截技术详解天天进步2015 前端开发 javascript 开发语言 ecmascript
引言在JavaScript的世界里，函数是一等公民。它们可以被赋值给变量，作为参数传递，甚至可以被动态修改。函数拦截（FunctionInterception）是一种强大的技术，允许开发者在不修改原始函数代码的情况下，拦截、监控和修改函数的行为。本文将深入探讨JavaScript函数拦截的各种技术、应用场景以及最佳实践。什么是函数拦截？函数拦截是指在函数执行前、执行中或执行后插入自定义逻辑的过程。
【001安卓开发方案调研】之Java+Gradle+XML 原生安卓开发 ThinkPet 移动app开发 android java xml
基于2025年国内安卓开发领域的最新动态，结合Java+Gradle+XML技术组合的生态发展，以下是综合分析：一、技术成熟度评估1.核心架构稳定性Java语言基础作为安卓开发官方支持语言，Java在国内拥有超过15年的技术积累，字节码编译机制与安卓ART虚拟机的深度适配，使其在内存管理、多线程处理等场景表现稳定。主流应用如微信、支付宝均保留Java核心模块。Gradle构建体系Gradle8.5
Golang可选参数实践 yzh_1346983557 golang 可选参数
背景：go不支持类似java的方法重载，但对于函数的可选参数和默认参数配置，通常要在不影响不破坏现有逻辑基础上进行参数的添加。实现：通过options选项，使用函数进行参数的初始化和可选值的设置。代码：packagemainimport"fmt"//go实现可选参数实践//背景：go不支持方法重载，但对于函数的可选参数和默认参数配置，通常要在不影响不破坏现有逻辑基础上进行参数的添加//实现：通过o
JavaScript常用函数测试demo sunny05296 JavaScript javascript 开发语言 ecmascript
JavaScript常用函数测试demovimJavaScriptTestDemo.html内容如下：JavaScriptfunctionstestdemoEnterF12toviewtheconsoleoutputmessageofconsole.log()EnterF5torefresh//JavaScriptint2string/string2inttestfunctiontest01(){
JVM 的类加载机制原理冰糖心书房 JVM 2025 Java面试系列 java
JVM的类加载机制是指JVM将.class文件（包含Java字节码）加载到内存，并对其进行校验、解析、初始化，最终转换为JVM可以直接使用的Java类型的过程。类加载过程(5个阶段):加载(Loading):查找并加载类的二进制数据：通过类的全限定名（FullyQualifiedName）查找.class文件。类加载器（ClassLoader）负责查找和加载.class文件。类加载器有多种，包括启
jmeter安装和jmeter历史版本下载 weixin_30432007 java
一、jmete下载：1、最新版本下载地址：http://jmeter.apache.org/download_jmeter.cgi2、历史版本下载地址：https://archive.apache.org/dist/jmeter/binaries/二、软件安装及设置环境变量1、JDK安装目录在D:\ProgramFiles\Java，其环境变量设置为：JAVA_HOME值为：D:\ProgramF
nginx性能优化及使用方面技巧智慧源点 nginx 性能优化 linux
优化Nginx进程数量配置参数如下：代码语言：javascript复制worker_processes1;#指定Nginx要开启的进程数，结尾的数字就是进程的个数，可以为auto这个参数调整的是Nginx服务的worker进程数，Nginx有Master进程和worker进程之分，Master为管理进程、真正接待“顾客”的是worker进程。进程个数的策略：worker进程数可以设置为等于CPU的
达梦数据库学习笔记 lwq979991632 数据库
达梦数据库学习资料一、操作系统安装1、配置信息CPU：4核心内存：4G网络：NAT2.安装包选择选择带GUI的服务器，勾选Java平台、KDE二、安装前准备1.数据库远程访问：关闭防火墙systemctlstopfirewalld（禁用）systemctldisablefirewalld(停止，关闭开机自启动)systemctlstatusfirewalld（查看状态）2.安装gcc包rpm-qa
轻松帮你搞清楚Python爬虫数据可视化的流程 liuhaoran___ python
Python爬虫数据可视化的流程主要是通过网络爬取所需的数据，并利用相关的库将数据分析结果以图形化的方式展示出来，帮助用户更直观地理解数据背后的信息。Python爬虫+数据可视化步骤1.获取目标网站的数据使用`requests`或者`selenium`库从网页上抓取信息。对于动态加载内容的页面可以考虑结合JavaScript渲染引擎。2.解析HTML内容提取有用信息常见工具如BeautifulSo
蓝桥杯——算法训练——粘木棍大柠丶蓝桥杯蓝桥杯算法职场和发展
问题描述有N根木棍，需要将其粘贴成M个长木棍，使得最长的和最短的的差距最小。输入格式第一行两个整数N,M。一行N个整数，表示木棍的长度。输出格式一行一个整数，表示最小的差距样例输入32102040样例输出10数据规模和约定N,M<=7packagecom.study.蓝桥杯.算法训练;importjava.util.Arrays;importjava.util.Scanner;/***@autho
java封装继承多态等麦田的设计者 java eclipse jvm c encapsulatopn
最近一段时间看了很多的视频却忘记总结了，现在只能想到什么写什么了，希望能起到一个回忆巩固的作用。 1、final关键字译为：最终的 &
F5与集群的区别 bijian1013 weblogic 集群 F5
http请求配置不是通过集群，而是F5；集群是weblogic容器的，如果是ejb接口是通过集群。 F5同集群的差别，主要还是会话复制的问题，F5一把是分发http请求用的，因为http都是无状态的服务，无需关注会话问题，类似
LeetCode[Math] - #7 Reverse Integer Cwind java 题解 Math LeetCode Algorithm
原题链接：#7 Reverse Integer 要求：按位反转输入的数字例1：输入 x = 123, 返回 321 例2：输入 x = -123, 返回 -321 难度：简单分析：对于一般情况，首先保存输入数字的符号，然后每次取输入的末位（x%10）作为输出的高位（result = result*10 + x%10）即可。但
BufferedOutputStream 周凡杨
首先说一下这个大批量，是指有上千万的数据量。例子：有一张短信历史表，其数据有上千万条数据，要进行数据备份到文本文件，就是执行如下SQL然后将结果集写入到文件中！ select t.msisd
linux下模拟按键输入和鼠标被触发 linux
查看/dev/input/eventX是什么类型的事件， cat /proc/bus/input/devices 设备有着自己特殊的按键键码，我需要将一些标准的按键，比如0－9，X－Z等模拟成标准按键，比如KEY_0,KEY-Z等，所以需要用到按键模拟，具体方法就是操作/dev/input/event1文件，向它写入个input_event结构体就可以模拟按键的输入了。 linux/in
ContentProvider初体验肆无忌惮_ ContentProvider
ContentProvider在安卓开发中非常重要。与Activity，Service，BroadcastReceiver并称安卓组件四大天王。在android中的作用是用来对外共享数据。因为安卓程序的数据库文件存放在data/data/packagename里面，这里面的文件默认都是私有的，别的程序无法访问。如果QQ游戏想访问手机QQ的帐号信息一键登录，那么就需要使用内容提供者COnte
关于Spring MVC项目（maven）中通过fileupload上传文件 843977358 mybatis spring mvc 修改头像上传文件 upload
Spring MVC 中通过fileupload上传文件，其中项目使用maven管理。 1.上传文件首先需要的是导入相关支持jar包：commons-fileupload.jar,commons-io.jar 因为我是用的maven管理项目，所以要在pom文件中配置（每个人的jar包位置根据实际情况定） <!-- 文件上传 start by zhangyd-c --&g
使用svnkit api，纯java操作svn，实现svn提交，更新等操作 aigo svnkit
原文：http://blog.csdn.net/hardwin/article/details/7963318 import java.io.File; import org.apache.log4j.Logger; import org.tmatesoft.svn.core.SVNCommitInfo; import org.tmateso
对比浏览器，casperjs，httpclient的Header信息 alleni123 爬虫 crawler header
@Override protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { String type=req.getParameter("type"); Enumeration es=re
java.io操作 DataInputStream和DataOutputStream基本数据流百合不是茶 java 流
1，java中如果不保存整个对象，只保存类中的属性，那么我们可以使用本篇文章中的方法，如果要保存整个对象先将类实例化后面的文章将详细写到 2，DataInputStream 是java.io包中一个数据输入流允许应用程序以与机器无关方式从底层输入流中读取基本 Java 数据类型。应用程序可以使用数据输出流写入稍后由数据输入流读取的数据。
车辆保险理赔案例 bijian1013 车险
理赔案例：一货运车，运输公司为车辆购买了机动车商业险和交强险，也买了安全生产责任险，运输一车烟花爆竹，在行驶途中发生爆炸，出现车毁、货损、司机亡、炸死一路人、炸毁一间民宅等惨剧，针对这几种情况，该如何赔付。赔付建议和方案：客户所买交强险在这里不起作用，因为交强险的赔付前提是：“机动车发生道路交通意外事故”；如果是交通意外事故引发的爆炸，则优先适用交强险条款进行赔付，不足的部分由商业
学习Spring必学的Java基础知识(5)—注解 bijian1013 java spring
文章来源：http://www.iteye.com/topic/1123823，整理在我的博客有两个目的：一个是原文确实很不错，通俗易懂，督促自已将博主的这一系列关于Spring文章都学完；另一个原因是为免原文被博主删除，在此记录，方便以后查找阅读。有必要对
【Struts2一】Struts2 Hello World bit1129 Hello world
Struts2 Hello World应用的基本步骤创建Struts2的Hello World应用，包括如下几步： 1.配置web.xml 2.创建Action 3.创建struts.xml，配置Action 4.启动web server，通过浏览器访问配置web.xml <?xml version="1.0" encoding="
【Avro二】Avro RPC框架 bit1129 rpc
1. Avro RPC简介 1.1. RPC RPC逻辑上分为二层，一是传输层，负责网络通信；二是协议层，将数据按照一定协议格式打包和解包从序列化方式来看，Apache Thrift 和Google的Protocol Buffers和Avro应该是属于同一个级别的框架，都能跨语言，性能优秀，数据精简，但是Avro的动态模式（不用生成代码，而且性能很好）这个特点让人非常喜欢，比较适合R
lua　set get cookie ronin47 lua cookie
lua: local access_token = ngx.var.cookie_SGAccessToken if access_token then ngx.header["Set-Cookie"] = "SGAccessToken="..access_token.."; path=/;Max-Age=3000" end
java-打印不大于N的质数 bylijinnan java
public class PrimeNumber { /** * 寻找不大于N的质数 */ public static void main(String[] args) { int n=100; PrimeNumber pn=new PrimeNumber(); pn.printPrimeNumber(n); System.out.print
Spring源码学习-PropertyPlaceholderHelper bylijinnan java spring
今天在看Spring 3.0.0.RELEASE的源码，发现PropertyPlaceholderHelper的一个bug 当时觉得奇怪，上网一搜，果然是个bug，不过早就有人发现了，且已经修复：详见： http://forum.spring.io/forum/spring-projects/container/88107-propertyplaceholderhelper-bug
[逻辑与拓扑]布尔逻辑与拓扑结构的结合会产生什么? comsci 拓扑
如果我们已经在一个工作流的节点中嵌入了可以进行逻辑推理的代码,那么成百上千个这样的节点如果组成一个拓扑网络,而这个网络是可以自动遍历的,非线性的拓扑计算模型和节点内部的布尔逻辑处理的结合,会产生什么样的结果呢? 是否可以形成一种新的模糊语言识别和处理模型呢? 大家有兴趣可以试试,用软件搞这些有个好处,就是花钱比较少,就算不成
ITEYE 都换百度推广了 cuisuqiang Google AdSense 百度推广广告外快
以前ITEYE的广告都是谷歌的Google AdSense，现在都换成百度推广了。为什么个人博客设置里面还是Google AdSense呢？都知道Google AdSense不好申请，这在ITEYE上也不是讨论了一两天了，强烈建议ITEYE换掉Google AdSense。至少，用一个好申请的吧。什么时候能从ITEYE上来点外快，哪怕少点
新浪微博技术架构分析 dalan_123 新浪微博架构
新浪微博在短短一年时间内从零发展到五千万用户，我们的基层架构也发展了几个版本。第一版就是是非常快的，我们可以非常快的实现我们的模块。我们看一下技术特点，微博这个产品从架构上来分析，它需要解决的是发表和订阅的问题。我们第一版采用的是推的消息模式，假如说我们一个明星用户他有10万个粉丝，那就是说用户发表一条微博的时候，我们把这个微博消息攒成10万份，这样就是很简单了，第一版的架构实际上就是这两行字。第
玩转ARP攻击 dcj3sjt126com r
我写这片文章只是想让你明白深刻理解某一协议的好处。高手免看。如果有人利用这片文章所做的一切事情，盖不负责。网上关于ARP的资料已经很多了，就不用我都说了。用某一位高手的话来说，“我们能做的事情很多，唯一受限制的是我们的创造力和想象力”。 ARP也是如此。以下讨论的机子有一个要攻击的机子：10.5.4.178 硬件地址：52:54:4C:98
PHP编码规范 dcj3sjt126com 编码规范
一、文件格式 1. 对于只含有 php 代码的文件，我们将在文件结尾处忽略掉 "?>" 。这是为了防止多余的空格或者其它字符影响到代码。例如：<?php$foo = 'foo';2. 缩进应该能够反映出代码的逻辑结果，尽量使用四个空格，禁止使用制表符TAB，因为这样能够保证有跨客户端编程器软件的灵活性。例
linux 脱机管理（nohup） eksliang linux nohup nohup
脱机管理 nohup 转载请出自出处：http://eksliang.iteye.com/blog/2166699 nohup可以让你在脱机或者注销系统后，还能够让工作继续进行。他的语法如下 nohup [命令与参数] --在终端机前台工作 nohup [命令与参数] & --在终端机后台工作但是这个命令需要注意的是，nohup并不支持bash的内置命令，所
BusinessObjects Enterprise Java SDK greemranqq java BO SAP Crystal Reports
最近项目用到oracle_ADF 从SAP/BO 上调用水晶报表，资料比较少，我做一个简单的分享，给和我一样的新手提供更多的便利。首先，我是尝试用JAVA JSP 去访问的。官方API：http://devlibrary.businessobjects.com/BusinessObjectsxi/en/en/BOE_SDK/boesdk_ja
系统负载剧变下的管控策略 iamzhongyong 高并发
假如目前的系统有100台机器，能够支撑每天1亿的点击量（这个就简单比喻一下），然后系统流量剧变了要，我如何应对，系统有那些策略可以处理，这里总结了一下之前的一些做法。 1、水平扩展这个最容易理解，加机器，这样的话对于系统刚刚开始的伸缩性设计要求比较高，能够非常灵活的添加机器，来应对流量的变化。 2、系统分组假如系统服务的业务不同，有优先级高的，有优先级低的，那就让不同的业务调用提前分组
BitTorrent DHT 协议中文翻译 justjavac bit
前言做了一个磁力链接和BT种子的搜索引擎 {Magnet & Torrent}，因此把 DHT 协议重新看了一遍。 BEP: 5Title: DHT ProtocolVersion: 3dec52cb3ae103ce22358e3894b31cad47a6f22bLast-Modified: Tue Apr 2 16:51:45 2013 -070
Ubuntu下Java环境的搭建 macroli java 工作 ubuntu
配置命令：　　$sudo apt-get install ubuntu-restricted-extras 　　再运行如下命令：　　$sudo apt-get install sun-java6-jdk 　　待安装完毕后选择默认Java. 　　$sudo update- alternatives --config java 　　安装过程提示选择，输入“2”即可，然后按回车键确定。
js字符串转日期（兼容IE所有版本） qiaolevip TO Date String IE
/** * 字符串转时间（yyyy-MM-dd HH:mm:ss） * result （分钟） */ stringToDate : function(fDate){ var fullDate = fDate.split(" ")[0].split("-"); var fullTime = fDate.split("
【数据挖掘学习】关联规则算法Apriori的学习与SQL简单实现购物篮分析 superlxw1234 sql 数据挖掘关联规则
关联规则挖掘用于寻找给定数据集中项之间的有趣的关联或相关关系。关联规则揭示了数据项间的未知的依赖关系，根据所挖掘的关联关系，可以从一个数据对象的信息来推断另一个数据对象的信息。例如购物篮分析。牛奶 ⇒ 面包 [支持度：3%，置信度：40%] 支持度3%：意味3%顾客同时购买牛奶和面包。置信度40%：意味购买牛奶的顾客40%也购买面包。规则的支持度和置信度是两个规则兴
Spring 5.0 的系统需求，期待你的反馈 wiselyman spring
Spring 5.0将在2016年发布。Spring5.0将支持JDK 9。 Spring 5.0的特性计划还在工作中，请保持关注，所以作者希望从使用者得到关于Spring 5.0系统需求方面的反馈。

Elasticsearch7 分词器(内置分词器和自定义分词器)

文章目录

Elasticsearch7 分词器(内置分词器和自定义分词器)

analysis

概览

char_filter

html_strip

mapping

pattern_replace

filter

asciifolding

length

lowercase

uppercase

ngram

edge_ngram

decimal_digit

tokenizer

Word Oriented Tokenizers

Standard tokenizer

Partial Word Tokenizers

NGram Tokenizer

Edge NGram Tokenizer

Structured Text Tokenizers

analyzer

standard /Standard Tokenizer;Lower Case Token Filter,Stop Token Filter

simple /Lower Case Tokenizer

whitespace /Whitespace Tokenizer

stop /Lower Case Tokenizer;Stop Token Filter

keyword /Keyword Tokenizer

pattern /Pattern Tokenizer;Lower Case Token Filter,Stop Token Filter

Language Analyzers

fingerprint /Standard Tokenizer;Lower Case Token Filter,ASCII Folding Token Filter,Stop Token Filter,Fingerprint Token Filter

customer分词器

你可能感兴趣的:(java)