uxff

Elasticsearch 权威教程 - 模糊匹配

[[partial-matching]]
== Partial Matching

A keen observer will notice that all the queries so far in this book have
operated on whole terms.(((“partial matching”))) To match something, the smallest unit had to be a
single term. You can find only terms that exist in the inverted index.

But what happens if you want to match parts of a term but not the whole thing?
Partial matching allows users to specify a portion of the term they are
looking for and find any words that contain that fragment.

The requirement to match on part of a term is less common in the full-text
search-engine world than you might think. If you have come from an SQL
background, you likely have, at some stage of your career,
implemented a poor man’s full-text search using SQL constructs like this:

[source,js]

WHERE text LIKE "*quick*"
  AND text LIKE "*brown*"

AND text LIKE “fox” <1>

<1> *fox* would match fox'' andfoxes.”

Of course, with Elasticsearch, we have the analysis process and the inverted
index that remove the need for such brute-force techniques. To handle the
case of matching both fox'' andfoxes,” we could simply use a stemmer to
index words in their root form. There is no need to match partial terms.

That said, on some occasions partial matching can be useful.
Common use (((“partial matching”, “common use cases”)))cases include the following:

Matching postal codes, product serial numbers, or other not_analyzed values
that start with a particular prefix or match a wildcard pattern
or even a regular expression
search-as-you-type—displaying the most likely results before the
user has finished typing the search terms
Matching in languages like German or Dutch, which contain long compound
words, like Weltgesundheitsorganisation (World Health Organization)

We will start by examining prefix matching on exact-value not_analyzed
fields.
=== Postcodes and Structured Data

We will use United Kingdom postcodes (postal codes in the United States) to illustrate how(((“partial matching”, “postcodes and structured data”))) to use partial matching with
structured data. UK postcodes have a well-defined structure. For instance, the
postcode W1V 3DG can(((“postcodes (UK), partial matching with”))) be broken down as follows:

W1V: This outer part identifies the postal area and district:

** W indicates the area (one or two letters)
** 1V indicates the district (one or two numbers, possibly followed by a letter

3DG: This inner part identifies a street or building:

** 3 indicates the sector (one number)
** DG indicates the unit (two letters)

Let’s assume that we are indexing postcodes as exact-value not_analyzed
fields, so we could create our index as follows:

[source,js]

PUT /my_index
{
“mappings”: {
“address”: {
“properties”: {
“postcode”: {
“type”: “string”,
“index”: “not_analyzed”
}
}
}
}

}

// SENSE: 130_Partial_Matching/10_Prefix_query.json

And index some (((“indexing”, “postcodes”)))postcodes:

[source,js]

PUT /my_index/address/1
{ “postcode”: “W1V 3DG” }

PUT /my_index/address/2
{ “postcode”: “W2F 8HW” }

PUT /my_index/address/3
{ “postcode”: “W1F 7HW” }

PUT /my_index/address/4
{ “postcode”: “WC1N 1LZ” }

PUT /my_index/address/5

{ “postcode”: “SW5 0BE” }

// SENSE: 130_Partial_Matching/10_Prefix_query.json

Now our data is ready to be queried.
[[prefix-query]]
=== prefix Query

To find all postcodes beginning with W1, we could use a (((“prefix query”)))(((“postcodes (UK), partial matching with”, “prefix query”)))simple prefix
query:

[source,js]

GET /my_index/address/_search
{
“query”: {
“prefix”: {
“postcode”: “W1”
}
}

}

// SENSE: 130_Partial_Matching/10_Prefix_query.json

The prefix query is a low-level query that works at the term level. It
doesn’t analyze the query string before searching. It assumes that you have
passed it the exact prefix that you want to find.

[TIP]

By default, the prefix query does no relevance scoring. It just finds
matching documents and gives them all a score of 1. Really, it behaves more
like a filter than a query. The only practical difference between the
prefix query and the prefix filter is that the filter can be cached.

==================================================

Previously, we said that `you can find only terms that exist in the inverted index,'' but we haven't done anything special to index these postcodes; each postcode is simply indexed as the exact value specified in each document. So how does theprefix` query work?

[role=”pagebreak-after”]
Remember that the inverted index consists(((“inverted index”, “for postcodes”))) of a sorted list of unique terms (in
this case, postcodes). For each term, it lists the IDs of the documents
containing that term in the postings list. The inverted index for our
example documents looks something like this:

Term:          Doc IDs:
-------------------------
"SW5 0BE"    |  5
"W1F 7HW"    |  3
"W1V 3DG"    |  1
"W2F 8HW"    |  2
"WC1N 1LZ"   |  4
-------------------------

To support prefix matching on the fly, the query does the following:

Skips through the terms list to find the first term beginning with W1.
Collects the associated document IDs.
Moves to the next term.
If that term also begins with W1, the query repeats from step 2; otherwise, we’re finished.

While this works fine for our small example, imagine that our inverted index
contains a million postcodes beginning with W1. The prefix query
would need to visit all one million terms in order to calculate the result!

And the shorter the prefix, the more terms need to be visited. If we were to
look for the prefix W instead of W1, perhaps we would match 10 million
terms instead of just one million.

CAUTION: The prefix query or filter are useful for ad hoc prefix matching, but
should be used with care. (((“prefix query”, “caution with”))) They can be used freely on fields with a small
number of terms, but they scale poorly and can put your cluster under a lot of
strain. Try to limit their impact on your cluster by using a long prefix;
this reduces the number of terms that need to be visited.

Later in this chapter, we present an alternative index-time solution that
makes prefix matching much more efficient. But first, we’ll take a look at
two related queries: the wildcard and regexp queries.
=== wildcard and regexp Queries

The wildcard query is a low-level, term-based query (((“wildcard query”)))(((“partial matching”, “wildcard and regexp queries”)))similar in nature to the
prefix query, but it allows you to specify a pattern instead of just a prefix.
It uses the standard shell wildcards: ? matches any character, and *
matches zero or more characters.(((“postcodes (UK), partial matching with”, “wildcard queries”)))

This query would match the documents containing W1F 7HW and W2F 8HW:

[source,js]

GET /my_index/address/_search
{
“query”: {
“wildcard”: {
“postcode”: “W?F*HW” <1>
}
}

}

// SENSE: 130_Partial_Matching/15_Wildcard_regexp.json

<1> The ? matches the 1 and the 2, while the * matches the space
and the 7 and 8.

Imagine now that you want to match all postcodes just in the W area. A
prefix match would also include postcodes starting with WC, and you would
have a similar problem with a wildcard match. We want to match only postcodes
that begin with a W, followed by a number.(((“postcodes (UK), partial matching with”, “regexp query”)))(((“regexp query”))) The regexp query allows you to
write these more complicated patterns:

[source,js]

GET /my_index/address/_search
{
“query”: {
“regexp”: {
“postcode”: “W[0-9].+” <1>
}
}

}

// SENSE: 130_Partial_Matching/15_Wildcard_regexp.json

<1> The regular expression says that the term must begin with a W, followed
by any number from 0 to 9, followed by one or more other characters.

The wildcard and regexp queries work in exactly the same way as the
prefix query. They also have to scan the list of terms in the inverted
index to find all matching terms, and gather document IDs term by term. The
only difference between them and the prefix query is that they support more-complex patterns.

This means that the same caveats apply. Running these queries on a field with
many unique terms can be resource intensive indeed. Avoid using a
pattern that starts with a wildcard (for example, *foo or, as a regexp, .*foo).

Whereas prefix matching can be made more efficient by preparing your data at
index time, wildcard and regular expression matching can be done only
at query time. These queries have their place but should be used sparingly.

[CAUTION]

The prefix, wildcard, and regexp queries operate on terms. If you use
them to query an analyzed field, they will examine each term in the
field, not the field as a whole.(((“prefix query”, “on analyzed fields”)))(((“wildcard query”, “on analyzed fields”)))(((“regexp query”, “on analyzed fields”)))(((“analyzed fields”, “prefix, wildcard, and regexp queries on”)))

For instance, let’s say that our title field contains `Quick brown fox'' which produces the termsquick,brown, andfox`.

This query would match:

[source,json]

{ “regexp”: { “title”: “br.*” }}

But neither of these queries would match:

[source,json]

{ “regexp”: { “title”: “Qu.*” }} <1>

{ “regexp”: { “title”: “quick br*” }} <2>

<1> The term in the index is quick, not Quick.
<2> quick and brown are separate terms.

=================================================
=== Query-Time Search-as-You-Type

Leaving postcodes behind, let’s take a look at how prefix matching can help
with full-text queries. (((“partial matching”, “query time search-as-you-type”))) Users have become accustomed to seeing search results
before they have finished typing their query–so-called instant search, or
search-as-you-type. (((“search-as-you-type”)))(((“instant search”))) Not only do users receive their search results in less
time, but we can guide them toward results that actually exist in our index.

For instance, if a user types in johnnie walker bl, we would like to show results for Johnnie Walker Black Label and Johnnie Walker Blue
Label before they can finish typing their query.

As always, there are more ways than one to skin a cat! We will start by
looking at the way that is simplest to implement. You don’t need to prepare your
data in any way; you can implement search-as-you-type at query time on any
full-text field.

In <>, we introduced the match_phrase query, which matches
all the specified words in the same positions relative to each other. For-query time search-as-you-type, we can use a specialization of this query,
called (((“prefix query”, “match_phrase_prefix query”)))(((“match_phrase_prefix query”)))the match_phrase_prefix query:

[source,js]

{
“match_phrase_prefix” : {
“brand” : “johnnie walker bl”
}

}

// SENSE: 130_Partial_Matching/20_Match_phrase_prefix.json

This query behaves in the same way as the match_phrase query, except that it
treats the last word in the query string as a prefix. In other words, the
preceding example would look for the following:

johnnie
Followed by walker
Followed by words beginning with bl

If you were to run this query through the validate-query API, it would
produce this explanation:

"johnnie walker bl*"

Like the match_phrase query, it accepts a slop parameter (see <>) to
make the word order and relative positions (((“slop parameter”, “match_prhase_prefix query”)))(((“match_phrase_prefix query”, “slop parameter”)))somewhat less rigid:

[source,js]

{
“match_phrase_prefix” : {
“brand” : {
“query”: “walker johnnie bl”, <1>
“slop”: 10
}
}

}

// SENSE: 130_Partial_Matching/20_Match_phrase_prefix.json

<1> Even though the words are in the wrong order, the query still matches
because we have set a high enough slop value to allow some flexibility
in word positions.

However, it is always only the last word in the query string that is treated
as a prefix.

Earlier, in <>, we warned about the perils of the prefix–how
prefix queries can be resource intensive. The same is true in this
case.(((“match_phrase_prefix query”, “caution with”))) A prefix of a could match hundreds of thousands of terms. Not only
would matching on this many terms be resource intensive, but it would also not be
useful to the user.

We can limit the impact (((“match_phrase_prefix query”, “max_expansions”)))(((“max_expansions parameter”)))of the prefix expansion by setting max_expansions to
a reasonable number, such as 50:

[source,js]

{
“match_phrase_prefix” : {
“brand” : {
“query”: “johnnie walker bl”,
“max_expansions”: 50
}
}

}

// SENSE: 130_Partial_Matching/20_Match_phrase_prefix.json

The max_expansions parameter controls how many terms the prefix is allowed
to match. It will find the first term starting with bl and keep collecting
terms (in alphabetical order) until it either runs out of terms with prefix
bl, or it has more terms than max_expansions.

Don’t forget that we have to run this query every time the user types another
character, so it needs to be fast. If the first set of results isn’t what users are after, they’ll keep typing until they get the results that they want.

=== Index-Time Optimizations

All of the solutions we’ve talked about so far are implemented at
query time. (((“index time optimizations”)))(((“partial matching”, “index time optimizations”)))They don’t require any special mappings or indexing patterns;
they simply work with the data that you’ve already indexed.

The flexibility of query-time operations comes at a cost: search performance.
Sometimes it may make sense to move the cost away from the query. In a real-
time web application, an additional 100ms may be too much latency to tolerate.

By preparing your data at index time, you can make your searches more flexible
and improve performance. You still pay a price: increased index size and
slightly slower indexing throughput, but it is a price you pay once at index
time, instead of paying it on every query.

Your users will thank you.
=== Ngrams for Partial Matching

As we have said before, `You can find only terms that exist in the inverted index.'' Although theprefix,wildcard, andregexp` queries demonstrated that
that is not strictly true, it is true that doing a single-term lookup is
much faster than iterating through the terms list to find matching terms on
the fly.(((“partial matching”, “index time optimizations”, “n-grams”))) Preparing your data for partial matching ahead of time will increase
your search performance.

Preparing your data at index time means choosing the right analysis chain, and
the tool that we use for partial matching is the n-gram.(((“n-grams”))) An n-gram can be
best thought of as a moving window on a word. The n stands for a length.
If we were to n-gram the word quick, the results would depend on the length
we have chosen:

[horizontal]
* Length 1 (unigram): [ q, u, i, c, k ]
* Length 2 (bigram): [ qu, ui, ic, ck ]
* Length 3 (trigram): [ qui, uic, ick ]
* Length 4 (four-gram): [ quic, uick ]
* Length 5 (five-gram): [ quick ]

Plain n-grams are useful for matching somewhere within a word, a technique
that we will use in <>. However, for search-as-you-type,
we use a specialized form of n-grams called edge n-grams. (((“edge n-grams”))) Edge
n-grams are anchored to the beginning of the word. Edge n-gramming the word
quick would result in this:

q
qu
qui
quic
quick

You may notice that this conforms exactly to the letters that a user searching for “quick” would type. In other words, these are the
perfect terms to use for instant search!
=== Index-Time Search-as-You-Type

The first step to setting up index-time search-as-you-type is to(((“search-as-you-type”, “index time”)))(((“partial matching”, “index time search-as-you-type”))) define our
analysis chain, which we discussed in <>, but we will
go over the steps again here.

==== Preparing the Index

The first step is to configure a (((“partial matching”, “index time search-as-you-type”, “preparing the index”)))custom edge_ngram token filter,(((“edge_ngram token filter”))) which we
will call the autocomplete_filter:

[source,js]

{
“filter”: {
“autocomplete_filter”: {
“type”: “edge_ngram”,
“min_gram”: 1,
“max_gram”: 20
}
}

}

This configuration says that, for any term that this token filter receives,
it should produce an n-gram anchored to the start of the word of minimum
length 1 and maximum length 20.

Then we need to use this token filter in a custom analyzer,(((“analyzers”, “autocomplete custom analyzer”))) which we will call
the autocomplete analyzer:

[source,js]

{
“analyzer”: {
“autocomplete”: {
“type”: “custom”,
“tokenizer”: “standard”,
“filter”: [
“lowercase”,
“autocomplete_filter” <1>
]
}
}

}

<1> Our custom edge-ngram token filter

This analyzer will tokenize a string into individual terms by using the
standard tokenizer, lowercase each term, and then produce edge n-grams of each
term, thanks to our autocomplete_filter.

The full request to create the index and instantiate the token filter and
analyzer looks like this:

[source,js]

PUT /my_index
{
“settings”: {
“number_of_shards”: 1, <1>
“analysis”: {
“filter”: {
“autocomplete_filter”: { <2>
“type”: “edge_ngram”,
“min_gram”: 1,
“max_gram”: 20
}
},
“analyzer”: {
“autocomplete”: {
“type”: “custom”,
“tokenizer”: “standard”,
“filter”: [
“lowercase”,
“autocomplete_filter” <3>
]
}
}
}
}

}

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

<1> See <>.
<2> First we define our custom token filter.
<3> Then we use it in an analyzer.

You can test this new analyzer to make sure it is behaving correctly by using
the analyze API:

[source,js]

GET /my_index/_analyze?analyzer=autocomplete

quick brown

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

The results show us that the analyzer is working correctly. It returns these
terms:

q
qu
qui
quic
quick
b
br
bro
brow
brown

To use the analyzer, we need to apply it to a field, which we can do
with(((“update-mapping API, applying custom autocomplete analyzer to a field”))) the update-mapping API:

[source,js]

PUT /my_index/_mapping/my_type
{
“my_type”: {
“properties”: {
“name”: {
“type”: “string”,
“analyzer”: “autocomplete”
}
}
}

}

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

Now, we can index some test documents:

[source,js]

POST /my_index/my_type/_bulk
{ “index”: { “_id”: 1 }}
{ “name”: “Brown foxes” }
{ “index”: { “_id”: 2 }}

{ “name”: “Yellow furballs” }

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

==== Querying the Field

If you test out a query for `brown fo'' by using ((("partial matching", "index time search-as-you-type", "querying the field")))a simplematch` query

[source,js]

GET /my_index/my_type/_search
{
“query”: {
“match”: {
“name”: “brown fo”
}
}

}

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

you will see that both documents match, even though the Yellow furballs
doc contains neither brown nor fo:

[source,js]

{

“hits”: [
{
“_id”: “1”,
“_score”: 1.5753809,
“_source”: {
“name”: “Brown foxes”
}
},
{
“_id”: “2”,
“_score”: 0.012520773,
“_source”: {
“name”: “Yellow furballs”
}
}
]

}

As always, the validate-query API shines some light:

[source,js]

GET /my_index/my_type/_validate/query?explain
{
“query”: {
“match”: {
“name”: “brown fo”
}
}

}

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

The explanation shows us that the query is looking for edge n-grams of every
word in the query string:

name:b name:br name:bro name:brow name:brown name:f name:fo

The name:f condition is satisfied by the second document because
furballs has been indexed as f, fu, fur, and so forth. In retrospect, this
is not surprising. The same autocomplete analyzer is being applied both at
index time and at search time, which in most situations is the right thing to
do. This is one of the few occasions when it makes sense to break this rule.

We want to ensure that our inverted index contains edge n-grams of every word,
but we want to match only the full words that the user has entered (brown and fo). (((“analyzers”, “changing search analyzer from index analyzer”))) We can do this by using the autocomplete analyzer at
index time and the standard analyzer at search time. One way to change the
search analyzer is just to specify it in the query:

[source,js]

GET /my_index/my_type/_search
{
“query”: {
“match”: {
“name”: {
“query”: “brown fo”,
“analyzer”: “standard” <1>
}
}
}

}

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

<1> This overrides the analyzer setting on the name field.

Alternatively, we can specify (((“search_analyzer parameter”)))(((“index_analyzer parameter”)))the index_analyzer and search_analyzer in
the mapping for the name field itself. Because we want to change only the
search_analyzer, we can update the existing mapping without having to
reindex our data:

[source,js]

PUT /my_index/my_type/_mapping
{
“my_type”: {
“properties”: {
“name”: {
“type”: “string”,
“index_analyzer”: “autocomplete”, <1>
“search_analyzer”: “standard” <2>
}
}
}

}

// SENSE: 130_Partial_Matching/35_Search_as_you_type.json

<1> Use the autocomplete analyzer at index time to produce edge n-grams of
every term.

<2> Use the standard analyzer at search time to search only on the terms
that the user has entered.

If we were to repeat the validate-query request, it would now give us this
explanation:

name:brown name:fo

Repeating our query correctly returns just the Brown foxes
document.

Because most of the work has been done at index time, all this query needs to
do is to look up the two terms brown and fo, which is much more efficient
than the match_phrase_prefix approach of having to find all terms beginning
with fo.

.Completion Suggester

Using edge n-grams for search-as-you-type is easy to set up, flexible, and
fast. However, sometimes it is not fast enough. Latency matters, especially
when you are trying to provide instant feedback. Sometimes the fastest way of
searching is not to search at all.

The http://bit.ly/1IChV5j[completion suggester] in
Elasticsearch(((“completion suggester”))) takes a completely different approach. You feed it a list
of all possible completions, and it builds them into a _finite state
transducer_, an(((“Finite State Transducer”))) optimized data structure that resembles a big graph. To
search for suggestions, Elasticsearch starts at the beginning of the graph and
moves character by character along the matching path. Once it has run out of
user input, it looks at all possible endings of the current path to produce a
list of suggestions.

This data structure lives in memory and makes prefix lookups extremely fast,
much faster than any term-based query could be. It is an excellent match for
autocompletion of names and brands, whose words are usually organized in a
common order: Johnny Rotten'' rather thanRotten Johnny.”

When word order is less predictable, edge n-grams can be a better solution
than the completion suggester. This particular cat may be skinned in myriad
ways.

==== Edge n-grams and Postcodes

The edge n-gram approach can(((“postcodes (UK), partial matching with”, “using edge n-grams”)))(((“edge n-grams”, “and postcodes”))) also be used for structured data, such as the
postcodes example from <

[TIP]

The keyword tokenizer is the no-operation tokenizer, the tokenizer that does
nothing. Whatever string it receives as input, it emits exactly the same
string as a single token. It can therefore be used for values that we would
normally treat as not_analyzed but that require some other analysis
transformation such as lowercasing.

==================================================

This example uses the keyword tokenizer to convert the postcode string into a token stream, so that we can use the edge n-gram token filter:

[source,js]

{
“analysis”: {
“filter”: {
“postcode_filter”: {
“type”: “edge_ngram”,
“min_gram”: 1,
“max_gram”: 8
}
},
“analyzer”: {
“postcode_index”: { <1>
“tokenizer”: “keyword”,
“filter”: [ “postcode_filter” ]
},
“postcode_search”: { <2>
“tokenizer”: “keyword”
}
}
}

}

// SENSE: 130_Partial_Matching/35_Postcodes.json

<1> The postcode_index analyzer would use the postcode_filter
to turn postcodes into edge n-grams.
<2> The postcode_search analyzer would treat search terms as
if they were not_indexed.

[[ngrams-compound-words]]
=== Ngrams for Compound Words

Finally, let’s take a look at how n-grams can be used to search languages with
compound words. (((“languages”, “using many compound words, indexing of”)))(((“n-grams”, “using with compound words”)))(((“partial matching”, “using n-grams for compound words”)))(((“German”, “compound words in”))) German is famous for combining several small words into one
massive compound word in order to capture precise or complex meanings. For
example:

Aussprachewörterbuch::
Pronunciation dictionary

Militärgeschichte::
Military history

Weißkopfseeadler::
White-headed sea eagle, or bald eagle

Weltgesundheitsorganisation::
World Health Organization

Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz::
The law concerning the delegation of duties for the supervision of cattle
marking and the labeling of beef

Somebody searching for Wörterbuch'' (dictionary) would probably expect to seeAussprachewörtebuch” in the results list. Similarly, a search for
Adler'' (eagle) should includeWeißkopfseeadler.”

One approach to indexing languages like this is to break compound words into
their constituent parts using the http://bit.ly/1ygdjjC[compound word token filter].
However, the quality of the results depends on how good your compound-word
dictionary is.

Another approach is just to break all words into n-grams and to search for any
matching fragments–the more fragments that match, the more relevant the
document.

Given that an n-gram is a moving window on a word, an n-gram of any length
will cover all of the word. We want to choose a length that is long enough
to be meaningful, but not so long that we produce far too many unique terms.
A trigram (length 3) is (((“trigrams”)))probably a good starting point:

[source,js]

PUT /my_index
{
“settings”: {
“analysis”: {
“filter”: {
“trigrams_filter”: {
“type”: “ngram”,
“min_gram”: 3,
“max_gram”: 3
}
},
“analyzer”: {
“trigrams”: {
“type”: “custom”,
“tokenizer”: “standard”,
“filter”: [
“lowercase”,
“trigrams_filter”
]
}
}
}
},
“mappings”: {
“my_type”: {
“properties”: {
“text”: {
“type”: “string”,
“analyzer”: “trigrams” <1>
}
}
}
}

}

// SENSE: 130_Partial_Matching/40_Compound_words.json

<1> The text field uses the trigrams analyzer to index its contents as
n-grams of length 3.

Testing the trigrams analyzer with the analyze API

[source,js]

GET /my_index/_analyze?analyzer=trigrams

Weißkopfseeadler

// SENSE: 130_Partial_Matching/40_Compound_words.json

returns these terms:

wei, eiß, ißk, ßko, kop, opf, pfs, fse, see, eea,ead, adl, dle, ler

We can index our example compound words to test this approach:

[source,js]

POST /my_index/my_type/_bulk
{ “index”: { “_id”: 1 }}
{ “text”: “Aussprachewörterbuch” }
{ “index”: { “_id”: 2 }}
{ “text”: “Militärgeschichte” }
{ “index”: { “_id”: 3 }}
{ “text”: “Weißkopfseeadler” }
{ “index”: { “_id”: 4 }}
{ “text”: “Weltgesundheitsorganisation” }
{ “index”: { “_id”: 5 }}

{ “text”: “Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz” }

// SENSE: 130_Partial_Matching/40_Compound_words.json

A search for `Adler'' (eagle) becomes a query for the three termsadl,dle, andler`:

[source,js]

GET /my_index/my_type/_search
{
“query”: {
“match”: {
“text”: “Adler”
}
}

}

// SENSE: 130_Partial_Matching/40_Compound_words.json

which correctly matches “Weißkopfsee-adler”:

[source,js]

{
“hits”: [
{
“_id”: “3”,
“_score”: 3.3191128,
“_source”: {
“text”: “Weißkopfseeadler”
}
}
]

}

// SENSE: 130_Partial_Matching/40_Compound_words.json

A similar query for Gesundheit'' (health) correctly matchesWelt-gesundheit-sorganisation,” but it also matches
Militär-__ges__-chichte'' andRindfleischetikettierungsüberwachungsaufgabenübertragungs-ges-etz,”
both of which also contain the trigram ges.

Judicious use of the minimum_should_match parameter can remove these
spurious results by requiring that a minimum number of trigrams must be
present for a document to be considered a match:

[source,js]

GET /my_index/my_type/_search
{
“query”: {
“match”: {
“text”: {
“query”: “Gesundheit”,
“minimum_should_match”: “80%”
}
}
}

}

// SENSE: 130_Partial_Matching/40_Compound_words.json

This is a bit of a shotgun approach to full-text search and can result in a
large inverted index, but it is an effective generic way of indexing languages
that use many compound words or that don’t use whitespace between words,
such as Thai.

This technique is used to increase recall—the number of relevant
documents that a search returns. It is usually used in combination with
other techniques, such as shingles (see <>) to improve precision and
the relevance score of each document.

https://github.com/uxff/elasticsearch-definitive-guide-cn

你可能感兴趣的:(分布式,elasticsearch)

k8s部署Kafka集群潞哥的博客 kubernetes kafka 容器
1.1、Kafka(消息队列)是一个分布式消息中间件,支持分区的、多副本的、多订阅者的、基于zookeeper协调的分布式消息系统。通俗来说：kafka就是一个存储系统，存储的数据形式为“消息"；1.2、常用的消息系统有哪些以及各自的特点有activemq，rabbitmq，rocketmq，kafka1.3、为什么使用消息队列1)、提高扩展性：因为消息队列解耦了处理过程，有新增需求时只要另外增加
20250120 深入了解 Apache Flink 的 Checkpointing 靈臺清明 Flink apache flink 大数据
ApacheFlink是一种用于实时流处理和批处理的分布式计算框架。在实时流处理任务中，保证数据的一致性和任务的容错性是至关重要的，而Flink的Checkpointing机制正是实现这一目标的核心技术。本文将详细介绍Flink的Checkpointing，包括其概念、原理、配置和实际应用。什么是Checkpointing？Checkpointing是Flink提供的一种用于容错的机制。它会在流处
Flink Standalone 方案中解决挂机问题星尘幻宇科技 flink 大数据
Standalone中可以配置HighAvailability（HA）部署和配置首先了解Flink实际运行时包括两类进程：JobManager（又称为JobMaster）：协调Task的分布式执行，包括调度Task、协调创Checkpoint以及当Jobfailover时协调各个Task从Checkpoint恢复等。TaskManager（又称为Worker）：执行Dataflow中的Tasks，
ELK Stack学习笔记在线打码学习笔记 redis linux centos es elk
一、ELKStack简介1、Elasticsearch一个实时的分布式搜索和分析引擎，它可以用于全文搜索，结构化搜索以及分析。它是一个建立在全文搜索引擎ApacheLucene(信息检索的工具jar包)基础上的搜索引擎，使用Java语言编写2、Logstash一个完全开源的工具，可以对日志进行收集、过滤，并将其存储供以后使用。是开源的服务器端数据处理管道，能够从多个来源收集数据、转换数据。并保存到
大模型推理：vllm多机多卡分布式本地部署 m0_74824755 面试学习路线阿里巴巴分布式
文章目录1、vLLM分布式部署docker镜像构建通信环境配置2、其他大模型部署工具3、问题记录参考文献单台机器GPU资源不足以执行推理任务时，一个方法是模型蒸馏量化，结果就是会牺牲些效果。另一种方式是采用多台机器多个GPU进行推理，资源不足就堆机器虽然暴力但也是个不错的解决方法。值得注意的是多机多卡部署的推理框架，也适用于单机多卡，单机单卡，这里不过多赘述。1、vLLM分布式部署我的需求是Ubu
我的软件架构师——Java 职位面试经历。小蜗牛慢慢爬行 java 面试开发语言职场和发展后端 spring boot spring
最近，我参加了一家领先的服务型公司的软件架构师（Java）职位的面试。我在这里分享了一些面试官问我的问题。我只列出了与Java相关的问题，因为本文主要关注Java。面试官问我有关AWS、Docker、Kubernetes、Kafka、ElasticSearch、SQL/NoSQL和设计模式的问题。ClassNotFoundException和NoClassDefFoundError有什么区别？当您
Apache SeaTunnel 2.3.9 正式发布：多项新特性与优化全面提升数据集成能力数据库
近日，ApacheSeaTunnel社区正式发布了最新版本2.3.9。本次更新新增了`Helm集群部署、Transform支持多表、Zeta新API、表结构转换、任务提交队列、分库分表合并、列转多行`等多个功能更新！作为一款开源、分布式的数据集成平台，本次版本通过新增功能、性能优化与问题修复，为开发者与企业用户带来了更加全面的支持。2.3.9版本下载：https://seatunnel.apach
智能工厂的设计软件应用场景的一个例子：为AI聊天工具添加一个知识系统之14 方案再探之5：知识树三类节点对应的三种网络形式及其网络主机一水鉴天人工语言智能制造软件智能人工智能
本文要点前面讨论过（前面有错这里做了修正），三种簿册归档对应通过不同的网络形式（分布式、对等式和去中心式）。每种网络主机上分别提供：分布式控制脚本、对等式账本和备记手本通过以上讨论，div模型已经涵盖以下内容：从内容提供者（某个AI聊天工具，特定接口）到知识树及其三种节点（通用接口）到网络主机及其三种网络形式（节点专属操作接口）的要求。后面需要进一步为三者设计一个能实现耦合和解耦的程序需要特别说明
Elixir语言的软件工程十二日后包罗万象 golang 开发语言后端
Elixir语言的软件工程引言在当今的软件工程领域，选择编程语言和技术栈是一个至关重要的决策。随着分布式系统、实时应用和高并发场景的需求日益增加，Elixir语言应运而生。Elixir是一种基于Erlang虚拟机（BEAM）的编程语言，兼具了Erlang的并发特性和灵活性，同时增加了现代编程语言的一些优雅和简洁的特性。本文将深入探讨Elixir语言在软件工程中的应用，包括其核心特性、生态系统、最佳
百万架构师第二十四课：漫谈分布式架构：分布式架构设计｜JavaGuide 后端
主流架构模型-SOA架构和微服务架构领域驱动设计及业务驱动划分。分布式架构的基本理论CAP、BASE以及应用什么是分布式架构下的高可用设计分布式架构下的可伸缩设计构建高性能的分布式架构SOA架构和微服务架构ServiceOrientedArchitecture面向服务的架构，是架构模型，不是解决方案，是一种设计方法在这种方法下，有多个服务，而服务之间是相互依赖的或者通过一定的通讯机制去完成通讯的。
深入理解 Redis：高性能缓存与分布式存储架构全栈探索者chen redis 缓存 redis 分布式数据库开发语言服务器运维
深入理解Redis：高性能缓存与分布式存储架构Redis，作为现代互联网架构中广泛使用的高性能内存数据存储系统，其高效性、丰富的数据结构和分布式能力，使得它成为了分布式缓存和存储解决方案的首选。在本篇文章中，我们将深入探讨Redis的核心特性，工作原理，使用场景，并通过实际案例来帮助你掌握如何在项目中高效地使用Redis。目录Redis基础概念与核心特性Redis的工作原理Redis的数据持久化机
DolphinScheduler × Jiron：打造高效智能的数据调度新生态 jiron开源平台开发 flink 大数据 hadoop hive sqoop spring cloud sentinel
JironGitHub地址https://github.com/642933588/jiron-cloudhttps://gitee.com/642933588/jiron-cloudDolphinScheduler×Jiron：打造高效智能的数据调度新生态DolphinScheduler是一个开源的分布式任务调度平台，专为大数据场景下的工作流调度和数据治理而设计。将DolphinSchedule
深入浅出：了解TCP协议可乐泡枸杞· 系统设计必备：你不可不知的 20 种关键网络协议 tcp/ip 网络系统架构网络协议
系统设计中你必须知道的20种网络协议目录探索DHCP协议：自动化网络配置的幕后推手解析ARP协议：网络通信的桥梁探索DNS的奥秘：互联网的幕后英雄理解REST与RESTful：它们有何不同？了解ICMP：网络故障排查的好帮手了解SNMP：网络管理的利器探索RPC协议：分布式系统通信的关键探索SSH协议：安全远程访问的基石探索POP3协议：经典电子邮件通信协议探索IMAP协议：现代电子邮件通信的支柱
后端开发面试题6（附答案）来年定当除暴安良面试面试跳槽后端 golang
前言在下首语言是golang，所以会用他作为示例。原文参见@arialdomartini的:Back-EndDeveloperInterviewQuestions分布式系统相关问题1.怎么测试一个分布式系统？测试分布式系统是一项复杂且具有挑战性的任务，因为它涉及到多个组件在不同的网络环境和硬件设施上的协同工作。以下是一些测试分布式系统的关键步骤和方法：单元测试：对分布式系统中的每个独立模块进行单元
区块链的数学基础：核心原理与应用解析一休哥助手区块链
引言区块链技术作为分布式账本系统，成功地解决了传统中心化系统中的信任问题。其背后隐藏着复杂而精妙的数学原理，包括密码学、哈希函数、数字签名、椭圆曲线、零知识证明等。这些数学工具不仅为区块链提供了安全保障，也为智能合约和去中心化应用（DApps）的开发奠定了基础。本文将深入剖析区块链中的核心数学基础，帮助读者理解其工作原理与实际应用。一、区块链数学基础概述区块链的数学基础可以分为以下几个核心领域：密
GaussDB数据库SQL系列-LOCK TABLE 关沵什么柿数据库 gaussdb sql
一、前言GaussDB是一款高性能、高可用的分布式数据库，广泛应用于各类行业和场景。在GaussDB中，锁是实现并发控制的关键机制之一，用于协调多个事务之间的数据访问，确保数据的一致性和完整性。本文将围绕GaussDB数据库的LOCKTABLE做一简单介绍。二、GaussDB数据库的锁GaussDB提供了多种锁模式用于控制对表中数据的并发访问。这些模式可以用在MVCC（多版本并发控制）无法给出期望
CDN如何实现内容分发黑石云边缘计算
CDN（内容分发网络）实现内容分发主要依赖于其分布式架构和一系列关键技术。以下是CDN实现内容分发的主要步骤和机制：一、DNS解析与重定向当用户在浏览器中输入域名请求访问某个网站时，首先会向本地DNS服务器发起域名解析请求。如果本地DNS服务器没有缓存该域名的解析结果，它会递归地查询根DNS服务器和授权DNS服务器，直到获得域名对应的IP地址。如果该域名配置了CDN服务，本地DNS服务器会将域名的
【RocketMQ 消息中间件】RocketMQ篇之-消息存储为什么性能高 CommitLog 刷盘机制同步异步 java中间件消息队列
RocketMQ篇之-消息存储RocketMQ作为一款分布式消息中间件，高可靠性是其最重要的特性之一。所以需要将消息进行持久化存储，以保证消息不丢失。RocketMQ的消息存储是RocketMQ的核心组件之一，负责消息的存储和传输。RocketMQ的消息存储主要包括CommitLog、ConsumeQueue、IndexFile、Checkpoint等几个部分。（前置）消息存储交互流程生产者发送消
如何理解DDoS安全防护在企业安全防护中的作用服务器安全
DDoS安全防护在安全防护中扮演着非常重要的角色。DDoS（分布式拒绝服务）攻击是一种常见的网络攻击，旨在通过向目标服务器发送大量请求，以消耗服务器资源并使其无法正常运行。理解DDoS安全防护的作用，可以从以下几个方面来说明：1.维护业务连续性：DDoS攻击可能导致目标服务器过载，甚至无法正常工作，导致业务中断。DDoS安全防护可以帮助企业保持业务连续性，通过识别和过滤恶意流量，保持服务的可用性。
国产海光CPU平台兼容性指南-基础软件分册-20231013（附各系统下载链接）技术瘾君子1573 服务器&存储服务器兼容列表海光 CPU 云计算大数据操作系统
目录声明一、操作系统二、虚拟化和云2.1虚拟化和云2.2虚拟机上的操作系统2.2.1VMwarevSphere上的虚拟机操作系统2.2.2KVM上的虚拟机操作系统2.2.3WindowsHyper-V上的虚拟机操作系统2.2.4VirtualBox上的虚拟机操作系统三、分布式存储四、数据库五、中间件六、大数据七、平台组件7.1云平台7.2大数据平台7.3人工智能平台7.4科学与工程计算平台八、其它
Kylin入门教程 -龙川- 介绍学习笔记 kylin
引言ApacheKylin是一个开源的分布式分析引擎，提供Hadoop上的多维分析（OLAP）能力，使得超大规模数据集的实时查询和分析成为可能。它通过预计算数据立方体来加速查询，使得复杂查询可以在亚秒级响应。本文将详细介绍Kylin的基本概念、安装与配置、基本操作及高级功能，帮助你全面掌握这款强大的数据分析工具。第一部分：Kylin简介1.1什么是Kylin？Kylin是由eBay开发并捐赠给Ap
【Elasticsearch 实战应用】 wenshao.du elasticsearch
Elasticsearch实战应用在现代企业技术架构中，Elasticsearch因其出色的性能、可扩展性和易用性，成为了处理大规模数据和构建搜索引擎的首选工具。本文将通过一个实际案例，详细讲解如何在SpringBoot项目中集成Elasticsearch，进行数据索引、搜索、聚合分析等操作。1.Elasticsearch简介Elasticsearch是一个基于ApacheLucene构建的开源分
【Git】Git 完全指南：从入门到精通 LuckiBit Git git GitHub 分布式版本管理 Windows python mac
Git完全指南：从入门到精通Git是现代软件开发中最重要的版本控制工具之一，它帮助开发者高效地管理项目，支持分布式协作和版本控制。无论是个人项目还是团队开发，Git都能提供强大的功能来跟踪、管理代码变更，并保障项目的稳定性与可持续发展。本篇文章从基础命令讲起，逐步深入，帮助你全面了解并掌握Git，最终达到精通。目录Git完全指南：从入门到精通1.Git概述1.1什么是Git1.2Git与其他版本控
对等能源交易（Peer-to-Peer Energy Trading）能源革命技术能源能源
概述对等能源交易（Peer-to-PeerEnergyTrading,P2PET）是一种新兴的能源交易模式，它允许能源消费者和生产者在去中心化的环境中直接进行交易。这种模式通常利用区块链技术来确保交易的安全性和透明度。对等能源交易，它改变了传统上由中央电网或大型能源公司主导的能源分配模式。在P2P能源交易中，个体用户可以既是能源的消费者也是生产者（即“产消者”），他们能够通过分布式能源资源（Dis
【黑马-SpringCloudAlibaba】学习笔记10-Seata：实现分布式事务控制言谶分布式学习 java
Seata介绍2019年1月，阿里巴巴中间件团队发起了开源项目Fescar（Fast&EaSyCommitAndRollback），其愿景是让分布式事务的使用像本地事务的使用一样，简单和高效，并逐步解决开发者们遇到的分布式事务方面的所有难题。后来更名为Seata，意为：SimpleExtensibleAutonomousTransactionArchitecture，是一套分布式事务解决方案。Se
黑马商城 Spring Cloud 微服务课程笔记：分布式事务 - Seata 的架构和原理阿贾克斯的黎明 java 架构 spring cloud 微服务
目录黑马商城SpringCloud微服务课程笔记：分布式事务-Seata的架构和原理一、Seata解决的问题场景二、Seata的架构三、Seata的原理在黑马商城的微服务架构中，当涉及到多个微服务协同完成一个业务操作时，分布式事务的处理变得至关重要。其中，Seata是一个开源的分布式事务解决方案，用于解决微服务架构中的分布式事务问题。一、Seata解决的问题场景在黑马商城中，例如用户下单购买商品这
黑马商城 Spring Cloud 微服务课程笔记 - 分布式事务 Seata（DAY2 - 10）阿贾克斯的黎明 java spring cloud 微服务笔记
目录黑马商城SpringCloud微服务课程笔记-分布式事务Seata（DAY2-10）一、课程内容概述二、原理三、知识点和步骤（一）知识点（二）步骤一、课程内容概述在黑马商城的SpringCloud微服务架构中，DAY2-10主要聚焦于分布式事务的解决方案——Seata。当微服务之间进行协作时，例如在一个业务流程涉及多个微服务的操作时，如何保证这些操作要么全部成功，要么全部失败，以确保数据的一致
在Linux中修改vm.max_map_count参数的步骤行路见知 linux 运维
使用docker安装es时报错，Elasticsearch需要更多的虚拟内存区域ERROR:[1]bootstrapchecksfailed.Youmustaddressthepointsdescribedinthefollowing[1]linesbeforestartingElasticsearch.bootstrapcheckfailure[1]of[1]:maxvirtualmemorya
2025java面试常见八股文整理 Java八股文面试面试职场和发展 java spring boot jvm spring spring cloud
1.多线程编程下，怎么解决线程的数据安全问题？如果线程存在竞争临界资源，多线程访问下添加同步代码块synchronized解决，或者分布式排他锁进行临界资源控制。在分布式多线程环境下，线程的数据安全尽量不要产生连接资源，使用线程本地化ThreadLocal实现线程资源隔离。2.SpringIOC依赖注入怎么理解，spring有几种方式属性注入，setter构建pojo实体类和有参构造方法工厂方法注
微软开源AI Agent AutoGen 详解培根芝士 AI microsoft 人工智能
AutoGen是微软发布的一个用于构建AIAgent系统的开源框架，旨在简化事件驱动、分布式、可扩展和弹性Agent应用程序的创建过程。开源地址：GitHub-microsoft/autogen:AprogrammingframeworkforagenticAIPyPi:autogen-agentchatDiscord:https://aka.ms/autogen-discordOfficeHou
redis学习笔记——不仅仅是存取数据 Everyday都不同 returnSource expire/del incr/lpush 数据库分区 redis
最近项目中用到比较多redis，感觉之前对它一直局限于get/set数据的层面。其实作为一个强大的NoSql数据库产品，如果好好利用它，会带来很多意想不到的效果。（因为我搞java，所以就从jedis的角度来补充一点东西吧。PS：不一定全，只是个人理解，不喜勿喷） 1、关于JedisPool.returnSource(Jedis jeids) 这个方法是从red
SQL性能优化-持续更新中。。。。。。 atongyeye oracle sql
1 通过ROWID访问表--索引你可以采用基于ROWID的访问方式情况,提高访问表的效率, , ROWID包含了表中记录的物理位置信息..ORACLE采用索引(INDEX)实现了数据和存放数据的物理位置(ROWID)之间的联系. 通常索引提供了快速访问ROWID的方法,因此那些基于索引列的查询就可以得到性能上的提高. 2 共享SQL语句--相同的sql放入缓存 3 选择最有效率的表
[JAVA语言]JAVA虚拟机对底层硬件的操控还不完善 comsci JAVA虚拟机
如果我们用汇编语言编写一个直接读写CPU寄存器的代码段，然后利用这个代码段去控制被操作系统屏蔽的硬件资源，这对于JVM虚拟机显然是不合法的，对操作系统来讲，这样也是不合法的，但是如果是一个工程项目的确需要这样做，合同已经签了，我们又不能够这样做，怎么办呢？那么一个精通汇编语言的那种X客，是否在这个时候就会发生某种至关重要的作用呢？ &n
lvs- real 男人50 LVS
#!/bin/bash # # Script to start LVS DR real server. # description: LVS DR real server # #. /etc/rc.d/init.d/functions VIP=10.10.6.252 host='/bin/hostname' case "$1" in sta
生成公钥和私钥 oloz DSA 安全加密
package com.msserver.core.util; import java.security.KeyPair; import java.security.PrivateKey; import java.security.PublicKey; import java.security.SecureRandom; public class SecurityUtil {
UIView 中加入的cocos2d，背景透明 374016526 cocos2d glClearColor
要点是首先pixelFormat:kEAGLColorFormatRGBA8，必须有alpha层才能透明。然后view设置为透明glView.opaque = NO;[director setOpenGLView:glView];[self.viewController.view setBackgroundColor:[UIColor clearColor]];[self.viewControll
mysql常用命令香水浓 mysql
连接数据库 mysql -u troy -ptroy 备份表 mysqldump -u troy -ptroy mm_database mm_user_tbl > user.sql 恢复表（与恢复数据库命令相同） mysql -u troy -ptroy mm_database < user.sql 备份数据库 mysqldump -u troy -ptroy
我的架构经验系列文章 - 后端架构 - 系统层面 agevs JavaScript jquery css html5
系统层面：高可用性所谓高可用性也就是通过避免单独故障加上快速故障转移实现一旦某台物理服务器出现故障能实现故障快速恢复。一般来说，可以采用两种方式，如果可以做业务可以做负载均衡则通过负载均衡实现集群，然后针对每一台服务器进行监控，一旦发生故障则从集群中移除；如果业务只能有单点入口那么可以通过实现Standby机加上虚拟IP机制，实现Active机在出现故障之后虚拟IP转移到Standby的快速
利用ant进行远程tomcat部署 aijuans tomcat
在javaEE项目中，需要将工程部署到远程服务器上，如果部署的频率比较高，手动部署的方式就比较麻烦，可以利用Ant工具实现快捷的部署。这篇博文详细介绍了ant配置的步骤（http://www.cnblogs.com/GloriousOnion/archive/2012/12/18/2822817.html），但是在tomcat7以上不适用，需要修改配置，具体如下： 1.配置tomcat的用户角色
获取复利总收入 baalwolf 获取
public static void main(String args[]){ int money=200; int year=1; double rate=0.1; &
eclipse.ini解释 BigBird2012 eclipse
大多数java开发者使用的都是eclipse，今天感兴趣去eclipse官网搜了一下eclipse.ini的配置，供大家参考，我会把关键的部分给大家用中文解释一下。还是推荐有问题不会直接搜谷歌，看官方文档，这样我们会知道问题的真面目是什么，对问题也有一个全面清晰的认识。 Overview 1、Eclipse.ini的作用 Eclipse startup is controlled by th
AngularJS实现分页功能 bijian1013 JavaScript AngularJS 分页
对于大多数web应用来说显示项目列表是一种很常见的任务。通常情况下，我们的数据会比较多，无法很好地显示在单个页面中。在这种情况下，我们需要把数据以页的方式来展示，同时带有转到上一页和下一页的功能。既然在整个应用中这是一种很常见的需求，那么把这一功能抽象成一个通用的、可复用的分页（Paginator）服务是很有意义的。 &nbs
[Maven学习笔记三]Maven archetype bit1129 ArcheType
archetype的英文意思是原型，Maven archetype表示创建Maven模块的模版，比如创建web项目，创建Spring项目等等. mvn archetype提供了一种命令行交互式创建Maven项目或者模块的方式， mvn archetype 1.在LearnMaven-ch03目录下，执行命令mvn archetype:gener
【Java命令三】jps bit1129 Java命令
jps很简单，用于显示当前运行的Java进程，也可以连接到远程服务器去查看 [hadoop@hadoop bin]$ jps -help usage: jps [-help] jps [-q] [-mlvV] [<hostid>] Definitions: <hostid>: <hostname>[:
ZABBIX2.2 2.4 等各版本之间的兼容性 ronin47
zabbix更新很快，从2009年到现在已经更新多个版本，为了使用更多zabbix的新特性，随之而来的便是升级版本，zabbix版本兼容性是必须优先考虑的一点客户端AGENT兼容 zabbix1.x到zabbix2.x的所有agent都兼容zabbix server2.4：如果你升级zabbix server，客户端是可以不做任何改变，除非你想使用agent的一些新特性。 Zabbix代理（p
unity 3d还是cocos2dx哪个适合游戏？ brotherlamp unity自学 unity教程 unity视频 unity资料 unity
unity 3d还是cocos2dx哪个适合游戏？问：unity 3d还是cocos2dx哪个适合游戏？答：首先目前来看unity视频教程因为是3d引擎，目前对2d支持并不完善，unity 3d 目前做2d普遍两种思路，一种是正交相机，3d画面2d视角，另一种是通过一些插件，动态创建mesh来绘制图形单元目前用的较多的是2d toolkit，ex2d，smooth moves，sm2，
百度笔试题：一个已经排序好的很大的数组，现在给它划分成m段，每段长度不定，段长最长为k，然后段内打乱顺序，请设计一个算法对其进行重新排序 bylijinnan java 算法面试百度招聘
import java.util.Arrays; /** * 最早是在陈利人老师的微博看到这道题： * #面试题#An array with n elements which is K most sorted，就是每个element的初始位置和它最终的排序后的位置的距离不超过常数K * 设计一个排序算法。It should be faster than O(n*lgn)。
获取checkbox复选框的值 chiangfai checkbox
<title>CheckBox</title> <script type = "text/javascript"> doGetVal: function doGetVal() { //var fruitName = document.getElementById("apple").value;//根据
MySQLdb用户指南 chenchao051 mysqldb
原网页被墙，放这里备用。 MySQLdb User's Guide Contents Introduction Installation _mysql MySQL C API translation MySQL C API function mapping Some _mysql examples MySQLdb
HIVE 窗口及分析函数 daizj hive 窗口函数分析函数
窗口函数应用场景：（1）用于分区排序（2）动态Group By （3）Top N （4）累计计算（5）层次查询一、分析函数用于等级、百分点、n分片等。函数说明 RANK() &nbs
PHP ZipArchive 实现压缩解压Zip文件 dcj3sjt126com PHP zip
PHP ZipArchive 是PHP自带的扩展类，可以轻松实现ZIP文件的压缩和解压，使用前首先要确保PHP ZIP 扩展已经开启，具体开启方法就不说了，不同的平台开启PHP扩增的方法网上都有，如有疑问欢迎交流。这里整理一下常用的示例供参考。一、解压缩zip文件 01 02 03 04 05 06 07 08 09 10 11
精彩英语贺词 dcj3sjt126com 英语
I'm always here 我会一直在这里支持你 &nb
基于Java注解的Spring的IoC功能 e200702084 java spring bean IOC Office
java模拟post请求 geeksun java
一般API接收客户端（比如网页、APP或其他应用服务）的请求，但在测试时需要模拟来自外界的请求，经探索，使用HttpComponentshttpClient可模拟Post提交请求。此处用HttpComponents的httpclient来完成使命。 import org.apache.http.HttpEntity ; import org.apache.http.HttpRespon
Swift语法之 ---- ?和!区别 hongtoushizi ?swift !
转载自： http://blog.sina.com.cn/s/blog_71715bf80102ux3v.html Swift语言使用var定义变量，但和别的语言不同，Swift里不会自动给变量赋初始值，也就是说变量不会有默认值，所以要求使用变量之前必须要对其初始化。如果在使用变量之前不进行初始化就会报错： var stringValue : String //
centos7安装jdk1.7 jisonami jdk centos
安装JDK1.7 步骤1、解压tar包在当前目录 [root@localhost usr]#tar -xzvf jdk-7u75-linux-x64.tar.gz 步骤2：配置环境变量在etc/profile文件下添加 export JAVA_HOME=/usr/java/jdk1.7.0_75 export CLASSPATH=/usr/java/jdk1.7.0_75/lib
数据源架构模式之数据映射器 home198979 PHP 架构数据映射器 datamapper
前面分别介绍了数据源架构模式之表数据入口、数据源架构模式之行和数据入口数据源架构模式之活动记录，相较于这三种数据源架构模式，数据映射器显得更加“高大上”。一、概念数据映射器（Data Mapper）：在保持对象和数据库（以及映射器本身）彼此独立的情况下，在二者之间移动数据的一个映射器层。概念永远都是抽象的，简单的说，数据映射器就是一个负责将数据映射到对象的类数据。 &nb
在Python中使用MYSQL pda158 mysql python
缘由　　近期在折腾一个小东西须要抓取网上的页面。然后进行解析。将结果放到数据库中。　　了解到 Python在这方面有优势，便选用之。　　由于我有台 server上面安装有 mysql，自然使用之。在进行数据库的这个操作过程中遇到了不少问题，这里记录一下，大家共勉。　　 python中mysql的调用　　百度之后能够通过MySQLdb进行数据库操作。
单例模式 hxl1988_0311 java 单例设计模式单件
package com.sosop.designpattern.singleton; /* * 单件模式：保证一个类必须只有一个实例，并提供全局的访问点 * * 所以单例模式必须有私有的构造器，没有私有构造器根本不用谈单件 * * 必须考虑到并发情况下创建了多个实例对象 * */ /** * 虽然有锁，但是只在第一次创建对象的时候加锁，并发时不会存在效率
27种迹象显示你应该辞掉程序员的工作 vipshichg 工作
1、你仍然在等待老板在2010年答应的要提拔你的暗示。 2、你的上级近10年没有开发过任何代码。 3、老板假装懂你说的这些技术，但实际上他完全不知道你在说什么。 4、你干完的项目6个月后才部署到现场服务器上。 5、时不时的，老板在检查你刚刚完成的工作时，要求按新想法重新开发。 6、而最终这个软件只有12个用户。 7、时间全浪费在办公室政治中，而不是用在开发好的软件上。 8、部署前5分钟才开始测试。