wuwuwuwuwuwuwuwu

WordNet-based semantic similarity measurement

WordNet-based semantic similarity measurement( http://www.cppblog.com/baby-fly/archive/2010/03/19/110111.html)

Download source from Google Code repository

(See the article on how to get a working copy of the repository.)

Introduction

In the previous article, we presented an approach for capturing similarity between words that was concerned with the syntactic similarity of two strings. Today, we are back to discuss another approach that is more concerned with the meaning of words. Semantic similarity is a confidence score that reflects the semantic relation between the meanings of two sentences. It is difficult to gain a high accuracy score because the exact semantic meanings are completely understood only in a particular context.

The goals of this paper are to:

Present to you some dictionary-based algorithms to capture the semantic similarity between two sentences, which is heavily based on the WordNet semantic dictionary.
Encourage you to work with the interesting topic of NLP.

Groundwork

Before we go any further, let us start with some brief introduction of the groundwork.

WordNet

WordNet is a lexical database which is available online, and provides a large repository of English lexical items. There is a multilingual WordNet for European languages which is structured in the same way as the English language WordNet.

WordNet was designed to establish the connections between four types of Parts of Speech (POS) - noun, verb, adjective, and adverb. The smallest unit in a WordNet is synset, which represents a specific meaning of a word. It includes the word, its explanation, and its synonyms. The specific meaning of one word under one type of POS is called a sense. Each sense of a word is in a different synset. Synsets are equivalent to senses = structures containing sets of terms with synonymous meanings. Each synset has a gloss that defines the concept it represents. For example, the words night, nighttime, and dark constitute a single synset that has the following gloss: the time after sunset and before sunrise while it is dark outside. Synsets are connected to one another through explicit semantic relations. Some of these relations (hypernym, hyponym for nouns, and hypernym and troponym for verbs) constitute is-a-kind-of (holonymy) and is-a-part-of (meronymy for nouns) hierarchies.

For example, tree is a kind of plant, tree is a hyponym of plant, and plant is a hypernym of tree. Analogously, trunk is a part of a tree, and we have trunk as a meronym of tree, and tree is a holonym of trunk. For one word and one type of POS, if there is more than one sense, WordNet organizes them in the order of the most frequently used to the least frequently used (Semcor).

WordNet.NET

Malcolm Crowe and Troy Simpson have developed an Open-Source .NET Framework library for WordNet, called WordNet.Net.

WordNet.Net was originally created by Malcolm Crowe, and it was known as a C# library for WordNet. It was created for WordNet 1.6, and stayed in its original form until after the release of WordNet 2.0 when Troy gained permission from Malcolm to use the code for freeware dictionary/thesaurus projects. Finally, after WordNet 2.1 was released, Troy released his version of Malcolm's library as an LGPL library known as WordNet.Net (with permission from Princeton and Malcolm Crowe, and in consultation with the Free Software Foundation), which was updated to work with the WordNet 2.1 database.

At the time of this writing, the WordNet.Net library is Open-Sourced for a short period of time, but it is expected to mature as more projects such as this spawn from the library's availability. Bug fixing and extensions to Malcolm's original library had been ongoing for over a year and a half prior to the release of the Open Source project. This is the project address of WordNet.Net.

Semantic similarity between sentences

Given two sentences, the measurement determines how similar the meaning of two sentences is. The higher the score, the more similar the meaning of the two sentences.

Here are the steps for computing semantic similarity between two sentences:

First, each sentence is partitioned into a list of tokens.
Part-of-speech disambiguation (or tagging).
Stemming words.
Find the most appropriate sense for every word in a sentence (Word Sense Disambiguation).
Finally, compute the similarity of the sentences based on the similarity of the pairs of words.

Tokenization

Each sentence is partitioned into a list of words, and we remove the stop words. Stop words are frequently occurring, insignificant words that appear in a database record, article, or a web page, etc.

Tagging part of speech (+)

This task is to identify the correct part of speech (POS - like noun, verb, pronoun, adverb ...) of each word in the sentence. The algorithm takes a sentence as input and a specified tag set (a finite list of POS tags). The output is a single best POS tag for each word. There are two types of taggers: the first one attaches syntactic roles to each word (subject, object, ..), and the second one attaches only functional roles (noun, verb, ...). There is a lot of work that has been done on POS tagging. The tagger can be classified as rule-based or stochastic. Rule-based taggers use hand written rules to disambiguate tag ambiguity. An example of rule-based tagging is Brill's tagger (Eric Brill algorithm). Stochastic taggers resolve tagging ambiguities by using a training corpus to compute the probability of a given word having a given tag in a given context. For example: tagger using the Hidden Markov Model, Maximize likelihood.

Brill Tagger sample for C#

There are two samples included for using the Brill Tagger from a C# application. The Brill Tagger tools, libraries, and samples can be found under the 3rd_Party_Tools_Data folder in the source repository.

One of the available ports is a VB.NET port by Steven Abbott of the original Brill Tagger. That port has been in turn ported to C# by Troy Simpson. The other is a port to VC++ by Paul Maddox. The C# test program for Paul Maddox's port uses a wrapper to read stdout directly from the command line application. The wrapper was created using a template by Mike Mayer.

See the respective test applications for working examples on using the Brill Tagger from C#. The port of Steven Abbott's work is fairly new, but after some testing, it is likely that Paul's VC++ port will be deprecated and replaced with Troy's C# port of Steven's VB.NET work.

Stemming word (+)

We use the Porter stemming algorithm. Porter stemming is a process of removing the common morphological and inflexional endings of words. It can be thought of as a lexicon finite state transducer with the following steps: Surface form -> split word into possible morphemes -> getting intermediate form -> map stems to categories and affixes to meaning -> underlying form. I.e.: foxes -> fox + s -> fox.

(+) Currently these works are not used in the semantic similarity project and will soon be integrated. To get their ideas, you can use the porterstermer class and the Brill Tagger sample in the repository.

Semantic relatedness and Word Sense Disambiguation (WSD)

As you are already aware, a word can have more than one sense that can lead to ambiguity. For example: the word "interest" has different meanings in the following two contexts:

"Interest" from a bank.
"Interest" in a subject.

WSD with original Micheal Lesk algorithm

Disambiguation is the process of finding out the most appropriate sense of a word that is used in a given sentence. The Lesk algorithm _[13]uses dictionary definitions (gloss) to disambiguate a polysemous word in a sentence context. The major objective of its idea is to count the number of words that are shared between two glosses. The more overlapping the words, the more related the senses are.

To disambiguate a word, the gloss of each of its senses is compared to the glosses of every other word in a phrase. A word is assigned to the sense whose gloss shares the largest number of words in common with the glosses of the other words.

For example: In performing disambiguation for the "pine cone" phrasal, according to the Oxford Advanced Learner's Dictionary, the word "pine" has two senses:

sense 1: kind of evergreen tree with needle-shaped leaves,
sense 2: waste away through sorrow or illness.

The word "cone" has three senses:

sense 1: solid body which narrows to a point,
sense 2: something of this shape, whether solid or hollow,
sense 3: fruit of a certain evergreen tree.

By comparing each of the two gloss senses of the word "pine" with each of the three senses of the word "cone", it is found that the words "evergreen tree" occurs in one sense in each of the two words. So, these two senses are then declared to be the most appropriate senses when the words "pine" and "cone" are used together.

The original Lesk algorithm begins anew for each word, and does not utilize the senses it previously assigned. This greedy method does not always work effectively. Therefore, if the computational time is not critical, we should think of optimal sense combination by applying local search techniques such as Beam. The major idea behind such methods is to reduce the search space by applying several heuristic techniques. The Beam searcher limits its attention to only the k most promising candidates at each stage of the search process, where k is a predefined number.

The adapted Micheal Lesk algorithm

The original Lesk used the gloss of a word, and is restricted on the overlap scoring mechanism. In this section, we introduce an adapted version of the algorithm_[16] with some improvements to overcome the limitations:

Access a dictionary with senses arranged in a hierarchical order (WordNet). This extended version uses not only the gloss/definition of the synset, but also considers the meaning of related words.
Apply a new scoring mechanism to measure gloss overlap that gives a more accurate score than the original Lesk bag of words counter.

To disambiguate each word in a sentence that has N words, we call each word to be disambiguated as a target word. The algorithm is described in the following steps:

Select a context: optimizes computational time so if N is long, we will define K context around the target word (or k-nearest neighbor) as the sequence of words, starting K words to the left of the target word, and ending K words to the right. This will reduce the computational space that decreases the processing time. For example: If k is four, there will be two words to the left of the target word and two words to the right.
For each word in the selected context, we look up and list all the possible senses of both POS (part of speech) noun and verb.
For each sense of a word (WordSense), we list the following relations (example of pine and cone):
- Its own gloss/definition that includes example texts that WordNet provides to the glosses.
- The gloss of the synsets that are connected to it through the hypernym relations. If there is more than one hypernym for a word sense, then the glosses for each hypernym are concatenated into a single gloss string (*).
- The gloss of the synsets that are connected to it through the hyponym relations (*).
- The gloss of the synsets that are connected to it through the meronym relations (*).
- The gloss of the synsets that are connected to it through the troponym relations (*).
  (*) All of them are applied with the same rule.
Combine all possible gloss pairs that are archived in the previous steps, and compute the relatedness by searching for overlap. The overall score is the sum of the scores for each relation pair.

When computing the relatedness between two synsets s1 and s2, the pair hype-hype means the gloss for the hypernym of s1 is compared to the gloss for the hypernym of s2. The pair hype-hypo means that the gloss for the hypernym of s1 is compared to the gloss for the hyponym of s2.

Collapse
```
OverallScore(s1, s2)= Score(hype(s1)-hypo(s2)) +
    Score(gloss(s1)-hypo(s2)) + Score(hype(s1)-gloss(s2))...
    ( OverallScore(s1, s2) is also equivalent to OverallScore(s2, s1) ).
```
In the example of "pine cone", there are three senses of pine and 6 senses of cone, so we can have a total of 18 possible combinations. One of them is the right one.

To score the overlap, we use a new scoring mechanism that differentiates between N-single words and N-consecutive word overlaps and effectively treats each gloss as a bag of words. It is based on ZipF's Law, which says that the length of words is inversely proportional to their usage. The shortest words are those which are used more often, the longest ones are used less often.

Measuring overlaps between two strings is reduced to solve the problem of finding the longest common sub-string with maximal consecutives. Each overlap which contains N consecutive words contributes N2 to the score of the gloss sense combination. For example: an overlap "ABC" has a score of 3²=9, and two single overlaps "AB" and "C" has a score of 2² + 1¹=5.
Once each combination has been scored, we pick up the sense that has the highest score to be the most appropriate sense for the target word in the selected context space. Hopefully, the output not only gives us the most appropriate sense, but also the associated part of speech for a word.
If you intend to work with this topic, you should refer to the measurements of Hirst-St.Onge which is based on finding the lexical chains between the synsets.

Semantic similarity between two synsets

The above method allows us to find the most appropriate sense for each word in a sentence. To compute the similarity between two sentences, we base the semantic similarity between word senses. We capture semantic similarity between two word senses based on the path length similarity.

In WordNet, each part of speech words (nouns/verbs...) are organized into taxonomies where each node is a set of synonyms (synset) represented in one sense. If a word has more than one sense, it will appear in multiple synsets at various locations in the taxonomy. WordNet defines relations between synsets and relations between word senses. A relation between synsets is a semantic relation, and a relation between word senses is a lexical relation. The difference is that lexical relations are relations between members of two different synsets, but semantic relations are relations between two whole synsets. For instance:

Semantic relations are hypernym, hyponym, holonym, etc.
Lexical relations are antonym relation and the derived form relation.

Using the example, the antonym of the tenth sense of the noun light (light#n#10) in WordNet is the first sense of the noun dark (dark#n#1). The synset to which it belongs is {light#n#10, lighting#n#1}. Clearly, it makes sense that light#n#10 is an antonym of dark#n#1, but lighting#n#1 is not an antonym of dark#n#1; therefore, the antonym relation needs to be a lexical relation, not a semantic relation. Semantic similarity is a special case of semantic relatedness where we only consider the IS-A relationship.

The path length-based similarity measurement

To measure the semantic similarity between two synsets, we use hyponym/hypernym (or is-a relations). Due to the limitation of is-a hierarchies, we only work with "noun-noun", and "verb-verb" parts of speech.

A simple way to measure the semantic similarity between two synsets is to treat taxonomy as an undirected graph and measure the distance between them in WordNet. Said P. Resnik: "The shorter the path from one node to another, the more similar they are". Note that the path length is measured in nodes/vertices rather than in links/edges. The length of the path between two members of the same synset is 1 (synonym relations).

This figure shows an example of the hyponym taxonomy in WordNet used for path length similarity measurement:

In the above figure, we observe that the length between car and auto is 1, car and truck is 3, car and bicycle is 4, car and fork is 12.

A shared parent of two synsets is known as a sub-sumer. The least common sub-sumer (LCS) of two synsets is the sumer that does not have any children that are also the sub-sumer of two synsets. In other words, the LCS of two synsets is the most specific sub-sumer of the two synsets. Back to the above example, the LCS of {car, auto..} and {truck..} is {automotive, motor vehicle}, since the {automotive, motor vehicle} is more specific than the common sub-sumer {wheeled vehicle}.

The path length gives us a simple way to compute the relatedness distance between two word senses. There are some issues that need to be addressed:

It is possible for two synsets from the same part of speech to have no common sub-sumer. Since we did not join all the different top nodes of each part of the speech taxonomy, a path cannot always be found between the two synsets. But if a unique root node is being used, then a path will always exist between any two noun/verb synsets.
Note that multiple inheritance is allowed in WordNet; some synsets belong to more than one taxonomy. So, if there is more than one path between two synsets, the shortest such path is selected.
Lemmatization: when looking up a word in WN, the word is first lemmatized. Therefore, the distance between "book" and "books" is 0, since they are identical. But "Mice" and "mouse"?
This measurement only compares the word senses which have the same part of speech (POS). This means that we do not compare a noun and a verb because they are located in different taxonomies. We just consider the words that are nouns, verbs, or adjectives, respectively. With the omission of the POS tagger, we will use Jeff Martin'sLexicon class. When considering a word, we first check if it is a noun and, if so, we will treat it as a noun, and its verb or adjective will be disregarded. If it is not a noun, we will check if it is a verb...
Compound nouns like "travel agent" will be treated as two single words via tokenization.

Measuring similarity (MS1)

There are many proposals for measuring semantic similarity between two synsets: Wu & Palmer, Leacock and Chodorow, P. Resnik. In this work, we experimented with two simple measurements:

Collapse

Sim(s, t) = 1/distance(s, t).

where distance is the path length from s to t using node counting.

Measuring similarity (MS2)

This formula was used in the previous article, which not only took into account the length of the path, but also the order of the sense involved in this path:

Collapse

Sim(s, t) = SenseWeight(s)*SenseWeight(t)/PathLength

where s and t: denote the source and target words being compared.
SenseWeight: denotes a weight calculated according to the frequency of use of this sense and the total of frequency of use of all senses.
PathLength: denotes the length of the connection path from s to t.

Semantic similarity between two sentences

We will now describe the overall strategy to capture semantic similarity between two sentences. Given two sentences X and Y, we denote m to be length of X, n to be length of Y. The major steps can be described as follows:

Tokenization.
Perform word stemming.
Perform part of speech tagging.
Word sense disambiguation.
Building a semantic similarity relative matrix R[m, n] of each pair of word senses, where R[i, j] is the semantic similarity between the most appropriate sense of word at position i of X and the most appropriate sense of word at position j of Y. Thus, R[i,j] is also the weight of the edge connecting from i to j. If a word does not exist in the dictionary, we use the edit-distance similarity instead and output a lower associated weight; for example: an abbreviation like CTO (Chief Technology Officer). Another solution for abbreviation is using abbreviation dictionary or abbreviation pattern recognition rules.
We formulate the problem of capturing semantic similarity between sentences as the problem of computing a maximum total matching weight of a bipartite graph, where X and Y are two sets of disjoint nodes. We use the Hungarian method to solve this problem; please refer to our previous article on capturing similarity between two strings. If computational time is critical, we can use a simple fast heuristic method as follows. The pseudo code is:
Collapse
```
ScoreSum <- 0;
    foreach (X[i] in X){
    bestCandidate <- -1;
    bestScore <- -maxInt;
    foreach (Y[j] in Y){
    if (Y[j] is still free && r[i, j] > bestScore){
    bestScore <- R[i, j];
    bestCandidate <- j;
    }
    }
    if (bestCandidate != -1){
    mark the bestCandidate as matched item.
    scoreSum <- scoreSum + bestScore;
    }
    }
```
The match results from the previous step are combined into a single similarity value for two sentences. There are many strategies to acquire an overall combined similarity value for sets of matching pairs. In the previous section, we presented two simple formulas to compute semantic similarity between two word-senses. For each formula, we apply an appropriate strategy to compute the overall score:
- Matching average: where match(X, Y) are the matching word tokens between X and Y. This similarity is computed by dividing the sum of similarity values of all match candidates of both sentences X and Y by the total number of set tokens. An important point is that it is based on each of the individual similarity values, so that the overall similarity always reflects the influence of them. We apply this strategy with the MS1 formula.
- Dice coefficient: This strategy returns the ratio of the number of tokens that can be matched over the total of tokens. We apply this strategy with the MS2 formula. Hence, Dice will always return a higher value than Matching average, and it is thus more optimistic. In this strategy, we need to predefine a threshold to select the matching pairs that have values exceeding the given threshold.
- (Cosine, Jarccard, Simpson coefficients will be considered in another particular situation).
For example: Given two sentences X and Y, X and Y have lengths of 3 and 2, respectively. The bipartite matcher returns that X[1] has matched Y[1] with a score of 0.8, X[2] has matched Y[2] with a score of 0.7:
- using Matching average, the overall score is : 2*(0.8 + 0.7) / (3 + 2) = 0.6.
- using Dice with a threshold is 0.5, since both the matching pairs have scores greater than the threshold, so we have a total of 2 matching pairs.
  The overall score is: 2*(1 + 1)/ (3+2) = 0.8.

Using the code

To run this code, you should install WordNet 2.1. Currently, the source code is stored at the Google Code repository. Please read the article: Using the WordNet.Net subversion repository before downloading the source code. This code is used to test the semantic similarity function:

Collapse

void Test()
{
SemanticSimilarity semsim=new SemanticSimilarity() ;
float score=semsim.GetScore("Defense Ministry",
"Department of defence");
}

Future works

Time restrictions are a problem; whenever possible, we would like to:

Improve the usability of this experiment.
Extend the WSD algorithm with supervised learning with such methods as the Naive Bayesian Classifier model.
Disambiguate part of speech using probabilistic decision trees.

Conclusion

In this article, you have seen a simple approach to capture semantic similarity. This work might have many limitations since we are not a NLP research group. There are some things that need to improve. Once the final work is approved, we will move a copy to the CodeProject. This process may take a few working days.

There is a Perl Open Source package for semantic similarity from T. Pedersen and his team. Unfortunately, we do not know Perl; it would be very helpful if someone could migrate it to .NET. We'll stop here for now and hope that others might be inspired to work on WordNet.Net to develop this open source library to make it more useful.

Acknowledgements

Many thanks to: WordNet Princeton; M. Crowe, T. Pedersen - his team: S. Banerjee, J. Michelizzi, S. Patwardhan ; M. Lesk; E. Briller; M. Porter; P. Resnik; Hirst - S.T. Onge; T. Syeda-Mahmood, L. Yan, W. Urban; H. Do, E. Rahm; X. Su , J.A. Gulla; F. Guinchiglia, M. Yatskevich; A.V. Goldberg ...for using some results of their research papers.
We would like to thank M.A. Warin, J. Martin, C. Lemon, R. Northedge, S. Abbott, P. Maddox who have provided helpful documents, resources, and insightful comments during this work.

Articles worth reading

A.V. Goldberg, R. Kennedy: An efficient cost scaling algorithm for the assignment problem, 1993.
WordNet by Princeton.
T. Syeda-Mahmood, L. Yan, W. Urban: Semantic search of schema repositories, 2005.
T. Dao: An improvement on capturing similarity between strings, 2005 (*).
T. Simpson: WordNet.NET, 2005 (*).
T. Pedersen, S. Banerjee, S. Patwardhan: Maximize semantic relatedness to perform word sense disambiguation, 2005.
H. Do, E. Rahm: COMA - A system for flexible combination of schema matching approach, 2002.
P. Resnik: WordNet and class-based probabilities.
J. Michelizzi: Master of science thesis, 2005.
X. Su and J.A. Gulla: Semantic enrichment for ontolog mapping, 2004.
F. Guinchiglia, M. Yatskevich: Element Level Semantic Matching, 2004.
G. Hirst, D.St. Onge: Lexical chains as representation of context for the detection and correction of malapropisms.
M. Lesk: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine code from an ice cream cone, 1986.
E. Brill: A simple rule-based part of speech tagger, 1993.
S. Banerjee, T. Pedersen: An adapted Lesk algorithm for word sense disambiguation using Word-Net, 2002.
S. Banerjee, T. Pedersen: Extended gloss overlaps as a measure of semantic relatedness, 2003.
M. Mayer: Launching a process and displaying its standard output.

(*) is the co-author of this article.

你可能感兴趣的:(Algorithm,each,library,Dictionary,distance,Semantic)

概念一： python 中列表，数组，集合，字典； ZhengXinTang #python数据结构 python list
1.python基本数据类型首先python3中自带的有六个标准的数据结构类型：Number（数字）String（字符串）Tuple（元组）List（列表）Set（集合）Dictionary（字典）不可变数据（3个）：Number（数字）、String（字符串）、Tuple（元组）；可变数据（3个）：List（列表）、Dictionary（字典）、Set（集合）。2.数据类型各自的特点2.1数组与
leetcode - 359. Logger Rate Limiter KpLn_HJL OJ题目记录 leetcode linux 服务器
DescriptionDesignaloggersystemthatreceivesastreamofmessagesalongwiththeirtimestamps.Eachuniquemessageshouldonlybeprintedatmostevery10seconds(i.e.amessageprintedattimestamptwillpreventotheridenticalmes
leetcode - 126. Word Ladder II KpLn_HJL OJ题目记录 leetcode word c#
DescriptionAtransformationsequencefromwordbeginWordtowordendWordusingadictionarywordListisasequenceofwordsbeginWord->s1->s2->…->sksuchthat:Everyadjacentpairofwordsdiffersbyasingleletter.Everysifor1"ho
知识图谱自动构建工具有哪些 Nate Hillick 知识图谱 neo4j 人工智能
知识图谱的自动构建工具有很多，常见的包括:Neo4j:基于图数据库的知识图谱构建工具Protégé:开源的知识图谱开发平台GoogleKnowledgeGraph:Google搜索引擎的知识图谱构建工具TopBraidComposer:基于SemanticWeb技术的知识图谱构建工具AllegroGraph:高性能图数据库，可用于构建知识图谱这仅仅是其中一部分工具，在市场上还有更多类似的工具。
zotcard 模板 SteveKenny zotero html
精读（期刊）${title} 标题翻译：️ 出版年份:${year} 出版期刊:${publicationTitle} 影响因子:${libraryCatalog} DOI:${DOI} 文章作者:${authors} 摘要:${abstractNote} 研究目的: 研究背景: 研究方法:&nbs
RAG系列（二）：如何优化索引东临碣石82 kotlin android 开发语言
上篇文章总览了RAG的各个环节，这篇我们接着讲第一个环节也就是“索引”环节如何做优化。具体细节“人人都是产品经理”的这篇文章里有非常详细的说明，不过我对微软体系搜索优化了解的多些，看到过的一些优化方法这里没有提到，比如微软的AISearch还有DataverseSemanticSearch里用到一些优化方法这里就没看到。知识搜索是一个可以扣出很多细节的领域，这里做个备考、补充学习过程中的一些体会并
R语言数据分析基础（一）- 使用R语言读取Excel 司南锤 R语言 excel r语言
在R中，读取和操作Excel文件最常用的readxl包，可以读取Excel文件，writexl包可以写入Excel文件。以下是常见的操作：安装和加载包首先，需要安装和加载readxl和writexl包。install.packages("readxl")install.packages("writexl")library(readxl)library(writexl)读取Excel文件使用read
cv python_python里面cv是什么意思 weixin_40004659 cv python
OpenCV(OpenSourceComputerVisionLibrary)开放源代码计算机视觉库，主要算法涉及图像处理、计算机视觉和机器学习相关方法。OpenCV其实就是一堆C和C++语言的源代码文件，这些源代码文件中实现了许多常用的计算机视觉算法。OpenCV由一系列C函数和C++类构成，它有C，C++，Python和java接口，当前SDK(SoftwareDevelopmentKit软件
12、数据系统内置功能（字符串、数组、时间、Math、遍历器、对象成员检测、Object静态方法、对象序列化、正则表达式）爱喝牛奶~ javascript 开发语言 ecmascript
目录12.1字符串12.1.1特效标签12.1.2字符串截子串12.2数组12.3时间12.4Math12.5遍历器1、for循环2、forin循环（es5的技术）3、while循环4、do-while循环5、ArrayforEach循环6、Arraymap()方法7、Arrayfilter()方法8、Arraysome()方法10、Arrayreduce()方法11、ArrayreduceRig
参考图像分割Referring Image Segmentation（RIS）和开放词汇语义分割Open Vocabulary Semantic Segmentation 余弦的倒数深度学习 CV 笔记计算机视觉深度学习
一、参考图像分割基本概念：ReferringImageSegmentation（RIS）是一种图像分割技术，旨在根据自然语言表达来标记图像或视频中表示对象实例的像素。也就是根据自然语言描述来实现图像分割。旨在根据自然语言表达来标记图像或视频中的特定区域。在给定描述区域的自然语言文本的情况下，RIS需要在图像中找到相应的区域。这个任务是众所周知的具有挑战性的视觉和语言任务之一。RIS需要收集目标区域
导入R包或者安装R包时遇到报错：shared object‘XXXX.so’ not found 生信碱移经验分享 r语言
报错示例：Error:packageornamespaceloadfailedfor‘Seurat’inlibrary.dynam(lib,package,package.lib):sharedobject‘uwot.so’notfound解决方案：安装XXXX.so中对应的R包名称XXXX，对于这个报错示例来说就是uwot:install.packages("uwot")随后重启以后即可正常使用
R语言和Python交互 dltan R语言 Python
交互原理借助了reticulate这个包，其中两个工具的模块包加载方式在R中是import(“math”);在Python中是importmath调用模块包后执行结果，执行方式：R中是math$sqrt(20)；python中是math.sqrt(20)python执行结果R语言执行结果整体codelibrary(reticulate)py_available()#检查python是否存在os<-
【JavaScript】forEach 遍历数组详解 Peter-Lu #JavaScript javascript 开发语言 ecmascript 前端
文章目录一、forEach方法概述1.forEach方法简介2.语法结构3.forEach方法的特性二、forEach方法的基本用法1.遍历数组元素2.使用索引3.访问原数组三、forEach方法的实际应用1.修改DOM元素2.累加数组元素3.异步操作四、forEach方法的优缺点1.优点2.缺点五、forEach与其他遍历方法的对比1.for循环2.map方法3.filter方法六、注意事项1.
clamav编译报错 guoguangwu 安全 debian
Ubuntu编译clamav时报如下错误：CMakeErrorat/usr/local/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230(message):CouldNOTfindLibcheck(missing:LIBCHECK_INCLUDE_DIRLIBCHECK_LIBRARY)CallStack(mostre
Babylon.js WebGL与性能优化天涯学馆 WebGL 3D图形图像技术 javascript webgl 3d Babylon.js Threejs
目录WebGL基础与Babylon.js底层理解性能监控与优化技巧WebGL基础与Babylon.js底层理解WebGL（WebGraphicsLibrary）是JavaScriptAPI，用于在网页上进行硬件加速的3D图形渲染。它是OpenGL的一个子集，由Web浏览器支持，无需插件。WebGL通过JavaScript与HTML5元素结合，使得开发者能够在浏览器中创建复杂的3D场景。Babylo
利用MetaNeighbor验证重复性和跨物种分群木与长清单细胞空间整合工具学习 r语言数据分析矩阵
进行跨物种研究时，我们经常需要进行注释结果的比较和归类，或者同一物种不同样本之间的注释验证。R语言中有一个包就可以利用直观的热图展示这一需求。导入包和环境library(Seurat)library(ggplot2)library(MetaNeighbor)library(SingleCellExperiment)library(dplyr)导入数据这里以海豚和两个公开人的PFC脑区单细胞转录组数
随笔8 - c#中List、Dictionary、ArrayList、Hashtable和数组的区别 leixf2016 C#
C#集合类ArrayArraylistListHashtableDictionaryStackQueue1.数组是固定大小的，不能伸缩。虽然System.Array.Resize这个泛型方法可以重置数组大小，但是该方法是重新创建新设置大小的数组，用的是旧数组的元素初始化。随后以前的数组就废弃！而集合却是可变长的2.数组要声明元素的类型，集合类的元素类型却是object.3.数组可读可写不能声明只读
WSN无线传感器网络Leach协议的matlab模拟与仿真 Simuworld MATLAB仿真案例 matlab WSN 无线传感器网络 Leach协议
目录1.算法仿真效果2.MATLAB源码3.算法概述4.部分参考文献1.算法仿真效果matlab2022a仿真结果如下：2.MATLAB源码
**深度探索Semantic Kernel：为您的代码注入人工智能的灵魂** 孙诗嘉Song-Thrush
深度探索SemanticKernel：为您的代码注入人工智能的灵魂semantic-kernelIntegratecutting-edgeLLMtechnologyquicklyandeasilyintoyourapps项目地址:https://gitcode.com/gh_mirrors/se/semantic-kernel在当前这个飞速发展的时代，大语言模型(LLM)如雨后春笋般涌现出来。然而
开发者实战 | OpenVINO™ 协同 Semantic Kernel：优化大模型应用性能新路径 OpenVINO 中文社区 openvino 人工智能
点击蓝字关注我们,让开发变得更有趣作者：杨亦诚作为主要面向RAG任务方向的框架，SemanticKernel可以简化大模型应用开发过程，而在RAG任务中最常用的深度学习模型就是Embedding和Textcompletion，分别实现文本的语义向量化和文本生成，因此本文主要会分享如何在SemanticKernel中调用OpenVINO™runtime部署Embedding和Textcompleti
网站地图爬虫猎狐肥 python 爬虫 python
defcrawl_sitemap(url):html=''#downloadthesitemapfilesitemap=download_page(url,2)#extractthesitemaplinkslinks=re.findall('(.*?)',sitemap)#loadeachlinkforlinkinlinks:html=download_page(link,2)if__name__
LeetCode-127. Word Ladder [C++][Java] 贫道绝缘子 LeetCode刷题怪 leetcode java c++
LeetCode-127.WordLadderhttps://leetcode.com/problems/word-ladder/题目描述AtransformationsequencefromwordbeginWordtowordendWordusingadictionarywordListisasequenceofwordsbeginWord->s1->s2->...->sksuchthat:E
日志2025.1.26 science怪兽 unity 算法游戏程序游戏引擎
日志2025.1.261.增加了副武器系统，优化了切换武器的视觉效果在人物背包、腿上挂载装有BackupWeaponModel脚本的武器模型//返回副武器publicWeaponBackupWeapon(){foreach(varweaponinweaponSlots){//在武器槽中却不是当前武器，即为副武器if(weapon.weaponType!=currentWeapon.weaponTy
实验五连接查询和嵌套查询无尽罚坐的人生 #数据库原理 oracle 数据库 mybatis
实验五连接查询和嵌套查询一、实验目的1．掌握ManagementStudio的使用。2．掌握SQL中连接查询和嵌套查询的使用。二、实验内容及要求（请同学们尝试每道题均使用连接和嵌套两种方式来进行查询，如果可以的话）1．找出所有任教“数据库”的教师的姓名。连接查询：selectdistinctt.Teac_namefromteachertjoincourseteacherstont.teac_id=
1f7b9a6b7c5.html,View source code: library_access.rar_Main_Form.dfm page_1 - VerySource 綾音Ayane
objectMain_F:TMain_FLeft=170Top=130BorderStyle=bsSingleCaption=#22270#20070#31649#29702#31995#32479ClientHeight=606ClientWidth=776Color=clBtnFaceFont.Charset=DEFAULT_CHARSETFont.Color=clWindowTextFont
2024Java面试-Redis常见面试题（1）修罗debug 2024年java面积集第一季面试 redis 职场和发展 java java面试
给大家介绍下目前市面上Java方面Redis常见的面试题哈！1、什么是Redis？简述它的优缺点？Redis的全称是：RemoteDictionary.Server，本质上是一个Key-Value类型的内存数据库，很像memcached，整个数据库统统加载在内存当中进行操作，定期通过异步操作把数据库数据flush到硬盘上进行保存。因为是纯内存操作，Redis的性能非常出色，每秒可以处理超过10万次
SQL 架构剖析淡定的飘着 SQL sql 数据库 schema sql server java c#
关键字：T-SQL;架构架构（Schema）。微软的官方说明（MSDN）："数据库架构是一个独立于数据库用户的非重复命名空间，您可以将架构视为对象的容器"，详细参考http://technet.microsoft.com/zh-cn/library/ms190387.aspx.我们知道，在JAVA中，命名空间名其实就是文件夹名。因此我们非常明确一点：一个对象只能属于一个架构，就像一个文件只能存放于
MATLAB代码实现了季节优化算法（SOA）中的播种（Seeding）过程 go5463158465 MATLAB专栏算法深度学习 matlab 算法开发语言
%%淘个代码%%%微信公众号搜索：淘个代码，获取更多代码%季节优化算法（SOA）function[Seeds]=Seeding(Population,AlgorithmParams,ProblemParams,NumOfNotImprovedTrees)Seeds=nan;S
MATLAB代码实现了季节优化算法（Seasonal Optimization Algorithm, SOA）来求解优化问题 go5463158465 matlab 算法深度学习 matlab 算法开发语言
%%淘个代码%%%微信公众号搜索：淘个代码，获取更多代码%季节优化算法（SOA）clearall;clc;closeall%%ProblemStatementfunc_name='F8';ProblemParams.CostFuncName=func_name;[lowerbound,upperbound,dimension,fobj]=fun_info(ProblemParams.CostFun
ML.NET速览 aixing8475 人工智能操作系统 runtime
什么是ML.NET？ML.NET是由微软创建，为.NET开发者准备的开源机器学习框架。它是跨平台的，可以在macOS，Linux及Windows上运行。机器学习管道ML.NET通过管道(pipeline)方式组合机器学习过程。整个管道分为以下四个部分：LoadData加载数据TransformData转换数据ChooseAlgorithm选择算法TrainModel训练模型示例建立一个控制台项目。
二分查找排序算法周凡杨 java 二分查找排序算法折半
一：概念二分查找又称折半查找（折半搜索/ 二分搜索），优点是比较次数少，查找速度快，平均性能好；其缺点是要求待查表为有序表，且插入删除困难。因此，折半查找方法适用于不经常变动而查找频繁的有序列表。首先，假设表中元素是按升序排列，将表中间位置记录的关键字与查找关键字比较，如果两者相等，则查找成功；否则利用中间位置记录将表分成前、后两个子表，如果中间位置记录的关键字大于查找关键字，则进一步
java中的BigDecimal bijian1013 java BigDecimal
在项目开发过程中出现精度丢失问题，查资料用BigDecimal解决，并发现如下这篇BigDecimal的解决问题的思路和方法很值得学习，特转载。原文地址：http://blog.csdn.net/ugg/article/de
Shell echo命令详解 daizj echo shell
Shell echo命令 Shell 的 echo 指令与 PHP 的 echo 指令类似，都是用于字符串的输出。命令格式： echo string 您可以使用echo实现更复杂的输出格式控制。 1.显示普通字符串: echo "It is a test" 这里的双引号完全可以省略，以下命令与上面实例效果一致： echo Itis a test 2.显示转义
Oracle DBA 简单操作周凡杨 oracle dba sql
--执行次数多的SQL select sql_text,executions from ( select sql_text,executions from v$sqlarea order by executions desc ) where rownum<81; &nb
画图重绘朱辉辉33 游戏
我第一次接触重绘是编写五子棋小游戏的时候，因为游戏里的棋盘是用线绘制的，而这些东西并不在系统自带的重绘里，所以在移动窗体时，棋盘并不会重绘出来。所以我们要重写系统的重绘方法。在重写系统重绘方法时，我们要注意一定要调用父类的重绘方法，即加上super.paint(g)，因为如果不调用父类的重绘方式，重写后会把父类的重绘覆盖掉，而父类的重绘方法是绘制画布，这样就导致我们
线程之初体验西蜀石兰线程
一直觉得多线程是学Java的一个分水岭，懂多线程才算入门。之前看《编程思想》的多线程章节，看的云里雾里，知道线程类有哪几个方法，却依旧不知道线程到底是什么？书上都写线程是进程的模块，共享线程的资源，可是这跟多线程编程有毛线的关系，呜呜。。。线程其实也是用户自定义的任务，不要过多的强调线程的属性，而忽略了线程最基本的属性。你可以在线程类的run()方法中定义自己的任务，就跟正常的Ja
linux集群互相免登陆配置林鹤霄 linux
配置ssh免登陆 1、生成秘钥和公钥 ssh-keygen -t rsa 2、提示让你输入，什么都不输，三次回车之后会在~下面的.ssh文件夹中多出两个文件id_rsa 和 id_rsa.pub 其中id_rsa为秘钥，id_rsa.pub为公钥，使用公钥加密的数据只有私钥才能对这些数据解密 c
mysql : Lock wait timeout exceeded; try restarting transaction aigo mysql
原文：http://www.cnblogs.com/freeliver54/archive/2010/09/30/1839042.html 原因是你使用的InnoDB 表类型的时候, 默认参数:innodb_lock_wait_timeout设置锁等待的时间是50s, 因为有的锁等待超过了这个时间,所以抱错. 你可以把这个时间加长,或者优化存储
Socket编程基本的聊天实现。 alleni123 socket
public class Server { //用来存储所有连接上来的客户 private List<ServerThread> clients; public static void main(String[] args) { Server s = new Server(); s.startServer(9988); } publi
多线程监听器事件模式(一个简单的例子) 百合不是茶线程监听模式
多线程的事件监听器模式监听器时间模式经常与多线程使用,在多线程中如何知道我的线程正在执行那什么内容,可以通过时间监听器模式得到创建多线程的事件监听器模式思路: 1, 创建线程并启动,在创建线程的位置设置一个标记 2,创建队
spring InitializingBean接口 bijian1013 java spring
spring的事务的TransactionTemplate，其源码如下： public class TransactionTemplate extends DefaultTransactionDefinition implements TransactionOperations, InitializingBean{ ... } TransactionTemplate继承了DefaultT
Oracle中询表的权限被授予给了哪些用户 bijian1013 oracle 数据库权限
Oracle查询表将权限赋给了哪些用户的SQL，以备查用。 select t.table_name as "表名", t.grantee as "被授权的属组", t.owner as "对象所在的属组"
【Struts2五】Struts2 参数传值 bit1129 struts2
Struts2中参数传值的3种情况 1.请求参数绑定到Action的实例字段上 2.Action将值传递到转发的视图上 3.Action将值传递到重定向的视图上一、请求参数绑定到Action的实例字段上以及Action将值传递到转发的视图上 Struts可以自动将请求URL中的请求参数或者表单提交的参数绑定到Action定义的实例字段上，绑定的规则使用ognl表达式语言
【Kafka十四】关于auto.offset.reset[Q/A] bit1129 kafka
I got serveral questions about auto.offset.reset. This configuration parameter governs how consumer read the message from Kafka when there is no initial offset in ZooKeeper or
nginx gzip压缩配置 ronin47 nginx gzip 压缩范例
nginx gzip压缩配置更多 0 nginx gzip 配置随着nginx的发展，越来越多的网站使用nginx，因此nginx的优化变得越来越重要，今天我们来看看nginx的gzip压缩到底是怎么压缩的呢？ gzip(GNU-ZIP)是一种压缩技术。经过gzip压缩后页面大小可以变为原来的30%甚至更小，这样，用
java-13.输入一个单向链表，输出该链表中倒数第 k 个节点 bylijinnan java
two cursors. Make the first cursor go K steps first. /* * 第 13 题：题目：输入一个单向链表，输出该链表中倒数第 k 个节点 */ public void displayKthItemsBackWard(ListNode head,int k){ ListNode p1=head,p2=head;
Spring源码学习-JdbcTemplate queryForObject bylijinnan java spring
JdbcTemplate中有两个可能会混淆的queryForObject方法： 1. Object queryForObject(String sql, Object[] args, Class requiredType) 2. Object queryForObject(String sql, Object[] args, RowMapper rowMapper) 第1个方法是只查
[冰川时代]在冰川时代,我们需要什么样的技术? comsci 技术
看美国那边的气候情况....我有个感觉...是不是要进入小冰期了? 那么在小冰期里面...我们的户外活动肯定会出现很多问题...在室内呆着的情况会非常多...怎么在室内呆着而不发闷...怎么用最低的电力保证室内的温度.....这都需要技术手段... &nb
js 获取浏览器型号 cuityang js 浏览器
根据浏览器获取iphone和apk的下载地址 <!DOCTYPE html> <html> <head> <meta charset="utf-8" content="text/html"/> <meta name=
C# socks5详解转 dalan_123 socket C#
http://www.cnblogs.com/zhujiechang/archive/2008/10/21/1316308.html 这里主要讲的是用.NET实现基于Socket5下面的代理协议进行客户端的通讯，Socket4的实现是类似的，注意的事，这里不是讲用C#实现一个代理服务器，因为实现一个代理服务器需要实现很多协议，头大，而且现在市面上有很多现成的代理服务器用，性能又好，
运维 Centos问题汇总 dcj3sjt126com 云主机
一、sh 脚本不执行的原因 sh脚本不执行的原因只有2个 1.权限不够 2.sh脚本里路径没写完整。二、解决You have new mail in /var/spool/mail/root 修改/usr/share/logwatch/default.conf/logwatch.conf配置文件 MailTo = MailFrom 三、查询连接数
Yii防注入攻击笔记 dcj3sjt126com sql WEB安全 yii
网站表单有注入漏洞须对所有用户输入的内容进行个过滤和检查，可以使用正则表达式或者直接输入字符判断，大部分是只允许输入字母和数字的，其它字符度不允许；对于内容复杂表单的内容，应该对html和script的符号进行转义替换：尤其是<,>,',"",&这几个符号这里有个转义对照表： http://blog.csdn.net/xinzhu1990/articl
MongoDB简介[一] eksliang mongodb MongoDB简介
MongoDB简介转载请出自出处：http://eksliang.iteye.com/blog/2173288 1.1易于使用 MongoDB是一个面向文档的数据库，而不是关系型数据库。与关系型数据库相比，面向文档的数据库不再有行的概念，取而代之的是更为灵活的“文档”模型。另外，不
zookeeper windows 入门安装和测试 greemranqq zookeeper 安装分布式
一、序言以下是我对zookeeper 的一些理解： zookeeper 作为一个服务注册信息存储的管理工具，好吧，这样说得很抽象，我们举个“栗子”。栗子1号：假设我是一家KTV的老板，我同时拥有5家KTV，我肯定得时刻监视
Spring之使用事务缘由(2-注解实现) ihuning spring
Spring事务注解实现 1. 依赖包： 1.1 spring包： spring-beans-4.0.0.RELEASE.jar spring-context-4.0.0.
iOS App Launch Option 啸笑天 option
iOS 程序启动时总会调用application:didFinishLaunchingWithOptions:，其中第二个参数launchOptions为NSDictionary类型的对象，里面存储有此程序启动的原因。 launchOptions中的可能键值见UIApplication Class Reference的Launch Options Keys节。 1、若用户直接
jdk与jre的区别（_） macroli java jvm jdk
简单的说JDK是面向开发人员使用的SDK，它提供了Java的开发环境和运行环境。SDK是Software Development Kit 一般指软件开发包，可以包括函数库、编译程序等。 JDK就是Java Development Kit JRE是Java Runtime Enviroment是指Java的运行环境，是面向Java程序的使用者，而不是开发者。如果安装了JDK，会发同你
Updates were rejected because the tip of your current branch is behind qiaolevip 学习永无止境每天进步一点点众观千象 git
$ git push joe prod-2295-1 To [email protected]:joe.le/dr-frontend.git ! [rejected] prod-2295-1 -> prod-2295-1 (non-fast-forward) error: failed to push some refs to '[email protected]
[一起学Hive]之十四-Hive的元数据表结构详解 superlxw1234 hive hive元数据结构
关键字：Hive元数据、Hive元数据表结构之前在 “[一起学Hive]之一–Hive概述，Hive是什么”中介绍过，Hive自己维护了一套元数据，用户通过HQL查询时候，Hive首先需要结合元数据，将HQL翻译成MapReduce去执行。本文介绍一下Hive元数据中重要的一些表结构及用途，以Hive0.13为例。文章最后面，会以一个示例来全面了解一下，
Spring 3.2.14，4.1.7，4.2.RC2发布 wiselyman Spring 3
Spring 3.2.14、4.1.7及4.2.RC2于6月30日发布。其中Spring 3.2.1是一个维护版本(维护周期到2016-12-31截止)，后续会继续根据需求和bug发布维护版本。此时，Spring官方强烈建议升级Spring框架至4.1.7 或者将要发布的4.2 。其中Spring 4.1.7主要包含这些更新内容。