UIM分析技术 (持续更新)

  1. statistical and rule-based Natural Language Processing (NLP)
  2. Information Retrieval (IR)
  3. machine learning
  4. ontologies
  5. Automated reasoning and
  6. Knowledge Sources (e.g., CYC, WordNet, FrameNet, etc.)

Analysis Basics

Analysis Engin, Document, Annotator, Annotator Developer, Type, Type System, Feature, Annotation, CAS, Sofa, JCas, UIMA Context.

 

UIMA's basic building blocks is Analysis Engines (AEs). One way to think about AEs is as software agents that automatically discover and record meta-data about original content. The descriptive information prodeced by AEs is referred to generally as analysis results .

 

The UIMA framework treats Analysis engines as pluggable, composible, discoverable, managed objects. At the heart of AEs are the analysis algorithms that do all the work to analyze documents and record analysis results.
UIMA provides a basic component type intended to house the core analysis algorithms running inside AEs. Instances of this component are called Annotators .

 

How annotators represent and share their results is an important part of the UIMA architecture. UIMA defines a Common Analysis Structure (CAS) precisely for these purposes.

The CAS is an object-based data structure that allows the representation of objects, properties and values. Object types may be related to each other in a single-inheritance hierarchy.

 

AEs analyze one or more views of a document. Each view contains a specific subject of analysis(Sofa) , plus a set of indexes holding metadata indexed by that view.

 

The two main interfaces that a UIMA component developer interacts with are the CAS and the UIMA Context . The component developer, in addition to interacting with the CAS, can access external resources through the framework's resource manager interface called the UIMA Context .

 

For every component specified in UIMA there are two parts required for its implementation:
    1. the declarative part and
    2. the code part.
The declarative part contains metadata describing the component, its identity, structure and behavior and is called the  Component Descriptor . Component descriptors are represented in XML. The code part implements the algorithm. The code part may be a program in Java.

 

 

你可能感兴趣的:(UIMA)