This is the research for current semantic web platform and the usability for enterprise-level semantic web techology development. This is a part of a research project funded by UK Technology Strategy Board. The analysis is a little bit out-of-date and could be used for reference.
GATE Platform
Strengths | Weaknesses |
1. Java-based Open-source(LGPL) frameworks & platform-independent;
2. Mature and robust platform: developed by University of Sheffield since 1996;
3. Gate Embedded (purely java based) can be integrated into enterprise Platform;
4. An integrated development environment(GATE Developer): ease the burden on language processing with a comprehensive set of Information Extraction plugins;
5. Highly customizable for specific domain applications;
6. High interoperability: the flexible component model supports plug-ins for any additional NLP tool, visualisation plug-in or external storage solutions;
7. A large community of developers;
8. Semantic Indexing, storage and querying supported by GATE Mimir (based on BigOWLIM);
9. Extensibility of NER: can be configured to recognise new entities using a rule-based grammar & GATE gazetteer to identify named entities;
10. Advanced text processbility : handles multiple input formats(text/html/xml/MS Word, etc.) & multilingual support & batch process & support processing corpora;
11. Support mapping NE to an OWL ontology;
12. A benchmarking tool : able to track system performance (using metrics inc. precision, recall and f-measure) across a corpus overtime;
|
1. Relatively difficult to setup/configure;
2. No Linked Data technology support (in-house solutions or additional plug-ins are needed);
3. No predefined ontology/taxonomy;
4. Lack of native support for multiple Serialisation formats, in terms of tag representations;
5. Customizations or additional plug-ins are necessary to extract NE relations and facts from unstructured text;
|
Opportunities/Trends | Threats |
1. Open-source & No limitations on use;
2. Platform independent;
3. The capability of NER Customization for specific customer requirements (backed by Gate Developer and a rule-based NLP language);
4. Able to improve precision of NER in particular domains with minimum cost by leveraging Gate Gazetteers;
5. Allow us to keep tune system performance by the benchmarking tools;
6. Cloud Computing support: end-to-end text&web processing solutions are available for commercial use so as to solve large-scaled text processing problem including web, text or opinion mining, indexing & search(fulltext/conceptual/structural) without extra server hardware costs;
7. Extensive documentation & community support;
8. Commercial training modules are available;
9. Installing & integration GATE with Enterprise platform has low overhead ; |
1. Experties in Natural Language Processing Technology(including text analysis, corpus creation, gold standards establishment, etc.) are necessary;
2. Long learn curve;
|
KIM Platform
Strengths | Weaknesses |
1. Semantic Annotation(beyond automatic annotation/tagging): Named entity & entity relation extraction while supporting the linkage to the structured knowledge of a domain;
2. NE recognition with high performance(F-Measure score), e.g., Person (0.87f), Organisation (0.71f), Location (0.90f) and overall score (0.85f);
3. Facts extraction: support to store the extracted facts and the reason on top of a given ontology;
4. 1. Linked Data based text Enrichment Support: support text enrichment with DBpedia with better coverage (60% more than AlchemyAPI and 1560% more than OpenCalais) & comparable precision (9% worse than AlchemyAPI, 5% better than OpenCalais); see Evaluation Report;
5. Advanced Text Analysis support: support the extraction of facts within the documents;
6. Semantic Indexing & retrieval of content;
7. Semantic Search/Structured Queries Support: based on semantic annotation, it support process complex filter and search operations based on the meaning of the words;
8. Mimir-based semantic enterprise search engine: a combination of full-text index & a high-capacity of semantic repository, scalable for millions of documents&billions of statements;
9. KIM ontology (KIMO): a predefined simplistic upper-level ontology consisting of 250 general entity types and 100 entity relations including locations, Happenings(defining Events&situations), meetings, military conflicts, different types of locations, government&other organisations, etc;
10. Semantic gazetteer (beyond traditional NER): disambiguation of lexical resources with the identification of NE references (a link to a class in the ontology);
11. Inference support powered by OWLIM;
12. Visualisation with the Graph Knowledge Explorer;
13. Standard, consistent and scalable framework; |
1. Only Free for non-commercial use;
2. Lack of sufficient Linked Data technology support, comparing with OSF, have to implement in-house solutions to compensate on that (possiblely using owl:sameAs to link KIMO concepts to community URI);
3. Have to propose our solutions which should be flexible enough to adapt KIMO ontology to SKOS (owl:sameAs? Or just use the KIMO classication schemes?) |
Opportunities/Trends | Threats |
1. Widely implemented cases spanning from Media, Life Science, Defense, Financial Intelligence to Government Institutions, Academic Institutions over 10 years;
2. Able to be tailored to change text analysis pipelines to find new types of entities and facts;
3. Support the use of the conceptual models and instance bases relevant to our domain;
4. Ease the burden on Ontology design;
5. Shorten learning curve: easier for us to integrate GATE into Enterprise Platform(need to further evaluate); a concise step-by-step guide about how to install and deploy KIM platform;
6. All dependencies based on open-source libraries: should be easily adapted into Enterprise platform, even though further check is necessary;
7. Greatly satisfy our urgent demanding with out-of-box solutions;
8. Higher Possibility of outperforming OpenCalais with higher f-measure score of NER and higher data enrichment ability;
9. A large number of peer-reviewed papers regarding KIM platform with high citation;
10. Lots of GATE open-source plugins and big community;
10. Free for non-commercial use ;
|
1. Long Learning curve: a full-fledged platform with a range of complicated components;
2. Integration with enterprise Platform with underlying unseen problems (should evaluate progressively);
3. Difficult to familiar both GATE platform and KIM platform (with lots of extentions & plugins); |
Open Semantic Framework (OSF)
Strengths | |
1. Information Extraction of domain-specific subject concepts and entities from unstructured text(backed by UMBEL&Wikipedia);
2. Disambiguation of information based on existing domain ontologies and entity dictionaries;
3. Support injection of semantic metadata as RDFa;
4. Multiple serialisation formats Support (interoperability): writing RDF and schema in JSON , xml, and CSV (irON parsers);
5. Support federation of existing information assets by clear separation of ‘ABox’ & ‘TBox’(ref.): Solutions about how to create instance records and linkage to existing ontologies and schema(LinkageSchema); a separated ontologies layer;
6. A RESTful-based middleware framework supporting access and expose RDF data via both API and SPARQL endpoint with the governance of rights and permissions;
7. Content negotiation support;
8. Flex-based Data visualisation /presentation widgets support (inc. filter, tabular, maps, bar, pie/linear charts, relationships(concept) browser, etc.; ref.);
9. Web-oriented Architecture: adhere to Linked Data principle, distributed and modularised design; cross-browser visualisation support; 10 APIs, processes and methodologies are well-documented |
1. PHP-based implementation: need to extra effort on the integration of JEE platform and PHP;
2. Extra technique ability on PHP and maintenance cost (scalability, performance, etc.) on heterogeneous platforms;
3. Extra coding for building medium communication layer between enterprise platform and a separated SW system (not able to reuse existing data models); Notes: scones based named entity extraction is based on GATE and implemented by Php/Java Bridge. A working GATE application must run on the Tomcat instance, which will be used by the Scones web service endpoint. Gate application will use the OWL ontology with some named entities dictionaries to tag documents. |
Opportunities/Trends | Threats |
1. Open-source, available under Apache2 license; 2. Platform-independent; 3. Gate-based Named Entity extraction may enable us to extend and customize for specific requirements more easier; 4. Named entity dictionaries support (called ‘Gazetteer’ in Annie) enables us to improve precision of NE recognition for a particular customer; 5. Provide a range of sound technical solutions/ideas required in KTP projects; 6. Embrace closely with current open standards & Linked Data principle; 7. Emerging & growing communities around the Drupal, one of the most successful CMS leveraging SW tech; structWSF based solution contains data configuration support for Amazon EC2/EBS setting, which enable us to host a EBS volume that can be attached to different running EC2 instance |
1. An emerging platform 2. Lack of objective evaluations about OSF from third-parties, in terms of performance, scalability and the suitability for a specific business context; 3. Extend the learn curve (PHP, integration of JAVA and PHP); 4. Usability; 5. Necessary to evaluate the robustness; PHP technique experts available in enterprise environment; |