chinaliping

NLP Resources

Tools : Machine Translation , POS Taggers , NP chunking , Sequence models , Parsers , Semantic Parsers/SRL , NER , Coreference , Language models , Concordances , Summarization , Other

Corpora : Large collections , Particular languages , Treebanks , Discourse , WSD , Literature , Acquisition

SGML/XML

Dictionaries

Lexical/morphological resources

Courses, Syllabi, and other Educational Resources

Mailing lists

Other stuff on the Web : General , IR , IE/Wrappers , People , Societies

Tools

Machine Translation systems

Instructions

Building a baseline statistical phrase MT system Wonderful pages about how to download a bunch of tools and some data and put them together to build a very competent baseline statistical MT system: NAACL 2006 WMt or 2009 WMT .

Freely downloadable

EGYPT system System from 1999 JHU workshop. Mainly of historical interest.

GIZA++ and mkcls Franz Och. C++. GPL.

Thot Phrase-based model building kit

Phramer An Open-Source Java Statistical Phrase-Based MT Decoder

Moses A new open-source phrase-based MT decoder with functionality beyond Pharaoh.

Syntax Augmented Machine Translation via Chart Parsing Andreas Zollmann and Ashish Venugopal

Free, but getting them requires hassle

Pharaoh decoder Philip Koehn, ISI.

MTTK Machine Translation Tool Kit. Deng and Byrne.

Part of Speech Taggers

Freely downloadable

Stanford POS tagger Loglinear tagger in Java (by Kristina Toutanova)

hunpos An HMM tagger with models available for English and Hungarian. A reimplementation of TnT (see below) in OCaml. pre-compiled models. Runs on Linux, Mac OS X, and Windows.

MBT: Memory-based Tagger Based on TiMBL

TreeTagger A decision tree based tagger from the University of Stuttgart (Helmut Scmid). It's language independent, but comes complete with parameter files for English, German, Italian, Dutch, French, Old French, Spanish, Bulgarian, and Russian. (Linux, Sparc-Solaris, Windows, and Mac OS X versions. Binary distribution only.) Page has links to sites where you can run it online.

SVMTool POS Tagger based on SVMs (uses SVMlight). LGPL.

ACOPOST (formerly ICOPOST) Open source C taggers originally written by by Ingo Schröder. Implements maximum entropy, HMM trigram, and transformation-based learning. C source available under GNU public license.

MXPOST : Adwait Ratnaparkhi's Maximum Entropy part of speech tagger Java POS tagger. A sentence boundary detector (MXTERMINATOR) is also included. Original version was only JDK1.1; later version worked with JDK1.3+. Class files, not source.

fnTBL A fast and flexible implementation of Transformation-Based Learning in C++. Includes a POS tagger, but also NP chunking and general chunking models.

mu-TBL An implementation of a Transformation-based Learner (a la Brill), usable for POS tagging and other things by Torbjörn Lager. Web demo also available. Prolog.

YamCha SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)

QTAG Part of speech tagger An HMM-based Java POS tagger from Birmingham U. (Oliver Mason). English and German parameter files. [Java class files, not source.]

The TOSCA/LOB tagger . Currently available for MS-DOS only. But the decision to make this famous system available is very interesting from an historical perspective, and for software sharing in academia more generally. LOB tag set.

The venerable Brill's Transformation-based learning Tagger A symbolic tagger, written in C. It's no longer available from a canonical location, but you might find a version from the Wikipedia page or you could try a reimplementation such as fnTBL .

Original Xerox Tagger A common lisp HMM tagger available by ftp .

Lingua-EN-Tagger Perl POS tagger by Maciej Ceglowski and Aaron Coburn. Version 0.11. (A bigram HMM tagger.)

Free, but require registration

TATOO The ISSCO tagger. HMM tagger. Need to register to download.

PoSTech Korean morphological analyzer and tagger Online registration.

TnT - A Statistical Part-of-Speech Tagger Trainable for various languages, comes with English and German pre-compiled models. Runs on Solaris and Linux.

Usable by email or on the web, but not distributed freely

Memory-based tagger From ILK group, Catholic University Brabant (Jakub Zavrel/Walter Daelemans). Does Dutch, English, Spanish, Swedish, Slovene. Other MBL demos are also available.

Birmingham tagger Accepts only plain ASCII email message contents. The tagset used is similar to the Brown/LOB/Penn set.

CLAWS tagger The UCREL CLAWS tagger is available for trial use on the web. (It's limited to 300 words though -- this site is more of an advertisement for licensing the real thing -- available as software for Suns or as a paid service.) You can also find info on CLAWS tagsets , though that page doesn't seem to link to the C7 tagset .

The AMALGAM tagger The AMALGAM Project also has various other useful resources, in particular a web guide to different tag sets in common use . The tagging is actually done by a (retrained) version of the Brill tagger (q.v.).

Xerox XRCE MLTT Part Of Speech Taggers Tags any of 14 languages (European and Arabic), online on the web.

Portuguese taggers on the web: Projecto Natura and a QTAG adaptation .

Not free

Lingsoft Lingsoft in Finland has (symbolic) analysis tools for many European languages. More information can be obtained by emailing [email protected] . There is an online demo .

Conexor Conexor in Finland has demonstrations of EngCG-style taggers and parsers, for English, Swedish, and Spanish.

Xerox Xerox has morphological analyzers and taggers for many languages. There are demos of some of their tools on the web. More information can be obtained by contacting Daniella Russo .

Infogistics Infogistics , an Edinburgh spinoff has a tagging and NP/Verb group chunker available commercially, including an evaluation version.

No longer available

LT POS and LT TTT The Edinburgh Language Technology Group tagger and text tokenizer (and sentence splitter were binary-only Solaris tools which no longer seem to be available.

NP chunking

Downloadable

YamCha SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)

Mark Greenwood's Noun Phrase Chunker A Java reimplementation of Ramshaw and Marcus (1995).

fnTBL A fast and flexible implementation of Transformation-Based Learning in C++. Includes a POS tagger, but also NP chunking and general chunking models.

Generic sequence models

Downloadable

CRF++ Generic CRF-based model in C++. Open source. By the author of YamCha.

Carafe Generic CRF-based sequence models in O-CaML. Open source. By Ben Wellner.

FreeLing A large suite of language analyzers. Written in C++. Covers text preprocessing, morphology, NER, POS tagging, parsing.

Parsers

Information on available probabilistic parsers can be found on the FSNLP: probabilistic parsing links page.

Semantic Parsers

Downloadable

ASSERT PropBank semantic roles (and opinions, etc.) by Sameer Pradhan.

Shalmaneser FrameNet-based by Katrin Erk.

Tree Kernels in SVMlight by Alessandro Moschitti. A general package, but it has particularly been used for SRL.

Named Entity Recognition

Downloadable

Stanford Named Entity Recognizer A Java Conditional Random Field sequence model with trained models for Named Entity Recognition. Java. GPL. By Jenny Finkel.

LingPipe Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co.

YamCha SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)

Coreference (Anaphora) Resolution

Downloadable

BART A Beautiful Anaphora Resolution Toolkit. Java. By Yannick Versley and many others. Java. Apache with GPL components.

Guitar Java. GPL.

Language modeling toolkits

Downloadable

IRSTLM Toolkit Compatible with SRILM, suitable for very large language models. LGPL. By Marcello Federico, Nicola Bertoldi et al.

CMU-Cambridge Statistical Language Modeling toolkit

Downloadable, but requires registration

The SRI Language Modeling toolkit by Andreas Stolcke is another good system for building language models, freely available for research purposes.

Not yet classified

Lextools A package of tools for creating weighted finite-state transducers (WFST) from high-level linguistic descriptions. Lextools binaries are available free for non-commercial use at: http://www.research.att.com/sw/tools/lextools/ . Supported platforms are: linux (i686), sgi (mips2) and sun4. Lextools is built on top of, and requires, the AT&T WFST toolkit (version 3.6), available free for non-commercial use from: http://www.research.att.com/sw/tools/fsm/

Friendly concordancing and text analysis tools

Wordsmith Tools (Mike Scott) The thing to get if you are working in the Windows world.

Text summarization tools

A prototype Java Summarisation applet (System Quirk)

MEAD A public domain portable multi-document summarization system. (Dragomir Radev and others.)

Other

Downloadable

Tilburg University's TiMBL Tilburg's Memory Based Learner by Walter Daelemans et al. A general near-neighbour-based machine learning package, but optimized for statistical NLP applications.

Time Expression taggers TIMEX2 standard taggers (site at Mitre).

NLTK An open source Python package for NLP application development with tools such as tokenization, POS TAGGING and parsers by Ed Loper and Steven Bird.

Ted Pedersen's code Ngram Statistics Package: Perl code that implements: Fisher's exact test, the likelihood ratio, Pearson's chi squared test, the Dice Coefficient, and Mutual Information; Duluth Senseval-2 word sense disambiguation systems; Senseval-1 data in Senseval-2 format; various other WSD datasets in Senseval formats, and semantic distances derived via WordNet.

ISIP tools The main aim is a publically available speech recognition system (alpha release available), but along the way there are also toolkits for discrete HMMs and statistical decision trees, and for various aspects of signal processing.

Mem . A Perl implementation of Generalized and Improved Iterative Scaling by Hugo WL ter Doest.

Automorphology A system (for Windows) for automatically learning the morphological forms of words in a corpus by John Goldsmith.

Wordnet Wordnet is available by ftp , compiled for a variety of machine types. For money, one can also get EuroWordNet for various European languages, an Italian/English/Spanish MultiWordNet and there's now a site for Global Wordnet . (See also Mappings between WordNet versions and Perl WordNet-Similarity module by Ted Pedersen, and WordNet Domains (coarse-grained sense topic classifications).)

Penn XTAG project A wide-coverage tree-adjoining grammar written in a mixture of C and Common Lisp. Also includes a large coverage morphological analyzer. Now includes more tools such as TCL/Tk tree viewer.

Dan Melamed's Assorted Tools A collection of various tools including a simulated annealling program, a post-processor for English stemming for the Penn XTAG morphology system, Good-Turing smoothing software, general text processing tools, text statistics tools and bitext geometry tools (mainly written in Perl 5).

MULTEXT Constructing corpora and tools for processing multilingual corpora. Contact: Jean Veronis [email protected] . Some stuff including a multilingual text editor is downloadable. MULTEXT EASThas parallel versions of Orwell's 1984 available free (upon registration) for a number of Central European languages.

Naive Bayes algorithm Software from the Rainbow/Libbow software package that implements several algorithms for text categorization, including naive Bayes, TF.IDF, and probabilistic algorithms. Accompanies Tom Mitchell's ML text.

HDDI Text Data Mining API from Lehigh University.

Emdros: a text database engine for linguistic analysis and research

Chasen Japanese morphological analyzer. Descendent of JUMAN.

Free, but require registration

Stuttgart's IMS Corpus Workbench (CWB) A workbench for full-text retrieval from large corpora (with a query language and corpus indexing). Includes the Corpus Query Processor (CQP) and xkwic. Available free for research groups (currently only as Solaris 1/2 or Linux binaries), on signing a license agreement.

Gate University of Sheffield's General Architecture for Text Engineering. Primarily an Information Extraction system.

MITRE's Alembic Workbench A workbench for the development of tagged corpora. Includes a tagger based on Brill's TBL approach.

SNoW SNoW is a learning program that can be used as a general purpose multi-class classifier and is specifically tailored for learning in the presence of a very large number of features. The learning architecture is a sparse network of linear units over a pre-defined or incrementally acquired feature space (Dan Roth).

Unsure

INTEX a finite-state transducer analysis system for English, French, and Italian that runs under NextStep. Contact: Max Silberztein [email protected]

The PennTools page collects information on a variety of NLP systems, many of which are available externally.

Corpora

Large collections aimed at the NLP community

LDC (Linguistic Data Consortium) and its catalogue by year . Email: [email protected] . Provides the largest range of corpora on CD-ROM. Cost ranges from cheap (e.g., ACL-DCI disk) to pricey. CDs can be purchased individually; institutions can become members and receive discounts on CDs. There's an LDC Online service for searches over the web (mainly intended for members, but there are samplers available).

European Language Resources Association and its catalogue . Distribution agency is ELDA . Rapidly growing collection of materials in European languages.

ICAME (International Computer Archive of Modern English) Sells various corpora (including Brown and London-Lund). Information on corpora on the web , by sending the message help to [email protected] , by ftp to nora.hd.uib.no . Also, manualsfor these corpora.

Reuters @ NIST Reuters corpora are now distributed by NIST.

TRACTOR TELRI Research Archive of Computational Tools and Resource. Corpora, many multilingual, in European community languages. Small fee for joining in order to be able to get corpora (unless you have contributed corpora).

CLR (Consortium for Lexical Research) Email: [email protected] . Focuses more on language processing tools and lexicons, but does have some corpora. As of Feb 1996, you can get most of their stuff by anonymous ftp to clr.nmsu.edu . Their catalog is available as a postscript file.

OTA (Oxford Text Archive) Provides mainly literary texts. Has a bright new web site. Email: [email protected] . Most materials are available on the web or by anonymous ftp to ota.ox.ac.uk . Some require negotiations with the providers.

Leipzig Corpora Collection Sentence collections in MySQL database for 17 mainly European languages.

BNC (British National Corpus) A 100 million word corpus of British English. You can search it online from their simple web interface or via View , a much better interface by Mark Davies, and there is an index to genres by David Lee. And now, an XML edition .

European Corpus Initiative Multilingual Corpus I (ECI/MCI) A 98 million word corpus, covering most of the major European languages, as well as Turkish, Japanese, Russian, Chinese, and Malay. Cheap. Need to sign a license agreement available at either the WWW site. Also available from the LDC.

Survey of English Usage At the Department of English Language and Literature at University College London. Includes the British part of ICE , the International Corpus of English project. Now available tagged, and parsed for function. 83,419 sentences. Includes ICECUP, dedicated retrieval software. Also, Diachronic Corpus of Present-Day Spoken English (800,000 words, tagged and parsed, half from ICE-GB and half from London-Lund).

International Corpus of English (ICE) Million word collections of English from various world Englishes: ICE-NZ, ICE-HK, ICE-East Africa, etc. Several of them are downloadable from this site.

Corpora held by Lancaster University This link provides its own annotations.

The European Language Activity Network Promises a uniform query language for accessing corpora in all EU languages -- but isn't quite there yet.

Talkbank . Rich video and transcripts.

Particular languages

English

English language corpora available from the sites above are not repeated here.

Corpora by Geoffrey Sampson's team The SUSANNE corpus and the CHRISTINE corpus (SUSANNE markup of a speech corpus).

Michigan Corpus of Academic Spoken English (MICASE) . 1.7 million words from 1997-2001.

Penn-Helsinki Parsed Corpus of Middle English A syntactically annotated corpus of the Middle English prose samples in the Helsinki Corpus of Historical English, with additions. 1.3 million words. $200.

Corpus of Professional, Spoken American-English (CPSA) 2 million words from faculty and committee meetings and White House press conferences (50K work sample free on internet).

Lancaster Parsed Corpus

Dialogue Diversity Corpus (Bill Mann)

American National Corpus

Chinese

English language corpora available from the sites above are not repeated here.

The Lancaster Corpus of Mandarin Chinese (LCMC) By Tony McEnery and Richard Xiao. Distinguished by being a balanced corpus, and freely available.

Multilingual

JRC-Acquis A parallel corpus of EU documents across all member states. 8 million words or more in each of 20 languages.

EMILLE/CIIL Monolingual written corpus data for 14 South Asian languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu). Orthographically transcribed spoken data and parallel corpus data for five South Asian languages (Bengali, Gujarati, Hindi, Punjabi and Urdu). In addition, the parallel corpus contains the English originals from which the translations stored in the corpus were derived. All data in the corpus is CES and Unicode compliant. The EMILLE corpus totals some 94 million words. Downloadable.

OPUS An open source parallel corpus, aligned, in many languages, based on free Linux etc. manuals.

World Health Organization Computer Assisted Translation page . Also includes a good selection of links on Computer Assisted Translation. (See also the copyright page .)

Searchable Canadian Hansard French-English parallel texts (1986-1993) From the Laboratoire de Recherche Appliquée en Linguistique Informatique, Universite de Montréal

European Union web server Parallel text in all EU languages. (In particular try European legislation .)

TELRI CD-ROMs Parallel and other text in central and eastern european languages.

Bosnian

The Oslo Corpus of Bosnian Texts .

Czech

Parallel Czech-English Literature translations in Czech and English

Czech National Corpus project: SYN2000 100 million words of contemporary Czech.

French

Association des Bibliophiles Universels Various French literary works.

American and French Research on the Treasury of the French Language (ARTFL) 150 million word corpus of various genres of French. You have to be a member to use it (but membership is fairly cheap).

German

COSMAS Corpus Large (over a billion words!) online-searchable German and Austrian corpora. This is the publically available part of the 1.85 billion word Mannheimer Corpus Collection

NEGRA Corpus Saarland University Syntactically Annotated Corpus of German Newspaper Texts. Available free of charge to academics. 20,000 sentences, tagged, and with syntactic structures. Free for academic use.

Russian

Russian National Corpus 150 million words, 5 million words POS-tagged, some in dependency treebank.

Library of Russian Internet Libraries Various literary works.

Slovene

Slovene-English parallel corpus 1 M words, free to download + on-line concordances.

Coming soon: Slovene reference corpus of 100 M words

Spanish and Portuguese

TychoBrahe Parsed Corpus of Historical Portuguese Over a million words of Portuguese from different historical periods, some of it morphologically analyzed/tagged. Free.

Information about Mark Davies' collection of (mainly historical Spanish and Portuguese . It's not clear what their availability is.

The CUMBRE corpus. Contact Professor Aquilino Sánchez

The CRATER Spanish corpus Morphosyntactically tagged telecommunication manuals) is available by ftp .

Corpus resources for Portuguese In total about 70 million words, available free, from various sources (newswire, etc.)

Folha de S. Paulo newspaper 4 annual CDROMs with full text.

COMPARA Portuguese-English parallel corpus. (In general, various resources at Linguateca site.

Swedish

Spraakdata , Department of Swedish, Göteborgs University. Has various searcable part of speech tagged Swedish corpora (Parole, Bank of Swedish, etc.), and some material in Zimbabwean languages.

Treebanks

Name Language Size Availability Comments

Penn Treebank	US English	2 million + words	Available (distributed by LDC)	1 million WSJ, 1 million speech, surface syntax (1970s TG)
BLLIP WSJ corpus	US English	30 million words	Available (distributed by LDC)	WSJ newswire. Automatically parsed, not hand checked. Same structure as Penn Treebank, except for some additional coreference marking
ICE-GB	UK English	1 million words (83,394 sentences)	Available; c. 500 pounds	British part of ICE, the International Corpus of English project. Tagged and parsed for function. Half spoken material.
NEGRA Corpus	German	20,000 sentences	Available free of charge to academics on completion of license agreement.	Saarland University Syntactically Annotated Corpus of German Newspaper Texts. Tagged, and with syntactic structures.
TIGER corpus	German	700,000 words	Available free of charge for research purposes on completion of license agreement.	German newspaper text (Frankfurter Rundschau). Semi-automatically parsed. They also have a good treebank search tool, TIGERSearch .
Alpino Dependency Treebank	Dutch	150,000 words	Freely downloadable	Assorted subcorpora. By far the largest is the full cdbl (newspaper) part of the Eindhoven corpus.
The Prague Dependency Treebank 1.0	Czech	500,000 words	Free on completion of license agreement (available through LDC).	Analyzed at the levels of parts of speech, syntactic functions (and, in the future, semantic roles) level in a dependency framework. Text from newspapers and weekly magazines.
TUT: Turin University Treebank	Italian	2,400 sentences	Free download.	Morhpological analysis and dependency analysis. Penn Treebank translation. Civil law and newspaper texts.
Bulgarian Treebank	Bulgarian	n/a	POS-tagged texts and dependencies analyses are available (some are free on the web, others via a license agreement)	An under construction Bulgarian HPSG treebank.
Penn Chinese Treebank	Chinese	100,000 words	Available (LDC )	Based on Xinhua news articles. 1980s-style GB syntax.
Danish Dependency Treebank 1.0	Danish	100,000 words	Available free under the GPL.	Built on a portion of the Parole corpus.
Floresta Sintá(c)tica	Portuguese	168,000 words hand-corrected; 1,000,000 words automatically parsed	Hand corrected part is free web download; automatically parsed part available through email contact	Text from CETEMPúblico corpus . Phrase structure and dependency representations. Available in several formats, including Penn Treebank format.
Talbanken05	Swedish	300,000 words	Free download	Resurrects and modernizes an early treebank from the 1970s.

Verbmobil Tübingen : under construction treebanked corpus of German, English, and Japanese sentences from Verbmobil (appointment scheduling) data

Syntactic Spanish Database (SDB) University of Santago de Compostela. 160,000 clauses / 1.5 million words.

CKIP Chinese Treebank (Taiwan) . Based on Academia Sinica corpus. (There's also a 100 sentence Chinese treebank at U. Maryland.)

LDC Korean Treebank .

Dublin-Essex Treebank project Deriving Linguistic Resources from Treebanks.

Treebanks

CSTBank : Cross-document Structure Theory: marking sentence functional relationships across related documents.

Resources for Word Sense Disambiguation

The Senseval web site Has a comprehensive selection of resources for WSD, including a good list of WSD data resources , but not yet the new SEMCOR .

Ted Pedersen's code Includes various WSD systems.

SenseClusters Open source package for unsupervised discovery of word senses by clustering together instances of a word (or words) that are used in similar contexts in raw text, supporting a wide range of clustering techniques based on both context vectors and similarity matrices, and including links to SVDPACKC and CLUTO. Ted Pedersen and Amruta Purandare.

Evocation WordNet synset similarity judgments Judgments on how similar the meanings of synsets are and how common they are in the BNC from Jordan Boyd-Graber.

Literature

There are now quite large collections of online literature, available in various languages (though the majority are in English, of course). Below are pointers to some of the main collections:

Entirely or mainly English

Alex: A Catalogue of Electronic Texts on the Internet Seems to have one of the largest collection. Searching and browsing facilities through gopher menus. Many languages.

Wiretap Electronic Text Archive Extensive and good quality. Still in the gopher age, though.

The On-line Books Page The index here only covers books in English, but there are lots of links to other collections of material in all languages.

Project Gutenberg The oldest and largest project to get out of copyright literature online, freely available. (Or see the mirror, Sailor's Project Gutenberg site .)

The Electronic Text Center of the University of Virginia Large collection of SGML text, mainly in English, but also in other major languages.

Center for Electronic Texts in the Humanities Princeton/Rutgers collaboration. They didn't have it together with their web site when I stopped by, but they may soon.

Oxford Electronic Text Library Editions Available from Oxford University Press, 200 Madison Ave, NY, NY 10016 212-679-7300. The Complete Works of Jane Austen is $95.00, and is reviewed in Computers and the Humanities , 28:4-5 (Aug/Oct, 1994), 317-321.

Coreference annotated texts From University of Woverhampton (R. Mitkov, C. Barbu et al.).

Acquisition data

CHILDES database . Database of child language transcriptions in English and many other languages. Texts are also available by ftp . Certain usage requirements. Manuals and programs for accessing the data (the CLAN concordancer) are also available online. Now in Unicode XML.

SGML/XML

Robin Cover's SGML/XML Web Page This is a wonderful compendium of information on SGML and XML, including information on the Text Encoding Initiative (TEI) . This document is also a guide to many text collections (ones usi

分类模型（BERT）训练全流程巴伦是只猫人工智能分类 bert 数据挖掘
使用BERT实现分类模型的完整训练流程BERT(BidirectionalEncoderRepresentationsfromTransformers)是一种强大的预训练语言模型，在各种NLP任务中表现出色。下面我将详细梳理使用BERT实现文本分类模型的完整训练过程。1.准备工作1.1环境配置pipinstalltransformerstorchtensorflowpandassklearn1.2
解决修改android手机设置中字体大小后系统布局混乱的方法 f44148db1e8c
均属于笔记，仅供个人参考，有问题欢迎指正重写getResources方法@OverridepublicResourcesgetResources(){//returnsuper.getResources();//解决修改android手机设置中字体大小后系统布局混乱的方法，重构getResources，修改系统倍数对应用内sp的影响；Resourcesres=super.getResources(
Python FastMCP：让你的AI工具链飞起来
PythonFastMCP：让你的AI工具链飞起来FastMCPFastMCP是什么？1.工具(Tools)：赋予LLM执行能力2.Resources（资源）：安全数据通道3.Prompts（提示模板）：标准化LLM交互4.组件协同：构建项目AI工具链5.部署架构与性能优化博主热门文章推荐：官方文档：FastMCP官方文档：https://gofastmcp.com/MCP协议规范：https:/
从API到Agent：万字洞悉LangChain工程化设计 bpluo42657 langchain
——构建下一代AI应用的核心范式迁移一、传统API范式的局限性：为什么需要Agent？接口式AI的痛点python#传统NLPAPI调用示例response=openai.Completion.create(model="text-davinci-003",prompt="请翻译：Helloworld",max_tokens=50)单次请求/响应模式缺乏状态管理与上下文延续硬编码逻辑难以应对复杂场
利用 Python 爬取小红书热门笔记并进行标签关键词分析程序员威哥最新爬虫实战项目 python 笔记开发语言
一、背景与目标小红书（RED）作为中国最活跃的内容社区之一，拥有大量关于美妆、穿搭、美食、旅游等领域的用户生成内容（UGC）。对于产品、品牌方或研究人员来说，提取热门笔记的标签关键词，可以有效捕捉用户关注点、消费趋势及内容热词。本项目目标：使用Python爬取小红书某个话题下的热门笔记；分析每篇笔记中的标题、正文、标签等字段；利用NLP技术提取高频关键词；对关键词进行可视化与聚类分析。二、技术难点
在NLP深层语义分析中，深度学习和机器学习的区别与联系
在自然语言处理（NLP）的深层语义分析任务中，深度学习与机器学习的区别和联系主要体现在以下方面：一、核心区别特征提取方式机器学习：依赖人工设计特征（如词频、句法规则、TF-IDF等），需要领域专家对文本进行结构化处理。例如，传统情感分析需人工定义“情感词库”或通过词性标注提取关键成分。深度学习：通过神经网络自动学习多层次特征。例如，BERT等模型可从原始文本中捕获词向量、句法关系甚至篇章级语义，无
敏捷开发中的自然语言处理集成项目管理实战手册项目管理最佳实践敏捷流程自然语言处理 easyui ai
敏捷开发中的自然语言处理集成：让代码与需求“说人话”关键词：敏捷开发、自然语言处理（NLP）、用户故事分析、需求自动化、持续集成优化摘要：在敏捷开发中，“快速响应变化”的核心目标常被繁琐的文本处理拖慢——需求文档像“天书”、用户故事靠“脑补”、缺陷报告整理耗时……自然语言处理（NLP）就像一位“智能翻译官”，能让开发团队与需求文档“流畅对话”。本文将用“搭积木”“翻译机”等生活化比喻，带您理解如何
甘超波：NLP中EMBA状态管理甘超波
哈喽，大家好我是甘超波，一名NLP爱好者，每天一篇文章，分享我的NLP实战经验和案例，希望给你些启发和帮助，这是第23篇原创文章什么是EMBAEMBA：是总裁班吗？在NLP中EMBA指的一种状态管理，我们NLP所有技巧都是在EMBA中展开的，像催眠，潜意识沟通......等都是在基于EMBA。如果把NLP比作一个楼房，EMBA就是楼房的地基如果把NLP比作一个汽车，EMBA就是汽车的发动机。其中E
鸿蒙与web混合开发双向通信屿筱鸿蒙 HarmonyOS5
鸿蒙与web混合开发双向通信用runJavaScript和registerJavaScriptProxywebentry/src/main/resources/rawfile/1.html混合开发打开相册//直接写js代码functionchangeImg(){//1.获取img这个元素constimg=document.querySelector('img')//2.修改元素的属性img.src
springboot-mybatis-MySQL-集成张_皮皮 springboot mybatis maven springboot mybatis idea
这也是我第一次搭建springboot-mybatis的项目环境，记录一下。我是用IntelliJIDEA，你可以创建maven项目，也可以直接创建spring项目，最终的项目结构如下，这里说明下，resources下面的mappers里面是存放mybatis的SQL映射文件，static下面存放前端静态资源文件，如js,css等，template下存放前端模板文件，本项目使用的freemarke
打造智能资讯引擎：基于 Python 的新闻数据爬取与个性化推荐系统实战全流程解析程序员威哥最新爬虫实战项目 python 开发语言
前言：数据时代的信息洪流，如何做到“千人千面”？在信息爆炸的时代，每天都有成千上万条新闻资讯涌现。如何从海量内容中挖掘出用户感兴趣的资讯？这不仅仅是爬虫技术的问题，更是数据建模与智能推荐算法的落地挑战。本篇文章将带你从零出发，构建一个具有实际应用价值的“个性化新闻阅读推荐系统”，从数据采集（爬虫）、文本处理（NLP）、兴趣建模（TF-IDF/协同过滤/Embedding）到推荐展示，覆盖整个推荐系
题解 | #使用join查询找出没有分类的电影id以及名称# 愤怒的小青春 java
58同城java后端一面凉经主流的哈希算法有哪几种？帮闺蜜们找靠谱男票hc多多光彩积云是什么企业，查不到有用信息太抽象了！培训班装公司招聘阿里巴巴前端暑期实习——无语八面挂怎么写自我介绍|自我介绍保姆级教学灵犀互娱客户端一面面经(求过啊)24找运维实习，这简历可行吗拓竹科技测试开发面经（25届暑期实习）分享一波攒了整个秋招的NLP算法岗面经腾讯广告暑期实习面试1、JVM垃圾回收机制2、syncho
进阶向:基于Python的智能客服系统设计与实现
智能客服系统开发指南系统概述智能客服系统是人工智能领域的重要应用，它通过自然语言处理(NLP)和机器学习技术自动化处理用户查询，显著提升客户服务效率和响应速度。基于Python的实现方案因其丰富的生态系统（如NLTK、spaCy、Transformers等库）、跨平台兼容性以及易于集成的特点，成为开发智能客服系统的首选。系统架构系统核心包括两个主要功能模块：1.API集成模块负责连接各类外部服务，
计算机视觉产品推荐,个性化推荐:人工智能中的计算机视觉、NLP自然语言处理和个性化推荐系统哪个前景更好一些？...
这个问题直接回答的话可能还是有着很强的个人观点，所以不如先向你介绍一些这几个领域目前的研究现状和应用情况(不再具体介绍其中原理)你自己可以斟酌一下哪方面更适合自己个性化推荐。一．所谓计算机视觉，是指使用计算机及相关设备对生物视觉的一种模拟个性化推荐。它的主要任务就是通过对采集的图片或视频进行处理以获得相应场景的三维信息，就像人类和许多其他类生物每天所做的那样[1]。现在人工智能的计算机视觉主要研究
【NLP舆情分析】基于python微博舆情分析可视化系统(flask+pandas+echarts) 视频教程 - 基于wordcloud库实现词云图
大家好，我是java1234_小锋老师，最近写了一套【NLP舆情分析】基于python微博舆情分析可视化系统(flask+pandas+echarts)视频教程，持续更新中，计划月底更新完，感谢支持。今天讲解基于wordcloud库实现词云图视频在线地址：2026版【NLP舆情分析】基于python微博舆情分析可视化系统(flask+pandas+echarts+爬虫)视频教程（火爆连载更新中..
Java 异常处理详解：从基础语法到最佳实践，打造健壮的 Java 应用大葱白菜 java合集开发语言 java 后端个人开发学习
作为一名Java开发工程师，你一定遇到过运行时错误、空指针异常、文件找不到等问题。Java提供了强大的异常处理机制，帮助我们优雅地捕获和处理这些错误。本文将带你全面掌握：Java异常体系结构try-catch-finally的使用throw与throws的区别自定义异常类的设计Java7+新特性（try-with-resources）常见异常类型及排查方法异常处理的最佳实践与注意事项并通过丰富的代
大模型算法工程师技术路线全解析：从基础到资深的能力跃迁 Mr.小海大模型算法数据挖掘人工智能机器学习深度学习机器翻译 web3
文章目录大模型算法工程师技术路线全解析：从基础到资深的能力跃迁一、基础阶段（0-2年经验）：构建核心知识体系与工程入门数学与机器学习基础编程与深度学习框架NLP与Transformer入门二、进阶阶段（2-4年经验）：深化模型技术与工程落地能力大模型预训练与微调技术预训练原理：数据与任务的协同设计微调工具：参数高效适配与工程优化对齐实践：价值观优化与实证效果分布式训练与框架工具并行策略：多维度协同
spring.factories和org.springframework.boot.autoconfigure.AutoConfiguration.imports 程序员老陈头面试学习路线阿里巴巴 spring java 数据库
spring.factories和org.springframework.boot.autoconfigure.AutoConfiguration.imports都是SpringBoot自动配置机制中的重要组成部分一、spring.factories文件位于resources/META-INF目录下，主要作用不仅可以用来注册自动配置类，还可以用来注册各种其他类型的处理器和服务提供者文件中的内容是一
丰盛日记第三天幸运星小燕子
第123期NLP执行师二阶4组章艳Day3分享《有效引导他人的能力》学到情绪管理的方法和体验练习中感动的一天，我很开心！1、复习大脑结构:由原始脑、情绪脑、皮质层三部分组成；三部分需要充分配合和相互制约，考虑三赢后，才能做出正确的决定。2、情绪体验小游戏:树和松鼠，让我们提醒不同的情绪感受。3、处理情绪的四个方法:思维、体能、环境、关系；导师建议可以使用呼吸放松法，使自己的情绪可以及时的醒觉→_→
【转】【译】How to Handle Very Long Sequences with LSTM（LSTM RNN 超长序列处理）开始奋斗的胖子机器学习 RNN LSTM 序列深度学习
原文地址http://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-neural-networks/一个长的输入序列却只对应一个或者一小段输出就是我们经常说的序列标注和序列分类。主要包括下面一些例子：包含上千个词的文件情感分类（NLP）包含上千个时间状态的脑电痕迹分类（Medici
“闭门造车”之多模态思路浅谈：自回归学习与生成 PaperWeekly 回归学习数据挖掘人工智能机器学习
©PaperWeekly原创·作者|苏剑林单位|科学空间研究方向|NLP、神经网络这篇文章我们继续来闭门造车，分享一下笔者最近对多模态学习的一些新理解。在前文《“闭门造车”之多模态思路浅谈：无损》中，我们强调了无损输入对于理想的多模型模态的重要性。如果这个观点成立，那么当前基于VQ-VAE、VQ-GAN等将图像离散化的主流思路就存在能力瓶颈，因为只需要简单计算一下信息熵就可以表明离散化必然会有严重
自动字幕生成器：Auto-Subtitle — 技术解析与应用指南房耿园Hartley
自动字幕生成器：Auto-Subtitle—技术解析与应用指南在视频内容日益丰富的今天，自动字幕生成工具变得越来越重要，尤其是对于听障人士、非母语者和在嘈杂环境下观看视频的人来说。Auto-Subtitle是一个开源项目，它利用先进的自然语言处理（NLP）技术为你的视频自动生成准确的字幕。本文将深入探讨这个项目的原理、功能、应用场景及特点，以期吸引更多开发者和用户关注并使用。项目简介Auto-Su
同步发电机与逆变型电源故障电流特性对比实验研究神经网络15044 MATLAB专栏仿真模型生成对抗网络学习人工智能开发语言 matlab
同步发电机与逆变型电源故障电流特性对比实验研究前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击跳转到网站。1.研究背景与意义随着可再生能源在电力系统中的渗透率不断提高，逆变型电源(Inverter-BasedResources,IBR)在电网中的比重日益增加。与传统同步发电机相比，IBR的故障响应特性存在显著差异，这对电力系统的保护设计和运行控制提出了新的挑战
略说NLP引入公理模型的可行性金井PRATHAMA 知识图谱与NLP 自然语言处理人工智能知识图谱
在自然语言处理（NLP）的深层语义分析中，公理化体系的引入具有理论可行性，但其实际应用仍面临挑战。以下从公理模型的设计思路、关键技术要点及注意事项三个方面展开分析，结合搜索结果的多个相关技术点进行综合说明：一、公理模型在深层语义分析中的设计思路公理的定义与语义形式化公理模型需以形式化逻辑为基础，定义语义分析中的原始概念（如谓词、实体、关系）和推理规则。例如：原始概念：将语义角色（如施事者、受事者）
NLP中情感分析如何结合知识图谱在跨文化领域提升观念分析和价值判断的准确性？
情感分析结合知识图谱，能够显著提升观念分析和价值判断的准确性。这一融合的核心在于利用知识图谱的结构化语义网络，为情感分析提供深层语境、实体关联和领域知识支撑。以下是具体机制和应用场景的分析：一、知识图谱如何提升情感分析的语义理解1.解决歧义与上下文依赖问题：情感词（如“冷”）在不同语境中含义不同（“服务态度冷”表负面，“冷静分析”表中性）。方案：知识图谱通过实体链接识别文本中的对象（如“服务态度”
Python机器学习教程
Python机器学习教程(MachineLearningwithPythonTutorial)PDFVersionQuickGuideResourcesJobSearchDiscussionPDF版本快速指南资源资源求职讨论区MachineLearning(ML)isbasicallythatfieldofcomputersciencewiththehelpofwhichcomputersyste
SBERT、CoSENT和BETR以及transformers的区别和联系 panshengnan NLP nlp transformer
SBERT、CoSENT、BETR和Transformers的区别和联系这几个模型（SBERT、CoSENT、BETR）和框架（Transformers）都是围绕自然语言处理（NLP）的句子嵌入和语义理解任务展开的。它们的联系主要在于基于Transformer架构，并针对特定任务做了优化；区别则在于目标任务、优化策略、训练方法和适用场景等方面。1.联系基于Transformer架构：它们的核心编码
入门大模型神器：开源项目Happy LLM保姆级教程！
Happy-LLMHappy-LLM——从零开始的大语言模型原理与实践教程。本项目是一个系统性的LLM学习教程，将从NLP的基本研究方法出发，根据LLM的思路及原理逐层深入，依次为读者剖析LLM的架构基础和训练过程。同时，我们会结合目前LLM领域最主流的代码框架，演练如何亲手搭建、训练一个LLM，期以实现授之以鱼，更授之以渔。希望大家能从这本书开始走入LLM的浩瀚世界，探索LLM的无尽可能。特点•
【原创】下雨天要游泳饶金霞家庭教育心理咨询
下午，我照着昨天与小儿的约定，在四点半，就来到幼儿园门口接孩子。老师打开大门，孩子从教室里走出来，一见到我就问:“老妈，泳衣准备好了吗？”我半蹲下来拥抱他说：“都放在车上啦！”儿子在我额头上亲一口说：“你真是世界上最讲信用的好妈妈！”我有点怀疑我这儿子有NLP的基因，总是能及时地给沟通者作出良好的回应，而且还会用米尔顿。其实看着这满天的乌云，我心里还在嘀咕，这场大雨可能不会等到我们去游泳场。果不其
LLM系统性学习完全指南（初学者必看系列） GA琥珀 LLM 学习人工智能语言模型
前言这篇文章将系统性的讲解LLM（LargeLanguageModels,LLM）的知识和应用。我们将从支撑整个领域的数学与机器学习基石出发，逐步剖析自然语言处理（NLP）的经典范式，深入探究引发革命的Transformer架构，并按时间顺序追溯从BERT、GPT-2到GPT-4、Llama及Gemini等里程碑式模型的演进。随后，我们将探讨如何将这些强大的基础模型转化为实用、安全的应用，涵盖对齐
解读Servlet原理篇二---GenericServlet与HttpServlet 周凡杨 java HttpServlet 源理 GenericService 源码
在上一篇《解读Servlet原理篇一》中提到，要实现javax.servlet.Servlet接口（即写自己的Servlet应用），你可以写一个继承自javax.servlet.GenericServletr的generic Servlet ，也可以写一个继承自java.servlet.http.HttpServlet的HTTP Servlet（这就是为什么我们自定义的Servlet通常是exte
MySQL性能优化 bijian1013 数据库 mysql
性能优化是通过某些有效的方法来提高MySQL的运行速度，减少占用的磁盘空间。性能优化包含很多方面，例如优化查询速度，优化更新速度和优化MySQL服务器等。本文介绍方法的主要有： a.优化查询 b.优化数据库结构
ThreadPool定时重试 dai_lm java ThreadPool thread timer timertask
项目需要当某事件触发时，执行http请求任务，失败时需要有重试机制，并根据失败次数的增加，重试间隔也相应增加，任务可能并发。由于是耗时任务，首先考虑的就是用线程来实现，并且为了节约资源，因而选择线程池。为了解决不定间隔的重试，选择Timer和TimerTask来完成 package threadpool; public class ThreadPoolTest {
Oracle 查看数据库的连接情况周凡杨 sql oracle 连接
首先要说的是，不同版本数据库提供的系统表会有不同，你可以根据数据字典查看该版本数据库所提供的表。 select * from dict where table_name like '%SESSION%'; 就可以查出一些表，然后根据这些表就可以获得会话信息 select sid,serial#,status,username,schemaname,osuser,terminal,ma
类的继承朱辉辉33 java
类的继承可以提高代码的重用行，减少冗余代码；还能提高代码的扩展性。Java继承的关键字是extends 格式:public class 类名（子类）extends 类名（父类）{ } 子类可以继承到父类所有的属性和普通方法，但不能继承构造方法。且子类可以直接使用父类的public和 protected属性，但要使用private属性仍需通过调用。子类的方法可以重写，但必须和父类的返回值类
android 悬浮窗特效肆无忌惮_ android
最近在开发项目的时候需要做一个悬浮层的动画，类似于支付宝掉钱动画。但是区别在于，需求是浮出一个窗口，之后边缩放边位移至屏幕右下角标签处。效果图如下：一开始考虑用自定义View来做。后来发现开线程让其移动很卡，ListView+动画也没法精确定位到目标点。后来想利用Dialog的dismiss动画来完成。自定义一个Dialog后，在styl
hadoop伪分布式搭建林鹤霄 hadoop
要修改4个文件 1: vim hadoop-env.sh 第九行 2: vim core-site.xml <configuration> &n
gdb调试命令 aigo gdb
原文：http://blog.csdn.net/hanchaoman/article/details/5517362 一、GDB常用命令简介 r run 运行.程序还没有运行前使用 c cuntinue
Socket编程的HelloWorld实例 alleni123 socket
public class Client { public static void main(String[] args) { Client c=new Client(); c.receiveMessage(); } public void receiveMessage(){ Socket s=null; BufferedRea
线程同步和异步百合不是茶线程同步异步
多线程和同步 : 如进程、线程同步，可理解为进程或线程A和B一块配合，A执行到一定程度时要依靠B的某个结果，于是停下来，示意B运行；B依言执行，再将结果给A；A再继续操作。所谓同步，就是在发出一个功能调用时，在没有得到结果之前，该调用就不返回，同时其它线程也不能调用这个方法多线程和异步:多线程可以做不同的事情,涉及到线程通知 &
JSP中文乱码分析 bijian1013 java jsp 中文乱码
在JSP的开发过程中，经常出现中文乱码的问题。首先了解一下Java中文问题的由来： Java的内核和class文件是基于unicode的，这使Java程序具有良好的跨平台性，但也带来了一些中文乱码问题的麻烦。原因主要有两方面，
js实现页面跳转重定向的几种方式 bijian1013 JavaScript 重定向
js实现页面跳转重定向有如下几种方式：一.window.location.href <script language="javascript"type="text/javascript"> window.location.href="http://www.baidu.c
【Struts2三】Struts2 Action转发类型 bit1129 struts2
在【Struts2一】 Struts Hello World http://bit1129.iteye.com/blog/2109365中配置了一个简单的Action，配置如下 <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configurat
【HBase十一】Java API操作HBase bit1129 hbase
Admin类的主要方法注释： 1. 创建表 /** * Creates a new table. Synchronous operation. * * @param desc table descriptor for table * @throws IllegalArgumentException if the table name is res
nginx gzip ronin47 nginx gzip
Nginx GZip 压缩 Nginx GZip 模块文档详见：http://wiki.nginx.org/HttpGzipModule 常用配置片段如下： gzip on; gzip_comp_level 2; # 压缩比例，比例越大，压缩时间越长。默认是1 gzip_types text/css text/javascript; # 哪些文件可以被压缩 gzip_disable &q
java-7.微软亚院之编程判断俩个链表是否相交给出俩个单向链表的头指针，比如 h1 ， h2 ，判断这俩个链表是否相交 bylijinnan java
public class LinkListTest { /** * we deal with two main missions: * * A. * 1.we create two joined-List(both have no loop) * 2.whether list1 and list2 join * 3.print the join
Spring源码学习-JdbcTemplate batchUpdate批量操作 bylijinnan java spring
Spring JdbcTemplate的batch操作最后还是利用了JDBC提供的方法，Spring只是做了一下改造和封装 JDBC的batch操作： String sql = "INSERT INTO CUSTOMER " + "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
[JWFD开源工作流]大规模拓扑矩阵存储结构最新进展 comsci 工作流
生成和创建类已经完成,构造一个100万个元素的矩阵模型,存储空间只有11M大,请大家参考我在博客园上面的文档"构造下一代工作流存储结构的尝试",更加相信的设计和代码将陆续推出......... 竞争对手的能力也很强.......,我相信..你们一定能够先于我们推出大规模拓扑扫描和分析系统的....
base64编码和url编码 cuityang base64 url
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.io.StringWriter; import java.io.UnsupportedEncodingException;
web应用集群Session保持 dalan_123 session
关于使用 memcached 或redis 存储 session ，以及使用 terracotta 服务器共享。建议使用 redis，不仅仅因为它可以将缓存的内容持久化，还因为它支持的单个对象比较大，而且数据类型丰富，不只是缓存 session，还可以做其他用途，一举几得啊。1、使用 filter 方法存储这种方法比较推荐，因为它的服务器使用范围比较多，不仅限于tomcat ，而且实现的原理比较简
Yii 框架里数据库操作详解-[增加、查询、更新、删除的方法 'AR模式'] dcj3sjt126com 数据库
public function getMinLimit () { $sql = "..."; $result = yii::app()->db->createCo
solr StatsComponent（聚合统计） eksliang solr聚合查询 solr stats
StatsComponent 转载请出自出处：http://eksliang.iteye.com/blog/2169134 http://eksliang.iteye.com/ 一、概述 Solr可以利用StatsComponent 实现数据库的聚合统计查询，也就是min、max、avg、count、sum的功能二、参数
百度一道面试题 greemranqq 位运算百度面试寻找奇数算法 bitmap 算法
那天看朋友提了一个百度面试的题目：怎么找出{1,1,2,3,3,4,4,4,5,5,5,5} 找出出现次数为奇数的数字. 我这里复制的是原话，当然顺序是不一定的，很多拿到题目第一反应就是用map,当然可以解决，但是效率不高。还有人觉得应该用算法xxx,我是没想到用啥算法好...！还有觉得应该先排序... 还有觉
Spring之在开发中使用SpringJDBC ihuning spring
在实际开发中使用SpringJDBC有两种方式： 1. 在Dao中添加属性JdbcTemplate并用Spring注入； JdbcTemplate类被设计成为线程安全的，所以可以在IOC 容器中声明它的单个实例，并将这个实例注入到所有的 DAO 实例中。JdbcTemplate也利用了Java 1.5 的特定(自动装箱，泛型，可变长度
JSON API 1.0 核心开发者自述 | 你所不知道的那些技术细节 justjavac json
2013年5月，Yehuda Katz 完成了JSON API(英文，中文) 技术规范的初稿。事情就发生在 RailsConf 之后，在那次会议上他和 Steve Klabnik 就 JSON 雏形的技术细节相聊甚欢。在沟通单一 Rails 服务器库—— ActiveModel::Serializers 和单一 JavaScript 客户端库——&
网站项目建设流程概述 macroli 工作
一.概念网站项目管理就是根据特定的规范、在预算范围内、按时完成的网站开发任务。二.需求分析项目立项　　我们接到客户的业务咨询，经过双方不断的接洽和了解，并通过基本的可行性讨论够，初步达成制作协议，这时就需要将项目立项。较好的做法是成立一个专门的项目小组，小组成员包括：项目经理，网页设计，程序员，测试员，编辑/文档等必须人员。项目实行项目经理制。客户的需求说明书　　第一步是需
AngularJs 三目运算表达式判断 qiaolevip 每天进步一点点学习永无止境众观千象 AngularJS
事件回顾：由于需要修改同一个模板，里面包含2个不同的内容，第一个里面使用的时间差和第二个里面名称不一样，其他过滤器，内容都大同小异。希望杜绝If这样比较傻的来判断if-show or not，继续追究其源码。 var b = "{{", a = "}}"; this.startSymbol = function(a) {
Spark算子：统计RDD分区中的元素及数量 superlxw1234 spark spark算子 Spark RDD分区元素
关键字：Spark算子、Spark RDD分区、Spark RDD分区元素数量 Spark RDD是被分区的，在生成RDD时候，一般可以指定分区的数量，如果不指定分区数量，当RDD从集合创建时候，则默认为该程序所分配到的资源的CPU核数，如果是从HDFS文件创建，默认为文件的Block数。可以利用RDD的mapPartitionsWithInd
Spring 3.2.x将于2016年12月31日停止支持 wiselyman Spring 3
Spring 团队公布在2016年12月31日停止对Spring Framework 3.2.x（包含tomcat 6.x）的支持。在此之前spring团队将持续发布3.2.x的维护版本。请大家及时准备及时升级到Spring
fis纯前端解决方案fis-pure zccst JavaScript
作者：zccst FIS通过插件扩展可以完美的支持模块化的前端开发方案，我们通过FIS的二次封装能力，封装了一个功能完备的纯前端模块化方案pure。 1，fis-pure的安装 $ fis install -g fis-pure $ pure -v 0.1.4 2，下载demo到本地 git clone https://github.com/hefangshi/f

NLP Resources

Contents

Tools

Machine Translation systems

Instructions

Freely downloadable

Free, but getting them requires hassle

Part of Speech Taggers

Freely downloadable

Free, but require registration

Usable by email or on the web, but not distributed freely

Not free

No longer available

NP chunking

Downloadable

Generic sequence models

Downloadable

Parsers

Semantic Parsers

Downloadable

Named Entity Recognition

Downloadable

Coreference (Anaphora) Resolution

Downloadable

Language modeling toolkits

Downloadable

Downloadable, but requires registration

Not yet classified

Friendly concordancing and text analysis tools

Text summarization tools

Other

Downloadable

Free, but require registration

Unsure

Corpora

Large collections aimed at the NLP community

Particular languages

English

Chinese

Multilingual

Bosnian

Czech

French

German

Russian

Slovene

Spanish and Portuguese

Swedish

Treebanks

Treebanks

Resources for Word Sense Disambiguation

Literature

Entirely or mainly English

Acquisition data

SGML/XML

你可能感兴趣的:(NLP Resources)