tattarrattat

信息检索领域相关资料

zz from http://net.pku.edu.cn/~webg/IR-Guide.txt

信息检索领域相关资料 (A Guide to Information Retrieval)
Organized by Hongfei Yan
Last updated on April 19, 2006

---------------------
Contents
   Books
   + Finding Out About: Search Engine Technology from a cognitive
   Perspective (Belew, R.K., 2000)
   http://www-cse.ucsd.edu/~rik/foa/
   + Foundations of Statistical Natural (C. Manning and H. Schutze, 1999)
   + Information Retrieval, 2nd edition (C.J. van Rijsbergen, 1979)
   (full text)
   http://www.dcs.gla.ac.uk/Keith/Preface.html
   + Information Retrieval: A Survey (Ed Greengrass, 2000)
   http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf
   + Information Retrieval: Data Structures & Algorithms
   (Frakes, W. and Baeza-Yates, R., 1992)
   http://www.dcc.uchile.cl/~rbaeza/iradsbook/irbook.html
   + Information Retrieval Interaction (Ingwersen, P., Taylor Graham, 1992)
   http://www.db.dk/pi/iri/
   + Managing Gigabytes:compressing and indexing documents and images,
   2nd edition, (Ian H. Witten, Alistair Moffat,and Timothy Bell,1999)
   + Mining the Web: Discovering Knowledge from Hypertext Data
   (Soumen Chakrabarti, 2003)
   + Modeling the Internet and the Web:
   probabilistic Methods and Algorithms
   (Pierre Baldi, Paolo Frasconi and Padhraic Smyth, 2003)
   + Modern Information Retrieval
   (Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 2000)
   + Readings in Information Retrieval.
   (Sparck-Jones, K. and Willett, P., 1997)
   + Search Engine: Principle,Technology and Systems
   搜索引擎-原理、技术与系统
   (Xiaoming Li,et al., 2005 ), (full text)
   http://sewm.pku.edu.cn/book/dlbook.html
   + The Geometry of Information Retrieval
   (C.J. van Rijsbergen, 2004)
   http://ir.dcs.gla.ac.uk/GeometryOfIR/
   + The Turn: Integration of Information Seeking and Retrieval in Context
   (Ingwersen, P., and Jarvelin, K., 2005)
   + TREC: Experiment and Evaluation in Information Retrieval
   (Voorhees, E.M., and Harman, D.K., 2005)
   http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10667

   Conferences and Workshops
   + CIKM: Conference on Information and Knowledge Management
   http://www.csee.umbc.edu/cikm/
   + SIGIR: Special Interest Group on Information Retrieval
   http://www.sigir.org/
   + World Wide Web
   http://www.iw3c2.org/
   + SEWM: Symposium of Search Engine and WebMining
   全国搜索引擎和网上信息挖掘学术研讨会
   http://net.pku.edu.cn/~sewm/

   Courses
   + CMU Information Retrieval
   http://nyc.lti.cs.cmu.edu/classes/11-741/ (Spring 2006)
   Instructors: Jamie Callan and Yiming Yang
   + Cornell University The Structure of Information Networks (Spring 2006)
   http://www.cs.cornell.edu/courses/cs685/2006sp/
   Instructor: Jon Kleinberg
   + Peking University Web Based Information Architectures (Fall 2005)
   http://net.pku.edu.cn/~wbia/
   Instructor: Xiaoming Li, Jimin Wang and Bo Peng
   + Stanford Univ. Text Information Retrieval and Web Mining (Autumn 2005)
   http://www.stanford.edu/class/cs276/
   Instructor: Christopher Manning and Prabhakar Raghavan
   + UIUC Introduction to Text Information Systems (Spring 2006)
   http://sifaka.cs.uiuc.edu/course/498cxz06s/
   Instructor: ChengXiang Zhai
   + UMass Univ. Information retrieval course (Spring 2005)
   http://ciir.cs.umass.edu/cmpsci646/
   Instructors: James Allan
   + Washington Univ. Search Engines course
   http://courses.washington.edu/lis544/

   Evaluation Resources
   + CLEF: Cross-Language Evaluation Forum
   http://clef.iei.pi.cnr.it/
   + CWIRF: Chinese Web Information Retrieval Forum
   http://www.cwirf.org/
   + DUC: Document Understanding Conferences
   http://duc.nist.gov/
   + INEX: INitiative for the Evaluation of XML Retrieval
   http://inex.is.informatik.uni-duisburg.de/
   + NTCIR: NII-NACSIS Test Collection for IR Systems
   http://research.nii.ac.jp/ntcir/
   + TREC: Text REtrieval Conference
   http://trec.nist.gov/

   Journals
   + Briefings in Bioinformatics (full text)
   http://bib.oxfordjournals.org/archive/
   + Computational Linguistics, The MIT Press
   http://mitpress.mit.edu/catalog/item/default.asp?ttype=4&tid=10
   + Data & Knowledge Engineering (DKE), Elsevier
   http://www.elsevier.com/wps/find/journaldescription.cws_home/505608/description?navopenmenu=-2
   + D-Lib Magazine
   http://www.dlib.org/
   + Information Processing Letters, Elsevier
   http://www.elsevier.com/locate/issn/00200190
   + Information Processing and Management (IP&M), Elsevier
   http://www.elsevier.com/locate/infoproman
   + Information Retrieval, Springer
   http://www.springer.com/sgw/cda/frontpage/0,11855,3-0-70-35744790-detailsPage%253Djournal%257Cdescription%257Cdescription,00.html
   + Information Research
   http://informationr.net/ir
   + International Journal on Digital Libraries, Springer
   http://link.springer.de/link/service/journals/00799/index.htm
   + International Journal of Cooperative Information Systems (IJCIS),
   World Scientific
   http://ejournals.wspc.com.sg/ijcis/ijcis.shtml
   + International Journal on Document Analysis and Recognition, Springer
   http://link.springer.de/link/service/journals/10032/index.htm
   + International Journal of Intelligent Systems, Wiley
   http://www3.interscience.wiley.com/cgi-bin/jhome/36062
   + International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), World Scientific
   http://ejournals.wspc.com.sg/ijufks/ijufks.shtml
   + Journal of the American Society for Information Science and Technology (JASIST), Wiley
   http://www3.interscience.wiley.com/cgi-bin/jhome/76501873
   + Journal of Documentation (JDoc). Emerald
   http://www.emeraldinsight.com/0022-0418.htm
   + Journal of Intelligent Information Systems (JIIS), Springer
   http://www.wkap.nl/journalhome.htm/0925-9902
   + Knowledge and Information Systems (KAIS), Springer
   http://link.springer.de/link/service/journals/10115/index.htm
   + Natural Language Engineering, Cambridge University Press
   http://www.cambridge.org/journals/journal_catalogue.asp?mnemonic=NLE
   + Transactions On Information Systems (TOIS), ACM
   http://www.acm.org/tois/
   + Transactions on Knowledge and Data Engineering (TKDE), IEEE
   http://www.computer.org/tkde/

   List Archives
   + SIG-IRList, http://www.sigir.org/sigirlist/index.html

   Organizations and Special Interest Groups
   + Cambridge NLIP, http://www.cl.cam.ac.uk/Research/NL/
   + CMU LTI, http://www.lti.cs.cmu.edu/
   + DEC laboratories in Palo Alto, Calif.
   + Glasgow Information Retrieval Group, http://www.dcs.gla.ac.uk/ir/
   + Google Labs, http://labs.google.com/
   + LTI, http://www.lti.cs.cmu.edu/
   + Massachusetts CIIR, http://ciir.cs.umass.edu/
   + MSR Asia, Web Search & Data Mining Group
   http://research.microsoft.com/wsm/
   + Standford InfoLab, http://infolab.stanford.edu/
   + UIUC Information Retrieval Group, http://sifaka.cs.uiuc.edu/ir/
   + 北大天网组, http://sewm.pku.edu.cn/
   + 北京大学计算语言学研究所, http://icl.pku.edu.cn/
   + 复旦大学信息检索和自然语言处理组,
   http://www.cs.fudan.edu.cn/mcwil/irnlp/
   + 哈工大信息检索组, http://ir.hit.edu.cn/
   #+ 清华大学智能技术与系统国家重点实验室, (fail to visit the URL)
   # http://www.csai.tsinghua.edu.cn/
   + 中科院大规模内容计算组, http://159.226.40.18/

   Researchers
   + ChengXiang Zhai, developing Lemur
   http://www-faculty.cs.uiuc.edu/~czhai/
   + Gerard Salton
   http://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Salton.html
   + Karen Sparck, developing IDF
   http://www.cl.cam.ac.uk/users/ksj/
   + Keith van Rijsbergen
   http://www.dcs.gla.ac.uk/~keith/
   + Jamie Callan,
   http://www.cs.cmu.edu/~callan/
   + Jon Kleinberg, developing HIT
   http://www.cs.cornell.edu/home/kleinber/
   + Li Xiaoming, developing Tianwang & Infomall
   + Nick Craswell, developing Terabyte Track
   http://research.microsoft.com/~nickcr
   + Susan Dumais, developing LSI
   http://research.microsoft.com/~sdumais/
   + Yiming Yang, developing text categorization
   http://www.cs.cmu.edu/~yiming/
   + Stephen Robertson,
   http://research.microsoft.com/users/robertson/
   + Tefko Saracevic
   http://www.scils.rutgers.edu/~tefko/
   + W. Bruce Croft
   http://ciir.cs.umass.edu/personnel/croft.html

   Research-related Resources
   + http://www-faculty.cs.uiuc.edu/~czhai/research.html

   Software
   + Apache Lucene: a full-featured text search engine library
   http://lucene.apache.org/java/docs/index.html
   + Gate: a general architecture for text engineering
   http://gate.ac.uk/
   + Lemur: A full-text search engine
   http://www.lemurproject.org/
   + MG: A full-text search engine
   http://www.math.utah.edu/pub/mg/
   + Porter Stemmer: English stemming algorithm
   http://www.tartarus.org/martin/PorterStemmer/
   + Nutch: an open source web search engine
   http://sourceforge.net/projects/nutch/
   + TSE: A Tiny Search Engine
   http://sewm.pku.edu.cn/src/TSE/

---------------------
References:
[1] Information Retrieval Resources, http://www.sigir.org/resources.html
[2] http://ir.dcs.gla.ac.uk/resources.html
[3] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
[4] Diekemar, Information Retrieval Links, Jan. 28, 1999.
   http://web.syr.edu/~diekemar/ir.html
[5] 陈鸿标，网上研习信息检索，1999年11月.
   http://159.226.40.18/freshman/resources/网上研习信息检索.doc
[6] 数据挖掘研究院, http://www.dmresearch.net/
[7] 语音自然语言在线, http://www.snlpinfo.com/index.php
[8] PKU SEWM Group, http://sewm.pku.edu.cn/
[9] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
[10] http://icl.pku.edu.cn/member/lisujian/maincontent.htm
[11] http://www.cs.fudan.edu.cn/mcwil/irnlp/link.htm
[12] Robert Krovetz, A Guide to the Literature of Information Retrieval,
   http://159.226.40.18/freshman/resources/guide-to-ir-lit.ps
[13] ACM Digital Library,
   http://portal.acm.org/portal.cfm
   http://acm.lib.tsinghua.edu.cn/acm/
[14] http://www.sigir.org/proceedings/Proc-Browse.html
[15] SIGIR,
   http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES278&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
[16] WWW, International World Wide Web Conference
   http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES968&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
[17] China Digital Journal Community, http://wanfang.calis.edu.cn/wf/szhqk/index.html

---------------------

More details are listed as follows
====================
CIIR
(The Center for Intelligent Information Retrieval,
美国Massachusetts大学的智能信息检索中心)
http://ciir.cs.umass.edu/

The Center for Intelligent Information Retrieval, a National Science
Foundation-created S/IUCRC Center, is one of the leading information retrieval
research labs in the world. The CIIR develops tools that provide effective
and efficient access to large, heterogeneous, distributed, text and
multimedia databases.

CIIR accomplishments include significant research advances in the areas of
distributed information retrieval, information filtering, topic detection,
multimedia indexing and retrieval, document image processing, terabyte
collections, data mining, summarization, resource discovery, interfaces
and visualization, and cross-lingual information retrieval.

The Center for Intelligent Information Retrieval continues to support the
emerging information infrastructure, both through research and technology
transfer. The goal of the CIIR is to develop tools that provide effective
and efficient access to large, heterogeneous, distributed, text and
multimedia databases.

====================
Glasgow Information Retrieval Group
http://www.dcs.gla.ac.uk/ir/
由Keith van Rijsbergen率领的英国Glasgow大学信息检索研究小组。
这个小组理论和实践并重，旨在建造一个高效、新颖、成功的多媒体信息检索系统，
为终极用户服务。

The Information Retrieval Group led by Professor Keith van Rijsbergen has a
vigorous programme of research, based on both theory and experiment, aimed at
giving end-users novel, effective, and efficient access to the world of
multi-media information. The group, part of the Department of Computing Science,
University of Glasgow, has a strong research history in a wide area of
information retrieval research from theoretical modelling of the retrieval
process to advanced system building and to the user-oriented evaluation of
information retrieval systems. The group's interests also include many areas
of Web information retrieval such as link analysis, summarisation and the
development of novel interaction techniques (e.g., ostension, implicit feedback
and graphical visualisation). Our research preserves a strong emphasis on
the evaluation of interactive IR systems, and the group maintains strong links
with researchers in Human-Computer Interaction and Psychology.

------
Keith van Rijsbergen, http://www.dcs.gla.ac.uk/~keith/
英国格拉斯哥大学。概率IR的逻辑推理学派代表人，出版了著名的IR经典教材
INFORMATION RETRIEVAL，重点介绍用概率研究信息检的方法。

=====================
Cambridge NLIP Group
(Natural Language and Information Processing Group)
http://www.cl.cam.ac.uk/Research/NL/

Research in NLIP has been done in the Computer Laboratory for nearly fifty years.
The earliest work, by Roger Needham and Karen Sparck Jones, was on automatic
thesaurus construction, in the context of document retrieval and machine translation.
Subsequent research by Karen Sparck Jones during the 1960s and 70s focused on
statistical approaches to retrieval and included innovative work on term
weighting. From the later 1970s research in language processing developed,
with work on syntax, semantics and discourse processing,

------
Karen Sparck Jones, http://www.cl.cam.ac.uk/users/ksj/
Karen Sparck Jones has been one of the most influential figures in Computing
since the 1950’s. Her work on Information Retrieval and Natural Language Processing
has never been so central as it is are today, with its implications for
search engine technology, the semantic web and even bioinformatics.

In 1972, Karen Sparck Jones published in the Journal of Documentation the paper
which defined the term weighting scheme now known as inverse document frequency (IDF).

Karen Sparck Jones is emeritus Professor of Computers and Information at the
Computer Laboratory, University of Cambridge. She has worked in automatic
language and information processing research since the late fifties,
and has many publications including several books, most recently `Evaluating
Natural Language Processing Systems' with Julia Galliers, and `Readings in
Information Retrieval', edited with Peter Willett.

1988年度Salton奖得主。现代概率IR模型的另一创始人。在NLP、IR等领域都颇有建树，
而且做了大量的组织性工作。现在供职于英国剑桥大学计算机学院。

====================
LTI
CMU (Carnegie Mellon Universit) Language Technologies Institute,
http://www.lti.cs.cmu.edu/

The Language Technologies Institute (LTI) of the School of Computer Science at
Carnegie Mellon University conducts research and provides graduate education
in all aspects of language technology and information management. The LTI was
established in 1996, as an expansion of the Center for Machine Translation
(CMT).

The Center for Machine Translation (CMT) was a research branch of the School
of Computer Science devoted to basic and applied research in all aspects of
natural language processing, with a primary focus on machine translation,
speech processing, and information retrieval. Containing a unique mix of
academic and industrial researchers specializing in various aspects of
computer science, artificial intelligence, computational linguistics and
theoretical linguistics, the CMT provided a rich and diverse environment for
collaboration among faculty, staff, visiting scholars, and qualified students.

------
Lemur Toolkit
Lemur is a collection of search engine algorithms and information retrieval
applications used for IR research, development and education. Lemur provides a
rich query language that supports search against simple texts, structured
(XML) texts, and texts annotated with part-of-speech, named-entity, and other
annotations used in NLP and text-mining applications. Lemur's search engines
comfortably support collections ranging from a few gigabytes to a few
terabytes of text. The software is distributed under open-source license, and
is used widely in the IR research community.

====================
Standford InfoLab
http://infolab.stanford.edu/

The Stanford WebBase Project
http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/

The Stanford WebBase project is investigating various issues in crawling,
storage, indexing, and querying of large collections of Web pages. The project
builds on the previous Google activity that was part of the DLI1 initiative.
The DLI2 WebBase project aims to build the necessary infrastructure to
facilitate the development and testing of new algorithms for clustering,
searching, mining, and classification of Web content.
====================
北大天网组, http://sewm.pku.edu.cn/

   北京大学网络实验室自1997年开始从事搜索引擎方面的研究与系统开发，
技术积累深厚，综合实力和学术影响在国内一直处于领先地位。我们研发的
“天网”搜索引擎系统是全国最有影响的出自校园的搜索引擎，从1997年10月
开始一直运行至今。“天网”在增量搜索技术、快速检索技术，海量信息存储
技术等方面都具有较强的优势，她的不断发展培育了一批批在海量网络文本
信息处理方面有实战经验的学生，受到中外IT企业的普遍欢迎。
   从2001年开始，本研究组在搜索引擎技术的基础上，展开了中国互联网
信息历史的收集与存档工作，形成了“中国互联网信息博物馆”，至今已
收藏20亿在不同时期出现过的中文网页，是目前全国规模最大的历史网页收藏
与回放系统。同时，我们还尝试了在其基础上进行多学科交叉的研究。

====================
中科院大规模内容计算组
http://159.226.40.18/

   信息检索小组主要针对文本信息的检索开展研究，多次参加TREC会议，
取得了很好的研究成果。小组开发的天罗检索系统在很多国家重要的信息部门
得到了广泛的应用，目前主要的研究方向包括WEB信息的获取，WEB信息检索等。
   信息分析小组的研究主要集中在大规模多源异构信息的分析与挖掘方面，
主要包括文本分类与聚类、信息过滤、个性化服务、自然语言问答和浅层
自然语言处理等。小组研制了一系列文本信息加工处理的实验平台，目前实验
平台可以通过主页中“成果演示”进行演示。值得一提的是小组开展的公开源码
计划，其中的高性能分词系统ICTCLAS得到了研究人员的广泛认同与使用。

====================
复旦大学信息检索和自然语言处理组,
http://www.cs.fudan.edu.cn/mcwil/irnlp/

大规模文本处理主要研究自然语言（特别是中文信息）的处理技术和方法，
包括二个方面内容：首先是基础性工作，主要是基础性的理论和算法, 包括
自动分词、未登录词识别、词性和概念标注、句法分析和语义分析等,也包括
语料库的搜集整理等；其次是中文信息处理的应用技术，包括自动索引、
文本检索、文本摘要、文本分类和文本过滤，特别是上述技术在网络环境下
的应用。这部分工作是文本方向的研究重点。

====================
HIT-IRLab, http://ir.hit.edu.cn/

   哈工大信息检索研究室 (HIT-IRLab) 成立于 2001 年 3月。研究方向
包括文本检索、问答系统、自动文摘、文本挖掘和语言分析等，研究室以
语言分析为基础研究，以文本过滤为应用研究，以信息抽取为语言分析从
句子理解向篇章理解的延伸，以句子检索为在语言分析和篇章理解的支持
下的智能化精准检索技术。

====================
SIGIR（美国计算机学会信息检索特别兴趣小组）、
TREC（文本检索学术年会）
MUC（消息理解学术年会）
TIPSTER（美国国防部高级研究计划署的IR实践基地）

====================
北京大学计算语言学研究所
http://icl.pku.edu.cn/

   北京大学计算语言学研究所成立于1986年。致力于计算语言学理论、语言
信息处理的基础资源和应用技术三方面的研究。
   围绕计算语言学和自然语言处理，包括如下三个主要的方向：首先基础资源
的研究与建设：计算词典学与机器词典，综合型语言知识库，语料库语言学与
语料库加工技术，术语学、术语自动提取、术语标准化研究等。其次是基础理论、
NLP的模型和方法：计算语言学基础，自然语言处理核心技术，现代汉语语法，
汉语的词/句法/语义分析，NLP统计模型，语言处理的信息论方法等。另外是
应用技术：机器翻译的方法、技术与系统实现，信息检索与提取，自然语言
信息处理系统的评价方法和技术，受限汉语及其辅助写作系统，中国古诗词计算机
辅助研究等。

====================
#清华大学智能技术与系统国家重点实验室 (fail to visit the URL)
# http://www.csai.tsinghua.edu.cn/

   智能技术与系统国家重点实验室依托于清华大学。实验室于1990年2月
对外开放运行。主要从事人工智能基本原理、基本方法的基础与应用基础研究，
包括智能信息处理、机器学习、智能控制，以及神经网络理论等，还从事与
人工智能有关的应用技术与系统集成技术的研究，主要有智能机器人、声音、
图形、图像、文字及语言处理等。

================
Susan Dumais,
http://research.microsoft.com/~sdumais/

I am interested in algorithms and interfaces for improved information
retrieval, as well as general issues in and human-computer interaction. I
joined Microsoft Research in July 1997. I work on a wide variety of
information access and management issues, including: personal information
management, web search, question answering, information retrieval, text
categorization, collaborative filtering, interfaces for improved search and
navigation, and user/task modeling.

Prior to coming to Microsoft, I worked on a statistical method for
concept-based retrieval known as Latent Semantic Indexing. You can find
pointers to this work on the Bellcore (now Telcordia) LSI page.

===============
UIUC Information Retrieval Group
http://sifaka.cs.uiuc.edu/ir/

The Information Retrieval (IR) group is part of the Database and Information
Systems (DAIS) Lab of the Computer Science Department at University of
Illinois at Urbana-Champaign. We work on a wide spectrum of problems in the
general area of text information management, including retrieval,
organization, filtering , and mining of textual information, aiming at
developing advanced text information management techniques and systems that
help people make better use of text information.

------
ChengXiang Zhai,
http://www-faculty.cs.uiuc.edu/~czhai/

Research Interests: Information Retrieval, Text Mining, Natural Language
Processing, Bioinformatics

University of Illinois at Urbana-Champaign, is recognized for
his work on user-centered, adaptive intelligent information access. His
techniques expect to improve search-engine performance, support better
information organization and enable understanding of large volumes of
information. Zhai's work in information retrieval is expected to enhance
curricula and provide new educational tools for the growing information
technology workforce.

===============
Stephen Robertson,
http://research.microsoft.com/users/robertson/

Stephen Robertson joined Microsoft Research Cambridge in April 1998.

In 1998, he was awarded the Tony Kent STRIX award by the Institute of
Information Scientists. In 2000, he was awarded the Salton Award by ACM SIGIR.
He is a Fellow of Girton College, Cambridge.

At Microsoft, he runs a group called Information Retrieval and Analysis, which
is concerned with core search processes such as term weighting, document
scoring and ranking algorithms, and combination of evidence from different
sources. These are studied theoretically through the use of formal models,
mainly statistical, and statistical methods including machine learning
methods, and experimentally, through activities such as the Text Retrieval
Conference (TREC) and with internally generated evaluation sets. The group
(with its Keenbow evaluation environment) has had some excellent results at
TREC. The group works closely with product groups to transfer ideas and
techniques.

His main research interests are in the design and evaluation of retrieval
systems. He is the author, jointly with Karen Sparck Jones, of a probabilistic
theory of information retrieval, which has been moderately influential. A
further development of that model, with Stephen Walker, led to the term
weighting and document ranking function known as Okapi BM25, which is used in
many experimental text retrieval systems.

Prior to joining Microsoft, he was at City University London, where he retains
a part-time position as Professor of Information Systems in the Department of
Information Science (homepage). He was Head of Department for eight years,
during which time it achieved the highest possible rating in two successive
research assessment exercises. He also started the Centre for Interactive
Systems Research, the main research vehicle of which is the Okapi text
retrieval system, which has also done well at TREC.

Before joining City, he was a research fellow at University College London,
where he took his PhD in the School of Library Archive and Information
Studies. Before that he was in the research department at Aslib. He has an MSc
in Information Science from City and a first degree in mathematics from
Cambridge.

===================
Nick Craswell
http://research.microsoft.com/~nickcr

I am an associate researcher at Microsoft Research Cambridge, in the
Information Retrieval and Analysis Group.

Research Overview

I am interested in Web search evaluation, mostly on enterprise-scale webs but
also the World Wide Web. I built the VLC, VLC2, WT2g and .GOV test
collections, which have been made available to research groups around the
world. David Hawking and I coordinated the TREC Web Track experiments. I am
currently involved in the TREC Terabyte Track and Enterprise Track. Some
publications: Book chapter preprint (pdf), IR'01 (citeseer) and CSIRO'01
(pdf).

I also work on effective Web search, which means making use of information in
pages, link structure and URL structure to generate more useful Web search
results. Some papers: SIGIR'05 (pdf), SIGIR'01 (pdf), TOIS'03 (pdf) (copying
is by permission of ACM, Inc.) and ADCS'03 (pdf).

My PhD was in distributed information retrieval (thesis pdf) which means
building a system on top of multiple engines/databases that already exist. My
recent work in the area has considered whether (or when) DIR is really
practical. Some papers: ADC'99 (ps), DL'00 (pdf), ADC'03 (pdf) and ADC'04
(pdf).

===============
Web Search & Data Mining Group of MSR Asia
http://research.microsoft.com/wsm/

The goal of the Web Search & Data Mining Group of MSR Asia is to drive the
next generation of Web search by leveraging data mining, machine learning, and
knowledge discovery techniques for information analysis, organization,
retrieval, and visualization. In addition, in contrast with current Web search
methods, which essentially do document-level ranking and retrieval, the Web
Search & Data Mining Group has created search at the object level to bring
increased knowledge and intelligence to users.

A Glimpse at Several Core Innovations:

Large-scale Experimental Web Search Platform

The Web Search & Data Mining Group is creating a large scale search platform
to efficiently store, parse, index and search billions of Web pages and other
types of documents. The search platform is flexible enough to allow for
testing of various state-of-the-art search techniques that have been created
at the lab using new technologies.

Structuralizing the Web

The biggest challenge facing both users and search engines over the next
several decades is the continued unstructured growth of the Internet. As such,
search functions that can effectively and efficiently dig out
machine-understandable information and knowledge layers from unorganized and
unstructured Web data will be the key to supporting relevant search results.
To meet this challenge, the group is exploring technologies, namely Web
information extraction, deep Web mining, and Web structure mining that can
automatically classify structures and extract objects from the Web. The
information and knowledge gathered using these new techniques greatly improves
the performance of current Web search and even facilitates the creation of
more sophisticated next generation search technologies.

Vertical Search

Today's conventional search engines can be described as page-level search
engines whose main function is to rank web pages according to their relevance
to a given query. Driving the future of the search industry are functions that
delve deeper into vertical domains to provide knowledge and intelligence to
query results. At MSR Asia, the Web Search & Data Mining Group is addressing
the greatest challenges faced by vertical search including large scale web
classification, object-level information extraction, object identification and
integration, and object relationship mining and ranking. The results of these
efforts are leading to more advanced search engines that deliver intelligence
and insight to search results.

Mobile Search

The explosive growth of new computing devices such as handheld computers,
Windows Mobile-based PocketPCs, and SmartPhones is driving demand for greater
and more efficient information access. These devices, which leverage the power
of the Web and allow greater access to information than ever before, are still
not capable of performing at the level of a desktop PC. At MSR Asia, the Web
Search & Data Mining Group is inventing new technologies to improve the mobile
search and browsing experience and deliver the capabilities of a PC to users
of these new devices. Project initiatives include developing innovative
presentation schemes and user interfaces to facilitate search and browsing
tasks on mobile devices and developing context aware search technologies to
address the special information needs of mobile users.

Multimedia Search

The Web Search & Data Mining Group is conducting research into new
technologies that index multimedia content such as images, videos, and audio.
Through content analysis and advanced visualization techniques, the group is
transforming today's conventional text based search engines to include
multimedia content thus delivering more intelligent search results to users.
For example, the group recently developed a new multimedia news reader which
mines large archival news databases presenting text, map information, images,
and background music within a unique user interface providing readers with a
more efficient news search engine and a more enjoyable reading experience.

------
Wei-Ying Ma
http://research.microsoft.com/users/wyma/

Senior Researcher, Research Manager, Microsoft Research Asia

Dr. Wei-Ying Ma received the B.S. degree in electrical engineering from the
National Tsing Hua University in Taiwan in 1990, and the M.S. and Ph.D.
degrees in electrical and computer engineering from the University of
California at Santa Barbara in 1994 and 1997, respectively. From 1994 to 1997
he was engaged in the Alexandria Digital Library (ADL) project in UCSB while
completing his Ph.D. He developed a web-based image retrieval system called
Netra which has been frequently cited by other researchers and is regarded as
one of the most representative image retrieval systems. From 1997 to 2001, he
was with HP Labs where he worked in the field of multimedia adaptation and
distributed media services infrastructure. He joined Microsoft Research Asia
in 2001. Since then, he has been leading a research group to conduct research
in the areas of information retrieval, web search, data mining, mobile
browsing, and multimedia management. He currently serves as an Editor for the
ACM/Springer Multimedia Systems Journal and Associate Editor for ACM
Transactions on Information System (TOIS). He has served on the organizing and
program committees of many international conferences including ACM Multimedia,
ACM SIGIR, ACM CIKM, WWW, ICME, CVPR, SPIE Multimedia Storage and Archiving
Systems, SPIE Multimedia Communication and Networking, etc. He is also the
general co-chair of International Multimedia Modeling (MMM) Conference 2005
and International Conference on Image and Video Retrieval (CIVR) 2005. He has
published 5 book chapters and over 100 international journal and conference
papers.

====================
Google Labs
http://labs.google.com/

Google Labs is a playground for Google engineers and adventurous Google users.
Google staffers with wild and crazy ideas post their prototypes on Google Labs
and solicit feedback on how the technology could be used or improved. None of
these experiments are guaranteed to make it onto Google.com, as this is really
the first phase in the development process. Google users with a desire to jump
over the cutting edge are invited to check out any or all of the posted
prototypes and send their comments directly to the Googlers who developed
them. Please, remember to wear your safety goggles while using this site.

Labs.google.com, Google's technology playground.
Google labs showcases a few of our favorite ideas that aren't quite ready for
prime time. Your feedback can help us improve them. Please play with these
prototypes and send your comments directly to the Googlers who developed them.

Want to learn more about Google technology? Here are some papers.
http://labs.google.com/papers/index.html

Passionate about these topics? You should work at Google.
algorithms, artificial intelligence, compiler optimization,
computer architecture, computer graphics,
data compression, data mining, file system design,
genetic algorithms, information retrieval,
machine learning, natural language processing, operating systems,
profiling, robotics,
text processing, user interface design,
web information retrieval, and more!

http://www.google.com/press/podium.html
Google Press Center: The Google Podium
Here you'll find a selection of public presentations made by Google
executives. From time to time, we will continue to add transcripts, audio or
video clips and links to presentations hosted elsewhere.

====================
Jon Kleinberg
http://www.cs.cornell.edu/home/kleinber/

Professor of Computer Science, Cornell University

My research is concerned with algorithms that exploit the combinatorial
structure of networks and information. My recent work has included
* link analysis and modeling of the World Wide Web and related information networks;
* discrete optimization and network algorithms; and
* algorithmic approaches to clustering, indexing, and data mining.
====================

你可能感兴趣的:(自然语言处理)

想要了解大模型，看懂这一篇就够了！大模型工作流程及核心参数介绍！ Gq.xxu qwen3 vllm transforms 大语言模型部署深度学习人工智能
若想深入探究大模型核心参数的效果与作用，就务必先弄清大模型的工作流程，明确核心参数在流程各阶段的效能与功能，知晓其具体含义。一，大模型的工作流程大模型运行时的工作原理可以概括为输入处理→特征提取→模型推理→结果生成四个核心阶段，整个过程融合了深度学习架构、自然语言处理技术以及分布式计算能力。从用户输入到大模型输出，整个工作的处理流程如下：输入文本→分词→嵌入+位置编码→Transformer多层处
多角色AI Agent：基于LLM的虚拟角色扮演系统 AI天才研究院 AI人工智能与大数据人工智能 ai
多角色AIAgent：基于LLM的虚拟角色扮演系统关键词多角色AIAgentLargeLanguageModel(LLM)虚拟角色扮演系统人工智能自然语言处理程序设计摘要本文旨在探讨多角色AIAgent的基础知识以及其如何在虚拟角色扮演系统中发挥作用。我们将首先介绍多角色AIAgent的概念、历史背景和基本原理。随后，我们将深入探讨LLM（大语言模型）在虚拟角色扮演系统中的应用，包括其工作原理、核
Python在人工智能领域的实际应用：示例代码解析辣条yyds python python 人工智能开发语言
摘要：本文将通过几个典型的人工智能应用场景，展示Python在图像识别、自然语言处理、推荐系统等方面的高级用法。通过示例代码，带大家深入理解Python在人工智能领域的实际应用。正文：Python作为一门流行的编程语言，凭借其简洁的语法、丰富的库和框架，成为了人工智能（AI）领域的主流开发语言。下面，我们将通过几个示例，探讨Python在人工智能方向的实际应用。示例一：图像识别-使用OpenCV进
深入详解 AI 与深度学习：从零开始掌握 BERT 模型架构拉不拉斯AICoding 技术探索人工智能深度学习 bert
深入详解AI与深度学习：从零开始掌握BERT模型架构引言在自然语言处理（NLP）领域，BERT（BidirectionalEncoderRepresentationsfromTransformers）是近年来最具影响力的模型之一。它通过双向上下文理解彻底改变了NLP任务的处理方式。本文将从基础概念到核心原理、应用场景和实践技巧，深入浅出地讲解BERT，帮助初学者快速掌握这一技术。一、BERT的核心
提示词工程在实体关系抽取中的创新 AI天才研究院计算 ChatGPT AI人工智能与大数据 java python javascript kotlin golang 架构人工智能大厂程序员硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM 系统架构设计软件哲学 Agent 程序员实现财富自由
1.5概念结构与核心要素组成在深入探讨提示词工程在实体关系抽取中的应用之前，我们需要对其概念结构与核心要素组成有一个清晰的理解。这一部分将介绍提示词工程的基本框架，以及实体关系抽取的关键技术。提示词工程的基本框架提示词工程（PromptEngineering）是指利用人工智能技术和自然语言处理方法，设计并优化用于训练语言模型的输入提示（prompt），以达到特定任务目标的过程。其核心框架包括以下几
Transformer模型架构深度讲解
Transformer是一种在自然语言处理（NLP）和深度学习中非常重要的模型架构。它首次由Vaswani等人于2017年提出，主要应用于序列到序列的任务（如机器翻译、文本生成、摘要生成等）。Transformer模型与传统的RNN（循环神经网络）和LSTM（长短时记忆网络）不同，它不依赖于时间步的顺序处理，而是完全基于“注意力机制”进行计算，这使得它在训练速度、并行化能力和长期依赖问题的处理上具
AI人工智能浪潮中，GPT的技术优势凸显 AI学长带你学AI 人工智能 gpt ai
AI人工智能浪潮中，GPT的技术优势凸显关键词：人工智能、GPT、自然语言处理、深度学习、Transformer、大语言模型、技术优势摘要：本文深入探讨了在人工智能浪潮中GPT(GenerativePre-trainedTransformer)系列模型的技术优势。我们将从GPT的核心架构出发，分析其独特的技术特点，包括自注意力机制、预训练-微调范式、零样本学习能力等。通过与传统NLP方法的对比，揭
10.5 实战ChatGLM3私有数据微调之提示工程：批量生成数据稳定性秘籍少林码僧掌握先机！从 0 起步实战 AI 大模型微调打造核心竞争力机器学习深度学习人工智能语言模型
实战ChatGLM3私有数据微调之提示工程：批量生成数据稳定性秘籍在当今人工智能蓬勃发展的时代，大语言模型（LLMs）如ChatGLM3的出现，为自然语言处理领域带来了革命性的变化。企业和开发者们纷纷寻求利用这些强大的模型来构建定制化的应用，以满足特定业务需求。其中，使用私有数据对ChatGLM3进行微调，成为了实现差异化竞争和提供个性化服务的关键途径。然而，在微调过程中，确保批量生成数据的稳定性
【零基础学AI】第27讲：注意力机制（Attention） - 机器翻译实战 1989 0基础学AI 人工智能机器翻译自然语言处理 python tensorflow 机器学习神经网络
本节课你将学到理解注意力机制的核心思想掌握注意力计算的数学原理实现基于注意力机制的Seq2Seq模型构建英语到法语的神经翻译系统开始之前环境要求Python3.8+需要安装的包：tensorflow==2.8.0numpy==1.21.0matplotlib==3.4.0pandas==1.3.0前置知识RNN/LSTM原理（第26讲）序列数据处理（第26讲）自然语言处理基础（第14讲）核心概念为
AI LLM架构与原理 - 预训练模型深度解析陈乔布斯 AI 人工智能大模型人工智能架构机器学习深度学习大模型 Python AI
一、引言在人工智能领域，大型语言模型（LLM）的发展日新月异，预训练模型作为LLM的核心技术，为模型的强大性能奠定了基础。预训练模型通过在大规模无标注数据上进行学习，能够捕捉语言的通用模式和语义信息，从而在各种自然语言处理任务中展现出卓越的能力。本文将深入探讨AILLM架构与原理中预训练模型的方法论和技术，结合图解、代码解析和实际案例，为读者呈现一个全面且易懂的预训练模型图景。二、预训练模型的基本
【心灵鸡汤】深度学习技能形成树：从零基础到AI专家的成长路径全解析智算菩萨人工智能深度学习
引言：技能树的生长哲学在这个人工智能浪潮汹涌的时代，深度学习犹如一棵参天大树，其根系深深扎入数学与计算科学的沃土，主干挺拔地承载着机器学习的核心理念，而枝叶则繁茂地延伸至计算机视觉、自然语言处理、强化学习等各个应用领域。对于初入此领域的新手而言，理解这棵技能树的生长规律，掌握其形成过程中的关键节点和发展阶段，将直接决定其在人工智能道路上能够走多远、攀多高。技能树的概念源于游戏设计，但在学习深度学习
自然语言处理-基于预训练模型的方法-笔记
自然语言处理-基于预训练模型的方法-笔记【下载地址】自然语言处理-基于预训练模型的方法-笔记《自然语言处理-基于预训练模型的方法》由哈尔滨工业大学出版，深入探讨了NLP领域的前沿技术与预训练模型的应用。本书系统介绍了预训练模型的基本概念、发展历程及常见模型的原理，并通过丰富的实践案例与代码实现，帮助读者掌握这些技术在自然语言处理任务中的实际应用。无论是初学者、研发人员，还是希望提升NLP能力的研究
DeepSeek在智能教育评估中的应用：试题检索 AIGC应用创新大全 AI大模型与大数据技术 AI人工智能与大数据应用开发 MCP&Agent 云算力网络 easyui 前端 javascript ai
DeepSeek在智能教育评估中的应用：试题检索关键词：DeepSeek、智能教育、试题检索、自然语言处理、知识图谱、个性化学习、评估系统摘要：本文探讨了DeepSeek大模型在智能教育评估系统中的试题检索应用。我们将深入分析如何利用先进的自然语言处理技术和知识图谱构建高效的试题检索系统，实现个性化学习路径推荐和精准评估。文章将从核心概念、技术原理到实际应用场景，全面解析这一创新教育技术解决方案。
多模态大模型的技术应用与未来展望：重构AI交互范式的新引擎 zhaoyi_he 重构人工智能
一、引言：为什么多模态是AI发展的下一场革命？过去十年，深度学习推动了计算机视觉和自然语言处理的飞跃，但两者的发展路径长期割裂。随着生成式AI和大模型时代的到来，**多模态大模型（MultimodalFoundationModels）**以统一的建模方式处理图像、文本、音频、视频等多源数据，重塑了“感知-认知-决策”链条，为AGI迈出关键一步。OpenAI的GPT-4o、Google的Gemini
大语言模型应用指南：ReAct 框架 AI大模型应用实战 java python javascript kotlin golang 架构人工智能
大语言模型应用指南：ReAct框架关键词：大语言模型,ReAct框架,自然语言处理(NLP),模型融合,多模态学习,深度学习,深度学习框架1.背景介绍1.1问题由来近年来，深度学习技术在自然语言处理(NLP)领域取得了显著进展。尤其是大语言模型(LargeLanguageModels,LLMs)，如BERT、GPT系列等，通过在大规模无标签数据上进行预训练，获得了强大的语言理解和生成能力。然而，预
大语言模型原理基础与前沿基于语言反馈进行微调 AI天才研究院计算 AI大模型企业级应用开发实战 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
大语言模型原理基础与前沿基于语言反馈进行微调作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着深度学习技术的飞速发展，自然语言处理（NLP）领域取得了显著的进展。大语言模型（LargeLanguageModels，LLMs）如GPT-3、BERT等在各项NLP任务上取得了令人瞩目的成绩。然而，如何进一步提高大语言模型的理
四种微调技术详解：SFT 监督微调、LoRA 微调、P-tuning v2、Freeze 监督微调方法
当谈到人工智能大语言模型的微调技术时，我们进入了一个令人兴奋的领域。这些大型预训练模型，如GPT-3、BERT和T5，拥有卓越的自然语言处理能力，但要使它们在特定任务上表现出色，就需要进行微调，以使其适应特定的数据和任务需求。在这篇文章中，我们将深入探讨四种不同的人工智能大语言模型微调技术：SFT监督微调、LoRA微调方法、P-tuningv2微调方法和Freeze监督微调方法。第一部分：SFT监
探索AI人工智能医疗NLP实体识别系统的架构设计 AI学长带你学AI 人工智能自然语言处理 easyui ai
探索AI人工智能医疗NLP实体识别系统的架构设计关键词：人工智能、医疗NLP、实体识别、系统架构、深度学习、自然语言处理、医疗信息化摘要：本文将深入探讨医疗领域NLP实体识别系统的架构设计。我们将从基础概念出发，逐步解析医疗文本处理的特殊性，详细介绍实体识别技术的核心原理，并通过实际案例展示如何构建一个高效可靠的医疗实体识别系统。文章还将探讨当前技术面临的挑战和未来发展方向，为医疗AI领域的从业者
人工智能动画展示人类的特征 AGI大模型与大数据研究院 AI大模型应用开发实战 java python javascript kotlin golang 架构人工智能
人工智能，动画，人类特征，情感识别，行为模拟，机器学习，深度学习，自然语言处理1.背景介绍人工智能（AI）技术近年来发展迅速，已渗透到生活的方方面面。从智能语音助手到自动驾驶汽车，AI正在改变着我们的世界。然而，尽管AI技术取得了令人瞩目的成就，但它仍然难以完全模拟人类的复杂行为和特征。人类的特征是多方面的，包括情感、认知、社交和创造力等。这些特征是人类区别于其他生物的重要标志，也是人类社会文明发
RNN案例人名分类器（完整步骤） AI扶我青云志 rnn 人工智能深度学习 nlp lstm gru
今天给大家分享一个NLP（自然语言处理）中的一个小案例，本案例讲解了RNN、LSTM、GRU模型是如何使用并进行预测的，一、案例架构人名分类器的实现可分为以下五个步骤:第一步:导入必备的工具包第二步:对data文件中的数据进行处理，满足训练要求第三步:构建RNN模型(包括传统RNN,LSTM以及GRU)第四步:构建训练函数并进行训练五步第:构建评估函数并进行预测二、实现步骤1.导包#导入torch
多模态大模型：技术原理与实战看清GPT的进化史和创新点 AI天才研究院 Agentic AI 实战计算 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
多模态大模型：技术原理与实战看清GPT的进化史和创新点1.背景介绍1.1人工智能的发展历程1.1.1早期人工智能1.1.2机器学习时代1.1.3深度学习的崛起1.2自然语言处理的演进1.2.1基于规则的方法1.2.2统计机器学习方法1.2.3深度学习方法1.3大语言模型的出现1.3.1Transformer架构的提出1.3.2GPT系列模型的发展1.3.3多模态大模型的兴起2.核心概念与联系2.1
AI原生应用必知：5大高效多轮对话框架对比 AI原生应用开发 AI-native easyui 前端 ai
AI原生应用必知：5大高效多轮对话框架对比关键词：AI原生应用、多轮对话、对话框架、自然语言处理、上下文管理、意图识别、对话状态跟踪摘要：本文深入探讨了构建AI原生应用时必备的5大多轮对话框架，包括Rasa、Dialogflow、MicrosoftBotFramework、AmazonLex和IBMWatsonAssistant。通过对比分析它们的架构设计、核心功能和应用场景，帮助开发者选择最适合
【LangChain编程：从入门到实践】LangChain与其他框架的比较 AI天才研究院 Agentic AI 实战计算 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
【LangChain编程：从入门到实践】LangChain与其他框架的比较1.背景介绍1.1人工智能发展现状在当今时代，人工智能(AI)已经成为科技领域中最热门和最具革命性的话题之一。随着计算能力的不断提升和算法的持续优化,AI系统正在不断扩展其应用范围,包括自然语言处理、计算机视觉、决策系统等各个领域。1.2LangChain概述在这种背景下,LangChain作为一个新兴的AI框架应运而生。L
Nystromformer：一种基于 Nyström 方法的自注意力近似算法 AI专题精讲 Paper阅读人工智能自然语言处理 AI
1.摘要Transformer已经成为广泛自然语言处理任务中的强大工具。推动Transformer展现出卓越性能的一个关键组件是self-attention机制，它对每个token编码了其他token的影响或依赖关系。虽然self-attention机制具有诸多优势，但其在输入序列长度上的二次复杂度限制了其在较长序列上的应用——这是当前社区积极研究的一个主题。为了解决这一限制，我们提出了Nystr
供应链风险管理：AI预测潜在风险 AI智能应用 AI大模型应用入门实战与进阶 java python javascript kotlin golang 架构人工智能
供应链风险管理,AI预测,机器学习,深度学习,自然语言处理,风险评估,供应链可视化1.背景介绍在当今全球化经济体系中，供应链的复杂性和脆弱性日益凸显。供应链风险是指任何可能对供应链正常运行造成负面影响的事件或因素。这些风险可能来自自然灾害、政治动荡、经济波动、技术故障、供应商违约等方面。一旦供应链风险爆发，可能会导致生产中断、产品短缺、成本飙升、品牌形象受损等严重后果。传统供应链风险管理方法主要依
供应链风险管理：AI如何预测供应链风险 AI大模型应用之禅 java python javascript kotlin golang 架构人工智能
供应链风险管理,AI预测,机器学习,深度学习,自然语言处理,时间序列分析,风险评估1.背景介绍在当今全球化经济体系中，供应链风险已成为企业面临的重大挑战。供应链的复杂性和不可预测性使得企业更容易受到各种风险的影响，例如自然灾害、政治动荡、经济波动、疫情爆发等。这些风险可能导致供应中断、成本增加、交付延迟，甚至损害企业声誉。传统供应链风险管理方法主要依赖于经验和专家判断，缺乏数据驱动和预测能力。随着
使用Python爬虫与自然语言处理技术抓取并分析网页内容 Python爬虫项目 python 爬虫自然语言处理 javascript 数据分析人工智能
1.引言在如今数据驱动的时代，网页爬虫（WebScraping）和自然语言处理（NLP）已成为处理大量网页数据的重要工具。利用Python爬虫抓取网页内容，结合NLP技术进行文本分析和信息抽取，能够从大量网页中提取有价值的信息。无论是新闻文章的情感分析、社交媒体的舆情分析，还是电商网站的商品评论挖掘，这些技术都发挥着至关重要的作用。本文将介绍如何利用Python爬虫与自然语言处理技术抓取并分析网页
GPT在AI原生应用领域的无限潜力
GPT在AI原生应用领域的无限潜力关键词：GPT、AI原生应用、自然语言处理、无限潜力、应用场景摘要：本文深入探讨了GPT在AI原生应用领域所展现出的无限潜力。首先介绍了相关背景知识，包括GPT的基本概念和AI原生应用的定义。接着详细解释了GPT的核心概念，以及它与AI原生应用的紧密联系。通过数学模型和公式对GPT的工作原理进行了阐述，并给出了实际的代码案例。还探讨了GPT在多个实际应用场景中的表
深度学习前置知识全面解析：从机器学习到深度学习的进阶之路
一、引言：人工智能时代的核心技术在当今这个数据爆炸的时代，人工智能(AI)已经成为推动社会进步的核心技术之一。作为AI领域最重要的分支，深度学习(DeepLearning)在计算机视觉、自然语言处理、语音识别等领域取得了突破性进展，彻底改变了我们与机器交互的方式。本教案将从机器学习的基础知识出发，系统性地介绍深度学习的核心概念、数学基础、网络架构和训练方法，为读者构建完整的知识体系框架。无论你是刚
AI 加持下的智能家居行业：变革、挑战与机遇低代码老李人工智能智能家居
在当今科技迅猛发展的浪潮中，人工智能（AI）已深深融入智能家居领域，成为推动其蓬勃发展的关键力量，为人们的生活带来了诸多便利和创新体验，同时也面临着一系列亟待解决的问题。一、AI驱动的智能家居功能升级（1）智能语音交互与控制智能语音助手作为智能家居的核心交互方式，借助自然语言处理（NLP）技术，让用户仅通过简单的语音指令，就能轻松操控家中各类智能设备，如精准控制灯光的开关与亮度调节、窗帘的开合、电
SQL的各种连接查询 xieke90 UNION ALL UNION 外连接内连接 JOIN
一、内连接概念：内连接就是使用比较运算符根据每个表共有的列的值匹配两个表中的行。内连接（join 或者inner join ） SQL语法： select * fron
java编程思想--复用类百合不是茶 java 继承代理组合 final类
复用类看着标题都不知道是什么,再加上java编程思想翻译的比价难懂,所以知道现在才看这本软件界的奇书一:组合语法:就是将对象的引用放到新类中即可代码: package com.wj.reuse; /** * * @author Administrator 组
[开源与生态系统]国产CPU的生态系统 comsci cpu
计算机要从娃娃抓起...而孩子最喜欢玩游戏.... 要让国产CPU在国内市场形成自己的生态系统和产业链,国家和企业就不能够忘记游戏这个非常关键的环节.... 投入一些资金和资源,人力和政策,让游
JVM内存区域划分Eden Space、Survivor Space、Tenured Gen，Perm Gen解释商人shang jvm内存
jvm区域总体分两类，heap区和非heap区。heap区又分：Eden Space（伊甸园）、Survivor Space(幸存者区)、Tenured Gen（老年代-养老区）。非heap区又分：Code Cache(代码缓存区)、Perm Gen（永久代）、Jvm Stack(java虚拟机栈)、Local Method Statck(本地方法栈)。 HotSpot虚拟机GC算法采用分代收
页面上调用 QQ oloz qq
<A href="tencent://message/?uin=707321921&Site=有事Q我&Menu=yes"> <img style="border:0px;" src=http://wpa.qq.com/pa?p=1:707321921:1></a>
一些问题文强chu 问题
1.eclipse 导出 doc 出现“The Javadoc command does not exist.” javadoc command 选择 jdk/bin/javadoc.exe 2.tomcate 配置 web 项目 ..... SQL:3.mysql * 必须得放前面否则 select&nbs
生活没有安全感小桔子生活孤独安全感
圈子好小，身边朋友没几个，交心的更是少之又少。在深圳，除了男朋友，没几个亲密的人。不知不觉男朋友成了唯一的依靠，毫不夸张的说，业余生活的全部。现在感情好，也很幸福的。但是说不准难免人心会变嘛，不发生什么大家都乐融融，发生什么很难处理。我想说如果不幸被分手(无论原因如何)，生活难免变化很大，在深圳，我没交心的朋友。明
php 基础语法 aichenglong php 基本语法
1 .1 php变量必须以$开头 <?php $a=” b”; echo ?> 1 .2 php基本数据库类型 Integer float/double Boolean string 1 .3 复合数据类型数组array和对象 object 1 .4 特殊数据类型 null 资源类型(resource) $co
mybatis tools 配置详解 AILIKES mybatis
MyBatis Generator中文文档 MyBatis Generator中文文档地址： http://generator.sturgeon.mopaas.com/ 该中文文档由于尽可能和原文内容一致，所以有些地方如果不熟悉，看中文版的文档的也会有一定的障碍，所以本章根据该中文文档以及实际应用，使用通俗的语言来讲解详细的配置。本文使用Markdown进行编辑，但是博客显示效
继承与多态的探讨百合不是茶 JAVA面向对象继承对象
继承 extends 多态继承是面向对象最经常使用的特征之一：继承语法是通过继承发、基类的域和方法 //继承就是从现有的类中生成一个新的类，这个新类拥有现有类的所有extends是使用继承的关键字：在A类中定义属性和方法； class A{ //定义属性 int age； //定义方法 public void go
JS的undefined与null的实例 bijian1013 JavaScript JavaScript
<form name="theform" id="theform"> </form> <script language="javascript"> var a alert(typeof(b)); //这里提示undefined if(theform.datas
TDD实践（一） bijian1013 java 敏捷 TDD
一.TDD概述 TDD：测试驱动开发，它的基本思想就是在开发功能代码之前，先编写测试代码。也就是说在明确要开发某个功能后，首先思考如何对这个功能进行测试，并完成测试代码的编写，然后编写相关的代码满足这些测试用例。然后循环进行添加其他功能，直到完全部功能的开发。
[Maven学习笔记十]Maven Profile与资源文件过滤器 bit1129 maven
什么是Maven Profile Maven Profile的含义是针对编译打包环境和编译打包目的配置定制，可以在不同的环境上选择相应的配置，例如DB信息，可以根据是为开发环境编译打包，还是为生产环境编译打包，动态的选择正确的DB配置信息 Profile的激活机制 1.Profile可以手工激活，比如在Intellij Idea的Maven Project视图中可以选择一个P
【Hive八】Hive用户自定义生成表函数(UDTF) bit1129 hive
1. 什么是UDTF UDTF，是User Defined Table-Generating Functions，一眼看上去，貌似是用户自定义生成表函数，这个生成表不应该理解为生成了一个HQL Table，貌似更应该理解为生成了类似关系表的二维行数据集 2. 如何实现UDTF 继承org.apache.hadoop.hive.ql.udf.generic
tfs restful api 加auth 2.0认计 ronin47
　　目前思考如何给tfs的ngx-tfs api增加安全性。有如下两点：　　一是基于客户端的ip设置。这个比较容易实现。　　二是基于OAuth2.0认证，这个需要lua，实现起来相对于一来说，有些难度。　　现在重点介绍第二种方法实现思路。　　前言：我们使用Nginx的Lua中间件建立了OAuth2认证和授权层。如果你也有此打算，阅读下面的文档，实现自动化并获得收益。SeatGe
jdk环境变量配置 byalias java jdk
进行java开发，首先要安装jdk，安装了jdk后还要进行环境变量配置： 1、下载jdk（http://java.sun.com/javase/downloads/index.jsp），我下载的版本是：jdk-7u79-windows-x64.exe 2、安装jdk-7u79-windows-x64.exe 3、配置环境变量：右击"计算机"-->&quo
《代码大全》表驱动法-Table Driven Approach-2 bylijinnan java
package com.ljn.base; import java.io.BufferedReader; import java.io.FileInputStream; import java.io.InputStreamReader; import java.util.ArrayList; import java.util.Collections; import java.uti
SQL 数值四舍五入小数点后保留2位 chicony 四舍五入
1.round() 函数是四舍五入用，第一个参数是我们要被操作的数据，第二个参数是设置我们四舍五入之后小数点后显示几位。 2.numeric 函数的2个参数，第一个表示数据长度，第二个参数表示小数点后位数。例如：　　select cast(round(12.5,2) as numeric(5,2))
c++运算符重载 CrazyMizzz C++
一、加+，减-，乘*，除/ 的运算符重载 Rational operator*(const Rational &x) const{ return Rational(x.a * this->a); } 在这里只写乘法的，加减除的写法类似二、<<输出,>>输入的运算符重载 &nb
hive DDL语法汇总 daizj hive 修改列 DDL 修改表
hive DDL语法汇总１、对表重命名 hive> ALTER TABLE table_name RENAME TO new_table_name; 2、修改表备注 hive> ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comm
jbox使用说明 dcj3sjt126com Web
参考网址：http://www.kudystudio.com/jbox/jbox-demo.html jBox v2.3 beta [ 点击下载] 技术交流QQGroup：172543951 100521167 [2011-11-11] jBox v2.3 正式版 - [调整&修复] IE6下有iframe或页面有active、applet控件
UISegmentedControl 开发笔记 dcj3sjt126com
// typedef NS_ENUM(NSInteger, UISegmentedControlStyle) { // UISegmentedControlStylePlain, // large plain &
Slick生成表映射文件 ekian scala
Scala添加SLICK进行数据库操作，需在sbt文件上添加slick-codegen包 "com.typesafe.slick" %% "slick-codegen" % slickVersion 因为我是连接SQL Server数据库，还需添加slick-extensions，jtds包 "com.typesa
ES-TEST gengzg test
package com.MarkNum; import java.io.IOException; import java.util.Date; import java.util.HashMap; import java.util.Map; import javax.servlet.ServletException; import javax.servlet.annotation
为何外键不再推荐使用 hugh.wang mysql DB
表的关联，是一种逻辑关系，并不需要进行物理上的“硬关联”，而且你所期望的关联，其实只是其数据上存在一定的联系而已，而这种联系实际上是在设计之初就定义好的固有逻辑。在业务代码中实现的时候，只要按照设计之初的这种固有关联逻辑来处理数据即可，并不需要在数据库层面进行“硬关联”，因为在数据库层面通过使用外键的方式进行“硬关联”，会带来很多额外的资源消耗来进行一致性和完整性校验，即使很多时候我们并不
领域驱动设计 julyflame VO DAO 设计模式 DTO po
概念： VO（View Object）：视图对象，用于展示层，它的作用是把某个指定页面（或组件）的所有数据封装起来。 DTO（Data Transfer Object）：数据传输对象，这个概念来源于J2EE的设计模式，原来的目的是为了EJB的分布式应用提供粗粒度的数据实体，以减少分布式调用的次数，从而提高分布式调用的性能和降低网络负载，但在这里，我泛指用于展示层与服务层之间的数据传输对
单例设计模式 hm4123660 java Singleton 单例设计模式懒汉式饿汉式
单例模式是一种常用的软件设计模式。在它的核心结构中只包含一个被称为单例类的特殊类。通过单例模式可以保证系统中一个类只有一个实例而且该实例易于外界访问，从而方便对实例个数的控制并节约系统源。如果希望在系统中某个类的对象只能存在一个，单例模式是最好的解决方案。 &nb
logback zhb8015 log logback
一、logback的介绍 Logback是由log4j创始人设计的又一个开源日志组件。logback当前分成三个模块：logback-core,logback- classic和logback-access。logback-core是其它两个模块的基础模块。logback-classic是log4j的一个改良版本。此外logback-class
整合Kafka到Spark Streaming——代码示例和挑战 Stark_Summer spark storm zookeeper PARALLELISM processing
作者Michael G. Noll是瑞士的一位工程师和研究员，效力于Verisign，是Verisign实验室的大规模数据分析基础设施（基础Hadoop）的技术主管。本文，Michael详细的演示了如何将Kafka整合到Spark Streaming中。期间， Michael还提到了将Kafka整合到 Spark Streaming中的一些现状，非常值得阅读，虽然有一些信息在Spark 1.2版
spring-master-slave-commondao 王新春 DAO spring dataSource slave master
互联网的web项目，都有个特点：请求的并发量高，其中请求最耗时的db操作，又是系统优化的重中之重。为此，往往搭建 db的一主多从库的数据库架构。作为web的DAO层，要保证针对主库进行写操作，对多个从库进行读操作。当然在一些请求中，为了避免主从复制的延迟导致的数据不一致性，部分的读操作也要到主库上。（这种需求一般通过业务垂直分开，比如下单业务的代码所部署的机器，读去应该也要从主库读取数