By Marcus P. Zillman
Marcus P. Zillman, M.S., A.M.H.A., is Executive Director of the Virtual Private Library and Founder/Creator of BotSpot? He is the author of nine different Internet MiniGuides 2005, Internet Sources Manual and eCurrent Awareness Resources 2005 Report. His Subject Tracer?Information Blogs (41 and constantly growing) are freely available from the Virtual Private Library, which include the latest resources on Deep Web Research and Bot Research. His current white papers on searching and researching the Internet are located at WhitePapers.us. His personal blog dedicated to knowledge discovery, knowledge harvesting, information retrieval and Internet current awareness is available at Zillman.us. His monthly free newsletter is titled AwarenessWatch?/a> and his monthly Internet Zillman Column has been archived since 1996.Published January 17, 2005
Bots, Blogs and News Aggregators is a keynote presentation that I have been delivering over the last year, and much of my information comes from the extensive research that I have completed over the years into the 搃nvisible? or what I like to call the 揹eep?web. The Deep Web covers somewhere in the vicinity of 600 billion pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 8 billion pages at the present time of this writing.
In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, and .ppt . These files are predominately used by businesses to communicate their information within their organization or to disseminate information to the external world from their organization. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the 損roperties?information on these files! This is interesting research that was written and posted in my personal blog a few months ago.
This article and guide is designed to give you the resources you need to better understand the history of the deep web research, as well as various classified resources that allow you to search through the currently available web to find those key sources of information nuggets only found by understanding how to search the 揹eep web?This Deep Web Research 2005 article is divided into the following sections:
Articles, Papers, Audios and Videos (Current and Historical) Cross Database Articles Cross Database Search Cross Database Search Tools Presentations Peer to Peer, File Sharing, Grid and Matrix Search Engines Resources - Deep Web Research Resources - Semantic Web Resources Bot Research Current Subject Tracer?Information Blogs
Articles, Papers, Audios and Videos (Current and Historical)
Academic and Scholar Search Engines and Sources
http://zillman.blogspot.com/2004/12/academic-and-scholar-search-engines.html
A Crisis for Web Preservation by Florence Olsen
http://snipurl.com/78te
All of OCLC抯 WorldCat Heading Toward the Open Web by Barbara Quint
http://www.infotoday.com/newsbreaks/nb041011-2.shtml
Annotation for the Deep Web
http://csdl.computer.org/comp/mags/ex/2003/05/x5042abs.htm
Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery
http://portal.acm.org/citation.cfm?id=640423&dl=ACM&coll=portal
Benevolent "Virus" Helps Reveal the Hidden Web
http://www.syllabus.com/article.asp?id=9680
Bot Research
http://www.BotResearch.info/
Common Information Environment Seeks To Reveal the Hidden Web
http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html
Crawling the Hidden Web by Sriram Raghavan and Hector Garcia-Molina
http://citeseer.ist.psu.edu/461253.html
Current Awareness Discovery Tools on the Internet
http://snipurl.com/57jl
Data Extraction and Label Assignment for Web Databases
http://www2003.sztaki.hu/cdrom/papers/refereed/p470/p470-wang.htm
Deep Content - Guide To Effective Searching of the Internet
http://www.brightplanet.com/deepcontent/tutorials/search/index.asp
Deep Web - Exploring the Secrets of the Hidden Internet by Marcus P. Zillman, M.S., A.M.H.A., - 23 minutes - Internet/Technology Channel
http://www.planetearthradio.com/
Desperately seeking Web Search 2.0
http://snipurl.com/64im
DigiCULT Thematic Issue 6
Resource Discovery Technologies for the Heritage Sector, June 2004
Download Thematic Issue 6:Link HiRes .pdf (4,9 MB)
http://snipurl.com/7v46
Diving in the Deep End of the Web by Suzanne Ross
http://research.microsoft.com/displayArticle.aspx?id=1052
Easy Topic Maps
http://easytopicmaps.com/
Econtent: Invisible Web Catalog
http://snipurl.com/5tbc
Farewell, Web 1.0! We Hardly Knew Ye by Steven Levy
http://www.msnbc.msn.com/id/6214349/site/newsweek/
Finding the Invisible Web by Jennifer Laycock
http://websearch.about.com/library/weekly/aa061903a.htm
Fugitive Documents Evade Federal Depositories
http://snipurl.com/78te
Google Teams Up with 17 Colleges to Test Searches of Scholarly Materials By Jeffrey R. Young
http://chronicle.com/free/2004/04/2004040901n.htm
Graph Structure in the Web
http://www.almaden.ibm.com/cs/k53/www9.final/
Gray Literature: Resources for Locating Unpublished Research by Brian S. Mathews
http://snipurl.com/5i3b
Guardian Unlimited: Search for the Invisible Web
http://www.guardian.co.uk/online/story/0,3605,547140,00.html
Indexing Deep Web Content By Paul Bruemmer
http://www.searchengineguide.com/wi/2002/0327_wi2.html
In Search of the Deep Web
http://www.salon.com/tech/feature/2004/03/09/deep_web/index_np.html
Internet Insights - Thoughts about Federated Searching by Peter Jasco
http://hypatia.slis.hawaii.edu/~jacso/extra/infotoday/federated/federated.htm
Invisible Web Gets Deeper
http://www.searchenginewatch.com/sereport/article.php/2162871
Invisible Web Revealed
http://www.searchenginewatch.com/sereport/article.php/2167321
IR and IE on the Web - PhD and MSc Dissertations
http://www.webir.org/phd.html
JEP: The Deep Web
http://www.press.umich.edu/jep/07-01/bergman.html
Kent State University: Searching the Invisible Web
http://www.library.kent.edu/internet/invisible_web/
Library Journal: Braking Through the Invisible Web
http://snipurl.com/5tbb
LLRX: Book Review: The Invisible Web
http://www.llrx.com/features/invisibleweb.htm
LLRX: Deep Web Research
http://www.llrx.com/features/deepweb.htm
LLRX: Mining Deeper Into the Invisible Web
http://www.llrx.com/features/mining.htm
LLRX: ResearchWire: Exposing the Invisible Web
http://www.llrx.com/columns/exposing.htm
Metadata? Thesauri? Taxonomies? Topic Maps! by Lars Marius Garshol
http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html
Mining Newsgroups Using Networks Arising From Social Behavior
http://www2003.sztaki.hu/cdrom/papers/refereed/p688/688-agrawal/index.html
Mining the Deep Web With Specialized Drills
http://snipurl.com/5tbd
Mining the Invisible Web
http://www.miningtheinvisibleweb.com/
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews
http://www2003.sztaki.hu/cdrom/papers/refereed/p451/package/p451-dave.htmlMining Topic-Specific Concepts and Definitions on the Web
http://www2003.sztaki.hu/cdrom/papers/refereed/p646/p646-liu-XHTML/p646-liu.html
Modeling and Mining of Network Information Systems Publications
http://www.mathstat.dal.ca/~mominis/Publications.htm
Net Plan Builds in Search by Kimberly Patch
http://snipurl.com/5kn0
New Profusion Site Offers Better View of Invisible Web
http://www.searchenginewatch.com/sereport/article.php/2163591
Noisy Channels Models Provide Short Answers to FAQs
http://www.economist.com/printedition/displayStory.cfm?Story_ID=3127462
Old Search Engine, the Library, Tries to Fit Into a Google World
http://snipurl.com/78rr
Online or Invisible?
http://www.neci.nec.com/~lawrence/papers/online-nature01/
OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites
http://www.public.asu.edu/~hdavulcu/VLDB-WS03.pdf
OpenIndex - Creating a Public Internet Index
http://www.openindex.org/index.php
PhysicsWeb: The Physics of the Web
http://physicsweb.org/article/world/14/7/09
Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries, Machine Learning, Neural Networks [Steve Lawrence, NEC Research Institute]
http://www.neci.nec.com/~lawrence/papers.html
Researcher Retrain Thyself
http://www.infotoday.com/online/sep04/OnTheNet.shtml
Researchers Map of the Web
http://www.almaden.ibm.com/almaden/webmap_press.html
Scientific American: Featured Article: The Semantic Web
http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2
Scraping the Web for Implied Data
http://searchenginewatch.com/searchday/article.php/3374821
Search Engine Hunts for Gold Beneath the Surface of the Web
http://snipurl.com/5tbe
Search Engine Meeting 2004 Hague, The Netherlands - White Papers and Presentations
http://www.infonortics.com/searchengines/sh04/04pro.html
Search Engine Technology and Digital Libraries
http://www.dlib.org/dlib/june04/lossau/06lossau.html
Searching the Deep Web
http://www.dlib.org/dlib/january01/warnick/01warnick.html
Searching the Deep Web - Video
http://www.osti.gov/media/DeepWebVideo.html
Searching the Internet (White Paper, Audio and Video)
http://www.SearchingTheInternet.info/
Seeing the Invisible Web
http://lib.berkeley.edu/TeachingLib/Guides/Internet/InvWebPowerpoint/index.htm
Seeing through the 'invisible' Web
http://www.usatoday.com/tech/2001/10/15/invisible-web-search.htm
Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)by A. Lopatenko
http://eprints.osti.gov/cgi-bin/dexpldcgi?qry1123892181;12
Smart Search - Advanced Search Engines Link Many Data Sources
http://gcn.com/23_24/tech-report/26999-1.html
Spidering Hacks
http://www.oreilly.com/catalog/spiderhks/
Structured Databases on the Web: Observations and Implications
http://snipurl.com/az5u
Technology Review: A Smarter Web
http://www.technologyreview.com/articles/frauenfelder1101.asp
Testbed for Information Extraction from Deep Web
http://www2004.org/proceedings/docs/2p346.pdfThe Deep Web
http://library.albany.edu/internet/deepweb.html
The Deep Web: Surfacing Hidden Value by Michael K. Bergman
http://www.press.umich.edu/jep/07-01/bergman.html
The Future Of News: The Digital Information Librarian
http://www.masternewmedia.org/2004/03/24/the_future_of_news_the.htmThe Hidden Potential of the Web
http://snipurl.com/5yv3
The Invisible Web by Chris Sherman
http://www.freepint.com/issues/080600.htm#feature
The Invisible Web for Educators
http://www3.dist214.k12.il.us/invisible/article/invisiblearticle.html
The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent Ambiguity
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The Invisible Web: Where Search Engines Fear To Go
http://www.powerhomebiz.com/vol25/invisible.htm
The Mechanics of Deep Net Meta Search
http://turbo10.com/papers/deepnet.pdf
The Seventh Asia Pacific Web Conference (APWeb05)
http://apweb05.csm.vu.edu.au/index.asp
The World Wide Web as a DeFacto Database: Using Technology to Find, Maintain and Update Current Company Information
http://www.intelliseek.com/whitepapers.asp
Traffic-Based Feedback on the Web by Jonathan Aizen, Daniel Huttenlocher, Jon Kleinberg, and Antal Novak
http://www.pnas.org/cgi/content/abstract/0307539100v1
What is the Invisible Web? A Crawler Perspective by Natalia Arroyo, Laboratorio de Internet
http://cybermetrics.wlv.ac.uk/AoIRASIST/arroyo.html
UMBC - AgentNews
http://agents.umbc.edu/agentnews/
Understanding Metadata
http://www.niso.org/standards/resources/UnderstandingMetadata.pdf
Web Characterization Project
http://wcp.oclc.org/
Web Pages Search Engine Based on DNS by Wang Liang, Guo Yi-Ping, and Fang Ming
http://arxiv.org/pdf/cs.NI/0403035
What is the Invisible Web?
http://websearch.about.com/library/weekly/aa061203a.htm
Using the Internet As a Dynamic Resource Tool for Knowledge Discovery
http://zillman.blogspot.com/2003_08_01_zillman_archive.html#106198657492603187
Yahoo and the Deep Web
http://news.com.com/2100-1024-5167931.html
ZDNet: I've Discovered the 'invisible Web'--Have You? Here's How!
http://reviews-zdnet.com.com/4520-6033_16-4206148.html
Cross Database Articles
Digital Libraries- Cross-Database Search: One-Stop Shopping
http://snipurl.com/5tbf
Search Tools Reports: Searching for Text Information in Databases
http://www.searchtools.com/info/database-search.html
The Right Solution: Federated Search Tools by Roy Tennant
http://snipurl.com/5zxp
UK Web Archiving Consortium
http://www.webarchive.org.uk/
Cross Database Search Services
Entrez - The Life Sciences Cross-Database Search Engine
http://www.ncbi.nlm.nih.gov/Entrez/index.html
EnergyFiles - Subject Pathways
http://energyfiles.osti.gov/
FlashPoint
http://flashpoint.lanl.gov/
GPO Access - Search Across Multiple Databases
http://www.gpoaccess.gov/multidb.html
HepLink -- Viral Hepatitis Deep Search Portal
http://www.heplink.org/
Hermes
http://www.ibt.unam.mx/biblioteca/
King County Library System
http://www.kcls.org/
NLM Gateway Search
http://gateway.nlm.nih.gov/gw/Cmd
SearchLight
http://searchlight.cdlib.org/cgi-bin/searchlight
SUMSearch
http://sumsearch.uthscsa.edu/
Cross Database Search Tools
Apple - Mac - Sherlock
http://www.asia.apple.com/sherlock/
askOnce
http://www.askonce.com/
Blue Angel Technologies
http://www.blueangeltech.com/
Bright Planet
http://brightplanet.com/
Copernic
http://www.copernic.com/en/index.html
Dublin Core Metadata Initiative (DCMI)
http://www.dublincore.org/
ENCompass Solutions
http://encompass.endinfosys.com/
Intelliseek
http://www.intelliseek.com/
Kepler - A Digital Library For Building Communities
http://kepler.cs.odu.edu/
MetaLib
http://www.exlibris-usa.com/metalib/
MetaSearch Initiative
http://www.niso.org/committees/MetaSearch-info.html
mod_oai Project - Getting OAI-PMH For Free
http://www.modoai.org/
MuseGlobal
http://www.museglobal.com/
Peter's PolySearch Engines
http://www2.hawaii.edu/~jacso/extra/poly-page.html
PBCore - The Public Broadcasting Metadata Dictionary
http://www.utah.edu/cpbmetadata/
Profusion
http://www.fusion.com/
Registry of Library Knowledge Bases
http://www.public.iastate.edu/~CYBERSTACKS/KBL.htm
Search Federal Research and Development
http://fedrnd.osti.gov/
SRW - Search/Retrieve Web Service
http://www.loc.gov/z3950/agency/zing/srw/STINET Multisearch
http://multisearch.dtic.mil/
The Flamenco Search Interface Project
http://bailando.sims.berkeley.edu/flamenco.html
VIAF: The Virtual International Authority File
http://www.oclc.org/research/projects/viaf/default.htm
WebFeat
http://www.webfeat.org/
Peer to Peer, File Sharing, Grid and Matrix Search Engines
24/7 Downloads
http://www.247downloads.com/
An Efficient Scheme for Query Processing on Peer-to-Peer Networks
http://aeolusres.homestead.com/files/index.html
angrycoffee.com
http://www.AngryCoffee.com/
BadBlue
http://badblue.com/
Between Rhizomes and Trees: P2P Information Systems by Bryn Loban
http://www.firstmonday.org/issues/issue9_10/loban/index.html
Bibster
http://bibster.semanticweb.org/index.htm
BigChampagne
http://www.bigchampagne.com/
Bit Torrent Official Site
http://www.BitTorrent.com/
Bitzi - The Free Universal Media Catalog
http://www.bitzi.com/
Blubster
http://www.blubster.com/
BotSpot? File-sharing Bots
http://www.botspot.com/BOTSPOT/Windows/Download_Bots/File-sharing_Bots/
Capn's PHP Gnutella Search
http://capnbry.net/gnutella/gs.php
Current P2P Search Implementations - P2P Networks
http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p8.html#CurrentP2PSearchImplementations
DebateRoom.com - XDCC Search / File Sharing Portal
http://www.debateroom.com/
Distributed Search Engines
http://www.openp2p.com/pub/t/74
Distributed Search in P2P Networks
http://csdl.computer.org/comp/mags/ic/2002/01/w1068abs.htm
Earth Station 5
http://www.es5.com/
eDonkey2000 - Overnet
http://www.edonkey2000.com/
Filetopia
http://www.filetopia.org/
FreeCache
http://www.archive.org/web/freecache.php
Free Haven Project
http://www.freehaven.net/index.html
FuzzBox: Tangent Research Artificial Intelligence and Robotics
http://tangentresearch.com/research/ai/
Gnougat: Fully decentralised file caching from the JXTA Project
http://gnougat.jxta.org/
GNUnet - GNU Project - Free Software Foundation (FSF)
http://www.gnu.org/software/GNUnet/gnunet.html
Gnutella.com
http://www.gnutella.com/
gPulp
http://www.gpulp.com/
GRACE IST Project
http://www.grace-ist.org/
GRACE - GRid seArch and Categorization Engine
http://pertinax.cms.shu.ac.uk/projects/cmslb2/
Grid Resources
http://www.GridResources.info/
Grokster
http://www.Grokster.com/
grub.org - Open Source, Distributed Internet Crawler!
http://grub.org/
iMesh
http://www.iMesh.com/
International Workshop on Peer-to-Peer Knowledge Management (P2PKM)
http://www.p2pkm.org/
Internet Movie Database (IMDb)
http://www.imdb.com/
JXTA Project
http://www.jxta.org/
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
http://citeseer.ist.psu.edu/529075.html
Kazaa Media Desktop
http://www.kazaa.com/us/index.htm
Legal P2P File Sharing Software
http://www.filesharesoftware.com/
Limewire
http://www.limewire.com/
LionShare P2P Project - Legitimate File-Sharing Among Individuals and Educational Institutions
http://lionshare.its.psu.edu
Locutus
http://locut.us/
MagnetLink
http://www.magnetlinks.org/
Mnet
http://mnet.sourceforge.net/
Morpheus :: Peer-to-Peer File Sharing Software
http://www.morpheus.com/
MusicBrainZ
http://www.MusicBrainZ.org/
MysterNetworks - The Evolution of Peer-to-Peer
http://www.mysternetworks.com/
NeuroGrid
http://www.neurogrid.net/
Open Directory - File Sharing
http://dmoz.org/Computers/Software/Internet/Clients/File_Sharing/
Open Directory - MP3 Search Engines
http://snipurl.com/5tbg
OpenNap: Open Source Napster Server
http://opennap.sourceforge.net/
OpenP2P.com
http://www.openp2p.com/
PeerMetrics.org
http://www.peermetrics.org/
Peer-to-Peer Topologies
http://www-db.stanford.edu/~schloss/hypercup/
Piolet
http://www.piolet.com/
Port Knocking
http://www.portknocking.org/
Project JXTA
http://www.jxta.org/
Shareaza
http://www.shareaza.com/
ShareSniffer
http://www.sharesniffer.com/
Skype
http://www.skype.com/
Snoopstar
http://www.snoopstar.com/
SourceForge: Project Info - ALPINE Network
http://sourceforge.net/projects/alpine/
Super Powered Peer To Peer
http://snipurl.com/9lzg
Surfy! - The Accurate Search Engine
http://www.surfy.com/index.html
The Anthill Project
http://www.cs.unibo.it/projects/anthill/
The Freenet Project
http://freenetproject.org/
TrustyFiles
http://www.trustyfiles.com/
UDDI Browser Search
http://soapclient.com/uddisearch.html
URLBlaze: URL Sharing Network
http://www.urlblaze.com/
WASTE
http://slackerbitch.free.fr/waste/
WebV2
http://www.webv2.com/
WideSource - Peer to Peer Search Engine
http://www.widesource.com/
Yahoo! Directory Peer-to-Peer File Sharing
http://dir.yahoo.com/Computers_and_Internet/Internet/Peer_to_Peer_File_Sharing/
Zebra
http://indexdata.dk/zebra/
Presentations
From Theory To Practice - Bielefeld Academic Search Engine
http://www.diglib.org/forums/Spring2004/summann0404.htm
Gumshoe Librarian
http://www.virtualchase.com/gumshoe/
Information Detective - Online Streaming Tutorial Videos On Searching the Internet including the Deep and Invisible Web
http://www.InformationDetective.com/
Quick Introduction to OWL Web Ontology Language
http://www.xfront.com/owl-quick-intro/sld001.htm
Searching the Deep Web - Dudley Knox Library Internet Guides - PowerPoint Slides
http://library.nps.navy.mil/home/Searching%20the%20Deep%20Web.ppt
Searching the Internet
http://www.SearchingTheInternet.info/
Searching the Internet: Using Brains and Bots
http://snipurl.com/5kza
Seeing the Invisible Web
http://lib.berkeley.edu/TeachingLib/Guides/Internet/InvWebPowerpoint/index.htm
Resources - Deep Web Research
AIRS Oxygen
http://www.airsdirectory.com/products/technologies/oxygen/
A Roadmap to Text Mining and Web Mining
http://www.cs.utexas.edu/users/pebronia/text-mining/
Beaucoup
http://www.beaucoup.com/
BrainBoost - Question Answering Search Engine
http://www.BrainBoost.com/
CiteLine Professional
http://www.citeline.com/pro_info.html
COLLATE - Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material
http://www.collate.de/
Comet Way
http://www.cometway.com/content.agent?page_name=Home
CompletePlanet
http://www.completeplanet.com/
Creative Commons RDF-Enhanced Search
http://search.creativecommons.org/index.jsp
Cyber Cemetery
http://govinfo.library.unt.edu/
CyberFiber
http://www.cyberfiber.com
Cybermtrics - First Generation Tools - Invisible Web
http://www.cindoc.csic.es/cybermetrics/search13.html
DART?br> http://www.dynago.com/dart/
Data Fountains: Open Source Internet Resource Discovery and Metadata/Full-Text Generation Service
http://infomine.ucr.edu/Data_Fountains/
Data Mining Resources
http://www.DataMiningResources.info/
Deep Web
http://www.deepwebtech.com/
Deep Web Search
http://www.mach9design.com/deep/deep1.htmlDeep Web Search Tools
http://snipurl.com/722j
Deep Web Technologies
http://www.deepwebtech.com/
DigiCULT Resources - Resource Discovery & Information Retrieval
http://www.digicult.info/pages/resources.php?t=21
digitalAGORA
http://aut.edu/agora/
Direct Search
http://www.freepint.com/gary/direct.htm
EEVL's Ejournal Search Engines
http://www.eevl.ac.uk/eese/eese-eevl.html
ENDECA
http://www.endeca.com/
Find Articles
http://www.findarticles.com/PI/index.jhtml
Fossick
http://www.freepint.com/gary/direct.htm
Freely Accessible Databases for the Public
http://www.istl.org/01-winter/internet.html
GlobalSpec - Engineering Search Engine
http://search.globalspec.com/Search/WebSearch
Google Labs
http://labs.google.com/
Google Scholar
http://scholar.google.com/
HighWire Press - Largest Repository of Free Full-Text Life Science Articles in the World
http://highwire.stanford.edu/
iBoogie?br> http://www.iboogie.tv/
IncyWincy - The Invisible Web Search Engine
http://www.incywincy.com/
INFOMINE
http://infomine.ucr.edu/
Inquirus
http://inquirus.nj.nec.com/
Instant Information Systems
http://www.docdel.com/
Institutional Archives Registry
http://archives.eprints.org/eprints.php?action=browse
Intelligence Center
http://www.intelligence-center.com/
Intelliseek
http://www.executivelibrary.com/
Intellisonar?br> http://www.quigo.com/intellisonar.htm
Internet Archive
http://www.archive.org/Invisible Library
http://www.invisiblelibrary.com/
Invisible Web - Hidden Pages and Websites
http://websearch.about.com/cs/invisibleweb/
Invisible-Web - Searchable Databases and Specialized Search Engines
http://invisible-web.net/
Kapow Web Collector
http://www.automated-info-solutions.com/products_main.html
KDnuggets: Data Mining, Web Mining, and Knowledge Discovery Guide
http://www.kdnuggets.com/
KeepMedia
http://www.keepmedia.com/
Khadoma
http://www.khadoma.com/
Knowledge Discovery
http://www.KnowledgeDiscovery.info/
Librarians' Index to the Internet
http://lii.org/
MagPortal
http://www.magportal.com/
Mappa.Mundi Magazine
http://mappa.mundi.net/
Medical Databases Online
http://www.medic8.com/MedicalDatabases.htm
Microsoft Web Search Research and Patents
http://www.webmasterworld.com/forum34/481.htm
Mining the Deep Web for Economic Data
http://www.citris-uc.org/projmatrix/project/display.action?project.id=33
Mining the Invisible Web
http://www.MiningTheInvisibleWeb.com/
Mooter Search
http://www.mooter.com/
MSN Sandbox
http://sandbox.msn.com/
NetNews Tracker
http://www.netnewstracker.com/
News Group Search
http://newsgroups.langenberg.com/
New Zealand Digital Library
http://www.nzdl.org/
OAI-PMH Implementation Guidelines - Conveying rights expressions about metadata in the OAI-PMH framework
http://www.openarchives.org/OAI/2.0/guidelines-rights.htm
OAIster
http://oaister.umdl.umich.edu/o/oaister/
OneLook Dictionary Search
http://www.onelook.com/
Open Archives Initiative
http://www.openarchives.org/
OpenIndex - Creating a Public Internet Index
http://www.openindex.org/index.php
Quigo Technologies
http://www.quigo.com/
Profusion
http://www.quigo.com/
Recommended Gateway Sites for the Deep Web
http://people.hws.edu/hunter/deepwebgate03.htm
RedLightGreen - Search for Books and Research Materials
http://www.redlightgreen.com/
reSearcher
http://www.theresearcher.ca/
Resource Discovery Network
http://www.rdn.ac.uk/
Science and Technology Sources on the Internet
http://www.library.ucsb.edu/istl/01-winter/internet.html
Scientific and Technical Information Network (STINET)
http://stinet.dtic.mil/
Science Commons
http://science.creativecommons.org/
Science.gov - FirstGov for Science - Government Science Portal
http://www.science.gov/
Scirus - Search Engine for Scientific Information
http://www.scirus.com/srsapp/
Search Adobe PDF Online
http://searchpdf.adobe.com/
SpeechBot?- Audio Search Using Speech Recognition
http://speechbot.research.compaq.com/
STN International - Databases in Science and Technology
http://www.stn-international.de/
Testbed for Information Extraction from Deep Web
http://daisen.cc.kyushu-u.ac.jp/TBDW/
The Internet Sleuth
http://www.isleuth.com/
The Deep Web
http://library.albany.edu/internet/deepweb.html
The Deep Web: Surfacing Hidden Value
http://wfps.k12.mt.us/wfhs/library/deep_web.htm
The Invisible Web
http://www.invisibleweb.com/
The Invisible Web
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The Invisible Web WebLog
http://ciquest.shef.ac.uk/invisible/
THOR: Deep Web Data Extraction
http://disl.cc.gatech.edu/THOR/
Those Dark Hiding Places: The Invisible Web Revealed
http://library.rider.edu/scholarly/rlackie/Invisible/Inv_Web.html
Turbo10
http://turbo10.com/
UNESCO Information Services - Databases
http://www.unesco.org/unesdi/
Universal Data Element Framework (UDEF)
http://www.udef.org/
Wall Street Executive Library
http://www.executivelibrary.com/
Web Data Extractors
http://zillman.blogspot.com/2004/09/web-data-extractors.html
Web Farming
http://webfarming.com/
Web Fountain by IBM
http://www-1.ibm.com/mediumbusiness/venture_development/emerging/wf.html
Web Intelligence Consortium
http://wi-consortium.org/
Web IR & IE
http://www.webir.org/
WebScales: Towards a Highly Scalable Metasearch Engine
http://www.cs.binghamton.edu/~meng/pub.d/PIreport03.html
Resources - Semantic Web Research
AIS SIGSEMIS - SIGSEMIS: Semantic Web and Information Systems
http://www.sigsemis.org/
Bibster
http://bibster.semanticweb.org/index.htm
DARPA Agent Markup Language
http://www.daml.org/
Dublin Core Services
http://www.describethis.com/
Foundation for Intelligent Physical Agents (FIPA)
http://www.fipa.org/
The FOAF Project - A Semantic Web Application
http://www.foaf-project.org/
HP Labs Semantic Web Research
http://www.hpl.hp.com/semweb/index.html
Infomesh's Semantic Web Introduction
http://infomesh.net/2001/swintro/
Jena ?A Semantic Web Framework for Java
http://jena.sourceforge.net/
Journal on Web Semantics
http://www.websemanticsjournal.org/
KnowledgeNets
http://www.inf.fu-berlin.de/inst/ag-nbi/research/wissensnetze/
Knowledge Sifter: Agent-Based Ontology-Driven Search over Heterogeneous Databases Using Semantic Web Services
http://eceb.gmu.edu/pubs/Kerschberg_KS_IFIP0604.pdf
Language Engineering for the Semantic Web: A Digital Library for Endangered Languages
http://informationr.net/ir/9-3/paper176.html
MetaData at W3C
http://www.w3.org/Metadata/
MindSwap
http://www.MindSwap.org/
MuseoSuomi
http://museosuomi.cs.helsinki.fi/
OASIS - Advancing eBusiness Standards
http://www.oasis-open.org/home/index.php
OIL - Ontology Inference Layer
http://www.ontoknowledge.org/oil/index.shtml
OntoWeb Portal
http://ontoweb.aifb.uni-karlsruhe.de/
O'Reilly's Semantic Web Primer
http://www.xml.com/pub/a/2000/11/01/semanticweb/
pOWL - Semantic Web Development Plattform
http://powl.sourceforge.net/
RDF - Resource Description Framework
http://www.w3.org/RDF/RDFWeb: Friend of a Friend (FOAF) Project
http://rdfweb.org/
Rules and Rule Markup Languages for the Semantic Web - RuleML-2003
http://tmitwww.tm.tue.nl/staff/gwagner/RuleML-2003.html
Science and the Semantic Web
http://www.mindswap.org/Science/
Semantic Blogging Demonstrator
http://jena.hpl.hp.com:3030/blojsom-hp/blog/
Semantic Blogging: Spreading the Semantic Web Meme
http://snipurl.com/66yj
Semantic Indexing
http://www.nitle.org/semantic_search.php
Semantic Markup Deconstructed Example
http://www.cs.umd.edu/users/hendler/sciam/walkthru.html
Semantic Planet Weblog
http://www.semanticplanet.com/
Semantic Routing BOF
http://www.neurogrid.net/SemanticRouting/SemanticRoutingBOF.htm
Semantic Translator for Enhanced Retrieval by the Bremen University
http://www.semantic-translation.de/
SemanticWeb.org - The Semantic Web Community Portal
http://www.semanticweb.org/
Semantic Web Activity Statement
http://www.w3.org/2001/sw/Activity.html
Semantic Web Application Platform - SWAP
http://www.w3.org/2000/10/swap/
Semantic Web Business SIG
http://business.semanticweb.org/
Semantic Web Challenges for Knowledge Management (136 pages .pdf)
http://www.sigsemis.org/July2004.pdf
Semantic Web Draws On the Power of Friends
http://www.freepint.com/issues/270504.htm#feature
Semantic Web for AURIS-MM
http://derpi.tuwien.ac.at/~andrei/AURIS-MM-plan.html
Semantic Web Laboratory
http://iit-iti.nrc-cnrc.gc.ca/projects-projets/sem-web-lab-web-sem_e.html
Semantic Web Publications
http://www.w3.org/2001/sw/#pub
Semantic Web Research Forum
http://snipurl.com/6or0
Semantic Web Research Group - OWL Site
http://owl.mindswap.org/
Semantic Web Roadmap
http://www.w3.org/DesignIssues/Semantic.html
Semantic Web W3C
http://www.w3.org/2001/sw/
Semaview?Published Resources Overview
http://www.semaview.com/resources/resources.html
SemText - Semantic Hypertext - Making Latent Semantics Blatant
http://semtext.org/mambo/index.php
SourceForge.net: Project Info - OWL API
http://sourceforge.net/projects/owlapi
Swoogle - Semantic Bot
http://swoogle.umbc.edu/
SWRL: A Semantic Web Rule Language Combining OWL and RuleML
http://www.daml.org/2003/11/swrl/
TAP - Building the Semantic Web
http://tap.stanford.edu/tap/
Technology Review: Sir Tim Berners-Lee - The Semantic Web
http://www.technologyreview.com/articles/04/10/frauenfelder1004.asp
The Cover Pages
http://xml.coverpages.org/
The Foundation for Semantic Interoperability on the WWW
http://talad.sis.pitt.edu/marut/soa/
The RDF Query Language (RQL)
http://139.91.183.30:9090/RDF/RQL/
The Semantic Grid
http://www.semanticgrid.org/
The Semantic Web: An Introduction
http://infomesh.net/2001/swintro/
The Semantic Web By Tim Berners-Lee, James Hendler and Ora Lassila
http://snipurl.com/297g
The Semantic Web In Breadth
http://logicerror.com/semanticWeb-long
The Semantic Web Is Your Friend
http://www.freepint.com/issues/270504.htm#feature
UDDI - Universal Description, Discovery, and Integration
http://www.uddi.org/
Web Service Modeling Ontology
http://www.wsmo.org/
WonderWeb
http://wonderweb.man.ac.uk/owl/
XML.com: Semantic Web
http://www.xml.com/pub/rg/Semantic_Web
XML.org
http://www.xml.org/
Yahoo Groups - SemanticWeb
http://groups.yahoo.com/group/semanticweb/
As I mentioned in the beginning of this article/guide, my current keynote presentation of Bots, Blogs and News Aggregators was created from these various resources along with my resources for Bot Research. Identifying competent and usable resources from the deep web, as well as identifying competent and usable tools to search the deep web is of extreme importance and the following resources for Bot Research will start you on your discovery of knowledge for deep web research:
Bot Research Resources and Sites
1st Spot
http://1st-spot.net/topic_agents.html
Agent-Based Software Development
http://www.ecs.soton.ac.uk/~mml/absd/index.html
Agent Construction Tools
http://www.agentbuilder.com/AgentTools/index.html
AgentLand
http://www.agentland.com/
AgentLink
http://www.AgentLink.org/
Agent Model Yields Leadership
http://snipurl.com/99mh
Agent Portal AI
http://www.agent.ai/
Agents Portal
http://aose.ift.ulaval.ca/
Alarm Growing Over Bot Software by Robert Lemos
http://news.com.com/2100-7349_3-5202236.html?tag=nefd.lede
ALICEBot
http://www.alicebot.org/
B.4.1 Search Robots - The Robots.txt File
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1
Bot A Blog
http://www.BotABlog.com/
Botizen
http://www.botizen.com/
BotLaw - The Place for Legal Research on Intelligent Agents/Bots
http://www.botlaw.com/
BotSpot?
http://www.botspot.com/
ChatterBots
http://www.ChatterBots.info/
ChatterBots at BotSpot?
http://botspot.com/
Data Mining Resources
http://www.DataMiningResources.info/
Deep Web Research
http://www.deepwebresearch.info/
Design of a Parallel and Distributed Web Search Engine by Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri
http://arxiv.org/abs/cs.IR/0407053
Eliza - The Original ChatterBot
http://www-ai.ijs.si/eliza/eliza.html
Fantomas Spider Spy?The BotBase
http://fantomaster.com/fasvsspy01.html
FyberSearch
http://www.fybersearch.com/
GeneSys Middleware
http://sourceforge.net/projects/genesys-mw/
Google Guide
http://www.googleguide.com/
Indexing Robot Crawler Checklist
http://www.searchtools.com/robots/robot-checklist.html
Information Retrieval (IR) Software
http://www.ir-ware.biz/
Internet Agents - CWS Apps
http://cws.internet.com/32agents.html
Internet Mathematics
http://www.InternetMathematics.org/
KiwiLogic
http://www.kiwilogic.com/
Knowledge Discovery
http://www.knowledgediscovery.info/
KTweb
http://ktweb.org/
LifeFX
http://www.lifefx.com/
Modelling and Mining of Network Information Systems
http://www.mathstat.dal.ca/~mominis/index.html
MultiAgent
http://www.MultiAgent.com/
MySpiders
http://myspiders.informatics.indiana.edu/
NativeMinds
http://www.nativeminds.com/
Oxyus Search Engine
http://sourceforge.net/projects/oxyus/
Robots, Spiders and Other User Agents: A Resource for WebMasters
http://joseluis.pellicer.org/ua/
RobotsTxt.org
http://www.robotstxt.org/
Search Engine Robots
http://www.jafsoft.com/searchengines/webbots.html
Search Engine Watch News
http://www.searchenginewatch.com/
Search Tools - Information Guides and News
http://www.searchtools.com/
Semantic Indexing
http://www.nitle.org/semantic_search.php
Semantic Web
http://www.semanticweb.org/
ShoppingBots
http://www.ShoppingBots.info/
SocSciBot3
http://socscibot.wlv.ac.uk/
Spider Hunter
http://www.spiderhunter.com/
Spidering Hacks
http://www.oreilly.com/catalog/spiderhks/
Swoogle - Semantic Bot
http://swoogle.umbc.edu/
The CGI Resource Index: Programs and Scripts: Perl: Searching
http://cgi.resourceindex.com/Programs_and_Scripts/Perl/Searching/
The Intelligent Software Agents Lab
http://www-2.cs.cmu.edu/~softagents/
The Mobile Agent List
http://www.informatik.uni-stuttgart.de/ipvr/vs/projekte/mole/mal/preview/preview.html
The Search Engine Project (TSEP)
http://freshmeat.net/projects/tsep/
The Simon Lavern Page
http://www.simonlaven.com/
The Web Robots Pages
http://www.robotstxt.org/wc/robots.html
TrademarkBots?
http://www.trademarkbots.com/
Tucows SearchBots for Windows 95/98
http://tucows.icm.edu.pl/searchbot95.html
UMBC AgentWeb
http://agents.umbc.edu/
Webbot - the W3C libwww Robot
http://www.w3.org/Robot/
Web Data Extractors - White Paper Link Compilation
http://zillman.blogspot.com/2004_08_01_zillman_archive.html#109250380875057586
Web Intelligence Consortium
http://wi-consortium.org/
Web IR & IE
http://www.webir.org/
Worm Radar
http://wormradar.com/index.html
Current Subject Tracer?Information Blogs
Subject Tracer?Information Blogs created and developed by the Virtual Private Library?combine the best of the latest tools on the Internet. Using bots, blogs and news aggregators the Subject Tracer?Information blogs generate RSS feeds with the latest resources to create a current information resource flow through niched subject tracers. I am proud to be the creator of the Internet抯 first Subject Tracer?Information Blogs:
Virtual Private Library
http://www.VirtualPrivateLibrary.com/
Agriculture Resources
http://www.AgricultureResources.info/
Artificial Intelligence Resources
http://www.AIResources.info/
Astronomy Resources
http://www.AstronomyResources.info/
Auction Resources
http://www.AuctionResources.info/
Biological Informatics
http://www.biologicalinformatics.info/
Bot Research
http://www.botresearch.info/
Business Intelligence Resources
http://www.biresources.info/
ChatterBots
http://www.ChatterBots.info/
Data Mining Resources
http://www.DataMiningResources.info/
Deep Web Research
http://www.deepwebresearch.info/
Directory Resources
http://www.DirectoryResources.info/
eCommerce Resources
http://www.eCommerceResources.info/
Elder Resources
http://www.ElderResources.info/
Employment Resources
http://www.EmploymentResources.info/
Entrepreneurial Resources
http://www.EntrepreneurialResources.info/
Financial Sources
http://www.FinancialSources.info/
Finding People
http://www.FindingPeople.info/
Games Resources
http://www.GamesResources.info/
Genealogy Resources
http://www.GenealogyResources.info/
Grant Resources
http://www.GrantResources.info/
Grid Resources
http://www.GridResources.info/
Healthcare Resources
http://www.healthcareresources.info/
Information Futures Markets
http://www.InformationFuturesMarkets.com/
Information Quality Resources
http://www.InformationQualityResources.info/
Internet Alerts
http://www.InternetAlerts.info/
Internet Demographics
http://www.internetdemographics.info/
Internet Experts
http://www.internetexperts.info/
Internet Hoaxes
http://www.internethoaxes.info/
Knowledge Discovery
http://www.knowledgediscovery.info/
Outsourcing/Offshoring Information and Resources
http://www.OutsourcingOffshore.us/
Privacy Resources
http://www.PrivacyResources.info/
Reference Resources
http://www.ReferenceResources.info/
Research Resources
http://www.researchresources.info/
RestStress?
http://www.RestStress.com/
Script Resources
http://www.ScriptResources.info/
ShoppingBots
http://www.ShoppingBots.info/
Statistics Resources
http://www.statisticsresources.info/
Student Research
http://www.studentresearch.info/
Theology Resources
http://www.TheologyResources.info/
Tutorial Resources
http://www.TutorialResources.info/
World Wide Web Reference
http://www.WWWReference/Deep Web Research 2005 is a very exciting place to search and to do research. New tools are constantly being created, and more databases and unique files are constantly being added. This all adds up to a phenomenal growth area of the world wide web that deserves your constant attention through search and current awareness that keeps you alert for the latest happenings and sources available on the Internet! This article is constantly updated as its source is the Deep Web Research Subject Tracer? Information Blog.