http://www.wired.com/wired/archive/8.02/autonomy_pr.html
The Quest for Meaning
The world's smartest search engine took 250 years to build. Autonomy is here.
By Steve Silberman
The past is never far away at the University of Cambridge. It can be as close as your morning toast, smeared with jam made for the Fellows at Christ's College each year from the fruit of a mulberry tree said to have been planted by John Milton.
Of course that's just a legend. The tree dates from decades before the author of Paradise Lost could have planted it, says Cambridge engineering professor Peter Rayner, holding forth in the garden behind a handsome edifice called the Masters' Lodge. I ask Rayner about the wing where we'll be having lunch, and I'm surprised when he advises me that it's new. "Oh, yes," he explains. "Darwin built it - large family, you know."
I've come to this medieval college town to learn more about the birthplace of another trace of the past that's bearing fruit in the present. A startup called Autonomy - one of the rising stars of UK high tech - is turning the obscure mathematical musings of 18th-century Presbyterian minister Thomas Bayes into a powerful new breed of software.
Autonomy's founder, Michael Lynch, likes to say that he's aiming for Autonomy to become "the Oracle of unstructured data." His company is thriving at the intersection of two Net-driven trends: the push toward personalizing services and the explosion of information in text form. Autonomy is one of a number of companies specializing in knowledge management, an industry growing as fast as the Internet itself. Larry Hawes, an analyst for the consulting firm the Delphi Group, says that in 1996, the combined software-license revenue for programs specializing in information-management tasks like text search and retrieval was $48 million. By the end of this year, Hawes estimates, the figure will be $604 million.
The Web - which essentially pasted a text interface on the front end of the burgeoning global network - has proven fortuitous for language-centered companies like Autonomy. Many firms are beginning to realize that the most crucial intellectual property in an office or business isn't neatly compartmentalized data in spreadsheets or databases, it's writing - all that messy, untagged, uncategorized verbiage that sprouts up like kudzu wherever people bounce ideas off one another. By offering knowledge-management tools for organizing the daily avalanche of email, Word documents, news, memos, Web pages, PowerPoint presentations, Lotus Notes, and online product descriptions, Autonomy has attracted a client roster that includes many of the world's largest media organizations and manufacturers, including the Associated Press, News Corp., Procter & Gamble, Lucent Technologies, Merrill Lynch, and the US Department of Defense.
The mathematical processes behind Autonomy's methods are complex, but the promise itself is simple: to enable computers to extract meaning from text and to use that meaning to better categorize and deliver useful information. While computers have long been able to identify strings of keywords, anyone who's used a search engine can testify to its limits. What makes Autonomy's products different is an underlying pattern-recognition algorithm, derived from Bayes' formulations, which empowers computers to act as if they possessed abilities we think of as subtly and profoundly human: comprehending context, generalizing from words to an idea, even understanding the unspoken by grasping the root concepts beneath the play of syntax.
If you're an engineer at BAE Systems (formerly British Aerospace) and you begin typing a memo about airfoil design, a program built by Autonomy will open a second window on the desktop with links to relevant research in the company archives, as well as items from the morning news you should see. It will also display the names of any of your colleagues who have done work on the subject. Another piece of Autonomy software performs a related function for readers of the BBC World Service's online version, alerting them - with hyperlinks created on the fly - to news items germane to the stories they're reading. And the next time you fire off an angry blast to your ISP, you can thank Autonomy if the wording of the automated reply seems unusually pertinent.
Though most people in the US have probably never heard of Autonomy or products like ActiveKnowledge and Portal-in-a-Box, that's sure to change. The company's stock price has jumped 1,000 percent on the European Easdaq index since an IPO in July 1998. Now that Autonomy has achieved a $1 billion market cap, Michael Lynch thinks the time is right for a dual listing on Nasdaq. Many industry analysts feel that the company has brought its ideas to the table at just the right time. "When Autonomy first came to the United States," says Hadley Reynolds, director of research for the Delphi Group, "the whole text-search market was considered a washed-up, has-been area. Companies that had been in there for the long haul, like Verity, were drying up. Lynch recognized that the Internet was going to create a problem that his technology could make a big dent in solving."
"What Autonomy has is so important and unique," adds Eric Brown, research director for Forrester Research, "it doesn't just belong 'in a box' - it belongs everywhere."
Not bad for a 3-year-old startup leveraging the 250-year-old brainstorm of a current resident of Bunhill Fields, one of the oldest graveyards in London.
It was while Lynch was a grad student at Cambridge, studying under Peter Rayner, head of the university's Signal Processing and Communication Research Group, that he began thinking about the implications of the curious notions of the Reverend Thomas Bayes. A statistician as well as a cleric, Bayes helped shape the foundation of modern probability theory. Rayner is one of a growing number of researchers turning to computers to take Bayes' method of statistical inference - using mathematics to predict the outcome of events - much further than the shy reverend might have dreamed.
Its methods are complex, but Autonomy's promise is simple: Enable computers to understand context.
Born in 1702, Bayes, a second-generation minister, served as his father's assistant in London and was made wealthy by his inheritance. Bayes spent the final 30 years of his life in a little spa town called Tunbridge Wells, filling notebooks with speculations on diverse subjects but formally publishing only two pieces in his lifetime. The first was an unsigned tract with the marvelous title Divine Benevolence: Or, An Attempt to Prove That the Principal End of the Divine Providence and Government Is the Happiness of His Creatures; the followup, a defense of Newton's calculus against an attack by a bishop who called Newton "an Infidel Mathematician." The latter work earned Bayes a fellowship in the Royal Society, the highest recognition he would earn in his lifetime.
When he died in 1761, Bayes left his papers and £100 to Richard Price, another clergyman whose speculations ventured beyond pondering how many divine agents could dance on the head of a pin. Price is celebrated for creating the first actuarial model for life insurance. Unlike his quiet and cautious friend Bayes, however, Price - a fiery and prolific public figure who wrote defenses of the American and French Revolutions - was an 18th-century genius of hype.
In one of Bayes' notebooks, crammed with musings on astronomy and electricity, Price discovered an unpublished work called An Essay Towards Solving a Problem in the Doctrine of Chances. In broad terms, Bayes had sketched a model for predicting events under conditions of uncertainty. The equation - now known as Bayes' theorem, or Bayes' rule - takes into account knowledge of past events and new observations to infer the probable occurrence of an event. Typically, the minister's son had been modest to a fault about the implications of his work. "So far as mathematics do not tend to make men more sober and rational thinkers, wiser and better men," Bayes wrote, "they are only to be considered as an amusement, which ought not to take us off from serious business."
Price disagreed. Upon publishing the essay after his friend's death, he replaced the self-deprecating introduction with a declaration that Bayes had not only solved "a problem that has never before been solved" but had even successfully "confirm[ed] the argument for the existence of the Deity." Hyperbole aside, Bayes had created a statistical model for harvesting wisdom from experience. Bayes' theorem chains probabilities, maximizing the amount of learned information brought to bear on a problem, and is especially well suited to predicting the outcome of situations where a mass of conflicting or overlapping influences converge on the result.
In the language of statistics, Bayes' theorem relates phenomena observed in the present (the evidence) to phenomena known to have occurred in the past (the prior) and ideas about what is going to happen (the model). Mathematically, the formula is represented as
P(y|t)P(t) | ||
P(t|y) | = |
|
P(y) |
He has some prior knowledge: Few customers order the smothered duck feet, while the steak frites is a perennial favorite. The cook can use information like this to calculate the prior probability that a given dish was ordered: P(t), or the probability of t. He also knows that one waiter tends to mispronounce any dish with a French name, that a bus roars past the restaurant every 10 minutes, and that the sizzling sounds from the skillets can obscure the difference between, say, ham and lamb. Taking these factors into account, he creates a complicated model of ways the orders might get confused: P(y|t), the probability of y, given t. Bayes' theorem allows the chef to chain the probabilities of all possible influences on what he hears to calculate the probability, P(t|y), that the order he heard was for the dish the customer chose.
Doctors perform Bayesian exercises in pattern recognition constantly, relating probabilities and beliefs to observations in the dance we think of as seasoned judgment. If a patient has a sore throat, red spots on her skin, and a 102-degree fever, what is the probability that she has chicken pox (and not measles)? In a medical context, the pattern to recognize, P(t|y), is that of the underlying disease.
For humans, the detection of a meaningful signal in clouds of data smog happens subliminally all the time. Neither the short-order cook nor the doctor needs to consult a mathematical formula to confirm his or her reasoning.
Bayes' quest to organize and extend that process was necessarily limited by his tools - a pen, a notebook, and the time required to do his calculations. What has made his notions widely applicable in everyday situations is the invention of computers, which can chain millions of probabilities in a heartbeat. With Bayesian "reasoning engines" embedded in software to drive the recognition process, computers can begin to approach the everyday capabilities of the human mind for sifting through chaos and finding meaning. "Bayes gave us a key to a secret garden," says Lynch. "A lot of people have opened up the gate, looked at the first row of roses, said, 'That's nice,' and shut the gate. They don't realize there's a whole new country stretching out behind those roses. With the new, superpowerful computers, we can explore that country."
Bayes' theorem has proven an effective aid to many sorts of pattern-recognition tasks, such as fingerprint identification, facial matching, and handwriting analysis. Take the problem of teaching a computer to read scribbled-on bank checks. As anyone who's tried to scan a magazine article into a Word file knows, getting a computer to read clean, standard fonts accurately is dodgy enough. Add in the variables introduced by eccentric writing styles, ballpoint versus felt-tip pens, differing rates of ink absorption, plus crumpling and folding of the pages, and traditional optical character recognition can be rendered useless.
The Bayesian model allows a computer to incorporate prior knowledge of the billions of ways handwritten letters and numbers can stray from standard forms, training itself to read the writing on checks by "seeing" lots of examples, thus building a base of prior probabilities to factor into decisions. If, for example, in most of the past 1 million instances the computer discerned a wavy shape that turned out to be an s, then that loopy figure on the check is probably an s, too - unless it's followed by what's probably a 6, in which case it's more likely to be a dollar sign or an 8.
In the emerging world of computing applications that employ such networked Bayes reasoning engines, almost any observable phenomenon can be inferred as a symptom of a hidden cause - whether characters in a document, or the behavior of an office worker repeatedly clicking a button on his keyboard when the computer refuses to respond. If an email message has broken out in exclamation points, perhaps the disease is spam. In the case of the office worker, the ailment might be a toxic interface. If traditional computing seems designed for a binary universe only a microchip could love, Bayes nets are made for the world of uncertainty, conflicting truths, static, and frustratingly incomplete information sets we live in.
One of Peter Rayner's hobbies recalls the problem of the short-order cook: using Bayesian methods to extract clear audio signals from the thickets of random noise on old recordings, resurrecting the glory of '20s jazz musicians from scratchy gramophone discs. As we sipped Côtes du Rhône in the oak-paneled sanctuary of the Masters' Lodge, Rayner and Lynch discussed approaches to boosting the sound quality of MP3 files, searching databases of GIFs to find a particular image, and predicting the transport rate of pharmaceuticals through blood.
"What Autonomy has is so important," says Forrester's Eric Brown, "it doesn't just belong 'in a box' - it belongs everywhere."
At Cambridge, researchers have applied the reverend's notions to disciplines as various as improving hearing aids and determining whether a given dose of a drug will sufficiently anesthetize a surgery patient. "This man of enormous importance for the 20th century - with a philosophy so far-reaching it makes Marx pale into insignificance - was essentially forgotten," Rayner tells me, adding that in a university environment designed to churn out MBAs, the wider implications of Bayes' work would have been overlooked long ago because it seemed to have few practical applications.
Lynch was recognized as an extraordinary, if unorthodox, student while still an undergraduate in the '80s. Rayner - a ruddy, alabaster-bearded, outspoken embodiment of the Cambridge lineage that produces infidel mathematicians - recalls that his now notably successful former student had a tough time getting out of bed. "Mike didn't do any work at all until a quarter of an hour before the exam, when he was miles away from any textbooks. But he used to invent these solutions which were very creative."
The broad sweep of Rayner's academic and cultural interests was a powerful influence on the young engineer, who says his mentor's insistence on problem solving over "hand-waving, headline-grabbing rubbish" encouraged him to think of innovative and practical applications for Bayes' work. It was over morning coffee with Rayner and other graduate students, says Lynch, that he first considered applying the 250-year-old theorem to the task of training computers to recognize patterns of meaning.
Lynch's first company - created in 1991 during his student years and fortified with an impulsive £2,000 loan offered in a pub - is called Neurodynamics. Working for, among others, companies in the British intelligence and defense industries, Neurodynamics uses neural-network technology and Bayesian methods to create applications that specialize in character, handwriting, and facial recognition, as well as surveillance. Lynch enjoyed cooking up solutions for high-level skunk works because, he says, "they have the most interesting problems."
One of the interesting problems Lynch addressed for British intelligence was how to enable computers to make sense of large volumes of words in many languages for a top-secret project. The young entrepreneur, who didn't have the necessary security clearances, was never told what sort of texts the technology would analyze - intercepted email, faxes, leaked documents? - but was instructed to perform his operations on newspaper stories from around the world. Out of that work came the chunk of code called the Dynamic Reasoning Engine, the Bayesian heart of every Autonomy product.
To determine whether two passages are concerned with the same fundamental ideas, Lynch realized, you don't need to know the meaning of each word. In fact, it's not even necessary to be able to speak the language. As long as you can teach the computer where one word ends and another begins, it can look at the ideas contained in a text as the outcomes of probabilities derived from the clustering of certain symbols. The symbol penguin, for instance, might refer to the Antarctic bird, a hockey team, or Batman's nemesis. If it clusters near certain other symbols in a passage, however - say, ice, South Pole, flightless, and black and white - penguin most likely refers to the bird. You can carry the process further: If those other words are present, there's an excellent chance the text is about penguins, even if the symbol for penguin itself is absent.
Lynch had zeroed in on the Achilles' heel of search engines. A search on penguin is just as likely to generate a list of pages about Penguin-Putnam books, the Purple Penguin Design Group, and why Linus Torvalds chose the bird as Linux's symbol as it is to uncover useful data on flightless aquatic birds of the tuxedoed, krill-munching variety.
Susan Dumais, senior researcher of adaptive systems and interaction for Microsoft, notes that a Web surfer who types printer into a search engine or help system is probably not seeking information on writing code for printer-driver software - even if the word appears 100 times in such a document, yielding a strong keyword match. The average person is probably looking for information on setting up a printer, trying to figure out why a printer isn't working, or looking for a good price on equipment. The prior knowledge of what most users are searching for can be factored into Bayesian information-retrieval strategies. The ability of Bayes nets to snare relationships among words that elude keyword-matching schemes "points to the rich way that human discourse is generated," Dumais observes, "out of words not said and all the finely shaded ways of saying things."
Many of the older firms specializing in text search and retrieval - such as Verity and Excalibur Technologies - have relied on teams of linguistic experts to create a custom taxonomy of terms for each client, charting the syntactic relationships among key concepts. This can take months. One advantage of the Bayesian approach is that the patterns naturally occurring in the texts "teach" the computer about the relationships between words - whether they're in English or Uzbek.
Lynch has no interest in fathering a multilingual search engine. He sees the search-engine business as locked up by established brands and is confident that the potential applications of Bayes' work to the torrent of digital text flooding our desktops is more far-reaching than aiming to build a better mousetrap.com. Instead, Lynch set about to kill off the search paradigm altogether by using Bayes' method to provide something better.
One of the products Autonomy created after it was spun off from Neurodynamics in 1996 was ActiveKnowledge, the BAE Systems tool that tips you off to relevant resources in the company archive when you begin typing a document. Another software program under the Autonomy direct-sales banner, Portal-in-a-Box, collects links to archived documents and to data on the Internet it thinks you'll be interested in - based on the texts you've clicked to and read in the past - and weaves it all into a custom-tailored page on your company's Web site. Portal-in-a-Box is like having your own personal clipping service that can read your mind increasingly well over time. Just as Bayes nets can get better and better at recognizing the differences between a saxophone solo and a burst of static, a smeared B and a water-stained 8, they can come to know what news topics you care about, what stocks you watch, and which email subject headers you deem sufficiently important to have zapped to your PDA. The more you interact with the items the software puts in front of you, the better it can predict how you'll react to an item, and it can even make an educated guess about what you'll want to do next.
Microsoft greets visitors with a desktop PC that has been christened the Bayesian Receptionist.
With the Dynamic Reasoning Engine at its core, Autonomy's cluster of applications combines Bayesian pattern recognition with neural networks, which use parallel pathways to mimic the action of the human nervous system.The company's products serve two primary markets: corporate knowledge management and new media. The same software that helps create breathe.net's personal portals for European business travelers enables Telia, a Swedish telecommunications firm, to market family-friendly Internet access with sex sites filtered out. The police force of Essex County, north of London, uses an Autonomy-powered database called Leo to turn up correspondences among criminal records, police reports, and emergency call transcripts. "The whole idea of police intelligence is putting together seemingly unimportant bits of information," explains Julian Robinson, the Essex Police Authority's system-development officer. "Autonomy products are the cornerstone around which we've built exactly what our officers want - a system that provides information with as little effort as possible."
Autonomy's low-maintenance software also won the company the BAE Systems contract over Excalibur Technology's RetrievalWare after a three-month trial of both products. "The amount of administration required for Excalibur was horrendous," says Kevin Phillips, head of information systems at BAE Systems' Virtual University. "Autonomy just burbles along."
The Net is burying us in things others think we might want to see or buy. As an increasing amount of stuff competes for our attention and prime placement on our ever more pocketable interfaces, software agents that just burble along - gradually training themselves to know what we really care about without resorting to online questionnaires or battalions of text taggers - may be our only hope for digging ourselves out. When there are hundreds of digital TV channels, and only six buttons on our remotes, how will we navigate the stacks and directories to find what we want to watch? For European viewers, an Autonomy product launching later this year will bring programs a viewer is likely to be interested in to the top of the heap.
In a report released by Jupiter Communications last November, 46 percent of high-traffic ecommerce sites studied - up from 38 percent in 1998 - were unable to respond to an email request for support within five days. The solution, Jupiter suggests, is an automated email-response system. Tasks like intelligent email routing and customer management are natural niches for software that acts like it knows how to read. Bayesian reasoning engines implanted in a dozen programs might look at a single piece of email sent to Anywhere.com as it's filtered through security, routed to the appropriate recipients, posted to in-house bulletin boards, and then responded to without giving away trade secrets.
In May last year, Autonomy made its reasoning engine available to other companies that wanted to build their own software around it on an OEM basis. Autonomy licenses its code to these partners at various rates and earns royalties of 10 to 50 percent on products that use Autonomy software. Such agreements with equipment manufacturers are Autonomy's fastest-growing revenue segment, and a company source speculates that revenues from OEM deals will exceed the income from direct sales by 2001.
The next generation of Autonomy software - embedded in intranet sites - creates a single interface for Oracle databases and legacy mainframe resources, archived email and Lotus Notes, Excel spreadsheets, and Word files. By partnering with young companies, such as Corechange, Intraspect Software, Verge Software, Provenance Systems, and Hyperwave Information Management, that make so-called middle-office software to build intranet portals, Autonomy is aiming to become the language lobe in the evolving big brain of the modern corporation. In making tools that learn more about you the more you use them, Autonomy is also riding the wave of personal profiling and customization that could someday take the place of the traditional service industry.
Autonomy is just one of the companies putting Bayes' rule to use in ways its creator couldn't have imagined. The reverend is hard at work in Microsoft Office's wizards, which anticipate your needs by observing behaviors such as cursor movements and hesitations. The theorem also plays a role in the troubleshooting areas on Microsoft.com, where Bayesian methods of diagnosing user problems save the company hundreds of millions of dollars a year in service calls, says Eric Horvitz, one of 25 Bayesian specialists who work with Microsoft's product teams.
One of the most promising uses of these strategies, predicts Horvitz, will be in the development of what Microsoft calls continual computation. Anticipating a user's next move could cut the time spent launching frequently used apps. Likewise, your browser could pre-fetch potentially interesting pages and cache them for you in the background.
In Redmond, there's a prototype running on a desktop computer christened the Bayesian Receptionist. Using a voice interface rather than text, the Bayesian Receptionist greets visitors to the Conversational Architectures Group and answers questions as needed. Horvitz points out that the particular strength of Bayesian approaches - making accurate guesses under conditions of uncertainty - is especially relevant for interfaces that converse, because they have to depend on constant renegotiation of the subject at hand, following the flow of spontaneous exchange while navigating through topic hierarchies. "Uncertainty about communication is at the heart of conversation," Horvitz observes.
He believes the smart objects of the future will inevitably carry a piece of Bayes' legacy: "Data from Star Trek? He'll be Bayesian."
When he's not helping build the Bayesian future, Michael Lynch lives in a village outside Cambridge that's so small - population 120 - he asks me not to name it in print. Many houses in the area are covered with corn-thatch roofs in the ancient style, and the older residents pray in churches with stone towers built to keep watch for Viking raiders. The locals all ask after Lynch's beloved otter hound, Gromit, named for Nick Park's animated clay canine. Standing in a garden, watching red chickens peck under the hedges, you wouldn't know which century you had arrived in. Bayes' Tunbridge Wells must have been something like this, seemingly thousands of miles away from the "dark satanic mills" that appalled Blake at the dawn of the Industrial Revolution.
When Lynch bought his house three years ago, he dug a koi pond where a rubbish dump had been. In winter, he boils kettles of water to make holes in the ice so the fish can have more oxygen while they hibernate. He stocked the pond with fish he bought at a pet store, but in just a few weeks, wild carp appeared, swimming among the ones he'd placed there. Lynch thinks they may have come in as eggs clinging to the legs of migrating herons.
"It's amazing how absolutely pervasive life is, given half a chance," he observes. "You dig a pool of water, you leave a patch of earth, something will grow there."
Lynch sees the marriage of Bayes' ideas and modern processing power as characteristic of a new, more mature phase of technology - an era in which humanity will no longer believe it's standing at the center of the universe.
"Rules-based, Boolean computing assumes that we know best how to solve a problem," he says. "My background comes completely the other way. The problem tells you how to solve the problem. That's what the next generation of computing is going to be about: listening to the world."