What Is Web 2.0
Design Patterns and Business Models for the Next Generation of Software
by Tim O'Reilly
The bursting of the dot-com bubble in the f
The concept of "Web 2.0" began with a conference brainstorming session between O'Reilly and MediaLive International. Dale Dougherty, web pioneer and O'Reilly VP, noted that far from having "crashed", the web was more important than ever, with exciting new applications and sites popping up with surprising regularity. What's more, the companies that had survived the collapse seemed to have some things in common. Could it be that the dot-com collapse marked some kind of turning point for the web, such that a c
In the year and a half since, the term "Web 2.0" has clearly taken hold, with more than 9.5 million citations in Google. But there's still a huge amount of disagreement about just what Web 2.0 means, with some people decrying it as a meaningless marketing buzzword, and others accepting it as the new conventional wisdom.
This article is an attempt to clarify just what we mean by Web 2.0.
In our initial brainstorming, we formulated our sense of Web 2.0 by example:
Web 1.0
|
|
Web 2.0
|
DoubleClick |
--> |
Google AdSense |
Ofoto |
--> |
Flickr |
Akamai |
--> |
BitTorrent |
mp3.com |
--> |
Napster |
Britannica Online |
--> |
Wikipedia |
personal websites |
--> |
blogging |
evite |
--> |
upcoming.org and EVDB |
domain name speculation |
--> |
search engine optimization |
page views |
--> |
cost per click |
screen scraping |
--> |
web services |
publishing |
--> |
participation |
content management systems |
--> |
wikis |
directories (taxonomy) |
--> |
tagging ("folksonomy") |
stickiness |
--> |
syndication |
The list went on and on. But what was it that made us identify one application or approach as "Web 1.0" and another as "Web 2.0"? (The question is particularly urgent because the Web 2.0 meme has become so widespread that companies are now pasting it on as a marketing buzzword, with no real understanding of just what it means. The question is particularly difficult because many of those buzzword-addicted startups are definitely not Web 2.0, while some of the applications we identified as Web 2.0, like Napster and BitTorrent, are not even properly web applications!) We began trying to tease out the principles that are demonstrated in one way or another by the success stories of web 1.0 and by the most interesting of the new applications.
Like many important concepts, Web 2.0 doesn't have a hard boundary, but rather, a gravitational core. You can visualize Web 2.0 as a set of principles and practices that tie together a veritable solar system of sites that demonstrate some or
Figure 1 shows a "meme map" of Web 2.0 that was developed at a brainstorming session during FOO Camp, a conference at O'Reilly Media. It's very much a work in progress, but shows the many ideas that radiate out from the Web 2.0 core.
For example, at the first Web 2.0 conference, in October 2004, John Battelle and I listed a preliminary set of principles in our opening talk. The first of those principles was "The web as platform." Yet that was also a r
Nonetheless, these pioneers provided useful contrasts because later entrants have taken their solution to the same problem even further, understanding something deeper about the nature of the new platform. Both DoubleClick and Akamai were Web 2.0 pioneers, yet we can also see how it's possible to realize more of the possibilities by embracing additional Web 2.0 design patterns.
Let's drill down for a moment into each of these three cases, teasing out some of the essential elements of difference.
If Netscape was the standard bearer for Web 1.0, Google is most certainly the standard bearer for Web 2.0, if only because their respective IPOs were defining events for each era. So let's start with a comparison of these two companies and their positioning.
Netscape framed "the web as platform" in terms of the old software paradigm: their flagship product was the web browser, a desktop application, and their strategy was to use their dominance in the browser market to establish a market for high-priced server products. Control over standards for displaying content and applications in the browser would, in theory, give Netscape the kind of market power enjoyed by Microsoft in the PC market. Much like the "horseless carriage" framed the automobile as an extension of the familiar, Netscape promoted a "webtop" to replace the desktop, and planned to populate that webtop with information updates and applets pushed to the webtop by information providers who would purchase Netscape servers.
In the end, both web browsers and web servers turned out to be commodities, and value moved "up the stack" to services delivered over the web platform.
Google, by contrast, began its life as a native web application, never sold or packaged, but delivered as a service, with customers paying, directly or indirectly, for the use of that service. None of the trappings of the old software industry are present. No scheduled software releases, just continuous improvement. No licensing or sale, just usage. No porting to different platforms so that customers can run the software on their own equipment, just a massively scalable collection of commodity PCs running open source operating systems plus homegrown applications and utilities that no one outside the company ever gets to see.
At bottom, Google requires a competency that Netscape never needed: database management. Google isn't just a collection of software tools, it's a specialized database. Without the data, the tools are useless; without the software, the data is unmanageable. Software licensing and control over APIs--the lever of power in the previous era--is irrelevant because the software never need be distributed but only performed, and also because without the ability to collect and manage the data, the software is of little use. In fact, the value of the software is proportional to the scale and dynamism of the data it helps to manage.
Google's service is not a server--though it is delivered by a massive collection of internet servers--nor a browser--though it is experienced by the user within the browser. Nor does its flagship search service even host the content that it enables users to find. Much like a phone c
While both Netscape and Google could be described as software companies, it's clear that Netscape belonged to the same software world as Lotus, Microsoft, Oracle, SAP, and other companies that got their start in the 1980's software revolution, while Google's fellows are other internet applications like eBay, Amazon, Napster, and yes, DoubleClick and Akamai.
Like Google, DoubleClick is a true child of the internet era. It harnesses software as a service, has a core competency in data management, and, as noted above, was a pioneer in web services long before web services even had a name. However, DoubleClick was ultimately limited by its business model. It bought into the '90s notion that the web was about publishing, not participation; that advertisers, not consumers, ought to c
As a result, DoubleClick proudly cites on its website "over 2000 successful implementations" of its software. Yahoo! Search Marketing (formerly Overture) and Google AdSense, by contrast, already serve hundreds of thousands of advertisers apiece.
Overture and Google's success came from an understanding of what Chris Anderson refers to as "the long tail," the collective power of the sm
The Web 2.0 lesson: leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.
Not surprisingly, other web 2.0 success stories demonstrate this same behavior. eBay enables occasional transactions of only a few dollars between single individuals, acting as an automated intermediary. Napster (though shut down for legal reasons) built its network not by building a centralized song database, but by architecting a system in such a way that every downloader also became a server, and thus grew the network.
Like DoubleClick, Akamai is optimized to do business with the head, not the tail, with the center, not the edges. While it serves the benefit of the individuals at the edge of the web by smoothing their access to the high-demand sites at the center, it collects its revenue from those central sites.
BitTorrent, like other pioneers in the P2P movement, takes a radical approach to internet decentralization. Every client is also a server; files are broken up into fragments that can be served from multiple locations, transparently harnessing the network of downloaders to provide both bandwidth and data to other users. The more popular the file, in fact, the faster it can be served, as there are more users providing bandwidth and fragments of the complete file.
BitTorrent thus demonstrates a key Web 2.0 principle: the service automatic
The central principle behind the success of the giants born in the Web 1.0 era who have survived to lead the Web 2.0 era appears to be this, that they have embraced the power of the web to harness collective intelligence:
Now, innovative companies that pick up on this insight and perhaps extend it even further, are making their mark on the web:
The lesson: Network effects from user contributions are the key to market dominance in the Web 2.0 era.
One of the most highly touted features of the Web 2.0 era is the rise of blogging. Personal home pages have been around since the early days of the web, and the personal diary and daily opinion column around much longer than that, so just what is the fuss
At its most basic, a blog is just a personal home page in diary format. But as Rich Skrenta notes, the chronological organization of a blog "seems like a trivial difference, but it drives an entirely different delivery, advertising and value chain."
One of the things that has made a difference is a technology c
Now, of course, "dynamic websites" (i.e., database-backed sites with dynamic
RSS also means that the web browser is not the only means of viewing a web page. While some RSS aggregators, such as Bloglines, are web-based, others are desktop clients, and still others
RSS is now being used to push not just notices of new blog entries, but also
But RSS is only part of what makes a weblog different from an ordinary web page. Tom Coates remarks on the significance of the permalink:
It may seem like a trivial piece of functionality now, but it was effectively the device that turned weblogs from an ease-of-publishing phenomenon into a conversational mess of overlapping communities. For the first time it became relatively easy to gesture directly at a highly specific post on someone else's site and talk about it. Discussion emerged. Chat emerged. And - as a result - friendships emerged or became more entrenched. The permalink was the first - and most successful - attempt to build bridges between weblogs.
In many ways, the combination of RSS and permalinks adds many of the features of NNTP, the Network News Protocol of the Usenet, onto HTTP, the web protocol. The "blogosphere" can be thought of as a new, peer-to-peer equivalent to Usenet and bulletin-boards, the conversational watering holes of the early internet. Not only can people subscribe to each others' sites, and easily link to individual comments on a page, but also, via a mechanism known as trackbacks, they can see when anyone else links to their pages, and can respond, either with reciprocal links, or by adding comments.
Interestingly, two-way links were the goal of early hypertext systems like Xanadu. Hypertext purists have celebrated trackbacks as a step towards two way links. But note that trackbacks are not properly two-way--rather, they are re
If an essential part of Web 2.0 is harnessing collective intelligence, turning the web into a kind of global brain, the blogosphere is the equivalent of constant mental chatter in the forebrain, the voice we hear in
First, because search engines use link structure to help predict useful pages, bloggers, as the most prolific and timely linkers, have a disproportionate role in shaping search engine results. Second, because the blogging community is so highly self-referential, bloggers paying attention to other bloggers magnifies their visibility and power. The "echo chamber" that critics decry is also an amplifier.
If it were merely an amplifier, blogging would be uninteresting. But like Wikipedia, blogging harnesses collective intelligence as a kind of filter. What James Suriowecki c
While mainstream media may see individual blogs as competitors, what is re
Every significant internet application to date has been backed by a specialized database: Google's web crawl, Yahoo!'s directory (and web crawl), Amazon's database of products, eBay's database of products and sellers, MapQuest's map databases, Napster's distributed song database. As Hal Varian remarked in a personal conversation last year, "SQL is the new HTML." Database management is a core competency of Web 2.0 companies, so much so that we have sometimes referred to these applications as "infoware" rather than merely software.
This fact leads to a key question: Who owns the data?
In the internet era, one can already see a number of cases where control over the database has led to market control and outsized financial returns. The monopoly on domain name registry initi
Look at the copyright notices at the base of every map served by MapQuest, maps.yahoo.com, maps.msn.com, or maps.google.com, and you'll see the line "Maps copyright NavTeq, TeleAtlas," or with the new satellite imagery services, "Images copyright Digital Globe." These companies made substantial investments in their databases (NavTeq alone reportedly invested $750 million to build their database of street addresses and directions. Digital Globe spent $500 million to launch their own satellite to improve on government-supplied imagery.) NavTeq has gone so far as to imitate Intel's familiar Intel Inside logo: Cars with navigation systems bear the imprint, "NavTeq Onboard." Data is indeed the Intel Inside of these applications, a sole source component in systems whose software infrastructure is largely open source or otherwise commodified.
The now hotly contested web mapping arena demonstrates how a failure to understand the importance of owning an application's core data will eventu
Contrast, however, the position of Amazon.com. Like competitors such as Barnesandnoble.com, its original database came from ISBN registry provider R.R. Bowker. But unlike MapQuest, Amazon relentlessly enhanced the data, adding publisher-supplied data such as cover images, table of contents, index, and sample material. Even more importantly, they harnessed their users to annotate the data, such that after ten years, Amazon, not Bowker, is the primary source for bibliographic data on books, a reference source for scholars and librarians as well as consumers. Amazon also introduced their own proprietary identifier, the ASIN, which corresponds to the ISBN where one is present, and creates an equivalent namespace for products without one. Effectively, Amazon "embraced and extended" their data suppliers.
Imagine if MapQuest had done the same thing, harnessing their users to annotate maps and directions, adding layers of value. It would have been much more difficult for competitors to enter the market just by licensing the base data.
The recent introduction of Google Maps provides a living laboratory for the competition between application vendors and their data suppliers. Google's lightweight programming model has led to the creation of numerous value-added services in the form of mashups that link Google Maps with other internet-accessible data sources. Paul Rademacher's housingmaps.com, which combines Google Maps with Craigslist apartment rental and home purchase data to create an interactive housing search tool, is the pre-eminent example of such a mashup.
At present, these mashups are mostly innovative experiments, done by hackers. But entrepreneurial activity follows close behind. And already, one can see that for at least one class of developer, Google has taken the role of data source away from Navteq and inserted themselves as a favored intermediary. We expect to see battles between data suppliers and application vendors in the next few years, as both realize just how important certain classes of data will become as building blocks for Web 2.0 applications.
The race is on to own certain classes of core data: location, identity, calendaring of public events, product identifiers and namespaces. In many cases, where there is significant cost to create the data, there may be an opportunity for an Intel Inside style play, with a single source for the data. In others, the winner will be the company that first reaches critical mass via user aggregation, and turns that aggregated data into a system service.
For example, in the area of identity, PayPal, Amazon's 1-click, and the millions of users of communications systems, may
A further point must be noted with regard to data, and that is user concerns about privacy and their rights to their own data. In many of the early web applications, copyright is only loosely enforced. For example, Amazon lays claim to any reviews submitted to the site, but in the absence of enforcement, people may repost the same review elsewhere. However, as companies begin to realize that control over data may be their chief source of competitive advantage, we may see heightened attempts at control.
Much as the rise of proprietary software led to the Free Software movement, we expect the rise of proprietary databases to result in a Free Data movement within the next decade. One can see early signs of this countervailing trend in open data projects such as Wikipedia, the Creative Commons, and in software projects like Greasemonkey, which
As noted above in the discussion of Google vs. Netscape, one of the defining characteristics of internet era software is that it is delivered as a service, not as a product. This fact leads to a number of fundamental changes in the business model of such a company:
It's no accident that Google's system administration, networking, and load balancing techniques are perhaps even more closely guarded secrets than their search algorithms. Google's success at automating these processes is a key part of their cost advantage over competitors.
It's also no accident that scripting languages such as Perl, Python, PHP, and now Ruby, play such a large role at web 2.0 companies. Perl was famously described by Hassan Schroeder, Sun's first webmaster, as "the duct tape of the internet." Dynamic languages (often c