PROGRAMMING THE WORLD WIDE WEB Chapter 1 Fundamentals

 

Preview
• Many countries have been changed forever by
the advent of the World Wide Web.
• It has had some downsides, for example,
pornography and destructive ideas.
• Many of us use the Internet and the World
Wide Web to
– Communicate with Friends, Relatives
– Business associates through e-mail
– Shopping for virtually anything
– Digging up a mitless variety of information
Designed by


©

Preview
• Constructing the software that provides all of
this information requires knowledge of
– Markup languages
– Meta-markup languages
– Programming skills in a myriad of different
programming languages
– Some specific to the WWW
– Some designed for general purpose computing
• This book provide the required background
and a basis for acquiring the knowledge and
skills necessary to build the WWW site
Designed by


©

Contents
• Fundamentals
• Introduction to HTML
• Cascading Style Sheets
• The Basic of Perl
• Using Perl for CGI Programming
• The Basic of JavaScript
• JavaScript and HTML Documents
• Dynamic Documents with JavaScript
Designed by


©

Contents
• Java Applets
• Introduction to XML
• Introduction to Web Servers and Servlets
• Database Access with Java
• A brief Introduction to Java
to ASP Web Servers and Asp Pages
Access With ASP
Web Color Design
Designed by


©

Chapter 1 Fundamentals
1.1 A Brief Introduction to the Internet
1.2 The World Wide Web
1.3 Web Browsers
1.4 Web Servers
1.5 Uniform Resource Locators
1.6 Multipurpose Internet Mail Extensions
1.7 The Hypertext Transfer Protocol
1.8 The Web Programmer’s Toolbox
Designed by


©

1.1.1 ORIGINS (1 of 3)
• The U.S. Department of Defense ( DoD)
became interested in developing a new largescale
computer network in the 1960s.
• The DoD’s Advanced Research Projects
Agency (ARPA) funded the construction of the
first such network, which connected about a
dozen ARPA-funded research laboratories and
universities.
• The first network was estabshed at UCLA in
1969. The network was named ARPAnet.
Designed by


©

1.1.1 ORIGINS (2 of 3)
• During the late 1970s and early 1980s
– BITNET (City University of New York)
• Electronic mail
• File transfer
– CSNET (University of Delaware, Purdue
University, University of Wisconsin, RAND
Corporation, and BBN)
• Electronic Mail
– Neither of them became a dominant national
network.
Designed by


©

1.1.1 ORIGINS (3 of 3)
• A new national network was created in 1986, NSFnet,
sponsored, of course, by NSF. NSFnet initially
connected supercomputer centers at five universities.
• By 1990, NSFnet had replaced ARPAnet for most
nonmitary uses, and all sorts of organizations had
estabshed nodes on this network – by 1992, NSFnet
connected more than one milon computers around
the world.
• In 1995, a small part of NSFnet returned to being a
research network.
• The rest became known as the Internet, although this
term was used much earer for both ARPAnet and
NSFnet.
Designed by


©

1.1.2 WHAT THE INTERNET
IS
• a huge collection of computers connected
in a communications network.
• Some of the devices connected to the
Internet are not computers at all.
• The innovation that allows all of these
diverse devices to communicated with
each other is a single low-level protocol,
the Transmission Control Protocol/Internet
Protocol (TCP/IP), for all connections.
Designed by


©

1.1.2 WHAT THE INTERNET
IS
Designed by


©

1.1.2 WHAT THE INTERNET
IS (Cont.)
• The internet is primarily a network of
networks rather than a network of
computers.
• All devices connected to the Internet
must be uniquely identifiable.
Designed by


©

1.1.2 WHAT THE INTERNET
IS (Cont.)
Designed by


©

1.1.3 INTERNET PROTOCOL
ADDRESSES
• For people, Internet devices are identified
by names;
• For computers, they are identified by
numeric address.
5031425
intPhonNumber
0xFF1234
Designed by


©

1.1.2 WHAT THE INTERNET
IS (Cont.)
• The Internet Protocol (IP) address of a
machine connected to the Internet is a
unique 32-bit number.
XXX.XXX.XXX.XXX
XXX ∈ [0.255] = [20, 28]
Designed by


©

1.1.2 WHAT THE INTERNET
IS (Cont.)
Designed by


©

1.1.4 DOMAIN NAMES
• Because pople have difficulty deang with
and remembering numbers, machines on
the Internet also have textual names.
• These names begin with the name of the
host machine, followed by progressively
larger enclosing collections of machines,
called domains.
movies.comedy.marxbros.com
Designed by


©

1.1.4 DOMAIN NAMES
(Cont.)
• The host name and all of the domain names are
together called a full quafied domain name .
• The full quafied domain name of the destination
for a message must be converted to an IP
address before the message can be transmitted
on the Internet to the destination. These
conversions are done by machines called name
servers, which of machines on the Internet and
are operated by organizations that are
responsible for the part of the Internet to which
those machines are connected.
Designed by


©

1.1.4 DOMAIN NAMES
(Cont.)
Designed by


©

1.1.4 DOMAIN NAMES
(Cont.)
• By the mid-1980s, a collection of different
protocols that run on top of TCP/IP had
been developed to support a variety of
uses of the Internet.
– Telnet
– FTP
– Usenet
Designed by


©

1.2 The World Wide Web
1.2.1 ORIGINS
• In 1989, a small group of people led by Tim
Berners-Lee at CERN (formally the Conseil
European pour la Recherce Nucleaire, or the
European Laboratory for Particle Physics)
proposed a new protocol for the Internet and a
system of document access to use it.
• The intent of this new system, which the group
named the World Wide Web, was to allow
scientists around the world to use the Internet
to exchange documents describing their work.
Designed by


©

1.2.1 ORIGINS
• The proposed new system was designed to
allow a user anywhere on the Internet to
search for and retrieve documents in
databases on any number of different
document servers.
• By late 1990, the basic ideas for the new
system had been fully developed and
implemented on a NeXT computer at CERN.
• In 1991, the system was ported to other
computer platforms and released to the rest of
the world.
Designed by


©

1.2.1 ORIGINS
• For the form of its documents, the system used
hypertext
, text with embedded nks to text in
other locations to allow nonsequential
browsing of textual material. The idea of
hypertext had been developed earer and had
appeared in Xerox's NoteCards and Apple's
HyperCard in the mid-1980s.
• From here on, we will refer to the World Wide
Web simply as "the Web
." The unit of
information on the Web has been referred to
by several different names;among them, the
most common are pages
, documents
, and
resources
.
Designed by


©

1.2.2 WEB OR INTERNET?
• It is important to remember that the Internet and
the Web are not the same things.
– The Internet is a collection of computers and other
devices connected by equipment that allows them to
communicate with each other.
– The Web is a collection of software and protocols that
have been installed on most, if not all, of the
computers on the Internet.
– The Internet was quite useful before the Web was
developed, and it is still useful without it.
– However, it is now the case that most users of the
Internet are Web users.
Designed by


©

1.2.2 WEB OR INTERNET?
• In an abstract sense, the Web is merely a
vast collection of documents, some of
which are connected by nks. These
documents are accessed by Web
browsers, introduced in Section 1.3, "Web
Browsers", and provided by Web servers,
intro-duced in Section 1.4, "Web Servers."
Designed by


©

1.3 Web Browsers
• Documents provided by servers on the
Web are accessed through browsers
,
which are programs.
• The first browsers were textbased-—
they were not capable of
displaying any sort of graphic information,
nor did they have a graphical user
interface. This effectively constrained the
growth in the use of the Web.
Designed by


©

1.3 Web Browsers
• In early 1993, this changed with the
release of Mosaic from the National
Center for Supercomputing Appcations
(NCSA) at the University of Ilnois.
• The first release of Mosaic ran on UNIX
systems using the X Window System.
• By late 1993, versions of Mosaic for
Apple Macintosh and Microsoft Windows
systems had been released.
Designed by


©

1.3 Web Browsers
• Browsers are cents on the Web
because they initiate the conversation
with the server, which waits for a
message from a cent before doing
anything.
Designed by


©

1.3 Web Browsers
• Although the Web supports a variety of
protocols, the most common one is the
Hypertext Transfer Protocol
(HTTP),
which directly supports Web documents.
HTTP provides a standard form of
communications between browsers and
Web servers.
• Section 1.7
, "The Hypertext Transfer
Protocol," provides a more detailed
discussion of HTTP.
Designed by


©

1.3 Web Browsers
• The most commonly used browsers are
– Microsoft Internet Explorer
– Netscape Navigator
– NCSA's Mosaic
– Opera Software's Opera
Designed by


©

1.4 Web Servers
• Web servers are programs that provide
documents to browsers. Servers are
slave programs
: They act only when
requests are made to them by browsers
run-ning on other computers on the
Internet. In most cases, the action is to
find a document and send it to the
requesting browser.
Designed by


©

1.4 Web Servers
• The documents provided by Web servers often are
complete
and static
.
• However, in some cases, the requested document
must be constructed when requested. Such
dynamically constructed documents
are often built
by programs that are stored on the server.
• In some cases, the request from a browser is an
expcit request to run some program stored on the
server. Such programs typically generate
documents
that are returned to the browser.
• Until recently, the active communications
connection
between a browser and a Web server was
maintained only from the time a request was sent to
the server until the time the server completed the
transfer of the document to the browser.
Designed by


©

1.5 Uniform Resource
Locators
• Uniform (or universal) resource locators
(URLs) are used to identify resources on
the Internet.
• There are many different kinds of
resources, which are identified by many
different kinds of URLs.
Designed by


©

1.5.1 URL FORMATS
• All URLs have the same general formatscheme
: object-address
• Here, the scheme is often a communications protocol.
Common schemes include
– http
– ftp
– gopher,
– telnet
– File
– mailto
– news.
• Different schemes use object addresses that have
differing forms.
Designed by


©

1.5.1 URL FORMATS
• Our main interest es in the HTTP
protocol, which is used to request and
send Hypertext Markup Language
(HTML) documents. In the case of HTTP,
the form of the rest of the URL is:
/ /fully-quafied-domain-name /path-to-document
Designed by


©

1.5.1 URL FORMATS
• Examples:
– http://ibm.com/websphere
– ftp://ftp.lotus.com
– mailto:[email protected]
– news://news.software.ibm.com
– http://localhost:8080
• General Format:
protocol:[port#]//[user[:password]@]serve
rname.domain/[url-path]
Designed by


©

1.5.1 URL FORMATS
• The host name is the name of the server
computer that stores the resource. The
host name can have a colon and a port
number attached. For the HTML protocol,
the Web server s-tens to port 80.
• Only when a server has been configured
to use some other port number is it
necessary to attach that port number to
the host name.
Designed by


©

1.5.1 URL FORMATS
• URLs can never
have embedded spaces
.
• A collection of special characters also cannot
appear in a URL, including
– Semicolons
– Colons
– ampersands (&).
• To include a space or one of the disallowed
special characters in a URL, the character
must be coded as a percent sign (%) and the
two-digit hexadecimal ASCII code for the
character.
Designed by


©

1.5.2 URL PATHS
• The path to the document for the HTTP protocol
is similar to a path to a file or directory in the file
system of an operating system, a sequence of
directory names and a file name, all separated
by whatever separator character the oper -ating
system uses.
– http://www.gumboco.com/files/f99/storefront.html
– http://www.gumboco.corn/storefront.html
– http: / /www. gumboco. corn/departments/
– http: / /www. gumboco. corn/
Designed by


©

1.6 Multipurpose Internet Mail
Extensions
• A browser needs some way of
determining the forms of the documents
it receives. Without knowing the form of a
document received from the server, the
browser would be unable to display it.
The forms of these documents are
specified with the Multipurpose Internet
Mail Extensions
(MIME), which was
orig-inally developed so that information
in different forms could be sent by email.
Designed by


©

1.6.1 TYPE
SPECIFICATIONS
• MIME was developed to allow different
kinds of documents to be sent using
Internet mail. These could be various
kinds of text, video data, or sound data.
Because the Web has similar needs,
MIME was adopted as part of the HTTP
protocol as the way to specify document
types transmitted over the Web.
• MIME specifications have this form:
type/subtype
Designed by


©

1.6.1 TYPE
SPECIFICATIONS
• The most common MIME types are text
,
image
, and video
. The most common text
subtypes are plain and html. The most
common image sub-types are gif and jpeg.
The most common video subtypes are
mpeg and quicktime. A st of MIME
specifications is stored in the configuration
files of every Web server.
Designed by


©

1.6.2 EXPERIMENTAL
DOCUMENT TYPES
• Many experimental subtypes are being
used. The name of an experimental
sub-type begins with
xx-
,
as in video/x-ms
video. Any user can add an experi-mental
subtype by having its name added to the
st of MIME specifications stored in the
user's Web server.
Designed by


©

1.6.2 EXPERIMENTAL
DOCUMENT TYPES
• As you might expect, the user must
provide a program that the browser can
call when it needs to display the contents
of the database. These programs either
are external to the browser, in which case
they are called helper appcations
, or
are code modules that are inserted into
the browser, in which case they are called
plug-ins
.
Designed by


©

1.7 The Hypertext Transfer
Protocol
• When a browser is instructed by the user to
request a document from a server, the
following takes place from the point of view of
the browser: The browser opens a
communications connection with the indicated
Web server, sends the request, receives the
response document, and displays the
response document for the user. When the
Web server receives a request for a document,
it searches for the document. Assuming that
the document is found, the server returns the
document to the browser.
Designed by


©

1.7 The Hypertext Transfer
Protocol
• All Web communications transactions use the
same protocol, the Hypertext Transfer Protocol
(HTTP). The current version of HTTP is 1.1. It
is formally defined as RFC 2616
, which was
approved in June 1999.
• HTTP consists of two phases, the request
and
the response
. Each HTTP communication
(request or response) between a browser and
a Web server consists of two parts, a header
and a body
.
Designed by


©

1.7.1 THE REQUEST
PHASE
• he general form of an HTTP request is as
follows:
HTTP method Domain part of the URL HTTP version
Header fields
Blank ne
Message body
• GET
/
www.gumboco.com/storefront.html
HTTP/1.1
Designed by


©

1.7.1 THE REQUEST
PHASE
• Table 1.1 HTTP request methods
DELETE Delete the specified document
Replace the specified document with the
enclosed data
PUT
Execute the specified document, using the
enclosed data
POST
Return the header information for the specified
document
HEAD
GET Return the contents of the specified document
Method Description
Designed by


©

1.7.1 THE REQUEST
PHASE
• GET
is the most often used method;
• POST
is probably the second most used
method.
• POST
was originally designed for tasks
such as posting a news article to a
newsgroup. Its most common use is to
send form data to the server, along with a
request to execute a program on the
server that will process the form data.
Designed by


©

1.7.1 THE REQUEST
PHASE
• Following the first ne of an HTTP
communication is any number of header fields,
most of which are optional. The format of a
Header field is the field name, followed by a
colon and the value of the field.
• The Accept
field is often included in a request; it
specifies preference of the browser for the MIME
type of the requested document. More than one
Accept field can be specified.
Accept: text/plain
Accept: text/html
Accept: image/gif
Designed by


©

1.7.1 THE REQUEST
PHASE
• A wildcard character, the asterisk (*), can
be used to specify that part of a MIME
type can be anything.
Accept: text/*
Designed by


©

1.7.1 THE REQUEST
PHASE
• If the request method is POST, the
Content-length
field must be included in
the request header. It specifies the
number of bytes in the body of the request
data.
• The header of a request must be followed
by a blank ne, which is used to separate
the header from the data.
• Requests that use the GET, HEAD, and
DELETE methods do not have bodies.
Designed by


©

1.7.2 THE RESPONSE
PHASE
• The general form of an HTTP response is
as follows:
Status ne
Response header fields
Blank ne
Response body
Designed by


©

1.7.2 THE RESPONSE
PHASE
• The status ne includes the HTTP version
used, a three-digit status code for the
response, and a short textual explanation
of the status code. For example, most
responses begin with this:
HTTP/1.1 200 OK
Designed by


©

1.7.2 THE RESPONSE
PHASE
The status codes begin with 1,2, 3, 4, or 5.
The general meanings of the five
categories specified by these first digits
are shown in Table 1.2.
5 Server error
4 Cent error
3 Redirection
2 Success
1 Informational
First Digit Category
Designed by


©

1.7.2 THE RESPONSE
PHASE
• The only essential field of the header is
Content-type, which tells the browser how
to display the response data. Other fields
that may be included in the header are
Date, Server, Content-length, and Lastmodified.
• The response header must be followed by
a blank ne, as is the case for request
headers. The response data follows the
blank ne.
Designed by


©

1.7.2 THE RESPONSE
PHASE
• In HTTP versions prior to 1.1, when a server
completed sending a response to the cent, the
communications connection was closed.
However, the default operation of HTTP 1.1 is
that the connection is kept open for a time so
that a cent can make several requests over a
short period of time without needing to
reestabsh the communications connection with
the server. This leads to significant increases in
the efficiency of the Web.
Designed by


©

Sample
• HTTP Request (1 of 4)
• http://www.ibm.com/simpleDoc.html
• What this says:
– Using the HTTP protocol
– Communicating with server " www.ibm.com"
– Get the document entitled " simpleDoc.html"
Designed by


©

Sample
• HTTP Request (2 of 4)
• http://www.ibm.com/simpleDoc.html
• What this DOES from the browser
Generates a request and sends it to the Web
server:
– GET /simpleDoc.html HTTP/1.0
– Connection: Keep-Ave
– User-Agent: Mozilla/4.04 [en] (WinNT; U)
– Host: www.ibm.com:80
– Accept: image/gif, image/x-bitmap, image/png, */*
– Accept-Language: en
– Accept-Charset: iso-8859-1, *, utf-8
Designed by


©

Sample
• HTTP Request (3 of 4)
• http://www.ibm.com/simpleDoc.html
• What the Web server DOES
– Looks to see if the file simpleDoc.html exists; if so, returns the following
to the browser:
• HTTP/1.1 200 Document follows
• Server: Domino-Go-Webserver/4.6
• Date: Thu, 16 May 2002 21:58:53 GMT
• Accept-Ranges: bytes
• Content-Type: text/html
• Content Length: 68
• Last-Modified: Wed, 11 December 2001 22:09:51 GMT
• <HTML>
• <BODY>
• <H1>Very simple HTML document</H1>
• </BODY>
• </HTML>
Designed by


©

Sample
• HTTP Request (4 of 4)
• Here is what you would see in the browser:
Designed by


©

1.8 The Web Programmer’s
Toolbox
1.8.1 OVERVIEW OF HTML
• At the onset, it is important to reaze that
HTML is not a programming language. It
cannot be used to describe computations. Its
purpose is to describe the general form and
layout of documents to be displayed by a
browser.
• The word markup comes from the pubshing
world.
• TeX and LaTeX are older markup languages
for use with digital text
Designed by


©

1.8.1 OVERVIEW OF HTML
• An HTML document is a mixture of
content and controls. The controls are
specified by the tags of HTML.
• Most HTML tags consist of pairs of
syntactic markers that are used to
specify particular kinds of content.
Designed by


©

Designed by


©

Designed by


©

Designed by


©

Designed by


©

Designed by


©

Designed by


©

Designed by


©

1.8.2 TOOLS FOR CREATING
HTML DOCUMENTS
• WYSIWYG HTML editor
– Macromedia Dreamweaver
– Adobe PageMill
– Microsoft’s FrontPage
Designed by


©

1.8.3 OVERVIEW OF PERL
• One of the approaches to adding a
computational abity to Web documents is to
store and execute a program that performs the
needed computation on the server. This is often
done using the Common Gateway Interface
(CGI
), which is discussed in detail in Chapters 5,
"Using Perl for CGI Programming." Briefly, CGI
is a standard way in which a browser and a
server communicate to run a program on the
server and return the output of that program to
the browser.
Designed by


©

• Common Gateway Interface (CGI) (1 of 2)
• One of the first technologies that companies used to
develop an interactive Web appcation.
• An HTML page gathers information from the user.
• Browsers use the "POST" HTTP request to send
data to the server.
• The server, in response to the HTTP "POST"
request, will run the CGI script to perform some
function.
• When the CGI script is done performing the function,
it will return some information to the user by
generating a HTML page.
Designed by


©

• CGI (2 of 2)
• The server can return a dynamically-built HTML
page (based on the function it is performing), or
it can redirect the browser to a different page.
• CGI scripts can be written in several languages.
– Perl and C++ are the most common
• Using CGI scripts allows a developer to perform
any task on a server, including but not mited to:
– Access operating system resources and functions
– Accessing/updating a database
– Perform calculations on supped information
• The CGI script runs on the server inside the
Web server.
Designed by


©

Designed by


©

1.8.4 OVERVIEW OF
JAVASCRIPT
• JavaScript provides an alternative to the
use of CGI and Perl to include a
computational capabity in Web
documents. It is less general than Perl, but
because it is focused on Web documents,
it is a powerful tool for Web programming.
Designed by


©

1.8.4 OVERVIEW OF
JAVASCRIPT
• JavaScript "programs" are stored on the
server but are usually embedded in HTML
documents. These HTML docu-ments are
downloaded when they are requested by
browsers. The JavaScript code in an
HTML document is interpreted by the
browser on the cent.
• Perl is a server-side programming tool for
Web documents, JavaScript is a centside
programming tool for Web documents.
Designed by


©

1.8.4 OVERVIEW OF
JAVASCRIPT
• The most important aspect of JavaScript is
its use in creating, accessing, and
modifying a document. JavaScript defines
an object hierarchy that matches a
hierarchical model of an HTML document.
Elements of an HTML document are
accessed through these objects, providing
the basis for dynamic documents.
Designed by


©

1.8.5 OVERVIEW OF JAVA
• Java was designed and is still controlled
by Sun Microsystems. It was originally
developed to program household
appances. As always, timing is
exquisitely important: Java was still in
development when Web usage began to
explode.
Designed by


©

1.8.5 OVERVIEW OF JAVA
• What Is Java?
• Java is a object-oriented programming
language.
• Java is a platform that includes:
– Java Virtual Machine (JVM)
– Appcation Programming Interface (API)
• Common types of Java programs include:
– Applets
– Appcations
– Servlets
Designed by


©

1.8.5 OVERVIEW OF JAVA
• Write Once and Run Anywhere
Designed by


©

1.8.5 OVERVIEW OF JAVA
• Java Is General Purpose
Designed by


©

1.8.6 PLUG-INS AND
FILTERS
• Two different kinds of converters can be
used to create HTML documents. First are
plug-ins
, which are programs that can be
integrated with a word processor. Plug-ins
add new capabities to the word processor,
such as toolbar buttons and menu
elements that provide convenient ways to
insert HTML into the document being
created or edited. After such insertions,
the document is displayed using the HTML.
Designed by


©

1.8.6 PLUG-INS AND
FILTERS
• A second kind of converter is a filter
, which
converts an existing document in some form,
such as LaTEX or Microsoft Word, to HTML.
Filters are never part of the editor or word
processor that created the document. This is an
advantage because they can be platformindependent.
For example, a WordPerfect user
working on a Macintosh computer can provide
documents that can be later con -verted to HTML
using a filter running on a UNIX platform. The
disadvantage
Designed by


©

1.8.6 PLUG-INS AND
FILTERS
• Neither plugs-ins nor filters produce HTML
documents that, when displayed by browsers,
have the identical appearance of that produced
by the word processor.
• The advantages are that
– existing documents produced with word processors
can be easily converted to HTML,
– users can produce HTML documents using a word
processor with which they are famiar.
• This inevitably leads to version problems during
maintenance of the document. This is clearly a
disadvantage of using converters.
Designed by


©

1.9 Summary
• Internet
• IP Address and Domain Names
• WWW
• Web browser
• HTTP and MIME
• HTML, CGI and JavaScript
• Java
• Plug-ins and filters
Designed by


©

1.10 Review Question
Designed by


©

1.11 Exercises
• Using Browser
• Initiazing Web Server

你可能感兴趣的:(JavaScript,Web,server,cgi,internet,browser)