wget

GNU Wget 1.13.4 Manual

Table of Contents

  • Wget 1.13.4
  • 1 Overview
  • 2 Invoking
    • 2.1 URL Format
    • 2.2 Option Syntax
    • 2.3 Basic Startup Options
    • 2.4 Logging and Input File Options
    • 2.5 Download Options
    • 2.6 Directory Options
    • 2.7 HTTP Options
    • 2.8 HTTPS (SSL/TLS) Options
    • 2.9 FTP Options
    • 2.10 Recursive Retrieval Options
    • 2.11 Recursive Accept/Reject Options
    • 2.12 Exit Status
  • 3 Recursive Download
  • 4 Following Links
    • 4.1 Spanning Hosts
    • 4.2 Types of Files
    • 4.3 Directory-Based Limits
    • 4.4 Relative Links
    • 4.5 Following FTP Links
  • 5 Time-Stamping
    • 5.1 Time-Stamping Usage
    • 5.2 HTTP Time-Stamping Internals
    • 5.3 FTP Time-Stamping Internals
  • 6 Startup File
    • 6.1 Wgetrc Location
    • 6.2 Wgetrc Syntax
    • 6.3 Wgetrc Commands
    • 6.4 Sample Wgetrc
  • 7 Examples
    • 7.1 Simple Usage
    • 7.2 Advanced Usage
    • 7.3 Very Advanced Usage
  • 8 Various
    • 8.1 Proxies
    • 8.2 Distribution
    • 8.3 Web Site
    • 8.4 Mailing Lists
      • Primary List
      • Bug Notices List
      • Obsolete Lists
    • 8.5 Internet Relay Chat
    • 8.6 Reporting Bugs
    • 8.7 Portability
    • 8.8 Signals
  • 9 Appendices
    • 9.1 Robot Exclusion
    • 9.2 Security Considerations
    • 9.3 Contributors
  • Appendix A Copying this manual
    • A.1 GNU Free Documentation License
  • Concept Index


Next:  Overview,Previous:  (dir),Up:  (dir)

Wget 1.13.4

This file documents the GNU Wget utility for downloading networkdata.

Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation,Inc.

Permission is granted to copy, distribute and/or modify this documentunder the terms of the GNU Free Documentation License, Version 1.2 orany later version published by the Free Software Foundation; with noInvariant Sections, no Front-Cover Texts, and no Back-Cover Texts. Acopy of the license is included in the section entitled “GNU FreeDocumentation License”.


Next:  Invoking,Previous:  Top,Up:  Top

1 Overview

GNU Wget is a free utility for non-interactive download of files fromthe Web. It supportshttp, https, and ftp protocols, aswell as retrieval through http proxies.

This chapter is a partial overview of Wget's features.

  • Wget is non-interactive, meaning that it can work in the background,while the user is not logged on. This allows you to start a retrievaland disconnect from the system, letting Wget finish the work. Bycontrast, most of the Web browsers require constant user's presence,which can be a great hindrance when transferring a lot of data.
  • Wget can follow links in html, xhtml, andcss pages, tocreate local versions of remote web sites, fully recreating thedirectory structure of the original site. This is sometimes referred toas “recursive downloading.” While doing that, Wget respects the RobotExclusion Standard (/robots.txt). Wget can be instructed toconvert the links in downloaded files to point at the local files, foroffline viewing.
  • File name wildcard matching and recursive mirroring of directories areavailable when retrieving viaftp. Wget can read the time-stampinformation given by bothhttp and ftp servers, and store itlocally. Thus Wget can see if the remote file has changed since lastretrieval, and automatically retrieve the new version if it has. Thismakes Wget suitable for mirroring offtp sites, as well as homepages.
  • Wget has been designed for robustness over slow or unstable networkconnections; if a download fails due to a network problem, it willkeep retrying until the whole file has been retrieved. If the serversupports regetting, it will instruct the server to continue thedownload from where it left off.
  • Wget supports proxy servers, which can lighten the network load, speedup retrieval and provide access behind firewalls. Wget uses the passiveftp downloading by default, activeftp being an option.
  • Wget supports IP version 6, the next generation of IP. IPv6 isautodetected at compile-time, and can be disabled at either build orrun time. Binaries built with IPv6 support work well in bothIPv4-only and dual family environments.
  • Built-in features offer mechanisms to tune which links you wish to follow(seeFollowing Links).
  • The progress of individual downloads is traced using a progress gauge. Interactive downloads are tracked using a “thermometer”-style gauge,whereas non-interactive ones are traced with dots, each dotrepresenting a fixed amount of data received (1KB by default). Eithergauge can be customized to your preferences.
  • Most of the features are fully configurable, either through command lineoptions, or via the initialization file.wgetrc (see Startup File). Wget allows you to defineglobal startup files(/usr/local/etc/wgetrc by default) for site settings. You can alsospecify the location of a startup file with the –config option.
  • Finally, GNU Wget is free software. This means that everyone may useit, redistribute it and/or modify it under the terms of the GNU GeneralPublic License, as published by the Free Software Foundation (see thefileCOPYING that came with GNU Wget, for details).


Next:  Recursive Download,Previous:  Overview,Up:  Top

2 Invoking

By default, Wget is very simple to invoke. The basic syntax is:

     
     wget [option]... [URL]...
     

Wget will simply download all the urls specified on the commandline.URL is a Uniform Resource Locator, as defined below.

However, you may wish to change some of the default parameters ofWget. You can do it two ways: permanently, adding the appropriatecommand to.wgetrc (see Startup File), or specifying it onthe command line.


Next:  Option Syntax,Previous:  Invoking,Up:  Invoking

2.1 URL Format

URL is an acronym for Uniform Resource Locator. A uniformresource locator is a compact string representation for a resourceavailable via the Internet. Wget recognizes theurl syntax as perrfc1738. This is the most widely used form (square brackets denoteoptional parts):

     http://host[:port]/directory/file
     ftp://host[:port]/directory/file

You can also encode your username and password within a url:

     ftp://user:password@host/path
     http://user:password@host/path

Either user or password, or both, may be left out. If youleave out either thehttp username or password, no authenticationwill be sent. If you leave out theftp username, ‘anonymous’will be used. If you leave out theftp password, your emailaddress will be supplied as a default password.1

Important Note: if you specify a password-containing urlon the command line, the username and password will be plainly visibleto all users on the system, by way ofps. On multi-user systems,this is a big security risk. To work around it, usewget -i -and feed the urls to Wget's standard input, each on a separateline, terminated byC-d.

You can encode unsafe characters in a url as ‘%xy’,xybeing the hexadecimal representation of the character's asciivalue. Some common unsafe characters include ‘%’ (quoted as‘%25’), ‘:’ (quoted as ‘%3A’), and ‘@’ (quoted as‘%40’). Refer torfc1738 for a comprehensive list of unsafecharacters.

Wget also supports the type feature for ftpurls. Bydefault, ftp documents are retrieved in the binary mode (type‘i’), which means that they are downloaded unchanged. Anotheruseful mode is the ‘a’ (ASCII) mode, which converts the linedelimiters between the different operating systems, and is thus usefulfor text files. Here is an example:

     ftp://host/directory/file;type=a

Two alternative variants of url specification are also supported,because of historical (hysterical?) reasons and their widespreaded use.

ftp-only syntax (supported by NcFTP):

     host:/dir/file

http-only syntax (introduced by Netscape):

     host[:port]/dir/file

These two alternative forms are deprecated, and may cease beingsupported in the future.

If you do not understand the difference between these notations, or donot know which one to use, just use the plain ordinary format you usewith your favorite browser, likeLynx or Netscape.


Next:  Basic Startup Options,Previous:  URL Format,Up:  Invoking

2.2 Option Syntax

Since Wget uses GNU getopt to process command-line arguments, everyoption has a long form along with the short one. Long options aremore convenient to remember, but take time to type. You may freelymix different option styles, or specify options after the command-linearguments. Thus you may write:

     wget -r --tries=10 http://fly.srk.fer.hr/ -o log

The space between the option accepting an argument and the argument maybe omitted. Instead of ‘-o log’ you can write ‘-olog’.

You may put several options that do not require arguments together,like:

     wget -drc URL

This is completely equivalent to:

     wget -d -r -c URL

Since the options can be specified after the arguments, you mayterminate them with ‘--’. So the following will try to downloadurl-x’, reporting failure to log:

     wget -o log -- -x

The options that accept comma-separated lists all respect the conventionthat specifying an empty list clears its value. This can be useful toclear the.wgetrc settings. For instance, if your .wgetrcsets exclude_directories to /cgi-bin, the followingexample will first reset it, and then set it to exclude/~nobodyand /~somebody. You can also clear the lists in.wgetrc(see Wgetrc Syntax).

     wget -X '' -X /~nobody,/~somebody

Most options that do not accept arguments are boolean options,so named because their state can be captured with a yes-or-no(“boolean”) variable. For example, ‘--follow-ftp’ tells Wgetto follow FTP links from HTML files and, on the other hand,‘--no-glob’ tells it not to perform file globbing on FTP URLs. Aboolean option is eitheraffirmative or negative(beginning with ‘--no’). All such options share severalproperties.

Unless stated otherwise, it is assumed that the default behavior isthe opposite of what the option accomplishes. For example, thedocumented existence of ‘--follow-ftp’ assumes that the defaultis tonot follow FTP links from HTML pages.

Affirmative options can be negated by prepending the ‘--no-’ tothe option name; negative options can be negated by omitting the‘--no-’ prefix. This might seem superfluous—if the default foran affirmative option is to not do something, then why provide a wayto explicitly turn it off? But the startup file may in fact changethe default. For instance, usingfollow_ftp = on in.wgetrc makes Wgetfollow FTP links by default, andusing ‘--no-follow-ftp’ is the only way to restore the factorydefault from the command line.


Next:  Logging and Input File Options,Previous:  Option Syntax,Up:  Invoking

2.3 Basic Startup Options

-V
--version
Display the version of Wget.
-h
--help
Print a help message describing all of Wget's command-line options.
-b
--background
Go to background immediately after startup. If no output file isspecified via the ‘ -o’, output is redirected to wget-log.


-e command’
--execute command’
Execute command as if it were a part of .wgetrc(see Startup File). A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence overthem. If you need to specify more than one wgetrc command, use multipleinstances of ‘ -e’.


Next:  Download Options,Previous:  Basic Startup Options,Up:  Invoking

2.4 Logging and Input File Options

-o logfile’
--output-file=logfile’
Log all messages to logfile. The messages are normally reportedto standard error.


-a logfile’
--append-output=logfile’
Append to logfile. This is the same as ‘ -o’, only it appendsto logfile instead of overwriting the old log file. If logfile does not exist, a new file is created.


-d
--debug
Turn on debug output, meaning various information important to thedevelopers of Wget if it does not work properly. Your systemadministrator may have chosen to compile Wget without debug support, inwhich case ‘ -d’ will not work. Please note that compiling withdebug support is always safe—Wget compiled with the debug support will not print any debug info unless requested with ‘ -d’. See Reporting Bugs, for more information on how to use ‘ -d’ forsending bug reports.


-q
--quiet
Turn off Wget's output.


-v
--verbose
Turn on verbose output, with all the available data. The default outputis verbose.
-nv
--no-verbose
Turn off verbose without being completely quiet (use ‘ -q’ forthat), which means that error messages and basic information still getprinted.


-i file’
--input-file=file’
Read urls from a local or external file. If ‘ -’ isspecified as file, urls are read from the standard input. (Use ‘ ./-’ to read from a file literally named ‘ -’.)

If this function is used, no urls need be present on the commandline. If there areurls both on the command line and in an inputfile, those on the command lines will be the first ones to beretrieved. If ‘--force-html’ is not specified, thenfileshould consist of a series of URLs, one per line.

However, if you specify ‘--force-html’, the document will beregarded as ‘html’. In that case you may have problems withrelative links, which you can solve either by addingurl"> to the documents or by specifying‘--base=url’ on the command line.

If the file is an external one, the document will be automaticallytreated as ‘html’ if the Content-Type matches ‘text/html’. Furthermore, thefile's location will be implicitly used as basehref if none was specified.


-F
--force-html
When input is read from a file, force it to be treated as an htmlfile. This enables you to retrieve relative links from existing html files on your local disk, by adding url "> to html, or using the ‘ --base’ command-lineoption.


-B URL’
--base=URL’
Resolves relative links using URL as the point of reference,when reading links from an HTML file specified via the‘ -i’/‘ --input-file’ option (together with‘ --force-html’, or when the input file was fetched remotely froma server describing it as html). This is equivalent to thepresence of a BASE tag in the html input file, with URL as the value for the href attribute.

For instance, if you specify ‘http://foo/bar/a.html’ forURL, and Wget reads ‘../baz/b.html’ from the input file, itwould be resolved to ‘http://foo/baz/b.html’.


--config=FILE’
Specify the location of a startup file you wish to use.


Next:  Directory Options,Previous:  Logging and Input File Options,Up:  Invoking

2.5 Download Options

--bind-address=ADDRESS’
When making client TCP/IP connections, bind to ADDRESS onthe local machine. ADDRESS may be specified as a hostname or IPaddress. This option can be useful if your machine is bound to multipleIPs.


-t number’
--tries=number’
Set number of retries to number. Specify 0 or ‘ inf’ forinfinite retrying. The default is to retry 20 times, with the exceptionof fatal errors like “connection refused” or “not found” (404),which are not retried.
-O file’
--output-document=file’
The documents will not be written to the appropriate files, but allwill be concatenated together and written to file. If ‘ -’is used as file, documents will be printed to standard output,disabling link conversion. (Use ‘ ./-’ to print to a fileliterally named ‘ -’.)

Use of ‘-O’ is not intended to mean simply “use the namefile instead of the one in the URL;” rather, it isanalogous to shell redirection:‘wget -O file http://foo’ is intended to work like‘wget -O - http://foo > file’;file will be truncatedimmediately, and all downloaded content will be written there.

For this reason, ‘-N’ (for timestamp-checking) is not supportedin combination with ‘-O’: sincefile is always newlycreated, it will always have a very new timestamp. A warning will beissued if this combination is used.

Similarly, using ‘-r’ or ‘-p’ with ‘-O’ may not work asyou expect: Wget won't just download the first file tofile andthen download the rest to their normal names: all downloadedcontent will be placed infile. This was disabled in version1.11, but has been reinstated (with a warning) in 1.11.2, as there aresome cases where this behavior can actually have some use.

Note that a combination with ‘-k’ is only permitted whendownloading a single document, as in that case it will just convertall relative URIs to external ones; ‘-k’ makes no sense formultiple URIs when they're all being downloaded to a single file;‘-k’ can be used only when the output is a regular file.


-nc
--no-clobber
If a file is downloaded more than once in the same directory, Wget'sbehavior depends on a few options, including ‘ -nc’. In certaincases, the local file will be clobbered, or overwritten, uponrepeated download. In other cases it will be preserved.

When running Wget without ‘-N’, ‘-nc’, ‘-r’, or‘-p’, downloading the same file in the same directory will resultin the original copy of file being preserved and the second copybeing named ‘file.1’. If that file is downloaded yetagain, the third copy will be named ‘file.2’, and so on. (This is also the behavior with ‘-nd’, even if ‘-r’ or‘-p’ are in effect.) When ‘-nc’ is specified, this behavioris suppressed, and Wget will refuse to download newer copies of‘file’. Therefore, “no-clobber” is actually amisnomer in this mode—it's not clobbering that's prevented (as thenumeric suffixes were already preventing clobbering), but rather themultiple version saving that's prevented.

When running Wget with ‘-r’ or ‘-p’, but without ‘-N’,‘-nd’, or ‘-nc’, re-downloading a file will result in thenew copy simply overwriting the old. Adding ‘-nc’ will preventthis behavior, instead causing the original version to be preservedand any newer copies on the server to be ignored.

When running Wget with ‘-N’, with or without ‘-r’ or‘-p’, the decision as to whether or not to download a newer copyof a file depends on the local and remote timestamp and size of thefile (see Time-Stamping). ‘-nc’ may not be specified at thesame time as ‘-N’.

Note that when ‘-nc’ is specified, files with the suffixes‘.html’ or ‘.htm’ will be loaded from the local disk andparsed as if they had been retrieved from the Web.


-c
--continue
Continue getting a partially-downloaded file. This is useful when youwant to finish up a download started by a previous instance of Wget, orby another program. For instance:
          wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

If there is a file named ls-lR.Z in the current directory, Wgetwill assume that it is the first portion of the remote file, and willask the server to continue the retrieval from an offset equal to thelength of the local file.

Note that you don't need to specify this option if you just want thecurrent invocation of Wget to retry downloading a file should theconnection be lost midway through. This is the default behavior. ‘-c’ only affects resumption of downloads started prior tothis invocation of Wget, and whose local files are still sitting around.

Without ‘-c’, the previous example would just download the remotefile tols-lR.Z.1, leaving the truncated ls-lR.Z filealone.

Beginning with Wget 1.7, if you use ‘-c’ on a non-empty file, andit turns out that the server does not support continued downloading,Wget will refuse to start the download from scratch, which wouldeffectively ruin existing contents. If you really want the download tostart from scratch, remove the file.

Also beginning with Wget 1.7, if you use ‘-c’ on a file which is ofequal size as the one on the server, Wget will refuse to download thefile and print an explanatory message. The same happens when the fileis smaller on the server than locally (presumably because it was changedon the server since your last download attempt)—because “continuing”is not meaningful, no download occurs.

On the other side of the coin, while using ‘-c’, any file that'sbigger on the server than locally will be considered an incompletedownload and only(length(remote) - length(local)) bytes will bedownloaded and tacked onto the end of the local file. This behavior canbe desirable in certain cases—for instance, you can use ‘wget -c’to download just the new portion that's been appended to a datacollection or log file.

However, if the file is bigger on the server because it's beenchanged, as opposed to justappended to, you'll end upwith a garbled file. Wget has no way of verifying that the local fileis really a valid prefix of the remote file. You need to be especiallycareful of this when using ‘-c’ in conjunction with ‘-r’,since every file will be considered as an "incomplete download" candidate.

Another instance where you'll get a garbled file if you try to use‘-c’ is if you have a lamehttp proxy that inserts a“transfer interrupted” string into the local file. In the future a“rollback” option may be added to deal with this case.

Note that ‘-c’ only works with ftp servers and with httpservers that support the Range header.


--progress=type’
Select the type of the progress indicator you wish to use. Legalindicators are “dot” and “bar”.

The “bar” indicator is used by default. It draws an ascii progressbar graphics (a.k.a “thermometer” display) indicating the status ofretrieval. If the output is not a TTY, the “dot” bar will be used bydefault.

Use ‘--progress=dot’ to switch to the “dot” display. It tracesthe retrieval by printing dots on the screen, each dot representing afixed amount of downloaded data.

When using the dotted retrieval, you may also set the style byspecifying the type as ‘dot:style’. Different styles assigndifferent meaning to one dot. With thedefault style each dotrepresents 1K, there are ten dots in a cluster and 50 dots in a line. Thebinary style has a more “computer”-like orientation—8Kdots, 16-dots clusters and 48 dots per line (which makes for 384Klines). Themega style is suitable for downloading very largefiles—each dot represents 64K retrieved, there are eight dots in acluster, and 48 dots on each line (so each line contains 3M).

Note that you can set the default style using the progresscommand in.wgetrc. That setting may be overridden from thecommand line. The exception is that, when the output is not a TTY, the“dot” progress will be favored over “bar”. To force the bar output,use ‘--progress=bar:force’.

-N
--timestamping
Turn on time-stamping. See Time-Stamping, for details.
--no-use-server-timestamps
Don't set the local file's timestamp by the one on the server.

By default, when a file is downloaded, it's timestamps are set tomatch those from the remote file. This allows the use of‘--timestamping’ on subsequent invocations of wget. However, itis sometimes useful to base the local file's timestamp on when it wasactually downloaded; for that purpose, the‘--no-use-server-timestamps’ option has been provided.


-S
--server-response
Print the headers sent by http servers and responses sent by ftp servers.


--spider
When invoked with this option, Wget will behave as a Web spider,which means that it will not download the pages, just check that theyare there. For example, you can use Wget to check your bookmarks:
          wget --spider --force-html -i bookmarks.html

This feature needs much more work for Wget to get close to thefunctionality of real web spiders.


-T seconds
--timeout=seconds’
Set the network timeout to seconds seconds. This is equivalentto specifying ‘ --dns-timeout’, ‘ --connect-timeout’, and‘ --read-timeout’, all at the same time.

When interacting with the network, Wget can check for timeout andabort the operation if it takes too long. This prevents anomalieslike hanging reads and infinite connects. The only timeout enabled bydefault is a 900-second read timeout. Setting a timeout to 0 disablesit altogether. Unless you know what you are doing, it is best not tochange the default timeout settings.

All timeout-related options accept decimal values, as well assubsecond values. For example, ‘0.1’ seconds is a legal (thoughunwise) choice of timeout. Subsecond timeouts are useful for checkingserver response times or for testing network latency.


--dns-timeout=seconds’
Set the DNS lookup timeout to seconds seconds. DNS lookups thatdon't complete within the specified time will fail. By default, thereis no timeout on DNS lookups, other than that implemented by systemlibraries.


--connect-timeout=seconds’
Set the connect timeout to seconds seconds. TCP connections thattake longer to establish will be aborted. By default, there is noconnect timeout, other than that implemented by system libraries.


--read-timeout=seconds’
Set the read (and write) timeout to seconds seconds. The“time” of this timeout refers to idle time: if, at any point inthe download, no data is received for more than the specified numberof seconds, reading fails and the download is restarted. This optiondoes not directly affect the duration of the entire download.

Of course, the remote server may choose to terminate the connectionsooner than this option requires. The default read timeout is 900seconds.


--limit-rate=amount’
Limit the download speed to amount bytes per second. Amount maybe expressed in bytes, kilobytes with the ‘ k’ suffix, or megabyteswith the ‘ m’ suffix. For example, ‘ --limit-rate=20k’ willlimit the retrieval rate to 20KB/s. This is useful when, for whateverreason, you don't want Wget to consume the entire available bandwidth.

This option allows the use of decimal numbers, usually in conjunctionwith power suffixes; for example, ‘--limit-rate=2.5k’ is a legalvalue.

Note that Wget implements the limiting by sleeping the appropriateamount of time after a network read that took less time than specifiedby the rate. Eventually this strategy causes the TCP transfer to slowdown to approximately the specified rate. However, it may take sometime for this balance to be achieved, so don't be surprised if limitingthe rate doesn't work well with very small files.


-w seconds’
--wait=seconds’
Wait the specified number of seconds between the retrievals. Use ofthis option is recommended, as it lightens the server load by making therequests less frequent. Instead of in seconds, the time can bespecified in minutes using the m suffix, in hours using hsuffix, or in days using d suffix.

Specifying a large value for this option is useful if the network or thedestination host is down, so that Wget can wait long enough toreasonably expect the network error to be fixed before the retry. Thewaiting interval specified by this function is influenced by--random-wait, which see.


--waitretry=seconds’
If you don't want Wget to wait between every retrieval, but onlybetween retries of failed downloads, you can use this option. Wget willuse linear backoff, waiting 1 second after the first failure on agiven file, then waiting 2 seconds after the second failure on thatfile, up to the maximum number of seconds you specify.

By default, Wget will assume a value of 10 seconds.


--random-wait
Some web sites may perform log analysis to identify retrieval programssuch as Wget by looking for statistically significant similarities inthe time between requests. This option causes the time between requeststo vary between 0.5 and 1.5 * wait seconds, where wait wasspecified using the ‘ --wait’ option, in order to mask Wget'spresence from such analysis.

A 2001 article in a publication devoted to development on a popularconsumer platform provided code to perform this analysis on the fly. Its author suggested blocking at the class C address level to ensureautomated retrieval programs were blocked despite changing DHCP-suppliedaddresses.

The ‘--random-wait’ option was inspired by this ill-advisedrecommendation to block many unrelated users from a web site due to theactions of one.

--no-proxy
Don't use proxies, even if the appropriate *_proxy environmentvariable is defined.

For more information about the use of proxies with Wget, See Proxies.


-Q quota’
--quota=quota’
Specify download quota for automatic retrievals. The value can bespecified in bytes (default), kilobytes (with ‘ k’ suffix), ormegabytes (with ‘ m’ suffix).

Note that quota will never affect downloading a single file. So if youspecify ‘wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz’, all of thels-lR.gz will be downloaded. The same goes even when severalurls are specified on the command-line. However, quota isrespected when retrieving either recursively, or from an input file. Thus you may safely type ‘wget -Q2m -i sites’—download will beaborted when the quota is exceeded.

Setting quota to 0 or to ‘inf’ unlimits the download quota.


--no-dns-cache
Turn off caching of DNS lookups. Normally, Wget remembers the IPaddresses it looked up from DNS so it doesn't have to repeatedlycontact the DNS server for the same (typically small) set of hosts itretrieves from. This cache exists in memory only; a new Wget run willcontact DNS again.

However, it has been reported that in some situations it is notdesirable to cache host names, even for the duration of ashort-running application like Wget. With this option Wget issues anew DNS lookup (more precisely, a new call togethostbyname orgetaddrinfo) each time it makes a new connection. Please notethat this option willnot affect caching that might beperformed by the resolving library or by an external caching layer,such as NSCD.

If you don't understand exactly what this option does, you probablywon't need it.


--restrict-file-names=modes’
Change which characters found in remote URLs must be escaped duringgeneration of local filenames. Characters that are restrictedby this option are escaped, i.e. replaced with ‘ %HH’, where‘ HH’ is the hexadecimal number that corresponds to the restrictedcharacter. This option may also be used to force all alphabeticalcases to be either lower- or uppercase.

By default, Wget escapes the characters that are not valid or safe aspart of file names on your operating system, as well as controlcharacters that are typically unprintable. This option is useful forchanging these defaults, perhaps because you are downloading to anon-native partition, or because you want to disable escaping of thecontrol characters, or you want to further restrict characters to onlythose in theascii range of values.

The modes are a comma-separated set of text values. Theacceptable values are ‘unix’, ‘windows’, ‘nocontrol’,‘ascii’, ‘lowercase’, and ‘uppercase’. The values‘unix’ and ‘windows’ are mutually exclusive (one willoverride the other), as are ‘lowercase’ and‘uppercase’. Those last are special cases, as they do not changethe set of characters that would be escaped, but rather force localfile paths to be converted either to lower- or uppercase.

When “unix” is specified, Wget escapes the character ‘/’ andthe control characters in the ranges 0–31 and 128–159. This is thedefault on Unix-like operating systems.

When “windows” is given, Wget escapes the characters ‘\’,‘|’, ‘/’, ‘:’, ‘?’, ‘"’, ‘*’, ‘<’,‘>’, and the control characters in the ranges 0–31 and 128–159. In addition to this, Wget in Windows mode uses ‘+’ instead of‘:’ to separate host and port in local file names, and uses‘@’ instead of ‘?’ to separate the query portion of the filename from the rest. Therefore, a URL that would be saved as‘www.xemacs.org:4300/search.pl?input=blah’ in Unix mode would besaved as ‘www.xemacs.org+4300/search.pl@input=blah’ in Windowsmode. This mode is the default on Windows.

If you specify ‘nocontrol’, then the escaping of the controlcharacters is also switched off. This option may make sensewhen you are downloading URLs whose names contain UTF-8 characters, ona system which can save and display filenames in UTF-8 (some possiblebyte values used in UTF-8 byte sequences fall in the range of valuesdesignated by Wget as “controls”).

The ‘ascii’ mode is used to specify that any bytes whose valuesare outside the range ofascii characters (that is, greater than127) shall be escaped. This can be useful when saving filenameswhose encoding does not match the one used locally.

-4
--inet4-only
-6
--inet6-only
Force connecting to IPv4 or IPv6 addresses. With ‘ --inet4-only’or ‘ -4’, Wget will only connect to IPv4 hosts, ignoring AAAArecords in DNS, and refusing to connect to IPv6 addresses specified inURLs. Conversely, with ‘ --inet6-only’ or ‘ -6’, Wget willonly connect to IPv6 hosts and ignore A records and IPv4 addresses.

Neither options should be needed normally. By default, an IPv6-awareWget will use the address family specified by the host's DNS record. If the DNS responds with both IPv4 and IPv6 addresses, Wget will trythem in sequence until it finds one it can connect to. (Also see--prefer-family option described below.)

These options can be used to deliberately force the use of IPv4 orIPv6 address families on dual family systems, usually to aid debuggingor to deal with broken network configuration. Only one of‘--inet6-only’ and ‘--inet4-only’ may be specified at thesame time. Neither option is available in Wget compiled without IPv6support.

--prefer-family=none/IPv4/IPv6
When given a choice of several addresses, connect to the addresseswith specified address family first. The address order returned byDNS is used without change by default.

This avoids spurious errors and connect attempts when accessing hoststhat resolve to both IPv6 and IPv4 addresses from IPv4 networks. Forexample, ‘www.kame.net’ resolves to‘2001:200:0:8002:203:47ff:fea5:3085’ and to‘203.178.141.194’. When the preferred family isIPv4, theIPv4 address is used first; when the preferred family is IPv6,the IPv6 address is used first; if the specified value is none,the address order returned by DNS is used without change.

Unlike ‘-4’ and ‘-6’, this option doesn't inhibit access toany address family, it only changes theorder in which theaddresses are accessed. Also note that the reordering performed bythis option isstable—it doesn't affect order of addresses ofthe same family. That is, the relative order of all IPv4 addressesand of all IPv6 addresses remains intact in all cases.

--retry-connrefused
Consider “connection refused” a transient error and try again. Normally Wget gives up on a URL when it is unable to connect to thesite because failure to connect is taken as a sign that the server isnot running at all and that retries would not help. This option isfor mirroring unreliable sites whose servers tend to disappear forshort periods of time.


--user=user’
--password=password’
Specify the username user and password password for both ftp and http file retrieval. These parameters can be overriddenusing the ‘ --ftp-user’ and ‘ --ftp-password’ options for ftp connections and the ‘ --http-user’ and ‘ --http-password’options for http connections.
--ask-password
Prompt for a password for each connection established. Cannot be specifiedwhen ‘ --password’ is being used, because they are mutually exclusive.


--no-iri
Turn off internationalized URI (IRI) support. Use ‘ --iri’ toturn it on. IRI support is activated by default.

You can set the default state of IRI support using the iricommand in.wgetrc. That setting may be overridden from thecommand line.


--local-encoding=encoding’
Force Wget to use encoding as the default system encoding. That affectshow Wget converts URLs specified as arguments from locale to utf-8 forIRI support.

Wget use the function nl_langinfo() and then the CHARSETenvironment variable to get the locale. If it fails,ascii is used.

You can set the default local encoding using the local_encodingcommand in.wgetrc. That setting may be overridden from thecommand line.


--remote-encoding=encoding’
Force Wget to use encoding as the default remote server encoding. That affects how Wget converts URIs found in files from remote encodingto utf-8 during a recursive fetch. This options is only useful forIRI support, for the interpretation of non- ascii characters.

For HTTP, remote encoding can be found in HTTP Content-Typeheader and in HTMLContent-Type http-equiv meta tag.

You can set the default encoding using the remoteencodingcommand in.wgetrc. That setting may be overridden from thecommand line.


--unlink
Force Wget to unlink file instead of clobbering existing file. Thisoption is useful for downloading to the directory with hardlinks.


Next:  HTTP Options,Previous:  Download Options,Up:  Invoking

2.6 Directory Options

-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the currentdirectory, without clobbering (if a name shows up more than once, thefilenames will get extensions ‘ .n’).
-x
--force-directories
The opposite of ‘ -nd’—create a hierarchy of directories, even ifone would not have been created otherwise. E.g. ‘ wget -xhttp://fly.srk.fer.hr/robots.txt’ will save the downloaded file to fly.srk.fer.hr/robots.txt.
-nH
--no-host-directories
Disable generation of host-prefixed directories. By default, invokingWget with ‘ -r http://fly.srk.fer.hr/’ will create a structure ofdirectories beginning with fly.srk.fer.hr/. This option disablessuch behavior.
--protocol-directories
Use the protocol name as a directory component of local file names. Forexample, with this option, ‘ wget -r http://host’ will save to‘ http/host /...’ rather than just to ‘ host /...’.


--cut-dirs=number’
Ignore number directory components. This is useful for getting afine-grained control over the directory where recursive retrieval willbe saved.

Take, for example, the directory at‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with‘-r’, it will be saved locally underftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option canremove the ftp.xemacs.org/ part, you are still stuck withpub/xemacs. This is where ‘--cut-dirs’ comes in handy; itmakes Wget not “see”number remote directory components. Hereare several examples of how ‘--cut-dirs’ option works.

          No options        -> ftp.xemacs.org/pub/xemacs/
          -nH               -> pub/xemacs/
          -nH --cut-dirs=1  -> xemacs/
          -nH --cut-dirs=2  -> .
          
          --cut-dirs=1      -> ftp.xemacs.org/xemacs/
          ...

If you just want to get rid of the directory structure, this option issimilar to a combination of ‘-nd’ and ‘-P’. However, unlike‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—forinstance, with ‘-nH --cut-dirs=1’, abeta/ subdirectory willbe placed to xemacs/beta, as one would expect.


-P prefix’
--directory-prefix=prefix’
Set directory prefix to prefix. The directory prefix is thedirectory where all other files and subdirectories will be saved to,i.e. the top of the retrieval tree. The default is ‘ .’ (thecurrent directory).


Next:  HTTPS (SSL/TLS) Options,Previous:  Directory Options,Up:  Invoking

2.7 HTTP Options

--default-page=name’
Use name as the default file name when it isn't known (i.e., forURLs that end in a slash), instead of index.html.


-E
--adjust-extension
If a file of type ‘ application/xhtml+xml’ or ‘ text/html’ isdownloaded and the URL does not end with the regexp‘ \.[Hh][Tt][Mm][Ll]?’, this option will cause the suffix ‘ .html’to be appended to the local filename. This is useful, for instance, whenyou're mirroring a remote site that uses ‘ .asp’ pages, but you wantthe mirrored pages to be viewable on your stock Apache server. Anothergood use for this is when you're downloading CGI-generated materials. A URLlike ‘ http://site.com/article.cgi?25’ will be saved as article.cgi?25.html.

Note that filenames changed in this way will be re-downloaded every timeyou re-mirror a site, because Wget can't tell that the localX.html file corresponds to remote URL ‘X’ (sinceit doesn't yet know that the URL produces output of type‘text/html’ or ‘application/xhtml+xml’.

As of version 1.12, Wget will also ensure that any downloaded files oftype ‘text/css’ end in the suffix ‘.css’, and the option wasrenamed from ‘--html-extension’, to better reflect its newbehavior. The old option name is still acceptable, but should now beconsidered deprecated.

At some point in the future, this option may well be expanded toinclude suffixes for other types of content, including content typesthat are not parsed by Wget.


--http-user=user’
--http-password=password’
Specify the username user and password password on an http server. According to the type of the challenge, Wget willencode them using either the basic (insecure),the digest, or the Windows NTLM authentication scheme.

Another way to specify username and password is in the url itself(seeURL Format). Either method reveals your password to anyone whobothers to runps. To prevent the passwords from being seen,store them in .wgetrc or.netrc, and make sure to protectthose files from other users withchmod. If the passwords arereally important, do not leave them lying in those files either—editthe files and delete them after Wget has started the download.


--no-http-keep-alive
Turn off the “keep-alive” feature for HTTP downloads. Normally, Wgetasks the server to keep the connection open so that, when you downloadmore than one document from the same server, they get transferred overthe same TCP connection. This saves time and at the same time reducesthe load on the server.

This option is useful when, for some reason, persistent (keep-alive)connections don't work for you, for example due to a server bug or dueto the inability of server-side scripts to cope with the connections.


--no-cache
Disable server-side cache. In this case, Wget will send the remoteserver an appropriate directive (‘ Pragma: no-cache’) to get thefile from the remote service, rather than returning the cached version. This is especially useful for retrieving and flushing out-of-datedocuments on proxy servers.

Caching is allowed by default.


--no-cookies
Disable the use of cookies. Cookies are a mechanism for maintainingserver-side state. The server sends the client a cookie using the Set-Cookie header, and the client responds with the same cookieupon further requests. Since cookies allow the server owners to keeptrack of visitors and for sites to exchange this information, someconsider them a breach of privacy. The default is to use cookies;however, storing cookies is not on by default.


--load-cookies file’
Load cookies from file before the first HTTP retrieval. file is a textual file in the format originally used by Netscape's cookies.txt file.

You will typically use this option when mirroring sites that requirethat you be logged in to access some or all of their content. The loginprocess typically works by the web server issuing anhttp cookieupon receiving and verifying your credentials. The cookie is thenresent by the browser when accessing that part of the site, and soproves your identity.

Mirroring such a site requires Wget to send the same cookies yourbrowser sends when communicating with the site. This is achieved by‘--load-cookies’—simply point Wget to the location of thecookies.txt file, and it will send the same cookies your browserwould send in the same situation. Different browsers keep textualcookie files in different locations:

Netscape 4.x.
The cookies are in ~/.netscape/cookies.txt.
Mozilla and Netscape 6.x.
Mozilla's cookie file is also named cookies.txt, locatedsomewhere under ~/.mozilla, in the directory of your profile. The full path usually ends up looking somewhat like ~/.mozilla/default/some-weird-string /cookies.txt.
Internet Explorer.
You can produce a cookie file Wget can use by using the File menu,Import and Export, Export Cookies. This has been tested with InternetExplorer 5; it is not guaranteed to work with earlier versions.
Other browsers.
If you are using a different browser to create your cookies,‘ --load-cookies’ will only work if you can locate or produce acookie file in the Netscape format that Wget expects.

If you cannot use ‘--load-cookies’, there might still be analternative. If your browser supports a “cookie manager”, you can useit to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wgetto send those cookies, bypassing the “official” cookie support:

          wget --no-cookies --header "Cookie: name=value"


--save-cookies file’
Save cookies to file before exiting. This will not save cookiesthat have expired or that have no expiry time (so-called “sessioncookies”), but also see ‘ --keep-session-cookies’.


--keep-session-cookies
When specified, causes ‘ --save-cookies’ to also save sessioncookies. Session cookies are normally not saved because they aremeant to be kept in memory and forgotten when you exit the browser. Saving them is useful on sites that require you to log in or to visitthe home page before you can access some pages. With this option,multiple Wget runs are considered a single browser session as far asthe site is concerned.

Since the cookie file format does not normally carry session cookies,Wget marks them with an expiry timestamp of 0. Wget's‘--load-cookies’ recognizes those as session cookies, but it mightconfuse other browsers. Also note that cookies so loaded will betreated as other session cookies, which means that if you want‘--save-cookies’ to preserve them again, you must use‘--keep-session-cookies’ again.


--ignore-length
Unfortunately, some http servers ( cgi programs, to be moreprecise) send out bogus Content-Length headers, which makes Wgetgo wild, as it thinks not all the document was retrieved. You can spotthis syndrome if Wget retries getting the same document again and again,each time claiming that the (otherwise normal) connection has closed onthe very same byte.

With this option, Wget will ignore the Content-Length header—asif it never existed.


--header=header-line’
Send header-line along with the rest of the headers in each http request. The supplied header is sent as-is, which means itmust contain name and value separated by colon, and must not containnewlines.

You may define more than one additional header by specifying‘--header’ more than once.

          wget --header='Accept-Charset: iso-8859-2' \
               --header='Accept-Language: hr'        \
                 http://fly.srk.fer.hr/

Specification of an empty string as the header value will clear allprevious user-defined headers.

As of Wget 1.10, this option can be used to override headers otherwisegenerated automatically. This example instructs Wget to connect tolocalhost, but to specify ‘foo.bar’ in theHost header:

          wget --header="Host: foo.bar" http://localhost/

In versions of Wget prior to 1.10 such use of ‘--header’ causedsending of duplicate headers.


--max-redirect=number’
Specifies the maximum number of redirections to follow for a resource. The default is 20, which is usually far more than necessary. However, onthose occasions where you want to allow more (or fewer), this is theoption to use.


--proxy-user=user’
--proxy-password=password’
Specify the username user and password password forauthentication on a proxy server. Wget will encode them using the basic authentication scheme.

Security considerations similar to those with ‘--http-password’pertain here as well.


--referer=url’
Include `Referer: url' header in HTTP request. Useful forretrieving documents with server-side processing that assume they arealways being retrieved by interactive web browsers and only come outproperly when Referer is set to one of the pages that point to them.


--save-headers
Save the headers sent by the http server to the file, preceding theactual contents, with an empty line as the separator.


-U agent-string’
--user-agent=agent-string’
Identify as agent-string to the http server.

The http protocol allows the clients to identify themselves using aUser-Agent header field. This enables distinguishing thewww software, usually for statistical purposes or for tracing ofprotocol violations. Wget normally identifies as‘Wget/version’,version being the current versionnumber of Wget.

However, some sites have been known to impose the policy of tailoringthe output according to theUser-Agent-supplied information. While this is not such a bad idea in theory, it has been abused byservers denying information to clients other than (historically)Netscape or, more frequently, Microsoft Internet Explorer. Thisoption allows you to change the User-Agent line issued by Wget. Use of this option is discouraged, unless you really know what you aredoing.

Specifying empty user agent with ‘--user-agent=""’ instructs Wgetnot to send theUser-Agent header in http requests.


--post-data=string’
--post-file=file’
Use POST as the method for all HTTP requests and send the specifieddata in the request body. ‘ --post-data’ sends string asdata, whereas ‘ --post-file’ sends the contents of file. Other than that, they work in exactly the same way. In particular,they both expect content of the form key1=value1&key2=value2,with percent-encoding for special characters; the only difference isthat one expects its content as a command-line parameter and the otheraccepts its content from a file. In particular, ‘ --post-file’ is not for transmitting files as form attachments: those mustappear as key=value data (with appropriate percent-coding) justlike everything else. Wget does not currently support multipart/form-data for transmitting POST data; only application/x-www-form-urlencoded. Only one of‘ --post-data’ and ‘ --post-file’ should be specified.

Please be aware that Wget needs to know the size of the POST data inadvance. Therefore the argument to--post-file must be a regularfile; specifying a FIFO or something like/dev/stdin won't work. It's not quite clear how to work around this limitation inherent inHTTP/1.0. Although HTTP/1.1 introduceschunked transfer thatdoesn't require knowing the request length in advance, a client can'tuse chunked unless it knows it's talking to an HTTP/1.1 server. And itcan't know that until it receives a response, which in turn requires therequest to have been completed – a chicken-and-egg problem.

Note: if Wget is redirected after the POST request is completed, itwill not send the POST data to the redirected URL. This is becauseURLs that process POST often respond with a redirection to a regularpage, which does not desire or accept POST. It is not completelyclear that this behavior is optimal; if it doesn't work out, it mightbe changed in the future.

This example shows how to log to a server using POST and then proceed todownload the desired pages, presumably only accessible to authorizedusers:

          # Log in to the server.  This can be done only once.
          wget --save-cookies cookies.txt \
               --post-data 'user=foo&password=bar' \
               http://server.com/auth.php
          
          # Now grab the page or pages we care about.
          wget --load-cookies cookies.txt \
               -p http://server.com/interesting/article.php

If the server is using session cookies to track user authentication,the above will not work because ‘--save-cookies’ will not savethem (and neither will browsers) and thecookies.txt file willbe empty. In that case use ‘--keep-session-cookies’ along with‘--save-cookies’ to force saving of session cookies.


--content-disposition
If this is set to on, experimental (not fully-functional) support for Content-Disposition headers is enabled. This can currently result inextra round-trips to the server for a HEAD request, and is knownto suffer from a few bugs, which is why it is not currently enabled by default.

This option is useful for some file-downloading CGI programs that useContent-Disposition headers to describe what the name of adownloaded file should be.


--trust-server-names
If this is set to on, on a redirect the last component of theredirection URL will be used as the local file name. By default it isused the last component in the original URL.


--auth-no-challenge
If this option is given, Wget will send Basic HTTP authenticationinformation (plaintext username and password) for all requests, justlike Wget 1.10.2 and prior did by default.

Use of this option is not recommended, and is intended only to supportsome few obscure servers, which never send HTTP authenticationchallenges, but accept unsolicited auth info, say, in addition toform-based authentication.


Next:  FTP Options,Previous:  HTTP Options,Up:  Invoking

2.8 HTTPS (SSL/TLS) Options

To support encrypted HTTP (HTTPS) downloads, Wget must be compiledwith an external SSL library, currently OpenSSL. If Wget is compiledwithout SSL support, none of these options are available.

--secure-protocol=protocol’
Choose the secure protocol to be used. Legal values are ‘ auto’,‘ SSLv2’, ‘ SSLv3’, and ‘ TLSv1’. If ‘ auto’ is used,the SSL library is given the liberty of choosing the appropriateprotocol automatically, which is achieved by sending an SSLv2 greetingand announcing support for SSLv3 and TLSv1. This is the default.

Specifying ‘SSLv2’, ‘SSLv3’, or ‘TLSv1’ forces the useof the corresponding protocol. This is useful when talking to old andbuggy SSL server implementations that make it hard for OpenSSL tochoose the correct protocol version. Fortunately, such servers arequite rare.


--no-check-certificate
Don't check the server certificate against the available certificateauthorities. Also don't require the URL host name to match the commonname presented by the certificate.

As of Wget 1.10, the default is to verify the server's certificateagainst the recognized certificate authorities, breaking the SSLhandshake and aborting the download if the verification fails. Although this provides more secure downloads, it does breakinteroperability with some sites that worked with previous Wgetversions, particularly those using self-signed, expired, or otherwiseinvalid certificates. This option forces an “insecure” mode ofoperation that turns the certificate verification errors into warningsand allows you to proceed.

If you encounter “certificate verification” errors or ones sayingthat “common name doesn't match requested host name”, you can usethis option to bypass the verification and proceed with the download.Only use this option if you are otherwise convinced of thesite's authenticity, or if you really don't care about the validity ofits certificate. It is almost always a bad idea not to check thecertificates when transmitting confidential or important data.


--certificate=file’
Use the client certificate stored in file. This is needed forservers that are configured to require certificates from the clientsthat connect to them. Normally a certificate is not required and thisswitch is optional.


--certificate-type=type’
Specify the type of the client certificate. Legal values are‘ PEM’ (assumed by default) and ‘ DER’, also known as‘ ASN1’.
--private-key=file’
Read the private key from file. This allows you to provide theprivate key in a file separate from the certificate.
--private-key-type=type’
Specify the type of the private key. Accepted values are ‘ PEM’(the default) and ‘ DER’.
--ca-certificate=file’
Use file as the file with the bundle of certificate authorities(“CA”) to verify the peers. The certificates must be in PEM format.

Without this option Wget looks for CA certificates at thesystem-specified locations, chosen at OpenSSL installation time.


--ca-directory=directory’
Specifies directory containing CA certificates in PEM format. Eachfile contains one CA certificate, and the file name is based on a hashvalue derived from the certificate. This is achieved by processing acertificate directory with the c_rehash utility supplied withOpenSSL. Using ‘ --ca-directory’ is more efficient than‘ --ca-certificate’ when many certificates are installed becauseit allows Wget to fetch certificates on demand.

Without this option Wget looks for CA certificates at thesystem-specified locations, chosen at OpenSSL installation time.


--random-file=file’
Use file as the source of random data for seeding thepseudo-random number generator on systems without /dev/random.

On such systems the SSL library needs an external source of randomnessto initialize. Randomness may be provided by EGD (see‘--egd-file’ below) or read from an external source specified bythe user. If this option is not specified, Wget looks for random datain $RANDFILE or, if that is unset, in$HOME/.rnd. Ifnone of those are available, it is likely that SSL encryption will notbe usable.

If you're getting the “Could not seed OpenSSL PRNG; disabling SSL.”error, you should provide random data using some of the methodsdescribed above.


--egd-file=file’
Use file as the EGD socket. EGD stands for EntropyGathering Daemon, a user-space program that collects data fromvarious unpredictable system sources and makes it available to otherprograms that might need it. Encryption software, such as the SSLlibrary, needs sources of non-repeating randomness to seed the randomnumber generator used to produce cryptographically strong keys.

OpenSSL allows the user to specify his own source of entropy using theRAND_FILE environment variable. If this variable is unset, orif the specified file does not produce enough randomness, OpenSSL willread random data from EGD socket specified using this option.

If this option is not specified (and the equivalent startup command isnot used), EGD is never contacted. EGD is not needed on modern Unixsystems that support/dev/random.


Next:  Recursive Retrieval Options,Previous:  HTTPS (SSL/TLS) Options,Up:  Invoking

2.9 FTP Options

--ftp-user=user’
--ftp-password=password’
Specify the username user and password password on an ftp server. Without this, or the corresponding startup option,the password defaults to ‘ -wget@’, normally used for anonymousFTP.

Another way to specify username and password is in the url itself(seeURL Format). Either method reveals your password to anyone whobothers to runps. To prevent the passwords from being seen,store them in .wgetrc or.netrc, and make sure to protectthose files from other users withchmod. If the passwords arereally important, do not leave them lying in those files either—editthe files and delete them after Wget has started the download.


--no-remove-listing
Don't remove the temporary .listing files generated by ftpretrievals. Normally, these files contain the raw directory listingsreceived from ftp servers. Not removing them can be useful fordebugging purposes, or when you want to be able to easily check on thecontents of remote server directories (e.g. to verify that a mirroryou're running is complete).

Note that even though Wget writes to a known filename for this file,this is not a security hole in the scenario of a user making.listing a symbolic link to/etc/passwd or something andasking root to run Wget in his or her directory. Depending onthe options used, either Wget will refuse to write to.listing,making the globbing/recursion/time-stamping operation fail, or thesymbolic link will be deleted and replaced with the actual.listing file, or the listing will be written to a.listing.number file.

Even though this situation isn't a problem, though, root shouldnever run Wget in a non-trusted user's directory. A user could dosomething as simple as linkingindex.html to /etc/passwdand askingroot to run Wget with ‘-N’ or ‘-r’ so the filewill be overwritten.


--no-glob
Turn off ftp globbing. Globbing refers to the use of shell-likespecial characters ( wildcards), like ‘ *’, ‘ ?’, ‘ [’and ‘ ]’ to retrieve more than one file from the same directory atonce, like:
          wget ftp://gnjilux.srk.fer.hr/*.msg

By default, globbing will be turned on if the url contains aglobbing character. This option may be used to turn globbing on or offpermanently.

You may have to quote the url to protect it from being expanded byyour shell. Globbing makes Wget look for a directory listing, which issystem-specific. This is why it currently works only with Unixftpservers (and the ones emulating Unix ls output).


--no-passive-ftp
Disable the use of the passive FTP transfer mode. Passive FTPmandates that the client connect to the server to establish the dataconnection rather than the other way around.

If the machine is connected to the Internet directly, both passive andactive FTP should work equally well. Behind most firewall and NATconfigurations passive FTP has a better chance of working. However,in some rare firewall configurations, active FTP actually works whenpassive FTP doesn't. If you suspect this to be the case, use thisoption, or setpassive_ftp=off in your init file.


--retr-symlinks
Usually, when retrieving ftp directories recursively and a symboliclink is encountered, the linked-to file is not downloaded. Instead, amatching symbolic link is created on the local filesystem. Thepointed-to file will not be downloaded unless this recursive retrievalwould have encountered it separately and downloaded it anyway.

When ‘--retr-symlinks’ is specified, however, symbolic links aretraversed and the pointed-to files are retrieved. At this time, thisoption does not cause Wget to traverse symlinks to directories andrecurse through them, but in the future it should be enhanced to dothis.

Note that when retrieving a file (not a directory) because it wasspecified on the command-line, rather than because it was recursed to,this option has no effect. Symbolic links are always traversed in thiscase.


Next:  Recursive Accept/Reject Options,Previous:  FTP Options,Up:  Invoking

2.10 Recursive Retrieval Options

-r
--recursive
Turn on recursive retrieving. See Recursive Download, for moredetails. The default maximum depth is 5.
-l depth’
--level=depth’
Specify recursion maximum depth level depth (see Recursive Download).


--delete-after
This option tells Wget to delete every single file it downloads, after having done so. It is useful for pre-fetching popularpages through a proxy, e.g.:
          wget -r -nd --delete-after http://whatever.com/~popular/page/

The ‘-r’ option is to retrieve recursively, and ‘-nd’ to notcreate directories.

Note that ‘--delete-after’ deletes files on the local machine. Itdoes not issue the ‘DELE’ command to remote FTP sites, forinstance. Also note that when ‘--delete-after’ is specified,‘--convert-links’ is ignored, so ‘.orig’ files are simply notcreated in the first place.


-k
--convert-links
After the download is complete, convert the links in the document tomake them suitable for local viewing. This affects not only the visiblehyperlinks, but any part of the document that links to external content,such as embedded images, links to style sheets, hyperlinks to non- htmlcontent, etc.

Each link will be changed in one of the two ways:

  • The links to files that have been downloaded by Wget will be changed torefer to the file they point to as a relative link.

    Example: if the downloaded file /foo/doc.html links to/bar/img.gif, also downloaded, then the link indoc.htmlwill be modified to point to ‘../bar/img.gif’. This kind oftransformation works reliably for arbitrary combinations of directories.

  • The links to files that have not been downloaded by Wget will be changedto include host name and absolute path of the location they point to.

    Example: if the downloaded file /foo/doc.html links to/bar/img.gif (or to../bar/img.gif), then the link indoc.html will be modified to point tohttp://hostname/bar/img.gif.

Because of this, local browsing works reliably: if a linked file wasdownloaded, the link will refer to its local name; if it was notdownloaded, the link will refer to its full Internet address rather thanpresenting a broken link. The fact that the former links are convertedto relative links ensures that you can move the downloaded hierarchy toanother directory.

Note that only at the end of the download can Wget know which links havebeen downloaded. Because of that, the work done by ‘-k’ will beperformed at the end of all the downloads.


-K
--backup-converted
When converting a file, back up the original version with a ‘ .orig’suffix. Affects the behavior of ‘ -N’ (see HTTP Time-Stamping Internals).
-m
--mirror
Turn on options suitable for mirroring. This option turns on recursionand time-stamping, sets infinite recursion depth and keeps ftpdirectory listings. It is currently equivalent to‘ -r -N -l inf --no-remove-listing’.


-p
--page-requisites
This option causes Wget to download all the files that are necessary toproperly display a given html page. This includes such things asinlined images, sounds, and referenced stylesheets.

Ordinarily, when downloading a single html page, any requisite documentsthat may be needed to display it properly are not downloaded. Using‘-r’ together with ‘-l’ can help, but since Wget does notordinarily distinguish between external and inlined documents, one isgenerally left with “leaf documents” that are missing theirrequisites.

For instance, say document 1.html contains an tagreferencing 1.gif and an tag pointing to externaldocument 2.html. Say that2.html is similar but that itsimage is 2.gif and it links to 3.html. Say thiscontinues up to some arbitrarily high number.

If one executes the command:

          wget -r -l 2 http://site/1.html

then 1.html, 1.gif,2.html, 2.gif, and3.html will be downloaded. As you can see,3.html iswithout its requisite 3.gif because Wget is simply counting thenumber of hops (up to 2) away from1.html in order to determinewhere to stop the recursion. However, with this command:

          wget -r -l 2 -p http://site/1.html

all the above files and 3.html's requisite3.gifwill be downloaded. Similarly,

          wget -r -l 1 -p http://site/1.html

will cause 1.html, 1.gif,2.html, and 2.gifto be downloaded. One might think that:

          wget -r -l 0 -p http://site/1.html

would download just 1.html and 1.gif, but unfortunatelythis is not the case, because ‘-l 0’ is equivalent to‘-l inf’—that is, infinite recursion. To download a single htmlpage (or a handful of them, all specified on the command-line or in a‘-iurl input file) and its (or their) requisites, simply leave off‘-r’ and ‘-l’:

          wget -p http://site/1.html

Note that Wget will behave as if ‘-r’ had been specified, but onlythat single page and its requisites will be downloaded. Links from thatpage to external documents will not be followed. Actually, to downloada single page and all its requisites (even if they exist on separatewebsites), and make sure the lot displays properly locally, this authorlikes to use a few options in addition to ‘-p’:

          wget -E -H -k -K -p http://site/document

To finish off this topic, it's worth knowing that Wget's idea of anexternal document link is any URL specified in an tag, an tag, or a tag other than.


--strict-comments
Turn on strict parsing of html comments. The default is to terminatecomments at the first occurrence of ‘ -->’.

According to specifications, html comments are expressed assgmldeclarations. Declaration is special markup that begins with‘’ and ends with ‘>’, such as ‘’, thatmay contain comments between a pair of ‘--’ delimiters.htmlcomments are “empty declarations”, sgml declarations without anynon-comment text. Therefore, ‘’ is a valid comment, andso is ‘’, but ‘’ is not.

On the other hand, most html writers don't perceive comments as anythingother than text delimited with ‘’, which is notquite the same. For example, something like ‘’works as a valid comment as long as the number of dashes is a multipleof four (!). If not, the comment technically lasts until the next‘--’, which may be at the other end of the document. Because ofthis, many popular browsers completely ignore the specification andimplement what users have come to expect: comments delimited with‘’.

Until version 1.9, Wget interpreted comments strictly, which resulted inmissing links in many web pages that displayed fine in browsers, but hadthe misfortune of containing non-compliant comments. Beginning withversion 1.9, Wget has joined the ranks of clients that implements“naive” comments, terminating each comment at the first occurrence of‘-->’.

If, for whatever reason, you want strict comment parsing, use thisoption to turn it on.


Next:  Exit Status,Previous:  Recursive Retrieval Options,Up:  Invoking

2.11 Recursive Accept/Reject Options

-A acclist --acceptacclist’
-R rejlist --rejectrejlist’
Specify comma-separated lists of file name suffixes or patterns toaccept or reject (see Types of Files). Note that ifany of the wildcard characters, ‘ *’, ‘ ?’, ‘ [’ or‘ ]’, appear in an element of acclist or rejlist,it will be treated as a pattern, rather than a suffix.
-D domain-list’
--domains=domain-list’
Set domains to be followed. domain-list is a comma-separated listof domains. Note that it does not turn on ‘ -H’.
--exclude-domains domain-list’
Specify the domains that are not to be followed(see Spanning Hosts).


--follow-ftp
Follow ftp links from html documents. Without this option,Wget will ignore all the ftp links.


--follow-tags=list’
Wget has an internal table of html tag / attribute pairs that itconsiders when looking for linked documents during a recursiveretrieval. If a user wants only a subset of those tags to beconsidered, however, he or she should be specify such tags in acomma-separated list with this option.
--ignore-tags=list’
This is the opposite of the ‘ --follow-tags’ option. To skipcertain html tags when recursively looking for documents to download,specify them in a comma-separated list.

In the past, this option was the best bet for downloading a single pageand its requisites, using a command-line like:

          wget --ignore-tags=a,area -H -k -K -r http://site/document

However, the author of this option came across a page with tags like and came to the realization thatspecifying tags to ignore was not enough. One can't just tell Wget toignore, because then stylesheets will not be downloaded. Now the best bet for downloading a single page and its requisites is thededicated ‘--page-requisites’ option.


--ignore-case
Ignore case when matching files and directories. This influences thebehavior of -R, -A, -I, and -X options, as well as globbingimplemented when downloading from FTP sites. For example, with thisoption, ‘ -A *.txt’ will match ‘ file1.txt’, but also‘ file2.TXT’, ‘ file3.TxT’, and so on.
-H
--span-hosts
Enable spanning across hosts when doing recursive retrieving(see Spanning Hosts).
-L
--relative
Follow relative links only. Useful for retrieving a specific home pagewithout any distractions, not even those from the same hosts(see Relative Links).
-I list’
--include-directories=list’
Specify a comma-separated list of directories you wish to follow whendownloading (see Directory-Based Limits). Elementsof list may contain wildcards.
-X list’
--exclude-directories=list’
Specify a comma-separated list of directories you wish to exclude fromdownload (see Directory-Based Limits). Elements of list may contain wildcards.
-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.


Previous:  Recursive Accept/Reject Options,Up:  Invoking

2.12 Exit Status

Wget may return one of several error codes if it encounters problems.

0
No problems occurred.
1
Generic error code.
2
Parse error—for instance, when parsing command-line options, the‘ .wgetrc’ or ‘ .netrc’...
3
File I/O error.
4
Network failure.
5
SSL verification failure.
6
Username/password authentication failure.
7
Protocol errors.
8
Server issued an error response.

With the exceptions of 0 and 1, the lower-numbered exit codes takeprecedence over higher-numbered ones, when multiple types of errorsare encountered.

In versions of Wget prior to 1.12, Wget's exit status tended to beunhelpful and inconsistent. Recursive downloads would virtually alwaysreturn 0 (success), regardless of any issues encountered, andnon-recursive fetches only returned the status corresponding to themost recently-attempted download.


Next:  Following Links,Previous:  Invoking,Up:  Top

3 Recursive Download

GNU Wget is capable of traversing parts of the Web (or a singlehttp orftp server), following links and directory structure. We refer to this as torecursive retrieval, or recursion.

With http urls, Wget retrieves and parses thehtml orcss from the given url, retrieving the files the documentrefers to, through markup like href or src, or cssuri values specified using the ‘url()’ functional notation. If the freshly downloaded file is also of typetext/html,application/xhtml+xml, or text/css, it will be parsedand followed further.

Recursive retrieval of http and html/css content isbreadth-first. This means that Wget first downloads the requesteddocument, then the documents linked from that document, then thedocuments linked by them, and so on. In other words, Wget firstdownloads the documents at depth 1, then those at depth 2, and so onuntil the specified maximum depth.

The maximum depth to which the retrieval may descend is specifiedwith the ‘-l’ option. The default maximum depth is five layers.

When retrieving an ftp url recursively, Wget will retrieve allthe data from the given directory tree (including the subdirectories upto the specified depth) on the remote server, creating its mirror imagelocally.ftp retrieval is also limited by the depthparameter. Unlikehttp recursion, ftp recursion is performeddepth-first.

By default, Wget will create a local directory tree, corresponding tothe one found on the remote server.

Recursive retrieving can find a number of applications, the mostimportant of which is mirroring. It is also useful forwwwpresentations, and any other opportunities where slow networkconnections should be bypassed by storing the files locally.

You should be warned that recursive downloads can overload the remoteservers. Because of that, many administrators frown upon them and mayban access from your site if they detect very fast downloads of bigamounts of content. When downloading from Internet servers, considerusing the ‘-w’ option to introduce a delay between accesses to theserver. The download will take a while longer, but the serveradministrator will not be alarmed by your rudeness.

Of course, recursive download may cause problems on your machine. Ifleft to run unchecked, it can easily fill up the disk. If downloadingfrom local network, it can also take bandwidth on the system, as well asconsume memory and CPU.

Try to specify the criteria that match the kind of download you aretrying to achieve. If you want to download only one page, use‘--page-requisites’ without any additional recursion. If you wantto download things under one directory, use ‘-np’ to avoiddownloading things from other directories. If you want to download allthe files from one directory, use ‘-l 1’ to make sure the recursiondepth never exceeds one. See Following Links, for more informationabout this.

Recursive retrieval should be used with care. Don't say you were notwarned.


Next:  Time-Stamping,Previous:  Recursive Download,Up:  Top

4 Following Links

When retrieving recursively, one does not wish to retrieve loads ofunnecessary data. Most of the time the users bear in mind exactly whatthey want to download, and want Wget to follow only specific links.

For example, if you wish to download the music archive from‘fly.srk.fer.hr’, you will not want to download all the home pagesthat happen to be referenced by an obscure part of the archive.

Wget possesses several mechanisms that allows you to fine-tune whichlinks it will follow.


Next:  Types of Files,Previous:  Following Links,Up:  Following Links

4.1 Spanning Hosts

Wget's recursive retrieval normally refuses to visit hosts differentthan the one you specified on the command line. This is a reasonabledefault; without it, every retrieval would have the potential to turnyour Wget into a small version of google.

However, visiting different hosts, or host spanning, is sometimesa useful option. Maybe the images are served from a different server. Maybe you're mirroring a site that consists of pages interlinked betweenthree servers. Maybe the server has two equivalent names, and the htmlpages refer to both interchangeably.

Span to any host—‘ -H
The ‘ -H’ option turns on host spanning, thus allowing Wget'srecursive run to visit any host referenced by a link. Unless sufficientrecursion-limiting criteria are applied depth, these foreign hosts willtypically link to yet more hosts, and so on until Wget ends up suckingup much more data than you have intended.
Limit spanning to certain domains—‘ -D
The ‘ -D’ option allows you to specify the domains that will befollowed, thus limiting the recursion only to the hosts that belong tothese domains. Obviously, this makes sense only in conjunction with‘ -H’. A typical example would be downloading the contents of‘ www.server.com’, but allowing downloads from‘ images.server.com’, etc.:
          wget -rH -Dserver.com http://www.server.com/

You can specify more than one address by separating them with a comma,e.g. ‘-Ddomain1.com,domain2.com’.

Keep download off certain domains—‘ --exclude-domains
If there are domains you want to exclude specifically, you can do itwith ‘ --exclude-domains’, which accepts the same type of argumentsof ‘ -D’, but will exclude all the listed domains. Forexample, if you want to download all the hosts from ‘ foo.edu’domain, with the exception of ‘ sunsite.foo.edu’, you can do it likethis:
          wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \
              http://www.foo.edu/


Next:  Directory-Based Limits,Previous:  Spanning Hosts,Up:  Following Links

4.2 Types of Files

When downloading material from the web, you will often want to restrictthe retrieval to only certain file types. For example, if you areinterested in downloadinggifs, you will not be overjoyed to getloads of PostScript documents, and vice versa.

Wget offers two options to deal with this problem. Each optiondescription lists a short name, a long name, and the equivalent commandin.wgetrc.

-A acclist’
--accept acclist’
accept = acclist’
The argument to ‘ --accept’ option is a list of file suffixes orpatterns that Wget will download during recursive retrieval. A suffixis the ending part of a file, and consists of “normal” letters,e.g. ‘ gif’ or ‘ .jpg’. A matching pattern contains shell-likewildcards, e.g. ‘ books*’ or ‘ zelazny*196[0-9]*’.

So, specifying ‘wget -A gif,jpg’ will make Wget download only thefiles ending with ‘gif’ or ‘jpg’, i.e.gifs andjpegs. On the other hand, ‘wget -A "zelazny*196[0-9]*"’ willdownload only files beginning with ‘zelazny’ and containing numbersfrom 1960 to 1969 anywhere within. Look up the manual of your shell fora description of how pattern matching works.

Of course, any number of suffixes and patterns can be combined into acomma-separated list, and given as an argument to ‘-A’.


-R rejlist’
--reject rejlist’
reject = rejlist’
The ‘ --reject’ option works the same way as ‘ --accept’, onlyits logic is the reverse; Wget will download all files except theones matching the suffixes (or patterns) in the list.

So, if you want to download a whole page except for the cumbersomempegs and.au files, you can use ‘wget -R mpg,mpeg,au’. Analogously, to download all files except the ones beginning with‘bjork’, use ‘wget -R "bjork*"’. The quotes are to preventexpansion by the shell.

The ‘-A’ and ‘-R’ options may be combined to achieve evenbetter fine-tuning of which files to retrieve. E.g. ‘wget -A"*zelazny*" -R .ps’ will download all the files having ‘zelazny’ asa part of their name, butnot the PostScript files.

Note that these two options do not affect the downloading of htmlfiles (as determined by a ‘.htm’ or ‘.html’ filenameprefix). This behavior may not be desirable for all users, and may bechanged for future versions of Wget.

Note, too, that query strings (strings at the end of a URL beginningwith a question mark (‘?’) are not included as part of thefilename for accept/reject rules, even though these will actuallycontribute to the name chosen for the local file. It is expected thata future version of Wget will provide an option to allow matchingagainst query strings.

Finally, it's worth noting that the accept/reject lists are matchedtwice against downloaded files: once against the URL's filenameportion, to determine if the file should be downloaded in the firstplace; then, after it has been accepted and successfully downloaded,the local file's name is also checked against the accept/reject liststo see if it should be removed. The rationale was that, since‘.htm’ and ‘.html’ files are always downloaded regardless ofaccept/reject rules, they should be removed after beingdownloaded and scanned for links, if they did match the accept/rejectlists. However, this can lead to unexpected results, since the localfilenames can differ from the original URL filenames in the followingways, all of which can change whether an accept/reject rule matches:

  • If the local file already exists and ‘--no-directories’ wasspecified, a numeric suffix will be appended to the original name.
  • If ‘--adjust-extension’ was specified, the local filename might have‘.html’ appended to it. If Wget is invoked with ‘-E -A.php’,a filename such as ‘index.php’ will match be accepted, but upondownload will be named ‘index.php.html’, which no longer matches,and so the file will be deleted.
  • Query strings do not contribute to URL matching, but are included inlocal filenames, and sodo contribute to filename matching.

This behavior, too, is considered less-than-desirable, and may changein a future version of Wget.


Next:  Relative Links,Previous:  Types of Files,Up:  Following Links

4.3 Directory-Based Limits

Regardless of other link-following facilities, it is often useful toplace the restriction of what files to retrieve based on the directoriesthose files are placed in. There can be many reasons for this—thehome pages may be organized in a reasonable directory structure; or somedirectories may contain useless information, e.g./cgi-bin or/dev directories.

Wget offers three different options to deal with this requirement. Eachoption description lists a short name, a long name, and the equivalentcommand in.wgetrc.

-I list’
--include list’
include_directories = list’
-I’ option accepts a comma-separated list of directories includedin the retrieval. Any other directories will simply be ignored. Thedirectories are absolute paths.

So, if you wish to download from ‘http://host/people/bozo/’following only links to bozo's colleagues in the/peopledirectory and the bogus scripts in /cgi-bin, you can specify:

          wget -I /people,/cgi-bin http://host/people/bozo/


-X list’
--exclude list’
exclude_directories = list’
-X’ option is exactly the reverse of ‘ -I’—this is a list ofdirectories excluded from the download. E.g. if you do not wantWget to download things from /cgi-bin directory, specify ‘ -X/cgi-bin’ on the command line.

The same as with ‘-A’/‘-R’, these two options can be combinedto get a better fine-tuning of downloading subdirectories. E.g. if youwant to load all the files from/pub hierarchy except for/pub/worthless, specify ‘-I/pub -X/pub/worthless’.


-np
--no-parent
no_parent = on
The simplest, and often very useful way of limiting directories isdisallowing retrieval of the links that refer to the hierarchy above than the beginning directory, i.e. disallowing ascent to theparent directory/directories.

The ‘--no-parent’ option (short ‘-np’) is useful in this case. Using it guarantees that you will never leave the existing hierarchy. Supposing you issue Wget with:

          wget -r --no-parent http://somehost/~luzer/my-archive/

You may rest assured that none of the references to/~his-girls-homepage/ or/~luzer/all-my-mpegs/ will befollowed. Only the archive you are interested in will be downloaded. Essentially, ‘--no-parent’ is similar to‘-I/~luzer/my-archive’, only it handles redirections in a moreintelligent fashion.

Note that, for HTTP (and HTTPS), the trailing slash is veryimportant to ‘--no-parent’. HTTP has no concept of a “directory”—Wgetrelies on you to indicate what's a directory and what isn't. In‘http://foo/bar/’, Wget will consider ‘bar’ to be adirectory, while in ‘http://foo/bar’ (no trailing slash),‘bar’ will be considered a filename (so ‘--no-parent’ would bemeaningless, as its parent is ‘/’).


Next:  FTP Links,Previous:  Directory-Based Limits,Up:  Following Links

4.4 Relative Links

When ‘-L’ is turned on, only the relative links are ever followed. Relative links are here defined those that do not refer to the webserver root. For example, these links are relative:

     
     
     

These links are not relative:

     
     
     

Using this option guarantees that recursive retrieval will not spanhosts, even without ‘-H’. In simple cases it also allows downloadsto “just work” without having to convert links.

This option is probably not very useful and might be removed in a futurerelease.


Previous:  Relative Links,Up:  Following Links

4.5 Following FTP Links

The rules for ftp are somewhat specific, as it is necessary forthem to be.ftp links in html documents are often includedfor purposes of reference, and it is often inconvenient to download themby default.

To have ftp links followed from html documents, you need tospecify the ‘--follow-ftp’ option. Having done that,ftplinks will span hosts regardless of ‘-H’ setting. This is logical,asftp links rarely point to the same host where the httpserver resides. For similar reasons, the ‘-L’ options has noeffect on such downloads. On the other hand, domain acceptance(‘-D’) and suffix rules (‘-A’ and ‘-R’) apply normally.

Also note that followed links to ftp directories will not beretrieved recursively further.


Next:  Startup File,Previous:  Following Links,Up:  Top

5 Time-Stamping

One of the most important aspects of mirroring information from theInternet is updating your archives.

Downloading the whole archive again and again, just to replace a fewchanged files is expensive, both in terms of wasted bandwidth and money,and the time to do the update. This is why all the mirroring toolsoffer the option of incremental updating.

Such an updating mechanism means that the remote server is scanned insearch ofnew files. Only those new files will be downloaded inthe place of the old ones.

A file is considered new if one of these two conditions are met:

  1. A file of that name does not already exist locally.
  2. A file of that name does exist, but the remote file was modified morerecently than the local file.

To implement this, the program needs to be aware of the time of lastmodification of both local and remote files. We call this information thetime-stamp of a file.

The time-stamping in GNU Wget is turned on using ‘--timestamping’(‘-N’) option, or throughtimestamping = on directive in.wgetrc. With this option, for each file it intends to download,Wget will check whether a local file of the same name exists. If itdoes, and the remote file is not newer, Wget will not download it.

If the local file does not exist, or the sizes of the files do notmatch, Wget will download the remote file no matter what the time-stampssay.


Next:  HTTP Time-Stamping Internals,Previous:  Time-Stamping,Up:  Time-Stamping

5.1 Time-Stamping Usage

The usage of time-stamping is simple. Say you would like to download afile so that it keeps its date of modification.

     wget -S http://www.gnu.ai.mit.edu/

A simple ls -l shows that the time stamp on the local file equalsthe state of theLast-Modified header, as returned by the server. As you can see, the time-stamping info is preserved locally, evenwithout ‘-N’ (at least forhttp).

Several days later, you would like Wget to check if the remote file haschanged, and download it if it has.

     wget -N http://www.gnu.ai.mit.edu/

Wget will ask the server for the last-modified date. If the local filehas the same timestamp as the server, or a newer one, the remote filewill not be re-fetched. However, if the remote file is more recent,Wget will proceed to fetch it.

The same goes for ftp. For example:

     wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"

(The quotes around that URL are to prevent the shell from trying tointerpret the ‘*’.)

After download, a local directory listing will show that the timestampsmatch those on the remote server. Reissuing the command with ‘-N’will make Wget re-fetchonly the files that have been modifiedsince the last download.

If you wished to mirror the GNU archive every week, you would use acommand like the following, weekly:

     wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/

Note that time-stamping will only work for files for which the servergives a timestamp. Forhttp, this depends on getting aLast-Modified header. Forftp, this depends on getting adirectory listing with dates in a format that Wget can parse(seeFTP Time-Stamping Internals).


Next:  FTP Time-Stamping Internals,Previous:  Time-Stamping Usage,Up:  Time-Stamping

5.2 HTTP Time-Stamping Internals

Time-stamping in http is implemented by checking of theLast-Modified header. If you wish to retrieve the filefoo.html throughhttp, Wget will check whetherfoo.html exists locally. If it doesn't,foo.html will beretrieved unconditionally.

If the file does exist locally, Wget will first check its localtime-stamp (similar to the wayls -l checks it), and then send aHEAD request to the remote server, demanding the information onthe remote file.

The Last-Modified header is examined to find which file wasmodified more recently (which makes it “newer”). If the remote fileis newer, it will be downloaded; if it is older, Wget will giveup.2

When ‘--backup-converted’ (‘-K’) is specified in conjunctionwith ‘-N’, server file ‘X’ is compared to local file‘X.orig’, if extant, rather than being compared to local file‘X’, which will always differ if it's been converted by‘--convert-links’ (‘-k’).

Arguably, http time-stamping should be implemented using theIf-Modified-Since request.


Previous:  HTTP Time-Stamping Internals,Up:  Time-Stamping

5.3 FTP Time-Stamping Internals

In theory, ftp time-stamping works much the same ashttp, onlyftp has no headers—time-stamps must be ferreted out of directorylistings.

If an ftp download is recursive or uses globbing, Wget will use theftpLIST command to get a file listing for the directorycontaining the desired file(s). It will try to analyze the listing,treating it like Unixls -l output, extracting the time-stamps. The rest is exactly the same as forhttp. Note that whenretrieving individual files from an ftp server without usingglobbing or recursion, listing files will not be downloaded (and thusfiles will not be time-stamped) unless ‘-N’ is specified.

Assumption that every directory listing is a Unix-style listing maysound extremely constraining, but in practice it is not, as manynon-Unixftp servers use the Unixoid listing format because most(all?) of the clients understand it. Bear in mind thatrfc959defines no standard way to get a file list, let alone the time-stamps. We can only hope that a future standard will define this.

Another non-standard solution includes the use of MDTM commandthat is supported by someftp servers (including the popularwu-ftpd), which returns the exact time of the specified file. Wget may support this command in the future.


Next:  Examples,Previous:  Time-Stamping,Up:  Top

6 Startup File

Once you know how to change default settings of Wget through commandline arguments, you may wish to make some of those settings permanent. You can do that in a convenient way by creating the Wget startupfile—.wgetrc.

Besides .wgetrc is the “main” initialization file, it isconvenient to have a special facility for storing passwords. Thus Wgetreads and interprets the contents of$HOME/.netrc, if it findsit. You can find .netrc format in your system manuals.

Wget reads .wgetrc upon startup, recognizing a limited set ofcommands.


Next:  Wgetrc Syntax,Previous:  Startup File,Up:  Startup File

6.1 Wgetrc Location

When initializing, Wget will look for aglobal startup file,/usr/local/etc/wgetrc by default (or some prefix other than/usr/local, if Wget was not installed there) and read commandsfrom there, if it exists.

Then it will look for the user's file. If the environmental variableWGETRC is set, Wget will try to load that file. Failing that, nofurther attempts will be made.

If WGETRC is not set, Wget will try to load $HOME/.wgetrc.

The fact that user's settings are loaded after the system-wide onesmeans that in case of collision user's wgetrcoverrides thesystem-wide wgetrc (in /usr/local/etc/wgetrc by default). Fascist admins, away!


Next:  Wgetrc Commands,Previous:  Wgetrc Location,Up:  Startup File

6.2 Wgetrc Syntax

The syntax of a wgetrc command is simple:

     variable = value

The variable will also be called command. Validvalues are different for different commands.

The commands are case-insensitive and underscore-insensitive. Thus‘DIr__PrefiX’ is the same as ‘dirprefix’. Empty lines, linesbeginning with ‘#’ and lines containing white-space only arediscarded.

Commands that expect a comma-separated list will clear the list on anempty command. So, if you wish to reset the rejection list specified inglobalwgetrc, you can do it with:

     reject =


Next:  Sample Wgetrc,Previous:  Wgetrc Syntax,Up:  Startup File

6.3 Wgetrc Commands

The complete set of commands is listed below. Legal values are listedafter the ‘=’. Simple Boolean values can be set or unset using‘on’ and ‘off’ or ‘1’ and ‘0’.

Some commands take pseudo-arbitrary values. address values can behostnames or dotted-quad IP addresses.n can be any positiveinteger, or ‘inf’ for infinity, where appropriate.stringvalues can be any non-empty string.

Most of these commands have direct command-line equivalents. Also, anywgetrc command can be specified on the command line using the‘--execute’ switch (seeBasic Startup Options.)

accept/reject = string
Same as ‘ -A’/‘ -R’ (see Types of Files).
add_hostdir = on/off
Enable/disable host-prefixed file names. ‘ -nH’ disables it.
ask_password = on/off
Prompt for a password for each connection established. Cannot be specifiedwhen ‘ --password’ is being used, because they are mutuallyexclusive. Equivalent to ‘ --ask-password’.
auth_no_challenge = on/off
If this option is given, Wget will send Basic HTTP authenticationinformation (plaintext username and password) for all requests. See‘ --auth-no-challenge’.
background = on/off
Enable/disable going to background—the same as ‘ -b’ (whichenables it).
backup_converted = on/off
Enable/disable saving pre-converted files with the suffix‘ .orig’—the same as ‘ -K’ (which enables it).
base = string
Consider relative urls in input files (specified via the‘ input’ command or the ‘ --input-file’/‘ -i’ option,together with ‘ force_html’ or ‘ --force-html’)as being relative to string—the same as ‘ --base=string’.
bind_address = address
Bind to address, like the ‘ --bind-address=address’.
ca_certificate = file
Set the certificate authority bundle file to file. The sameas ‘ --ca-certificate=file’.
ca_directory = directory
Set the directory used for certificate authorities. The same as‘ --ca-directory=directory’.
cache = on/off
When set to off, disallow server-caching. See the ‘ --no-cache’option.
certificate = file
Set the client certificate file name to file. The same as‘ --certificate=file’.
certificate_type = string
Specify the type of the client certificate, legal values being‘ PEM’ (the default) and ‘ DER’ (aka ASN1). The same as‘ --certificate-type=string’.
check_certificate = on/off
If this is set to off, the server certificate is not checked againstthe specified client authorities. The default is “on”. The same as‘ --check-certificate’.
connect_timeout = n
Set the connect timeout—the same as ‘ --connect-timeout’.
content_disposition = on/off
Turn on recognition of the (non-standard) ‘ Content-Disposition’HTTP header—if set to ‘ on’, the same as ‘ --content-disposition’.
trust_server_names = on/off
If set to on, use the last component of a redirection URL for the localfile name.
continue = on/off
If set to on, force continuation of preexistent partially retrievedfiles. See ‘ -c’ before setting it.
convert_links = on/off
Convert non-relative links locally. The same as ‘ -k’.
cookies = on/off
When set to off, disallow cookies. See the ‘ --cookies’ option.
cut_dirs = n
Ignore n remote directory components. Equivalent to‘ --cut-dirs=n’.
debug = on/off
Debug mode, same as ‘ -d’.
default_page = string
Default page name—the same as ‘ --default-page=string’.
delete_after = on/off
Delete after download—the same as ‘ --delete-after’.
dir_prefix = string
Top of directory tree—the same as ‘ -P string’.
dirstruct = on/off
Turning dirstruct on or off—the same as ‘ -x’ or ‘ -nd’,respectively.
dns_cache = on/off
Turn DNS caching on/off. Since DNS caching is on by default, thisoption is normally used to turn it off and is equivalent to‘ --no-dns-cache’.
dns_timeout = n
Set the DNS timeout—the same as ‘ --dns-timeout’.
domains = string
Same as ‘ -D’ (see Spanning Hosts).
dot_bytes = n
Specify the number of bytes “contained” in a dot, as seen throughoutthe retrieval (1024 by default). You can postfix the value with‘ k’ or ‘ m’, representing kilobytes and megabytes,respectively. With dot settings you can tailor the dot retrieval tosuit your needs, or you can use the predefined styles(see Download Options).
dot_spacing = n
Specify the number of dots in a single cluster (10 by default).
dots_in_line = n
Specify the number of dots that will be printed in each line throughoutthe retrieval (50 by default).
egd_file = file
Use string as the EGD socket file name. The same as‘ --egd-file=file’.
exclude_directories = string
Specify a comma-separated list of directories you wish to exclude fromdownload—the same as ‘ -Xstring’ (see Directory-Based Limits).
exclude_domains = string
Same as ‘ --exclude-domains=string’ (see Spanning Hosts).
follow_ftp = on/off
Follow ftp links from html documents—the same as‘ --follow-ftp’.
follow_tags = string
Only follow certain html tags when doing a recursive retrieval,just like ‘ --follow-tags=string’.
force_html = on/off
If set to on, force the input filename to be regarded as an htmldocument—the same as ‘ -F’.
ftp_password = string
Set your ftp password to string. Without this setting, thepassword defaults to ‘ -wget@’, which is a useful default foranonymous ftp access.

This command used to be named passwd prior to Wget 1.10.

ftp_proxy = string
Use string as ftp proxy, instead of the one specified inenvironment.
ftp_user = string
Set ftp user to string.

This command used to be named login prior to Wget 1.10.

glob = on/off
Turn globbing on/off—the same as ‘ --glob’ and ‘ --no-glob’.
header = string
Define a header for HTTP downloads, like using‘ --header=string’.
adjust_extension = on/off
Add a ‘ .html’ extension to ‘ text/html’ or‘ application/xhtml+xml’ files that lack one, or a ‘ .css’extension to ‘ text/css’ files that lack one, like‘ -E’. Previously named ‘ html_extension’ (still acceptable,but deprecated).
http_keep_alive = on/off
Turn the keep-alive feature on or off (defaults to on). Turning itoff is equivalent to ‘ --no-http-keep-alive’.
http_password = string
Set http password, equivalent to‘ --http-password=string’.
http_proxy = string
Use string as http proxy, instead of the one specified inenvironment.
http_user = string
Set http user to string, equivalent to‘ --http-user=string’.
https_proxy = string
Use string as https proxy, instead of the one specified inenvironment.
ignore_case = on/off
When set to on, match files and directories case insensitively; thesame as ‘ --ignore-case’.
ignore_length = on/off
When set to on, ignore Content-Length header; the same as‘ --ignore-length’.
ignore_tags = string
Ignore certain html tags when doing a recursive retrieval, like‘ --ignore-tags=string’.
include_directories = string
Specify a comma-separated list of directories you wish to follow whendownloading—the same as ‘ -Istring’.
iri = on/off
When set to on, enable internationalized URI (IRI) support; the same as‘ --iri’.
inet4_only = on/off
Force connecting to IPv4 addresses, off by default. You can put thisin the global init file to disable Wget's attempts to resolve andconnect to IPv6 hosts. Available only if Wget was compiled with IPv6support. The same as ‘ --inet4-only’ or ‘ -4’.
inet6_only = on/off
Force connecting to IPv6 addresses, off by default. Available only ifWget was compiled with IPv6 support. The same as ‘ --inet6-only’or ‘ -6’.
input = file
Read the urls from string, like ‘ -ifile’.
keep_session_cookies = on/off
When specified, causes ‘ save_cookies = on’ to also save sessioncookies. See ‘ --keep-session-cookies’.
limit_rate = rate
Limit the download speed to no more than rate bytes per second. The same as ‘ --limit-rate=rate’.
load_cookies = file
Load cookies from file. See ‘ --load-cookiesfile’.
local_encoding = encoding
Force Wget to use encoding as the default system encoding. See‘ --local-encoding’.
logfile = file
Set logfile to file, the same as ‘ -o file’.
max_redirect = number
Specifies the maximum number of redirections to follow for a resource. See ‘ --max-redirect=number’.
mirror = on/off
Turn mirroring on/off. The same as ‘ -m’.
netrc = on/off
Turn reading netrc on or off.
no_clobber = on/off
Same as ‘ -nc’.
no_parent = on/off
Disallow retrieving outside the directory hierarchy, like‘ --no-parent’ (see Directory-Based Limits).
no_proxy = string
Use string as the comma-separated list of domains to avoid inproxy loading, instead of the one specified in environment.
output_document = file
Set the output filename—the same as ‘ -O file’.
page_requisites = on/off
Download all ancillary documents necessary for a single html page todisplay properly—the same as ‘ -p’.
passive_ftp = on/off
Change setting of passive ftp, equivalent to the‘ --passive-ftp’ option.
password = string
Specify password string for both ftp and http file retrieval. This command can be overridden using the ‘ ftp_password’ and‘ http_password’ command for ftp and http respectively.
post_data = string
Use POST as the method for all HTTP requests and send string inthe request body. The same as ‘ --post-data=string’.
post_file = file
Use POST as the method for all HTTP requests and send the contents of file in the request body. The same as‘ --post-file=file’.
prefer_family = none/IPv4/IPv6
When given a choice of several addresses, connect to the addresseswith specified address family first. The address order returned byDNS is used without change by default. The same as ‘ --prefer-family’,which see for a detailed discussion of why this is useful.
private_key = file
Set the private key file to file. The same as‘ --private-key=file’.
private_key_type = string
Specify the type of the private key, legal values being ‘ PEM’(the default) and ‘ DER’ (aka ASN1). The same as‘ --private-type=string’.
progress = string
Set the type of the progress indicator. Legal types are ‘ dot’and ‘ bar’. Equivalent to ‘ --progress=string’.
protocol_directories = on/off
When set, use the protocol name as a directory component of local filenames. The same as ‘ --protocol-directories’.
proxy_password = string
Set proxy authentication password to string, like‘ --proxy-password=string’.
proxy_user = string
Set proxy authentication user name to string, like‘ --proxy-user=string’.
quiet = on/off
Quiet mode—the same as ‘ -q’.
quota = quota
Specify the download quota, which is useful to put in the global wgetrc. When download quota is specified, Wget will stopretrieving after the download sum has become greater than quota. Thequota can be specified in bytes (default), kbytes ‘ k’ appended) ormbytes (‘ m’ appended). Thus ‘ quota = 5m’ will set the quotato 5 megabytes. Note that the user's startup file overrides systemsettings.
random_file = file
Use file as a source of randomness on systems lacking /dev/random.
random_wait = on/off
Turn random between-request wait times on or off. The same as‘ --random-wait’.
read_timeout = n
Set the read (and write) timeout—the same as‘ --read-timeout=n’.
reclevel = n
Recursion level (depth)—the same as ‘ -l n’.
recursive = on/off
Recursive on/off—the same as ‘ -r’.
referer = string
Set HTTP ‘ Referer:’ header just like‘ --referer=string’. (Note that it was the folks who wrotethe http spec who got the spelling of “referrer” wrong.)
relative_only = on/off
Follow only relative links—the same as ‘ -L’ (see Relative Links).
remote_encoding = encoding
Force Wget to use encoding as the default remote server encoding. See ‘ --remote-encoding’.
remove_listing = on/off
If set to on, remove ftp listings downloaded by Wget. Setting itto off is the same as ‘ --no-remove-listing’.
restrict_file_names = unix/windows
Restrict the file names generated by Wget from URLs. See‘ --restrict-file-names’ for a more detailed description.
retr_symlinks = on/off
When set to on, retrieve symbolic links as if they were plain files; thesame as ‘ --retr-symlinks’.
retry_connrefused = on/off
When set to on, consider “connection refused” a transienterror—the same as ‘ --retry-connrefused’.
robots = on/off
Specify whether the norobots convention is respected by Wget, “on” bydefault. This switch controls both the /robots.txt and the‘ nofollow’ aspect of the spec. See Robot Exclusion, for moredetails about this. Be sure you know what you are doing before turningthis off.
save_cookies = file
Save cookies to file. The same as ‘ --save-cookiesfile’.
save_headers = on/off
Same as ‘ --save-headers’.
secure_protocol = string
Choose the secure protocol to be used. Legal values are ‘ auto’(the default), ‘ SSLv2’, ‘ SSLv3’, and ‘ TLSv1’. The sameas ‘ --secure-protocol=string’.
server_response = on/off
Choose whether or not to print the http and ftp serverresponses—the same as ‘ -S’.
show_all_dns_entries = on/off
When a DNS name is resolved, show all the IP addresses, not just the firstthree.
span_hosts = on/off
Same as ‘ -H’.
spider = on/off
Same as ‘ --spider’.
strict_comments = on/off
Same as ‘ --strict-comments’.
timeout = n
Set all applicable timeout values to n, the same as ‘ -Tn’.
timestamping = on/off
Turn timestamping on/off. The same as ‘ -N’ (see Time-Stamping).
use_server_timestamps = on/off
If set to ‘ off’, Wget won't set the local file's timestamp by theone on the server (same as ‘ --no-use-server-timestamps’).
tries = n
Set number of retries per url—the same as ‘ -tn’.
use_proxy = on/off
When set to off, don't use proxy even when proxy-related environmentvariables are set. In that case it is the same as using‘ --no-proxy’.
user = string
Specify username string for both ftp and http file retrieval. This command can be overridden using the ‘ ftp_user’ and‘ http_user’ command for ftp and http respectively.
user_agent = string
User agent identification sent to the HTTP Server—the same as‘ --user-agent=string’.
verbose = on/off
Turn verbose on/off—the same as ‘ -v’/‘ -nv’.
wait = n
Wait n seconds between retrievals—the same as ‘ -wn’.
wait_retry = n
Wait up to n seconds between retries of failed retrievalsonly—the same as ‘ --waitretry=n’. Note that this isturned on by default in the global wgetrc.


Previous:  Wgetrc Commands,Up:  Startup File

6.4 Sample Wgetrc

This is the sample initialization file, as given in the distribution. It is divided in two section—one for global usage (suitable for globalstartup file), and one for local usage (suitable for$HOME/.wgetrc). Be careful about the things you change.

Note that almost all the lines are commented out. For a command to haveany effect, you must remove the ‘#’ character at the beginning ofits line.

     ###
     ### Sample Wget initialization file .wgetrc
     ###
     
     ## You can use this file to change the default behaviour of wget or to
     ## avoid having to type many many command-line options. This file does
     ## not contain a comprehensive list of commands -- look at the manual
     ## to find out what you can put into this file.
     ##
     ## Wget initialization file can reside in /usr/local/etc/wgetrc
     ## (global, for all users) or $HOME/.wgetrc (for a single user).
     ##
     ## To use the settings in this file, you will have to uncomment them,
     ## as well as change them, in most cases, as the values on the
     ## commented-out lines are the default values (e.g. "off").
     
     
     ##
     ## Global settings (useful for setting up in /usr/local/etc/wgetrc).
     ## Think well before you change them, since they may reduce wget's
     ## functionality, and make it behave contrary to the documentation:
     ##
     
     # You can set retrieve quota for beginners by specifying a value
     # optionally followed by 'K' (kilobytes) or 'M' (megabytes).  The
     # default quota is unlimited.
     #quota = inf
     
     # You can lower (or raise) the default number of retries when
     # downloading a file (default is 20).
     #tries = 20
     
     # Lowering the maximum depth of the recursive retrieval is handy to
     # prevent newbies from going too "deep" when they unwittingly start
     # the recursive retrieval.  The default is 5.
     #reclevel = 5
     
     # By default Wget uses "passive FTP" transfer where the client
     # initiates the data connection to the server rather than the other
     # way around.  That is required on systems behind NAT where the client
     # computer cannot be easily reached from the Internet.  However, some
     # firewalls software explicitly supports active FTP and in fact has
     # problems supporting passive transfer.  If you are in such
     # environment, use "passive_ftp = off" to revert to active FTP.
     #passive_ftp = off
     
     # The "wait" command below makes Wget wait between every connection.
     # If, instead, you want Wget to wait only between retries of failed
     # downloads, set waitretry to maximum number of seconds to wait (Wget
     # will use "linear backoff", waiting 1 second after the first failure
     # on a file, 2 seconds after the second failure, etc. up to this max).
     #waitretry = 10
     
     
     ##
     ## Local settings (for a user to set in his $HOME/.wgetrc).  It is
     ## *highly* undesirable to put these settings in the global file, since
     ## they are potentially dangerous to "normal" users.
     ##
     ## Even when setting up your own ~/.wgetrc, you should know what you
     ## are doing before doing so.
     ##
     
     # Set this to on to use timestamping by default:
     #timestamping = off
     
     # It is a good idea to make Wget send your email address in a `From:'
     # header with your request (so that server administrators can contact
     # you in case of errors).  Wget does *not* send `From:' by default.
     #header = From: Your Name 
     
     # You can set up other headers, like Accept-Language.  Accept-Language
     # is *not* sent by default.
     #header = Accept-Language: en
     
     # You can set the default proxies for Wget to use for http, https, and ftp.
     # They will override the value in the environment.
     #https_proxy = http://proxy.yoyodyne.com:18023/
     #http_proxy = http://proxy.yoyodyne.com:18023/
     #ftp_proxy = http://proxy.yoyodyne.com:18023/
     
     # If you do not want to use proxy at all, set this to off.
     #use_proxy = on
     
     # You can customize the retrieval outlook.  Valid options are default,
     # binary, mega and micro.
     #dot_style = default
     
     # Setting this to off makes Wget not download /robots.txt.  Be sure to
     # know *exactly* what /robots.txt is and how it is used before changing
     # the default!
     #robots = on
     
     # It can be useful to make Wget wait between connections.  Set this to
     # the number of seconds you want Wget to wait.
     #wait = 0
     
     # You can force creating directory structure, even if a single is being
     # retrieved, by setting this to on.
     #dirstruct = off
     
     # You can turn on recursive retrieving by default (don't do this if
     # you are not sure you know what it means) by setting this to on.
     #recursive = off
     
     # To always back up file X as X.orig before converting its links (due
     # to -k / --convert-links / convert_links = on having been specified),
     # set this variable to on:
     #backup_converted = off
     
     # To have Wget follow FTP links from HTML files by default, set this
     # to on:
     #follow_ftp = off
     
     # To try ipv6 addresses first:
     #prefer-family = IPv6
     
     # Set default IRI support state
     #iri = off
     
     # Force the default system encoding
     #locale = UTF-8
     
     # Force the default remote server encoding
     #remoteencoding = UTF-8


Next:  Various,Previous:  Startup File,Up:  Top

7 Examples

The examples are divided into three sections loosely based on theircomplexity.


Next:  Advanced Usage,Previous:  Examples,Up:  Examples

7.1 Simple Usage

  • Say you want to download a url. Just type:
              wget http://fly.srk.fer.hr/
    
  • But what will happen if the connection is slow, and the file is lengthy? The connection will probably fail before the whole file is retrieved,more than once. In this case, Wget will try getting the file until iteither gets the whole of it, or exceeds the default number of retries(this being 20). It is easy to change the number of tries to 45, toinsure that the whole file will arrive safely:
              wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
    
  • Now let's leave Wget to work in the background, and write its progressto log filelog. It is tiring to type ‘--tries’, so weshall use ‘-t’.
              wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
    

    The ampersand at the end of the line makes sure that Wget works in thebackground. To unlimit the number of retries, use ‘-t inf’.

  • The usage of ftp is as simple. Wget will take care of login andpassword.
              wget ftp://gnjilux.srk.fer.hr/welcome.msg
    
  • If you specify a directory, Wget will retrieve the directory listing,parse it and convert it tohtml. Try:
              wget ftp://ftp.gnu.org/pub/gnu/
              links index.html
    


Next:  Very Advanced Usage,Previous:  Simple Usage,Up:  Examples

7.2 Advanced Usage


Previous:  Advanced Usage,Up:  Examples

7.3 Very Advanced Usage

  • If you wish Wget to keep a mirror of a page (or ftpsubdirectories), use ‘--mirror’ (‘-m’), which is the shorthandfor ‘-r -l inf -N’. You can put Wget in the crontab file asking itto recheck a site each Sunday:
              crontab
              0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
    
  • In addition to the above, you want the links to be converted for localviewing. But, after having read this manual, you know that linkconversion doesn't play well with timestamping, so you also want Wget toback up the originalhtml files before the conversion. Wget invocationwould look like this:
              wget --mirror --convert-links --backup-converted  \
                   http://www.gnu.org/ -o /home/me/weeklog
    
  • But you've also noticed that local viewing doesn't work all that wellwhen html files are saved under extensions other than ‘.html’,perhaps because they were served asindex.cgi. So you'd likeWget to rename all the files served with content-type ‘text/html’or ‘application/xhtml+xml’ toname.html.
              wget --mirror --convert-links --backup-converted \
                   --html-extension -o /home/me/weeklog        \
                   http://www.gnu.org/
    

    Or, with less typing:

              wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
    


Next:  Appendices,Previous:  Examples,Up:  Top

8 Various

This chapter contains all the stuff that could not fit anywhere else.


Next:  Distribution,Previous:  Various,Up:  Various

8.1 Proxies

Proxies are special-purpose http servers designed to transferdata from remote servers to local clients. One typical use of proxiesis lightening network load for users behind a slow connection. This isachieved by channeling allhttp and ftp requests through theproxy which caches the transferred data. When a cached resource isrequested again, proxy will return the data from cache. Another use forproxies is for companies that separate (for security reasons) theirinternal networks from the rest of Internet. In order to obtaininformation from the Web, their users connect and retrieve remote datausing an authorized proxy.

Wget supports proxies for both http and ftp retrievals. Thestandard way to specify proxy location, which Wget recognizes, is usingthe following environment variables:

http_proxy
https_proxy
If set, the http_proxy and https_proxy variables shouldcontain the urls of the proxies for http and httpsconnections respectively.
ftp_proxy
This variable should contain the url of the proxy for ftpconnections. It is quite common that http_proxy and ftp_proxy are set to the same url.
no_proxy
This variable should contain a comma-separated list of domain extensionsproxy should not be used for. For instance, if the value of no_proxy is ‘ .mit.edu’, proxy will not be used to retrievedocuments from MIT.

In addition to the environment variables, proxy location and settingsmay be specified from within Wget itself.

--no-proxy
proxy = on/off
This option and the corresponding command may be used to suppress theuse of proxy, even if the appropriate environment variables are set.
http_proxy = URL’
https_proxy = URL’
ftp_proxy = URL’
no_proxy = string’
These startup file variables allow you to override the proxy settingsspecified by the environment.

Some proxy servers require authorization to enable you to use them. Theauthorization consists ofusername and password, which mustbe sent by Wget. As with http authorization, severalauthentication schemes exist. For proxy authorization only theBasic authentication scheme is currently implemented.

You may specify your username and password either through the proxyurl or through the command-line options. Assuming that thecompany's proxy is located at ‘proxy.company.com’ at port 8001, aproxyurl location containing authorization data might look likethis:

     http://hniksic:[email protected]:8001/

Alternatively, you may use the ‘proxy-user’ and‘proxy-password’ options, and the equivalent.wgetrcsettings proxy_user andproxy_password to set the proxyusername and password.


Next:  Web Site,Previous:  Proxies,Up:  Various

8.2 Distribution

Like all GNU utilities, the latest version of Wget can be found at themaster GNU archive site ftp.gnu.org, and its mirrors. For example,Wget 1.13.4 can be found atftp://ftp.gnu.org/pub/gnu/wget/wget-1.13.4.tar.gz


Next:  Mailing Lists,Previous:  Distribution,Up:  Various

8.3 Web Site

The official web site for GNU Wget is athttp://www.gnu.org/software/wget/. However, most usefulinformation resides at “The Wget Wgiki”,http://wget.addictivecode.org/.


Next:  Internet Relay Chat,Previous:  Web Site,Up:  Various

8.4 Mailing Lists

Primary List

The primary mailinglist for discussion, bug-reports, or questionsabout GNU Wget is [email protected]. To subscribe, send anemail [email protected], or visithttp://lists.gnu.org/mailman/listinfo/bug-wget.

You do not need to subscribe to send a message to the list; however,please note that unsubscribed messages are moderated, and may take awhile before they hit the list—usually around a day. Ifyou want your message to show up immediately, please subscribe to thelist before posting. Archives for the list may be found athttp://lists.gnu.org/pipermail/bug-wget/.

An NNTP/Usenettish gateway is also available viaGmane. You can see the Gmanearchives athttp://news.gmane.org/gmane.comp.web.wget.general. Note that theGmane archives conveniently include messages from both the currentlist, and the previous one. Messages also show up in the Gmanearchives sooner than they do atlists.gnu.org.

Bug Notices List

Additionally, there is the [email protected] mailinglist. This is a non-discussion list that receives bug reportnotifications from the bug-tracker. To subscribe to this list,send an email [email protected],or visithttp://addictivecode.org/mailman/listinfo/wget-notify.

Obsolete Lists

Previously, the mailing list [email protected] was used as themain discussion list, and another list,[email protected] was used for submitting anddiscussing patches to GNU Wget.

Messages from [email protected] are archived at

  • http://www.mail-archive.com/wget%40sunsite.dk/ and at
  • http://news.gmane.org/gmane.comp.web.wget.general (which alsocontinues to archive the current list,[email protected]).

Messages from [email protected] are archived at

  • http://news.gmane.org/gmane.comp.web.wget.patches.


Next:  Reporting Bugs,Previous:  Mailing Lists,Up:  Various

8.5 Internet Relay Chat

In addition to the mailinglists, we also have a support channel set upvia IRC atirc.freenode.org, #wget. Come check it out!


Next:  Portability,Previous:  Internet Relay Chat,Up:  Various

8.6 Reporting Bugs

You are welcome to submit bug reports via the GNU Wget bug tracker (seehttp://wget.addictivecode.org/BugTracker).

Before actually submitting a bug report, please try to follow a fewsimple guidelines.

  1. Please try to ascertain that the behavior you see really is a bug. IfWget crashes, it's a bug. If Wget does not behave as documented,it's a bug. If things work strange, but you are not sure about the waythey are supposed to work, it might well be a bug, but you might want todouble-check the documentation and the mailing lists (see Mailing Lists).
  2. Try to repeat the bug in as simple circumstances as possible. E.g. ifWget crashes while downloading ‘wget -rl0 -kKE -t5 --no-proxyhttp://yoyodyne.com -o /tmp/log’, you should try to see if the crash isrepeatable, and if will occur with a simpler set of options. You mighteven try to start the download at the page where the crash occurred tosee if that page somehow triggered the crash.

    Also, while I will probably be interested to know the contents of your.wgetrc file, just dumping it into the debug message is probablya bad idea. Instead, you should first try to see if the bug repeatswith.wgetrc moved out of the way. Only if it turns out that.wgetrc settings affect the bug, mail me the relevant parts ofthe file.

  3. Please start Wget with ‘-d’ option and send us the resultingoutput (or relevant parts thereof). If Wget was compiled withoutdebug support, recompile it—it ismuch easier to trace bugswith debug support on.

    Note: please make sure to remove any potentially sensitive informationfrom the debug log before sending it to the bug address. The-d won't go out of its way to collect sensitive information,but the logwill contain a fairly complete transcript of Wget'scommunication with the server, which may include passwords and piecesof downloaded data. Since the bug address is publically archived, youmay assume that all bug reports are visible to the public.

  4. If Wget has crashed, try to run it in a debugger, e.g. gdb `whichwget` core and typewhere to get the backtrace. This may notwork if the system administrator has disabled core files, but it issafe to try.


Next:  Signals,Previous:  Reporting Bugs,Up:  Various

8.7 Portability

Like all GNU software, Wget works on the GNU system. However, since ituses GNU Autoconf for building and configuring, and mostly avoids using“special” features of any particular Unix, it should compile (andwork) on all common Unix flavors.

Various Wget versions have been compiled and tested under many kinds ofUnix systems, including GNU/Linux, Solaris, SunOS 4.x, Mac OS X, OSF(aka Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Someof those systems are no longer in widespread use and may not be able tosupport recent versions of Wget. If Wget fails to compile on yoursystem, we would like to know about it.

Thanks to kind contributors, this version of Wget compiles and workson 32-bit Microsoft Windows platforms. It has been compiledsuccessfully using MS Visual C++ 6.0, Watcom, Borland C, and GCCcompilers. Naturally, it is crippled of some features available onUnix, but it should work as a substitute for people stuck withWindows. Note that Windows-specific portions of Wget are notguaranteed to be supported in the future, although this has been thecase in practice for many years now. All questions and problems inWindows usage should be reported to Wget mailing list [email protected] where the volunteers who maintain theWindows-related features might look at them.

Support for building on MS-DOS via DJGPP has been contributed by GisleVanem; a port to VMS is maintained by Steven Schweda, and is availableathttp://antinode.org/.


Previous:  Portability,Up:  Various

8.8 Signals

Since the purpose of Wget is background work, it catches the hangupsignal (SIGHUP) and ignores it. If the output was on standardoutput, it will be redirected to a file namedwget-log. Otherwise, SIGHUP is ignored. This is convenient when you wishto redirect the output of Wget after having started it.

     $ wget http://www.gnus.org/dist/gnus.tar.gz &
     ...
     $ kill -HUP %%
     SIGHUP received, redirecting output to `wget-log'.

Other than that, Wget will not try to interfere with signals in any way. C-c, kill -TERM and kill -KILL should kill it alike.


Next:  Copying this manual,Previous:  Various,Up:  Top

9 Appendices

This chapter contains some references I consider useful.


Next:  Security Considerations,Previous:  Appendices,Up:  Appendices

9.1 Robot Exclusion

It is extremely easy to make Wget wander aimlessly around a web site,sucking all the available data in progress. ‘wget -r site’,and you're set. Great? Not for the server admin.

As long as Wget is only retrieving static pages, and doing it at areasonable rate (see the ‘--wait’ option), there's not much of aproblem. The trouble is that Wget can't tell the difference between thesmallest static page and the most demanding CGI. A site I know has asection handled by a CGI Perl script that converts Info files tohtml onthe fly. The script is slow, but works well enough for human usersviewing an occasional Info file. However, when someone's recursive Wgetdownload stumbles upon the index page that links to all the Info filesthrough the script, the system is brought to its knees without providinganything useful to the user (This task of converting Info files could bedone locally and access to Info documentation for all installed GNUsoftware on a system is available from theinfo command).

To avoid this kind of accident, as well as to preserve privacy fordocuments that need to be protected from well-behaved robots, theconcept ofrobot exclusion was invented. The idea is thatthe server administrators and document authors can specify whichportions of the site they wish to protect from robots and thosethey will permit access.

The most popular mechanism, and the de facto standard supported byall the major robots, is the “Robots Exclusion Standard” (RES) writtenby Martijn Koster et al. in 1994. It specifies the format of a textfile containing directives that instruct the robots which URL paths toavoid. To be found by the robots, the specifications must be placed in/robots.txt in the server root, which the robots are expected todownload and parse.

Although Wget is not a web robot in the strictest sense of the word, itcan download large parts of the site without the user's intervention todownload an individual page. Because of that, Wget honors RES whendownloading recursively. For instance, when you issue:

     wget -r http://www.server.com/

First the index of ‘www.server.com’ will be downloaded. If Wgetfinds that it wants to download more documents from that server, it willrequest ‘http://www.server.com/robots.txt’ and, if found, use itfor further downloads. robots.txt is loaded only once per eachserver.

Until version 1.8, Wget supported the first version of the standard,written by Martijn Koster in 1994 and available athttp://www.robotstxt.org/wc/norobots.html. As of version 1.8,Wget has supported the additional directives specified in the internetdraft ‘’ titled “A Method for WebRobots Control”. The draft, which has as far as I know never made toanrfc, is available athttp://www.robotstxt.org/wc/norobots-rfc.txt.

This manual no longer includes the text of the Robot Exclusion Standard.

The second, less known mechanism, enables the author of an individualdocument to specify whether they want the links from the file to befollowed by a robot. This is achieved using theMETA tag, likethis:

     

This is explained in some detail athttp://www.robotstxt.org/wc/meta-user.html. Wget supports thismethod of robot exclusion in addition to the usual/robots.txtexclusion.

If you know what you are doing and really really wish to turn off therobot exclusion, set therobots variable to ‘off’ in your.wgetrc. You can achieve the same effect from the command lineusing the-e switch, e.g. ‘wget -e robots=off url...’.


Next:  Contributors,Previous:  Robot Exclusion,Up:  Appendices

9.2 Security Considerations

When using Wget, you must be aware that it sends unencrypted passwordsthrough the network, which may present a security problem. Here are themain issues, and some solutions.

  1. The passwords on the command line are visible using ps. The bestway around it is to usewget -i - and feed the urls toWget's standard input, each on a separate line, terminated byC-d. Another workaround is to use .netrc to store passwords; however,storing unencrypted passwords is also considered a security risk.
  2. Using the insecure basic authentication scheme, unencryptedpasswords are transmitted through the network routers and gateways.
  3. The ftp passwords are also in no way encrypted. There is no goodsolution for this at the moment.
  4. Although the “normal” output of Wget tries to hide the passwords,debugging logs show them, in all forms. This problem is avoided bybeing careful when you send debug logs (yes, even when you send them tome).


Previous:  Security Considerations,Up:  Appendices

9.3 Contributors

GNU Wget was written by Hrvoje Niksic [email protected].

However, the development of Wget could never have gone as far as it has, wereit not for the help of many people, either with bug reports, feature proposals,patches, or letters saying “Thanks!”.

Special thanks goes to the following people (no particular order):

  • Dan Harkless—contributed a lot of code and documentation ofextremely high quality, as well as the--page-requisites andrelated options. He was the principal maintainer for some time andreleased Wget 1.6.
  • Ian Abbott—contributed bug fixes, Windows-related fixes, andprovided a prototype implementation of the breadth-first recursivedownload. Co-maintained Wget during the 1.8 release cycle.
  • The dotsrc.org crew, in particular Karsten Thygesen—donated systemresources such as the mailing list, web space,ftp space, andversion control repositories, along with a lot of time to make theseactually work. Christian Reiniger was of invaluable help with settingup Subversion.
  • Heiko Herold—provided high-quality Windows builds and contributedbug and build reports for many years.
  • Shawn McHorse—bug reports and patches.
  • Kaveh R. Ghazi—on-the-fly ansi2knr-ization. Lots ofportability fixes.
  • Gordon Matzigkeit—.netrc support.
  • Zlatko Calusic, Tomislav Vujec and Drazen Kacar—feature suggestionsand “philosophical” discussions.
  • Darko Budor—initial port to Windows.
  • Antonio Rosella—help and suggestions, plus the initial Italiantranslation.
  • Tomislav Petrovic, Mario Mikocevic—many bug reports and suggestions.
  • Francois Pinard—many thorough bug reports and discussions.
  • Karl Eichwalder—lots of help with internationalization, Makefilelayout and many other things.
  • Junio Hamano—donated support for Opie and http Digestauthentication.
  • Mauro Tortonesi—improved IPv6 support, adding support for dualfamily systems. Refactored and enhanced FTP IPv6 code. Maintained GNUWget from 2004–2007.
  • Christopher G. Lewis—maintenance of the Windows version of GNU WGet.
  • Gisle Vanem—many helpful patches and improvements, especially forWindows and MS-DOS support.
  • Ralf Wildenhues—contributed patches to convert Wget to use Automake aspart of its build process, and various bugfixes.
  • Steven Schubiger—Many helpful patches, bugfixes and improvements. Notably, conversion of Wget to use the Gnulib quotes and quoteargsmodules, and the addition of password prompts at the console, via theGnulib getpasswd-gnu module.
  • Ted Mielczarek—donated support for CSS.
  • Saint Xavier—Support for IRIs (RFC 3987).
  • People who provided donations for development—including Brian Gough.

The following people have provided patches, bug/build reports, usefulsuggestions, beta testing services, fan mail and all the other thingsthat make maintenance so much fun:

Tim Adam,Adrian Aichner,Martin Baehr,Dieter Baron,Roger Beeman,Dan Berger,T. Bharath,Christian Biere,Paul Bludov,Daniel Bodea,Mark Boyns,John Burden,Julien Buty,Wanderlei Cavassin,Gilles Cedoc,Tim Charron,Noel Cragg,Kristijan Conkas,John Daily,Andreas Damm,Ahmon Dancy,Andrew Davison,Bertrand Demiddelaer,Alexander Dergachev,Andrew Deryabin,Ulrich Drepper,Marc Duponcheel,Damir Dzeko,Alan Eldridge,Hans-Andreas Engel,Aleksandar Erkalovic,Andy Eskilsson,Joao Ferreira,Christian Fraenkel,David Fritz,Mike Frysinger,Charles C. Fu,FUJISHIMA Satsuki,Masashi Fujita,Howard Gayle,Marcel Gerrits,Lemble Gregory,Hans Grobler,Alain Guibert,Mathieu Guillaume,Aaron Hawley,Jochen Hein,Karl Heuer,Madhusudan Hosaagrahara,HIROSE Masaaki,Ulf Harnhammar,Gregor Hoffleit,Erik Magnus Hulthen,Richard Huveneers,Jonas Jensen,Larry Jones,Simon Josefsson,Mario Juric,Hack Kampbjorn,Const Kaplinsky,Goran Kezunovic,Igor Khristophorov,Robert Kleine,KOJIMA Haime,Fila Kolodny,Alexander Kourakos,Martin Kraemer,Sami Krank,Jay Krell,Simos KSenitellis,Christian Lackas,Hrvoje Lacko,Daniel S. Lewart,Nicolas Lichtmeier,Dave Love,Alexander V. Lukyanov,Thomas Lussnig,Andre Majorel,Aurelien Marchand,Matthew J. Mellon,Jordan Mendelson,Ted Mielczarek,Robert Millan,Lin Zhe Min,Jan Minar,Tim Mooney,Keith Moore,Adam D. Moss,Simon Munton,Charlie Negyesi,R. K. Owen,Jim Paris,Kenny Parnell,Leonid Petrov,Simone Piunno,Andrew Pollock,Steve Pothier,Jan Prikryl,Marin Purgar,Csaba Raduly,Keith Refson,Bill Richardson,Tyler Riddle,Tobias Ringstrom,Jochen Roderburg,Juan Jose Rodriguez,Maciej W. Rozycki,Edward J. Sabol,Heinz Salzmann,Robert Schmidt,Nicolas Schodet,Benno Schulenberg,Andreas Schwab,Steven M. Schweda,Chris Seawood,Pranab Shenoy,Dennis Smit,Toomas Soome,Tage Stabell-Kulo,Philip Stadermann,Daniel Stenberg,Sven Sternberger,Markus Strasser,John Summerfield,Szakacsits Szabolcs,Mike Thomas,Philipp Thomas,Mauro Tortonesi,Dave Turner,Gisle Vanem,Rabin Vincent,Russell Vincent,Zeljko Vrba,Charles G Waldman,Douglas E. Wegscheid,Ralf Wildenhues,Joshua David Williams,Benjamin Wolsey,Saint Xavier,YAMAZAKI Makoto,Jasmin Zainul,Bojan Zdrnja,Kristijan Zimmer,Xin Zou.

Apologies to all who I accidentally left out, and many thanks to all thesubscribers of the Wget mailing list.


Next:  Concept Index,Previous:  Appendices,Up:  Top

Appendix A Copying this manual


Previous:  Copying this manual,Up:  Copying this manual

A.1 GNU Free Documentation License

Version 1.3, 3 November 2008
     Copyright © 2000, 2001, 2002, 2007, 2008, 2009, 2010, 2011
     Free Software Foundation, Inc.
     http://fsf.org/
     
     Everyone is permitted to copy and distribute verbatim copies
     of this license document, but changing it is not allowed.
  1. PREAMBLE

    The purpose of this License is to make a manual, textbook, or otherfunctional and useful documentfree in the sense of freedom: toassure everyone the effective freedom to copy and redistribute it,with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a wayto get credit for their work, while not being considered responsiblefor modifications made by others.

    This License is a kind of “copyleft”, which means that derivativeworks of the document must themselves be free in the same sense. Itcomplements the GNU General Public License, which is a copyleftlicense designed for free software.

    We have designed this License in order to use it for manuals for freesoftware, because free software needs free documentation: a freeprogram should come with manuals providing the same freedoms that thesoftware does. But this License is not limited to software manuals;it can be used for any textual work, regardless of subject matter orwhether it is published as a printed book. We recommend this Licenseprincipally for works whose purpose is instruction or reference.

  2. APPLICABILITY AND DEFINITIONS

    This License applies to any manual or other work, in any medium, thatcontains a notice placed by the copyright holder saying it can bedistributed under the terms of this License. Such a notice grants aworld-wide, royalty-free license, unlimited in duration, to use thatwork under the conditions stated herein. The “Document”, below,refers to any such manual or work. Any member of the public is alicensee, and is addressed as “you”. You accept the license if youcopy, modify or distribute the work in a way requiring permissionunder copyright law.

    A “Modified Version” of the Document means any work containing theDocument or a portion of it, either copied verbatim, or withmodifications and/or translated into another language.

    A “Secondary Section” is a named appendix or a front-matter sectionof the Document that deals exclusively with the relationship of thepublishers or authors of the Document to the Document's overallsubject (or to related matters) and contains nothing that could falldirectly within that overall subject. (Thus, if the Document is inpart a textbook of mathematics, a Secondary Section may not explainany mathematics.) The relationship could be a matter of historicalconnection with the subject or with related matters, or of legal,commercial, philosophical, ethical or political position regardingthem.

    The “Invariant Sections” are certain Secondary Sections whose titlesare designated, as being those of Invariant Sections, in the noticethat says that the Document is released under this License. If asection does not fit the above definition of Secondary then it is notallowed to be designated as Invariant. The Document may contain zeroInvariant Sections. If the Document does not identify any InvariantSections then there are none.

    The “Cover Texts” are certain short passages of text that are listed,as Front-Cover Texts or Back-Cover Texts, in the notice that says thatthe Document is released under this License. A Front-Cover Text maybe at most 5 words, and a Back-Cover Text may be at most 25 words.

    A “Transparent” copy of the Document means a machine-readable copy,represented in a format whose specification is available to thegeneral public, that is suitable for revising the documentstraightforwardly with generic text editors or (for images composed ofpixels) generic paint programs or (for drawings) some widely availabledrawing editor, and that is suitable for input to text formatters orfor automatic translation to a variety of formats suitable for inputto text formatters. A copy made in an otherwise Transparent fileformat whose markup, or absence of markup, has been arranged to thwartor discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amountof text. A copy that is not “Transparent” is called “Opaque”.

    Examples of suitable formats for Transparent copies include plainascii without markup, Texinfo input format, LaTeX inputformat,SGML or XML using a publicly availableDTD, and standard-conforming simpleHTML,PostScript or PDF designed for human modification. Examplesof transparent image formats includePNG, XCF andJPG. Opaque formats include proprietary formats that can beread and edited only by proprietary word processors,SGML orXML for which the DTD and/or processing tools arenot generally available, and the machine-generatedHTML,PostScript or PDF produced by some word processors foroutput purposes only.

    The “Title Page” means, for a printed book, the title page itself,plus such following pages as are needed to hold, legibly, the materialthis License requires to appear in the title page. For works informats which do not have any title page as such, “Title Page” meansthe text near the most prominent appearance of the work's title,preceding the beginning of the body of the text.

    The “publisher” means any person or entity that distributes copiesof the Document to the public.

    A section “Entitled XYZ” means a named subunit of the Document whosetitle either is precisely XYZ or contains XYZ in parentheses followingtext that translates XYZ in another language. (Here XYZ stands for aspecific section name mentioned below, such as “Acknowledgements”,“Dedications”, “Endorsements”, or “History”.) To “Preserve the Title”of such a section when you modify the Document means that it remains asection “Entitled XYZ” according to this definition.

    The Document may include Warranty Disclaimers next to the notice whichstates that this License applies to the Document. These WarrantyDisclaimers are considered to be included by reference in thisLicense, but only as regards disclaiming warranties: any otherimplication that these Warranty Disclaimers may have is void and hasno effect on the meaning of this License.

  3. VERBATIM COPYING

    You may copy and distribute the Document in any medium, eithercommercially or noncommercially, provided that this License, thecopyright notices, and the license notice saying this License appliesto the Document are reproduced in all copies, and that you add no otherconditions whatsoever to those of this License. You may not usetechnical measures to obstruct or control the reading or furthercopying of the copies you make or distribute. However, you may acceptcompensation in exchange for copies. If you distribute a large enoughnumber of copies you must also follow the conditions in section 3.

    You may also lend copies, under the same conditions stated above, andyou may publicly display copies.

  4. COPYING IN QUANTITY

    If you publish printed copies (or copies in media that commonly haveprinted covers) of the Document, numbering more than 100, and theDocument's license notice requires Cover Texts, you must enclose thecopies in covers that carry, clearly and legibly, all these CoverTexts: Front-Cover Texts on the front cover, and Back-Cover Texts onthe back cover. Both covers must also clearly and legibly identifyyou as the publisher of these copies. The front cover must presentthe full title with all words of the title equally prominent andvisible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preservethe title of the Document and satisfy these conditions, can be treatedas verbatim copying in other respects.

    If the required texts for either cover are too voluminous to fitlegibly, you should put the first ones listed (as many as fitreasonably) on the actual cover, and continue the rest onto adjacentpages.

    If you publish or distribute Opaque copies of the Document numberingmore than 100, you must either include a machine-readable Transparentcopy along with each Opaque copy, or state in or with each Opaque copya computer-network location from which the general network-usingpublic has access to download using public-standard network protocolsa complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps,when you begin distribution of Opaque copies in quantity, to ensurethat this Transparent copy will remain thus accessible at the statedlocation until at least one year after the last time you distribute anOpaque copy (directly or through your agents or retailers) of thatedition to the public.

    It is requested, but not required, that you contact the authors of theDocument well before redistributing any large number of copies, to givethem a chance to provide you with an updated version of the Document.

  5. MODIFICATIONS

    You may copy and distribute a Modified Version of the Document underthe conditions of sections 2 and 3 above, provided that you releasethe Modified Version under precisely this License, with the ModifiedVersion filling the role of the Document, thus licensing distributionand modification of the Modified Version to whoever possesses a copyof it. In addition, you must do these things in the Modified Version:

    1. Use in the Title Page (and on the covers, if any) a title distinctfrom that of the Document, and from those of previous versions(which should, if there were any, be listed in the History sectionof the Document). You may use the same title as a previous versionif the original publisher of that version gives permission.
    2. List on the Title Page, as authors, one or more persons or entitiesresponsible for authorship of the modifications in the ModifiedVersion, together with at least five of the principal authors of theDocument (all of its principal authors, if it has fewer than five),unless they release you from this requirement.
    3. State on the Title page the name of the publisher of theModified Version, as the publisher.
    4. Preserve all the copyright notices of the Document.
    5. Add an appropriate copyright notice for your modificationsadjacent to the other copyright notices.
    6. Include, immediately after the copyright notices, a license noticegiving the public permission to use the Modified Version under theterms of this License, in the form shown in the Addendum below.
    7. Preserve in that license notice the full lists of Invariant Sectionsand required Cover Texts given in the Document's license notice.
    8. Include an unaltered copy of this License.
    9. Preserve the section Entitled “History”, Preserve its Title, and addto it an item stating at least the title, year, new authors, andpublisher of the Modified Version as given on the Title Page. Ifthere is no section Entitled “History” in the Document, create onestating the title, year, authors, and publisher of the Document asgiven on its Title Page, then add an item describing the ModifiedVersion as stated in the previous sentence.
    10. Preserve the network location, if any, given in the Document forpublic access to a Transparent copy of the Document, and likewisethe network locations given in the Document for previous versionsit was based on. These may be placed in the “History” section. You may omit a network location for a work that was published atleast four years before the Document itself, or if the originalpublisher of the version it refers to gives permission.
    11. For any section Entitled “Acknowledgements” or “Dedications”, Preservethe Title of the section, and preserve in the section all thesubstance and tone of each of the contributor acknowledgements and/ordedications given therein.
    12. Preserve all the Invariant Sections of the Document,unaltered in their text and in their titles. Section numbersor the equivalent are not considered part of the section titles.
    13. Delete any section Entitled “Endorsements”. Such a sectionmay not be included in the Modified Version.
    14. Do not retitle any existing section to be Entitled “Endorsements” orto conflict in title with any Invariant Section.
    15. Preserve any Warranty Disclaimers.

    If the Modified Version includes new front-matter sections orappendices that qualify as Secondary Sections and contain no materialcopied from the Document, you may at your option designate some or allof these sections as invariant. To do this, add their titles to thelist of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

    You may add a section Entitled “Endorsements”, provided it containsnothing but endorsements of your Modified Version by variousparties—for example, statements of peer review or that the text hasbeen approved by an organization as the authoritative definition of astandard.

    You may add a passage of up to five words as a Front-Cover Text, and apassage of up to 25 words as a Back-Cover Text, to the end of the listof Cover Texts in the Modified Version. Only one passage ofFront-Cover Text and one of Back-Cover Text may be added by (orthrough arrangements made by) any one entity. If the Document alreadyincludes a cover text for the same cover, previously added by you orby arrangement made by the same entity you are acting on behalf of,you may not add another; but you may replace the old one, on explicitpermission from the previous publisher that added the old one.

    The author(s) and publisher(s) of the Document do not by this Licensegive permission to use their names for publicity for or to assert orimply endorsement of any Modified Version.

  6. COMBINING DOCUMENTS

    You may combine the Document with other documents released under thisLicense, under the terms defined in section 4 above for modifiedversions, provided that you include in the combination all of theInvariant Sections of all of the original documents, unmodified, andlist them all as Invariant Sections of your combined work in itslicense notice, and that you preserve all their Warranty Disclaimers.

    The combined work need only contain one copy of this License, andmultiple identical Invariant Sections may be replaced with a singlecopy. If there are multiple Invariant Sections with the same name butdifferent contents, make the title of each such section unique byadding at the end of it, in parentheses, the name of the originalauthor or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list ofInvariant Sections in the license notice of the combined work.

    In the combination, you must combine any sections Entitled “History”in the various original documents, forming one section Entitled“History”; likewise combine any sections Entitled “Acknowledgements”,and any sections Entitled “Dedications”. You must delete allsections Entitled “Endorsements.”

  7. COLLECTIONS OF DOCUMENTS

    You may make a collection consisting of the Document and other documentsreleased under this License, and replace the individual copies of thisLicense in the various documents with a single copy that is included inthe collection, provided that you follow the rules of this License forverbatim copying of each of the documents in all other respects.

    You may extract a single document from such a collection, and distributeit individually under this License, provided you insert a copy of thisLicense into the extracted document, and follow this License in allother respects regarding verbatim copying of that document.

  8. AGGREGATION WITH INDEPENDENT WORKS

    A compilation of the Document or its derivatives with other separateand independent documents or works, in or on a volume of a storage ordistribution medium, is called an “aggregate” if the copyrightresulting from the compilation is not used to limit the legal rightsof the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does notapply to the other works in the aggregate which are not themselvesderivative works of the Document.

    If the Cover Text requirement of section 3 is applicable to thesecopies of the Document, then if the Document is less than one half ofthe entire aggregate, the Document's Cover Texts may be placed oncovers that bracket the Document within the aggregate, or theelectronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the wholeaggregate.

  9. TRANSLATION

    Translation is considered a kind of modification, so you maydistribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires specialpermission from their copyright holders, but you may includetranslations of some or all Invariant Sections in addition to theoriginal versions of these Invariant Sections. You may include atranslation of this License, and all the license notices in theDocument, and any Warranty Disclaimers, provided that you also includethe original English version of this License and the original versionsof those notices and disclaimers. In case of a disagreement betweenthe translation and the original version of this License or a noticeor disclaimer, the original version will prevail.

    If a section in the Document is Entitled “Acknowledgements”,“Dedications”, or “History”, the requirement (section 4) to Preserveits Title (section 1) will typically require changing the actualtitle.

  10. TERMINATION

    You may not copy, modify, sublicense, or distribute the Documentexcept as expressly provided under this License. Any attemptotherwise to copy, modify, sublicense, or distribute it is void, andwill automatically terminate your rights under this License.

    However, if you cease all violation of this License, then your licensefrom a particular copyright holder is reinstated (a) provisionally,unless and until the copyright holder explicitly and finallyterminates your license, and (b) permanently, if the copyright holderfails to notify you of the violation by some reasonable means prior to60 days after the cessation.

    Moreover, your license from a particular copyright holder isreinstated permanently if the copyright holder notifies you of theviolation by some reasonable means, this is the first time you havereceived notice of violation of this License (for any work) from thatcopyright holder, and you cure the violation prior to 30 days afteryour receipt of the notice.

    Termination of your rights under this section does not terminate thelicenses of parties who have received copies or rights from you underthis License. If your rights have been terminated and not permanentlyreinstated, receipt of a copy of some or all of the same material doesnot give you any rights to use it.

  11. FUTURE REVISIONS OF THIS LICENSE

    The Free Software Foundation may publish new, revised versionsof the GNU Free Documentation License from time to time. Such newversions will be similar in spirit to the present version, but maydiffer in detail to address new problems or concerns. Seehttp://www.gnu.org/copyleft/.

    Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of thisLicense “or any later version” applies to it, you have the option offollowing the terms and conditions either of that specified version orof any later version that has been published (not as a draft) by theFree Software Foundation. If the Document does not specify a versionnumber of this License, you may choose any version ever published (notas a draft) by the Free Software Foundation. If the Documentspecifies that a proxy can decide which future versions of thisLicense can be used, that proxy's public statement of acceptance of aversion permanently authorizes you to choose that version for theDocument.

  12. RELICENSING

    “Massive Multiauthor Collaboration Site” (or “MMC Site”) means anyWorld Wide Web server that publishes copyrightable works and alsoprovides prominent facilities for anybody to edit those works. Apublic wiki that anybody can edit is an example of such a server. A“Massive Multiauthor Collaboration” (or “MMC”) contained in thesite means any set of copyrightable works thus published on the MMCsite.

    “CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0license published by Creative Commons Corporation, a not-for-profitcorporation with a principal place of business in San Francisco,California, as well as future copyleft versions of that licensepublished by that same organization.

    “Incorporate” means to publish or republish a Document, in whole orin part, as part of another Document.

    An MMC is “eligible for relicensing” if it is licensed under thisLicense, and if all works that were first published under this Licensesomewhere other than this MMC, and subsequently incorporated in wholeor in part into the MMC, (1) had no cover texts or invariant sections,and (2) were thus incorporated prior to November 1, 2008.

    The operator of an MMC Site may republish an MMC contained in the siteunder CC-BY-SA on the same site at any time before August 1, 2009,provided the MMC is eligible for relicensing.

ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy ofthe License in the document and put the following copyright andlicense notices just after the title page:

       Copyright (C)  year  your name.
       Permission is granted to copy, distribute and/or modify this document
       under the terms of the GNU Free Documentation License, Version 1.3
       or any later version published by the Free Software Foundation;
       with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
       Texts.  A copy of the license is included in the section entitled ``GNU
       Free Documentation License''.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,replace the “with...Texts.” line with this:

         with the Invariant Sections being list their titles, with
         the Front-Cover Texts being list, and with the Back-Cover Texts
         being list.

If you have Invariant Sections without Cover Texts, or some othercombination of the three, merge those two alternatives to suit thesituation.

If your document contains nontrivial examples of program code, werecommend releasing these examples in parallel under your choice offree software license, such as the GNU General Public License,to permit their use in free software.


Previous:  Copying this manual,Up:  Top

Concept Index

  • #wget: Internet Relay Chat
  • .css extension: HTTP Options
  • .html extension: HTTP Options
  • .listing files, removing:FTP Options
  • .netrc: Startup File
  • .wgetrc: Startup File
  • accept directories: Directory-Based Limits
  • accept suffixes: Types of Files
  • accept wildcards: Types of Files
  • append to log: Logging and Input File Options
  • arguments: Invoking
  • authentication: HTTP Options
  • authentication: Download Options
  • backing up converted files:Recursive Retrieval Options
  • bandwidth, limit: Download Options
  • base for relative links in input file:Logging and Input File Options
  • bind address: Download Options
  • bug reports: Reporting Bugs
  • bugs: Reporting Bugs
  • cache: HTTP Options
  • caching of DNS lookups: Download Options
  • case fold: Recursive Accept/Reject Options
  • client IP address: Download Options
  • clobbering, file: Download Options
  • command line: Invoking
  • comments, html: Recursive Retrieval Options
  • connect timeout: Download Options
  • Content-Disposition: HTTP Options
  • Content-Length, ignore:HTTP Options
  • continue retrieval: Download Options
  • contributors: Contributors
  • conversion of links: Recursive Retrieval Options
  • cookies: HTTP Options
  • cookies, loading: HTTP Options
  • cookies, saving: HTTP Options
  • cookies, session: HTTP Options
  • cut directories: Directory Options
  • debug: Logging and Input File Options
  • default page name: HTTP Options
  • delete after retrieval: Recursive Retrieval Options
  • directories: Directory-Based Limits
  • directories, exclude: Directory-Based Limits
  • directories, include: Directory-Based Limits
  • directory limits: Directory-Based Limits
  • directory prefix: Directory Options
  • DNS cache: Download Options
  • DNS timeout: Download Options
  • dot style: Download Options
  • downloading multiple times:Download Options
  • EGD: HTTPS (SSL/TLS) Options
  • entropy, specifying source of:HTTPS (SSL/TLS) Options
  • examples: Examples
  • exclude directories: Directory-Based Limits
  • execute wgetrc command: Basic Startup Options
  • FDL, GNU Free Documentation License:GNU Free Documentation License
  • features: Overview
  • file names, restrict: Download Options
  • filling proxy cache: Recursive Retrieval Options
  • follow FTP links: Recursive Accept/Reject Options
  • following ftp links: FTP Links
  • following links: Following Links
  • force html: Logging and Input File Options
  • ftp authentication: FTP Options
  • ftp password: FTP Options
  • ftp time-stamping: FTP Time-Stamping Internals
  • ftp user: FTP Options
  • globbing, toggle: FTP Options
  • hangup: Signals
  • header, add: HTTP Options
  • hosts, spanning: Spanning Hosts
  • html comments:Recursive Retrieval Options
  • http password: HTTP Options
  • http referer: HTTP Options
  • http time-stamping: HTTP Time-Stamping Internals
  • http user: HTTP Options
  • idn support: Download Options
  • ignore case: Recursive Accept/Reject Options
  • ignore length: HTTP Options
  • include directories: Directory-Based Limits
  • incomplete downloads: Download Options
  • incremental updating: Time-Stamping
  • index.html: HTTP Options
  • input-file: Logging and Input File Options
  • Internet Relay Chat: Internet Relay Chat
  • invoking: Invoking
  • IP address, client: Download Options
  • IPv6: Download Options
  • IRC: Internet Relay Chat
  • iri support: Download Options
  • Keep-Alive, turning off:HTTP Options
  • latest version: Distribution
  • limit bandwidth: Download Options
  • link conversion: Recursive Retrieval Options
  • links: Following Links
  • list: Mailing Lists
  • loading cookies: HTTP Options
  • local encoding: Download Options
  • location of wgetrc: Wgetrc Location
  • log file: Logging and Input File Options
  • mailing list: Mailing Lists
  • mirroring: Very Advanced Usage
  • no parent: Directory-Based Limits
  • no-clobber: Download Options
  • nohup: Invoking
  • number of retries: Download Options
  • operating systems: Portability
  • option syntax: Option Syntax
  • output file: Logging and Input File Options
  • overview: Overview
  • page requisites: Recursive Retrieval Options
  • passive ftp: FTP Options
  • password: Download Options
  • pause: Download Options
  • Persistent Connections, disabling:HTTP Options
  • portability: Portability
  • POST: HTTP Options
  • progress indicator: Download Options
  • proxies: Proxies
  • proxy: HTTP Options
  • proxy: Download Options
  • proxy authentication: HTTP Options
  • proxy filling: Recursive Retrieval Options
  • proxy password: HTTP Options
  • proxy user: HTTP Options
  • quiet: Logging and Input File Options
  • quota: Download Options
  • random wait: Download Options
  • randomness, specifying source of:HTTPS (SSL/TLS) Options
  • rate, limit: Download Options
  • read timeout: Download Options
  • recursion: Recursive Download
  • recursive download: Recursive Download
  • redirect: HTTP Options
  • redirecting output: Advanced Usage
  • referer, http: HTTP Options
  • reject directories: Directory-Based Limits
  • reject suffixes: Types of Files
  • reject wildcards: Types of Files
  • relative links: Relative Links
  • remote encoding: Download Options
  • reporting bugs: Reporting Bugs
  • required images, downloading:Recursive Retrieval Options
  • resume download: Download Options
  • retries: Download Options
  • retries, waiting between:Download Options
  • retrieving: Recursive Download
  • robot exclusion: Robot Exclusion
  • robots.txt: Robot Exclusion
  • sample wgetrc: Sample Wgetrc
  • saving cookies: HTTP Options
  • security: Security Considerations
  • server maintenance: Robot Exclusion
  • server response, print: Download Options
  • server response, save: HTTP Options
  • session cookies: HTTP Options
  • signal handling: Signals
  • spanning hosts: Spanning Hosts
  • specify config: Logging and Input File Options
  • spider: Download Options
  • SSL: HTTPS (SSL/TLS) Options
  • SSL certificate: HTTPS (SSL/TLS) Options
  • SSL certificate authority:HTTPS (SSL/TLS) Options
  • SSL certificate type, specify:HTTPS (SSL/TLS) Options
  • SSL certificate, check: HTTPS (SSL/TLS) Options
  • SSL protocol, choose: HTTPS (SSL/TLS) Options
  • startup: Startup File
  • startup file: Startup File
  • suffixes, accept: Types of Files
  • suffixes, reject: Types of Files
  • symbolic links, retrieving:FTP Options
  • syntax of options: Option Syntax
  • syntax of wgetrc: Wgetrc Syntax
  • tag-based recursive pruning:Recursive Accept/Reject Options
  • time-stamping: Time-Stamping
  • time-stamping usage: Time-Stamping Usage
  • timeout: Download Options
  • timeout, connect: Download Options
  • timeout, DNS: Download Options
  • timeout, read: Download Options
  • timestamping: Time-Stamping
  • tries: Download Options
  • Trust server names: HTTP Options
  • types of files: Types of Files
  • unlink: Download Options
  • updating the archives: Time-Stamping
  • URL: URL Format
  • URL syntax: URL Format
  • usage, time-stamping: Time-Stamping Usage
  • user: Download Options
  • user-agent: HTTP Options
  • various: Various
  • verbose: Logging and Input File Options
  • wait: Download Options
  • wait, random: Download Options
  • waiting between retries: Download Options
  • web site: Web Site
  • Wget as spider: Download Options
  • wgetrc: Startup File
  • wgetrc commands: Wgetrc Commands
  • wgetrc location: Wgetrc Location
  • wgetrc syntax: Wgetrc Syntax
  • wildcards, accept: Types of Files
  • wildcards, reject: Types of Files
  • Windows file names: Download Options

Table of Contents

  • Wget 1.13.4
  • 1 Overview
  • 2 Invoking
    • 2.1 URL Format
    • 2.2 Option Syntax
    • 2.3 Basic Startup Options
    • 2.4 Logging and Input File Options
    • 2.5 Download Options
    • 2.6 Directory Options
    • 2.7 HTTP Options
    • 2.8 HTTPS (SSL/TLS) Options
    • 2.9 FTP Options
    • 2.10 Recursive Retrieval Options
    • 2.11 Recursive Accept/Reject Options
    • 2.12 Exit Status
  • 3 Recursive Download
  • 4 Following Links
    • 4.1 Spanning Hosts
    • 4.2 Types of Files
    • 4.3 Directory-Based Limits
    • 4.4 Relative Links
    • 4.5 Following FTP Links
  • 5 Time-Stamping
    • 5.1 Time-Stamping Usage
    • 5.2 HTTP Time-Stamping Internals
    • 5.3 FTP Time-Stamping Internals
  • 6 Startup File
    • 6.1 Wgetrc Location
    • 6.2 Wgetrc Syntax
    • 6.3 Wgetrc Commands
    • 6.4 Sample Wgetrc
  • 7 Examples
    • 7.1 Simple Usage
    • 7.2 Advanced Usage
    • 7.3 Very Advanced Usage
  • 8 Various
    • 8.1 Proxies
    • 8.2 Distribution
    • 8.3 Web Site
    • 8.4 Mailing Lists
      • Primary List
      • Bug Notices List
      • Obsolete Lists
    • 8.5 Internet Relay Chat
    • 8.6 Reporting Bugs
    • 8.7 Portability
    • 8.8 Signals
  • 9 Appendices
    • 9.1 Robot Exclusion
    • 9.2 Security Considerations
    • 9.3 Contributors
  • Appendix A Copying this manual
    • A.1 GNU Free Documentation License
  • Concept Index

Footnotes

[1] If you have a.netrc file in your home directory, password will also besearched for there.

[2] As an additional check, Wget will look at theContent-Length header, and compare the sizes; if they are not thesame, the remote file will be downloaded no matter what the time-stampsays.


你可能感兴趣的:(Linux)