This file documents the GNU Wget utility for downloading networkdata.
Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation,Inc.
Permission is granted to copy, distribute and/or modify this documentunder the terms of the GNU Free Documentation License, Version 1.2 orany later version published by the Free Software Foundation; with noInvariant Sections, no Front-Cover Texts, and no Back-Cover Texts. Acopy of the license is included in the section entitled “GNU FreeDocumentation License”.
GNU Wget is a free utility for non-interactive download of files fromthe Web. It supportshttp, https, and ftp protocols, aswell as retrieval through http proxies.
This chapter is a partial overview of Wget's features.
By default, Wget is very simple to invoke. The basic syntax is:
wget [option]... [URL]...
Wget will simply download all the urls specified on the commandline.URL is a Uniform Resource Locator, as defined below.
However, you may wish to change some of the default parameters ofWget. You can do it two ways: permanently, adding the appropriatecommand to.wgetrc (see Startup File), or specifying it onthe command line.
URL is an acronym for Uniform Resource Locator. A uniformresource locator is a compact string representation for a resourceavailable via the Internet. Wget recognizes theurl syntax as perrfc1738. This is the most widely used form (square brackets denoteoptional parts):
http://host[:port]/directory/file ftp://host[:port]/directory/file
You can also encode your username and password within a url:
ftp://user:password@host/path http://user:password@host/path
Either user or password, or both, may be left out. If youleave out either thehttp username or password, no authenticationwill be sent. If you leave out theftp username, ‘anonymous’will be used. If you leave out theftp password, your emailaddress will be supplied as a default password.1
Important Note: if you specify a password-containing urlon the command line, the username and password will be plainly visibleto all users on the system, by way ofps
. On multi-user systems,this is a big security risk. To work around it, usewget -i -
and feed the urls to Wget's standard input, each on a separateline, terminated byC-d.
You can encode unsafe characters in a url as ‘%xy’,xy
being the hexadecimal representation of the character's asciivalue. Some common unsafe characters include ‘%’ (quoted as‘%25’), ‘:’ (quoted as ‘%3A’), and ‘@’ (quoted as‘%40’). Refer torfc1738 for a comprehensive list of unsafecharacters.
Wget also supports the type
feature for ftpurls. Bydefault, ftp documents are retrieved in the binary mode (type‘i’), which means that they are downloaded unchanged. Anotheruseful mode is the ‘a’ (ASCII) mode, which converts the linedelimiters between the different operating systems, and is thus usefulfor text files. Here is an example:
ftp://host/directory/file;type=a
Two alternative variants of url specification are also supported,because of historical (hysterical?) reasons and their widespreaded use.
ftp-only syntax (supported by NcFTP
):
host:/dir/file
http-only syntax (introduced by Netscape
):
host[:port]/dir/file
These two alternative forms are deprecated, and may cease beingsupported in the future.
If you do not understand the difference between these notations, or donot know which one to use, just use the plain ordinary format you usewith your favorite browser, likeLynx
or Netscape
.
Since Wget uses GNU getopt to process command-line arguments, everyoption has a long form along with the short one. Long options aremore convenient to remember, but take time to type. You may freelymix different option styles, or specify options after the command-linearguments. Thus you may write:
wget -r --tries=10 http://fly.srk.fer.hr/ -o log
The space between the option accepting an argument and the argument maybe omitted. Instead of ‘-o log’ you can write ‘-olog’.
You may put several options that do not require arguments together,like:
wget -drc URL
This is completely equivalent to:
wget -d -r -c URL
Since the options can be specified after the arguments, you mayterminate them with ‘--’. So the following will try to downloadurl ‘-x’, reporting failure to log:
wget -o log -- -x
The options that accept comma-separated lists all respect the conventionthat specifying an empty list clears its value. This can be useful toclear the.wgetrc settings. For instance, if your .wgetrcsets exclude_directories
to /cgi-bin, the followingexample will first reset it, and then set it to exclude/~nobodyand /~somebody. You can also clear the lists in.wgetrc(see Wgetrc Syntax).
wget -X '' -X /~nobody,/~somebody
Most options that do not accept arguments are boolean options,so named because their state can be captured with a yes-or-no(“boolean”) variable. For example, ‘--follow-ftp’ tells Wgetto follow FTP links from HTML files and, on the other hand,‘--no-glob’ tells it not to perform file globbing on FTP URLs. Aboolean option is eitheraffirmative or negative(beginning with ‘--no’). All such options share severalproperties.
Unless stated otherwise, it is assumed that the default behavior isthe opposite of what the option accomplishes. For example, thedocumented existence of ‘--follow-ftp’ assumes that the defaultis tonot follow FTP links from HTML pages.
Affirmative options can be negated by prepending the ‘--no-’ tothe option name; negative options can be negated by omitting the‘--no-’ prefix. This might seem superfluous—if the default foran affirmative option is to not do something, then why provide a wayto explicitly turn it off? But the startup file may in fact changethe default. For instance, usingfollow_ftp = on
in.wgetrc makes Wgetfollow FTP links by default, andusing ‘--no-follow-ftp’ is the only way to restore the factorydefault from the command line.
If this function is used, no urls need be present on the commandline. If there areurls both on the command line and in an inputfile, those on the command lines will be the first ones to beretrieved. If ‘--force-html’ is not specified, thenfileshould consist of a series of URLs, one per line.
However, if you specify ‘--force-html’, the document will beregarded as ‘html’. In that case you may have problems withrelative links, which you can solve either by adding">
to the documents or by specifying‘--base=url’ on the command line.
If the file is an external one, the document will be automaticallytreated as ‘html’ if the Content-Type matches ‘text/html’. Furthermore, thefile's location will be implicitly used as basehref if none was specified.
url
">
to
html, or using the ‘
--base’ command-lineoption.
BASE
tag in the
html input file, with URL as the value for the
href
attribute.
For instance, if you specify ‘http://foo/bar/a.html’ forURL, and Wget reads ‘../baz/b.html’ from the input file, itwould be resolved to ‘http://foo/baz/b.html’.
Use of ‘-O’ is not intended to mean simply “use the namefile instead of the one in the URL;” rather, it isanalogous to shell redirection:‘wget -O file http://foo’ is intended to work like‘wget -O - http://foo > file’;file will be truncatedimmediately, and all downloaded content will be written there.
For this reason, ‘-N’ (for timestamp-checking) is not supportedin combination with ‘-O’: sincefile is always newlycreated, it will always have a very new timestamp. A warning will beissued if this combination is used.
Similarly, using ‘-r’ or ‘-p’ with ‘-O’ may not work asyou expect: Wget won't just download the first file tofile andthen download the rest to their normal names: all downloadedcontent will be placed infile. This was disabled in version1.11, but has been reinstated (with a warning) in 1.11.2, as there aresome cases where this behavior can actually have some use.
Note that a combination with ‘-k’ is only permitted whendownloading a single document, as in that case it will just convertall relative URIs to external ones; ‘-k’ makes no sense formultiple URIs when they're all being downloaded to a single file;‘-k’ can be used only when the output is a regular file.
When running Wget without ‘-N’, ‘-nc’, ‘-r’, or‘-p’, downloading the same file in the same directory will resultin the original copy of file being preserved and the second copybeing named ‘file.1’. If that file is downloaded yetagain, the third copy will be named ‘file.2’, and so on. (This is also the behavior with ‘-nd’, even if ‘-r’ or‘-p’ are in effect.) When ‘-nc’ is specified, this behavioris suppressed, and Wget will refuse to download newer copies of‘file’. Therefore, “no-clobber
” is actually amisnomer in this mode—it's not clobbering that's prevented (as thenumeric suffixes were already preventing clobbering), but rather themultiple version saving that's prevented.
When running Wget with ‘-r’ or ‘-p’, but without ‘-N’,‘-nd’, or ‘-nc’, re-downloading a file will result in thenew copy simply overwriting the old. Adding ‘-nc’ will preventthis behavior, instead causing the original version to be preservedand any newer copies on the server to be ignored.
When running Wget with ‘-N’, with or without ‘-r’ or‘-p’, the decision as to whether or not to download a newer copyof a file depends on the local and remote timestamp and size of thefile (see Time-Stamping). ‘-nc’ may not be specified at thesame time as ‘-N’.
Note that when ‘-nc’ is specified, files with the suffixes‘.html’ or ‘.htm’ will be loaded from the local disk andparsed as if they had been retrieved from the Web.
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
If there is a file named ls-lR.Z in the current directory, Wgetwill assume that it is the first portion of the remote file, and willask the server to continue the retrieval from an offset equal to thelength of the local file.
Note that you don't need to specify this option if you just want thecurrent invocation of Wget to retry downloading a file should theconnection be lost midway through. This is the default behavior. ‘-c’ only affects resumption of downloads started prior tothis invocation of Wget, and whose local files are still sitting around.
Without ‘-c’, the previous example would just download the remotefile tols-lR.Z.1, leaving the truncated ls-lR.Z filealone.
Beginning with Wget 1.7, if you use ‘-c’ on a non-empty file, andit turns out that the server does not support continued downloading,Wget will refuse to start the download from scratch, which wouldeffectively ruin existing contents. If you really want the download tostart from scratch, remove the file.
Also beginning with Wget 1.7, if you use ‘-c’ on a file which is ofequal size as the one on the server, Wget will refuse to download thefile and print an explanatory message. The same happens when the fileis smaller on the server than locally (presumably because it was changedon the server since your last download attempt)—because “continuing”is not meaningful, no download occurs.
On the other side of the coin, while using ‘-c’, any file that'sbigger on the server than locally will be considered an incompletedownload and only(length(remote) - length(local))
bytes will bedownloaded and tacked onto the end of the local file. This behavior canbe desirable in certain cases—for instance, you can use ‘wget -c’to download just the new portion that's been appended to a datacollection or log file.
However, if the file is bigger on the server because it's beenchanged, as opposed to justappended to, you'll end upwith a garbled file. Wget has no way of verifying that the local fileis really a valid prefix of the remote file. You need to be especiallycareful of this when using ‘-c’ in conjunction with ‘-r’,since every file will be considered as an "incomplete download" candidate.
Another instance where you'll get a garbled file if you try to use‘-c’ is if you have a lamehttp proxy that inserts a“transfer interrupted” string into the local file. In the future a“rollback” option may be added to deal with this case.
Note that ‘-c’ only works with ftp servers and with httpservers that support the Range
header.
The “bar” indicator is used by default. It draws an ascii progressbar graphics (a.k.a “thermometer” display) indicating the status ofretrieval. If the output is not a TTY, the “dot” bar will be used bydefault.
Use ‘--progress=dot’ to switch to the “dot” display. It tracesthe retrieval by printing dots on the screen, each dot representing afixed amount of downloaded data.
When using the dotted retrieval, you may also set the style byspecifying the type as ‘dot:style’. Different styles assigndifferent meaning to one dot. With thedefault
style each dotrepresents 1K, there are ten dots in a cluster and 50 dots in a line. Thebinary
style has a more “computer”-like orientation—8Kdots, 16-dots clusters and 48 dots per line (which makes for 384Klines). Themega
style is suitable for downloading very largefiles—each dot represents 64K retrieved, there are eight dots in acluster, and 48 dots on each line (so each line contains 3M).
Note that you can set the default style using the progress
command in.wgetrc. That setting may be overridden from thecommand line. The exception is that, when the output is not a TTY, the“dot” progress will be favored over “bar”. To force the bar output,use ‘--progress=bar:force’.
By default, when a file is downloaded, it's timestamps are set tomatch those from the remote file. This allows the use of‘--timestamping’ on subsequent invocations of wget. However, itis sometimes useful to base the local file's timestamp on when it wasactually downloaded; for that purpose, the‘--no-use-server-timestamps’ option has been provided.
wget --spider --force-html -i bookmarks.html
This feature needs much more work for Wget to get close to thefunctionality of real web spiders.
When interacting with the network, Wget can check for timeout andabort the operation if it takes too long. This prevents anomalieslike hanging reads and infinite connects. The only timeout enabled bydefault is a 900-second read timeout. Setting a timeout to 0 disablesit altogether. Unless you know what you are doing, it is best not tochange the default timeout settings.
All timeout-related options accept decimal values, as well assubsecond values. For example, ‘0.1’ seconds is a legal (thoughunwise) choice of timeout. Subsecond timeouts are useful for checkingserver response times or for testing network latency.
Of course, the remote server may choose to terminate the connectionsooner than this option requires. The default read timeout is 900seconds.
This option allows the use of decimal numbers, usually in conjunctionwith power suffixes; for example, ‘--limit-rate=2.5k’ is a legalvalue.
Note that Wget implements the limiting by sleeping the appropriateamount of time after a network read that took less time than specifiedby the rate. Eventually this strategy causes the TCP transfer to slowdown to approximately the specified rate. However, it may take sometime for this balance to be achieved, so don't be surprised if limitingthe rate doesn't work well with very small files.
m
suffix, in hours using
h
suffix, or in days using
d
suffix.
Specifying a large value for this option is useful if the network or thedestination host is down, so that Wget can wait long enough toreasonably expect the network error to be fixed before the retry. Thewaiting interval specified by this function is influenced by--random-wait
, which see.
By default, Wget will assume a value of 10 seconds.
A 2001 article in a publication devoted to development on a popularconsumer platform provided code to perform this analysis on the fly. Its author suggested blocking at the class C address level to ensureautomated retrieval programs were blocked despite changing DHCP-suppliedaddresses.
The ‘--random-wait’ option was inspired by this ill-advisedrecommendation to block many unrelated users from a web site due to theactions of one.
*_proxy
environmentvariable is defined.
For more information about the use of proxies with Wget, See Proxies.
Note that quota will never affect downloading a single file. So if youspecify ‘wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz’, all of thels-lR.gz will be downloaded. The same goes even when severalurls are specified on the command-line. However, quota isrespected when retrieving either recursively, or from an input file. Thus you may safely type ‘wget -Q2m -i sites’—download will beaborted when the quota is exceeded.
Setting quota to 0 or to ‘inf’ unlimits the download quota.
However, it has been reported that in some situations it is notdesirable to cache host names, even for the duration of ashort-running application like Wget. With this option Wget issues anew DNS lookup (more precisely, a new call togethostbyname
orgetaddrinfo
) each time it makes a new connection. Please notethat this option willnot affect caching that might beperformed by the resolving library or by an external caching layer,such as NSCD.
If you don't understand exactly what this option does, you probablywon't need it.
By default, Wget escapes the characters that are not valid or safe aspart of file names on your operating system, as well as controlcharacters that are typically unprintable. This option is useful forchanging these defaults, perhaps because you are downloading to anon-native partition, or because you want to disable escaping of thecontrol characters, or you want to further restrict characters to onlythose in theascii range of values.
The modes are a comma-separated set of text values. Theacceptable values are ‘unix’, ‘windows’, ‘nocontrol’,‘ascii’, ‘lowercase’, and ‘uppercase’. The values‘unix’ and ‘windows’ are mutually exclusive (one willoverride the other), as are ‘lowercase’ and‘uppercase’. Those last are special cases, as they do not changethe set of characters that would be escaped, but rather force localfile paths to be converted either to lower- or uppercase.
When “unix” is specified, Wget escapes the character ‘/’ andthe control characters in the ranges 0–31 and 128–159. This is thedefault on Unix-like operating systems.
When “windows” is given, Wget escapes the characters ‘\’,‘|’, ‘/’, ‘:’, ‘?’, ‘"’, ‘*’, ‘<’,‘>’, and the control characters in the ranges 0–31 and 128–159. In addition to this, Wget in Windows mode uses ‘+’ instead of‘:’ to separate host and port in local file names, and uses‘@’ instead of ‘?’ to separate the query portion of the filename from the rest. Therefore, a URL that would be saved as‘www.xemacs.org:4300/search.pl?input=blah’ in Unix mode would besaved as ‘www.xemacs.org+4300/search.pl@input=blah’ in Windowsmode. This mode is the default on Windows.
If you specify ‘nocontrol’, then the escaping of the controlcharacters is also switched off. This option may make sensewhen you are downloading URLs whose names contain UTF-8 characters, ona system which can save and display filenames in UTF-8 (some possiblebyte values used in UTF-8 byte sequences fall in the range of valuesdesignated by Wget as “controls”).
The ‘ascii’ mode is used to specify that any bytes whose valuesare outside the range ofascii characters (that is, greater than127) shall be escaped. This can be useful when saving filenameswhose encoding does not match the one used locally.
Neither options should be needed normally. By default, an IPv6-awareWget will use the address family specified by the host's DNS record. If the DNS responds with both IPv4 and IPv6 addresses, Wget will trythem in sequence until it finds one it can connect to. (Also see--prefer-family
option described below.)
These options can be used to deliberately force the use of IPv4 orIPv6 address families on dual family systems, usually to aid debuggingor to deal with broken network configuration. Only one of‘--inet6-only’ and ‘--inet4-only’ may be specified at thesame time. Neither option is available in Wget compiled without IPv6support.
This avoids spurious errors and connect attempts when accessing hoststhat resolve to both IPv6 and IPv4 addresses from IPv4 networks. Forexample, ‘www.kame.net’ resolves to‘2001:200:0:8002:203:47ff:fea5:3085’ and to‘203.178.141.194’. When the preferred family isIPv4
, theIPv4 address is used first; when the preferred family is IPv6
,the IPv6 address is used first; if the specified value is none
,the address order returned by DNS is used without change.
Unlike ‘-4’ and ‘-6’, this option doesn't inhibit access toany address family, it only changes theorder in which theaddresses are accessed. Also note that the reordering performed bythis option isstable—it doesn't affect order of addresses ofthe same family. That is, the relative order of all IPv4 addressesand of all IPv6 addresses remains intact in all cases.
You can set the default state of IRI support using the iri
command in.wgetrc. That setting may be overridden from thecommand line.
Wget use the function nl_langinfo()
and then the CHARSET
environment variable to get the locale. If it fails,ascii is used.
You can set the default local encoding using the local_encoding
command in.wgetrc. That setting may be overridden from thecommand line.
For HTTP, remote encoding can be found in HTTP Content-Type
header and in HTMLContent-Type http-equiv
meta tag.
You can set the default encoding using the remoteencoding
command in.wgetrc. That setting may be overridden from thecommand line.
Take, for example, the directory at‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with‘-r’, it will be saved locally underftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option canremove the ftp.xemacs.org/ part, you are still stuck withpub/xemacs. This is where ‘--cut-dirs’ comes in handy; itmakes Wget not “see”number remote directory components. Hereare several examples of how ‘--cut-dirs’ option works.
No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> . --cut-dirs=1 -> ftp.xemacs.org/xemacs/ ...
If you just want to get rid of the directory structure, this option issimilar to a combination of ‘-nd’ and ‘-P’. However, unlike‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—forinstance, with ‘-nH --cut-dirs=1’, abeta/ subdirectory willbe placed to xemacs/beta, as one would expect.
Note that filenames changed in this way will be re-downloaded every timeyou re-mirror a site, because Wget can't tell that the localX.html file corresponds to remote URL ‘X’ (sinceit doesn't yet know that the URL produces output of type‘text/html’ or ‘application/xhtml+xml’.
As of version 1.12, Wget will also ensure that any downloaded files oftype ‘text/css’ end in the suffix ‘.css’, and the option wasrenamed from ‘--html-extension’, to better reflect its newbehavior. The old option name is still acceptable, but should now beconsidered deprecated.
At some point in the future, this option may well be expanded toinclude suffixes for other types of content, including content typesthat are not parsed by Wget.
basic
(insecure),the
digest
, or the Windows
NTLM
authentication scheme.
Another way to specify username and password is in the url itself(seeURL Format). Either method reveals your password to anyone whobothers to runps
. To prevent the passwords from being seen,store them in .wgetrc or.netrc, and make sure to protectthose files from other users withchmod
. If the passwords arereally important, do not leave them lying in those files either—editthe files and delete them after Wget has started the download.
This option is useful when, for some reason, persistent (keep-alive)connections don't work for you, for example due to a server bug or dueto the inability of server-side scripts to cope with the connections.
Caching is allowed by default.
Set-Cookie
header, and the client responds with the same cookieupon further requests. Since cookies allow the server owners to keeptrack of visitors and for sites to exchange this information, someconsider them a breach of privacy. The default is to use cookies;however,
storing cookies is not on by default.
You will typically use this option when mirroring sites that requirethat you be logged in to access some or all of their content. The loginprocess typically works by the web server issuing anhttp cookieupon receiving and verifying your credentials. The cookie is thenresent by the browser when accessing that part of the site, and soproves your identity.
Mirroring such a site requires Wget to send the same cookies yourbrowser sends when communicating with the site. This is achieved by‘--load-cookies’—simply point Wget to the location of thecookies.txt file, and it will send the same cookies your browserwould send in the same situation. Different browsers keep textualcookie files in different locations:
If you cannot use ‘--load-cookies’, there might still be analternative. If your browser supports a “cookie manager”, you can useit to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wgetto send those cookies, bypassing the “official” cookie support:
wget --no-cookies --header "Cookie: name=value"
Since the cookie file format does not normally carry session cookies,Wget marks them with an expiry timestamp of 0. Wget's‘--load-cookies’ recognizes those as session cookies, but it mightconfuse other browsers. Also note that cookies so loaded will betreated as other session cookies, which means that if you want‘--save-cookies’ to preserve them again, you must use‘--keep-session-cookies’ again.
Content-Length
headers, which makes Wgetgo wild, as it thinks not all the document was retrieved. You can spotthis syndrome if Wget retries getting the same document again and again,each time claiming that the (otherwise normal) connection has closed onthe very same byte.
With this option, Wget will ignore the Content-Length
header—asif it never existed.
You may define more than one additional header by specifying‘--header’ more than once.
wget --header='Accept-Charset: iso-8859-2' \ --header='Accept-Language: hr' \ http://fly.srk.fer.hr/
Specification of an empty string as the header value will clear allprevious user-defined headers.
As of Wget 1.10, this option can be used to override headers otherwisegenerated automatically. This example instructs Wget to connect tolocalhost, but to specify ‘foo.bar’ in theHost
header:
wget --header="Host: foo.bar" http://localhost/
In versions of Wget prior to 1.10 such use of ‘--header’ causedsending of duplicate headers.
basic
authentication scheme.
Security considerations similar to those with ‘--http-password’pertain here as well.
The http protocol allows the clients to identify themselves using aUser-Agent
header field. This enables distinguishing thewww software, usually for statistical purposes or for tracing ofprotocol violations. Wget normally identifies as‘Wget/version’,version being the current versionnumber of Wget.
However, some sites have been known to impose the policy of tailoringthe output according to theUser-Agent
-supplied information. While this is not such a bad idea in theory, it has been abused byservers denying information to clients other than (historically)Netscape or, more frequently, Microsoft Internet Explorer. Thisoption allows you to change the User-Agent
line issued by Wget. Use of this option is discouraged, unless you really know what you aredoing.
Specifying empty user agent with ‘--user-agent=""’ instructs Wgetnot to send theUser-Agent
header in http requests.
key1=value1&key2=value2
,with percent-encoding for special characters; the only difference isthat one expects its content as a command-line parameter and the otheraccepts its content from a file. In particular, ‘
--post-file’ is
not for transmitting files as form attachments: those mustappear as
key=value
data (with appropriate percent-coding) justlike everything else. Wget does not currently support
multipart/form-data
for transmitting POST data; only
application/x-www-form-urlencoded
. Only one of‘
--post-data’ and ‘
--post-file’ should be specified.
Please be aware that Wget needs to know the size of the POST data inadvance. Therefore the argument to--post-file
must be a regularfile; specifying a FIFO or something like/dev/stdin won't work. It's not quite clear how to work around this limitation inherent inHTTP/1.0. Although HTTP/1.1 introduceschunked transfer thatdoesn't require knowing the request length in advance, a client can'tuse chunked unless it knows it's talking to an HTTP/1.1 server. And itcan't know that until it receives a response, which in turn requires therequest to have been completed – a chicken-and-egg problem.
Note: if Wget is redirected after the POST request is completed, itwill not send the POST data to the redirected URL. This is becauseURLs that process POST often respond with a redirection to a regularpage, which does not desire or accept POST. It is not completelyclear that this behavior is optimal; if it doesn't work out, it mightbe changed in the future.
This example shows how to log to a server using POST and then proceed todownload the desired pages, presumably only accessible to authorizedusers:
# Log in to the server. This can be done only once. wget --save-cookies cookies.txt \ --post-data 'user=foo&password=bar' \ http://server.com/auth.php # Now grab the page or pages we care about. wget --load-cookies cookies.txt \ -p http://server.com/interesting/article.php
If the server is using session cookies to track user authentication,the above will not work because ‘--save-cookies’ will not savethem (and neither will browsers) and thecookies.txt file willbe empty. In that case use ‘--keep-session-cookies’ along with‘--save-cookies’ to force saving of session cookies.
Content-Disposition
headers is enabled. This can currently result inextra round-trips to the server for a
HEAD
request, and is knownto suffer from a few bugs, which is why it is not currently enabled by default.
This option is useful for some file-downloading CGI programs that useContent-Disposition
headers to describe what the name of adownloaded file should be.
Use of this option is not recommended, and is intended only to supportsome few obscure servers, which never send HTTP authenticationchallenges, but accept unsolicited auth info, say, in addition toform-based authentication.
To support encrypted HTTP (HTTPS) downloads, Wget must be compiledwith an external SSL library, currently OpenSSL. If Wget is compiledwithout SSL support, none of these options are available.
Specifying ‘SSLv2’, ‘SSLv3’, or ‘TLSv1’ forces the useof the corresponding protocol. This is useful when talking to old andbuggy SSL server implementations that make it hard for OpenSSL tochoose the correct protocol version. Fortunately, such servers arequite rare.
As of Wget 1.10, the default is to verify the server's certificateagainst the recognized certificate authorities, breaking the SSLhandshake and aborting the download if the verification fails. Although this provides more secure downloads, it does breakinteroperability with some sites that worked with previous Wgetversions, particularly those using self-signed, expired, or otherwiseinvalid certificates. This option forces an “insecure” mode ofoperation that turns the certificate verification errors into warningsand allows you to proceed.
If you encounter “certificate verification” errors or ones sayingthat “common name doesn't match requested host name”, you can usethis option to bypass the verification and proceed with the download.Only use this option if you are otherwise convinced of thesite's authenticity, or if you really don't care about the validity ofits certificate. It is almost always a bad idea not to check thecertificates when transmitting confidential or important data.
Without this option Wget looks for CA certificates at thesystem-specified locations, chosen at OpenSSL installation time.
c_rehash
utility supplied withOpenSSL. Using ‘
--ca-directory’ is more efficient than‘
--ca-certificate’ when many certificates are installed becauseit allows Wget to fetch certificates on demand.
Without this option Wget looks for CA certificates at thesystem-specified locations, chosen at OpenSSL installation time.
On such systems the SSL library needs an external source of randomnessto initialize. Randomness may be provided by EGD (see‘--egd-file’ below) or read from an external source specified bythe user. If this option is not specified, Wget looks for random datain $RANDFILE
or, if that is unset, in$HOME/.rnd. Ifnone of those are available, it is likely that SSL encryption will notbe usable.
If you're getting the “Could not seed OpenSSL PRNG; disabling SSL.”error, you should provide random data using some of the methodsdescribed above.
OpenSSL allows the user to specify his own source of entropy using theRAND_FILE
environment variable. If this variable is unset, orif the specified file does not produce enough randomness, OpenSSL willread random data from EGD socket specified using this option.
If this option is not specified (and the equivalent startup command isnot used), EGD is never contacted. EGD is not needed on modern Unixsystems that support/dev/random.
Another way to specify username and password is in the url itself(seeURL Format). Either method reveals your password to anyone whobothers to runps
. To prevent the passwords from being seen,store them in .wgetrc or.netrc, and make sure to protectthose files from other users withchmod
. If the passwords arereally important, do not leave them lying in those files either—editthe files and delete them after Wget has started the download.
Note that even though Wget writes to a known filename for this file,this is not a security hole in the scenario of a user making.listing a symbolic link to/etc/passwd or something andasking root
to run Wget in his or her directory. Depending onthe options used, either Wget will refuse to write to.listing,making the globbing/recursion/time-stamping operation fail, or thesymbolic link will be deleted and replaced with the actual.listing file, or the listing will be written to a.listing.number file.
Even though this situation isn't a problem, though, root
shouldnever run Wget in a non-trusted user's directory. A user could dosomething as simple as linkingindex.html to /etc/passwdand askingroot
to run Wget with ‘-N’ or ‘-r’ so the filewill be overwritten.
wget ftp://gnjilux.srk.fer.hr/*.msg
By default, globbing will be turned on if the url contains aglobbing character. This option may be used to turn globbing on or offpermanently.
You may have to quote the url to protect it from being expanded byyour shell. Globbing makes Wget look for a directory listing, which issystem-specific. This is why it currently works only with Unixftpservers (and the ones emulating Unix ls
output).
If the machine is connected to the Internet directly, both passive andactive FTP should work equally well. Behind most firewall and NATconfigurations passive FTP has a better chance of working. However,in some rare firewall configurations, active FTP actually works whenpassive FTP doesn't. If you suspect this to be the case, use thisoption, or setpassive_ftp=off
in your init file.
When ‘--retr-symlinks’ is specified, however, symbolic links aretraversed and the pointed-to files are retrieved. At this time, thisoption does not cause Wget to traverse symlinks to directories andrecurse through them, but in the future it should be enhanced to dothis.
Note that when retrieving a file (not a directory) because it wasspecified on the command-line, rather than because it was recursed to,this option has no effect. Symbolic links are always traversed in thiscase.
wget -r -nd --delete-after http://whatever.com/~popular/page/
The ‘-r’ option is to retrieve recursively, and ‘-nd’ to notcreate directories.
Note that ‘--delete-after’ deletes files on the local machine. Itdoes not issue the ‘DELE’ command to remote FTP sites, forinstance. Also note that when ‘--delete-after’ is specified,‘--convert-links’ is ignored, so ‘.orig’ files are simply notcreated in the first place.
Each link will be changed in one of the two ways:
Example: if the downloaded file /foo/doc.html links to/bar/img.gif, also downloaded, then the link indoc.htmlwill be modified to point to ‘../bar/img.gif’. This kind oftransformation works reliably for arbitrary combinations of directories.
Example: if the downloaded file /foo/doc.html links to/bar/img.gif (or to../bar/img.gif), then the link indoc.html will be modified to point tohttp://hostname/bar/img.gif.
Because of this, local browsing works reliably: if a linked file wasdownloaded, the link will refer to its local name; if it was notdownloaded, the link will refer to its full Internet address rather thanpresenting a broken link. The fact that the former links are convertedto relative links ensures that you can move the downloaded hierarchy toanother directory.
Note that only at the end of the download can Wget know which links havebeen downloaded. Because of that, the work done by ‘-k’ will beperformed at the end of all the downloads.
Ordinarily, when downloading a single html page, any requisite documentsthat may be needed to display it properly are not downloaded. Using‘-r’ together with ‘-l’ can help, but since Wget does notordinarily distinguish between external and inlined documents, one isgenerally left with “leaf documents” that are missing theirrequisites.
For instance, say document 1.html contains an tagreferencing 1.gif and an
tag pointing to externaldocument 2.html. Say that2.html is similar but that itsimage is 2.gif and it links to 3.html. Say thiscontinues up to some arbitrarily high number.
If one executes the command:
wget -r -l 2 http://site/1.html
then 1.html, 1.gif,2.html, 2.gif, and3.html will be downloaded. As you can see,3.html iswithout its requisite 3.gif because Wget is simply counting thenumber of hops (up to 2) away from1.html in order to determinewhere to stop the recursion. However, with this command:
wget -r -l 2 -p http://site/1.html
all the above files and 3.html's requisite3.gifwill be downloaded. Similarly,
wget -r -l 1 -p http://site/1.html
will cause 1.html, 1.gif,2.html, and 2.gifto be downloaded. One might think that:
wget -r -l 0 -p http://site/1.html
would download just 1.html and 1.gif, but unfortunatelythis is not the case, because ‘-l 0’ is equivalent to‘-l inf’—that is, infinite recursion. To download a single htmlpage (or a handful of them, all specified on the command-line or in a‘-i’url input file) and its (or their) requisites, simply leave off‘-r’ and ‘-l’:
wget -p http://site/1.html
Note that Wget will behave as if ‘-r’ had been specified, but onlythat single page and its requisites will be downloaded. Links from thatpage to external documents will not be followed. Actually, to downloada single page and all its requisites (even if they exist on separatewebsites), and make sure the lot displays properly locally, this authorlikes to use a few options in addition to ‘-p’:
wget -E -H -k -K -p http://site/document
To finish off this topic, it's worth knowing that Wget's idea of anexternal document link is any URL specified in an tag, an
tag, or a
tag other than
.
According to specifications, html comments are expressed assgmldeclarations. Declaration is special markup that begins with‘’ and ends with ‘>’, such as ‘’, thatmay contain comments between a pair of ‘--’ delimiters.htmlcomments are “empty declarations”, sgml declarations without anynon-comment text. Therefore, ‘’ is a valid comment, andso is ‘’, but ‘’ is not.
On the other hand, most html writers don't perceive comments as anythingother than text delimited with ‘’, which is notquite the same. For example, something like ‘’works as a valid comment as long as the number of dashes is a multipleof four (!). If not, the comment technically lasts until the next‘--’, which may be at the other end of the document. Because ofthis, many popular browsers completely ignore the specification andimplement what users have come to expect: comments delimited with‘’.
Until version 1.9, Wget interpreted comments strictly, which resulted inmissing links in many web pages that displayed fine in browsers, but hadthe misfortune of containing non-compliant comments. Beginning withversion 1.9, Wget has joined the ranks of clients that implements“naive” comments, terminating each comment at the first occurrence of‘-->’.
If, for whatever reason, you want strict comment parsing, use thisoption to turn it on.
In the past, this option was the best bet for downloading a single pageand its requisites, using a command-line like:
wget --ignore-tags=a,area -H -k -K -r http://site/document
However, the author of this option came across a page with tags like and came to the realization thatspecifying tags to ignore was not enough. One can't just tell Wget toignore
, because then stylesheets will not be downloaded. Now the best bet for downloading a single page and its requisites is thededicated ‘--page-requisites’ option.
Wget may return one of several error codes if it encounters problems.
With the exceptions of 0 and 1, the lower-numbered exit codes takeprecedence over higher-numbered ones, when multiple types of errorsare encountered.
In versions of Wget prior to 1.12, Wget's exit status tended to beunhelpful and inconsistent. Recursive downloads would virtually alwaysreturn 0 (success), regardless of any issues encountered, andnon-recursive fetches only returned the status corresponding to themost recently-attempted download.
GNU Wget is capable of traversing parts of the Web (or a singlehttp orftp server), following links and directory structure. We refer to this as torecursive retrieval, or recursion.
With http urls, Wget retrieves and parses thehtml orcss from the given url, retrieving the files the documentrefers to, through markup like href
or src
, or cssuri values specified using the ‘url()’ functional notation. If the freshly downloaded file is also of typetext/html
,application/xhtml+xml
, or text/css
, it will be parsedand followed further.
Recursive retrieval of http and html/css content isbreadth-first. This means that Wget first downloads the requesteddocument, then the documents linked from that document, then thedocuments linked by them, and so on. In other words, Wget firstdownloads the documents at depth 1, then those at depth 2, and so onuntil the specified maximum depth.
The maximum depth to which the retrieval may descend is specifiedwith the ‘-l’ option. The default maximum depth is five layers.
When retrieving an ftp url recursively, Wget will retrieve allthe data from the given directory tree (including the subdirectories upto the specified depth) on the remote server, creating its mirror imagelocally.ftp retrieval is also limited by the depth
parameter. Unlikehttp recursion, ftp recursion is performeddepth-first.
By default, Wget will create a local directory tree, corresponding tothe one found on the remote server.
Recursive retrieving can find a number of applications, the mostimportant of which is mirroring. It is also useful forwwwpresentations, and any other opportunities where slow networkconnections should be bypassed by storing the files locally.
You should be warned that recursive downloads can overload the remoteservers. Because of that, many administrators frown upon them and mayban access from your site if they detect very fast downloads of bigamounts of content. When downloading from Internet servers, considerusing the ‘-w’ option to introduce a delay between accesses to theserver. The download will take a while longer, but the serveradministrator will not be alarmed by your rudeness.
Of course, recursive download may cause problems on your machine. Ifleft to run unchecked, it can easily fill up the disk. If downloadingfrom local network, it can also take bandwidth on the system, as well asconsume memory and CPU.
Try to specify the criteria that match the kind of download you aretrying to achieve. If you want to download only one page, use‘--page-requisites’ without any additional recursion. If you wantto download things under one directory, use ‘-np’ to avoiddownloading things from other directories. If you want to download allthe files from one directory, use ‘-l 1’ to make sure the recursiondepth never exceeds one. See Following Links, for more informationabout this.
Recursive retrieval should be used with care. Don't say you were notwarned.
When retrieving recursively, one does not wish to retrieve loads ofunnecessary data. Most of the time the users bear in mind exactly whatthey want to download, and want Wget to follow only specific links.
For example, if you wish to download the music archive from‘fly.srk.fer.hr’, you will not want to download all the home pagesthat happen to be referenced by an obscure part of the archive.
Wget possesses several mechanisms that allows you to fine-tune whichlinks it will follow.
Wget's recursive retrieval normally refuses to visit hosts differentthan the one you specified on the command line. This is a reasonabledefault; without it, every retrieval would have the potential to turnyour Wget into a small version of google.
However, visiting different hosts, or host spanning, is sometimesa useful option. Maybe the images are served from a different server. Maybe you're mirroring a site that consists of pages interlinked betweenthree servers. Maybe the server has two equivalent names, and the htmlpages refer to both interchangeably.
wget -rH -Dserver.com http://www.server.com/
You can specify more than one address by separating them with a comma,e.g. ‘-Ddomain1.com,domain2.com’.
wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \ http://www.foo.edu/
When downloading material from the web, you will often want to restrictthe retrieval to only certain file types. For example, if you areinterested in downloadinggifs, you will not be overjoyed to getloads of PostScript documents, and vice versa.
Wget offers two options to deal with this problem. Each optiondescription lists a short name, a long name, and the equivalent commandin.wgetrc.
So, specifying ‘wget -A gif,jpg’ will make Wget download only thefiles ending with ‘gif’ or ‘jpg’, i.e.gifs andjpegs. On the other hand, ‘wget -A "zelazny*196[0-9]*"’ willdownload only files beginning with ‘zelazny’ and containing numbersfrom 1960 to 1969 anywhere within. Look up the manual of your shell fora description of how pattern matching works.
Of course, any number of suffixes and patterns can be combined into acomma-separated list, and given as an argument to ‘-A’.
So, if you want to download a whole page except for the cumbersomempegs and.au files, you can use ‘wget -R mpg,mpeg,au’. Analogously, to download all files except the ones beginning with‘bjork’, use ‘wget -R "bjork*"’. The quotes are to preventexpansion by the shell.
The ‘-A’ and ‘-R’ options may be combined to achieve evenbetter fine-tuning of which files to retrieve. E.g. ‘wget -A"*zelazny*" -R .ps’ will download all the files having ‘zelazny’ asa part of their name, butnot the PostScript files.
Note that these two options do not affect the downloading of htmlfiles (as determined by a ‘.htm’ or ‘.html’ filenameprefix). This behavior may not be desirable for all users, and may bechanged for future versions of Wget.
Note, too, that query strings (strings at the end of a URL beginningwith a question mark (‘?’) are not included as part of thefilename for accept/reject rules, even though these will actuallycontribute to the name chosen for the local file. It is expected thata future version of Wget will provide an option to allow matchingagainst query strings.
Finally, it's worth noting that the accept/reject lists are matchedtwice against downloaded files: once against the URL's filenameportion, to determine if the file should be downloaded in the firstplace; then, after it has been accepted and successfully downloaded,the local file's name is also checked against the accept/reject liststo see if it should be removed. The rationale was that, since‘.htm’ and ‘.html’ files are always downloaded regardless ofaccept/reject rules, they should be removed after beingdownloaded and scanned for links, if they did match the accept/rejectlists. However, this can lead to unexpected results, since the localfilenames can differ from the original URL filenames in the followingways, all of which can change whether an accept/reject rule matches:
This behavior, too, is considered less-than-desirable, and may changein a future version of Wget.
Regardless of other link-following facilities, it is often useful toplace the restriction of what files to retrieve based on the directoriesthose files are placed in. There can be many reasons for this—thehome pages may be organized in a reasonable directory structure; or somedirectories may contain useless information, e.g./cgi-bin or/dev directories.
Wget offers three different options to deal with this requirement. Eachoption description lists a short name, a long name, and the equivalentcommand in.wgetrc.
So, if you wish to download from ‘http://host/people/bozo/’following only links to bozo's colleagues in the/peopledirectory and the bogus scripts in /cgi-bin, you can specify:
wget -I /people,/cgi-bin http://host/people/bozo/
The same as with ‘-A’/‘-R’, these two options can be combinedto get a better fine-tuning of downloading subdirectories. E.g. if youwant to load all the files from/pub hierarchy except for/pub/worthless, specify ‘-I/pub -X/pub/worthless’.
The ‘--no-parent’ option (short ‘-np’) is useful in this case. Using it guarantees that you will never leave the existing hierarchy. Supposing you issue Wget with:
wget -r --no-parent http://somehost/~luzer/my-archive/
You may rest assured that none of the references to/~his-girls-homepage/ or/~luzer/all-my-mpegs/ will befollowed. Only the archive you are interested in will be downloaded. Essentially, ‘--no-parent’ is similar to‘-I/~luzer/my-archive’, only it handles redirections in a moreintelligent fashion.
Note that, for HTTP (and HTTPS), the trailing slash is veryimportant to ‘--no-parent’. HTTP has no concept of a “directory”—Wgetrelies on you to indicate what's a directory and what isn't. In‘http://foo/bar/’, Wget will consider ‘bar’ to be adirectory, while in ‘http://foo/bar’ (no trailing slash),‘bar’ will be considered a filename (so ‘--no-parent’ would bemeaningless, as its parent is ‘/’).
When ‘-L’ is turned on, only the relative links are ever followed. Relative links are here defined those that do not refer to the webserver root. For example, these links are relative:
These links are not relative:
Using this option guarantees that recursive retrieval will not spanhosts, even without ‘-H’. In simple cases it also allows downloadsto “just work” without having to convert links.
This option is probably not very useful and might be removed in a futurerelease.
The rules for ftp are somewhat specific, as it is necessary forthem to be.ftp links in html documents are often includedfor purposes of reference, and it is often inconvenient to download themby default.
To have ftp links followed from html documents, you need tospecify the ‘--follow-ftp’ option. Having done that,ftplinks will span hosts regardless of ‘-H’ setting. This is logical,asftp links rarely point to the same host where the httpserver resides. For similar reasons, the ‘-L’ options has noeffect on such downloads. On the other hand, domain acceptance(‘-D’) and suffix rules (‘-A’ and ‘-R’) apply normally.
Also note that followed links to ftp directories will not beretrieved recursively further.
One of the most important aspects of mirroring information from theInternet is updating your archives.
Downloading the whole archive again and again, just to replace a fewchanged files is expensive, both in terms of wasted bandwidth and money,and the time to do the update. This is why all the mirroring toolsoffer the option of incremental updating.
Such an updating mechanism means that the remote server is scanned insearch ofnew files. Only those new files will be downloaded inthe place of the old ones.
A file is considered new if one of these two conditions are met:
To implement this, the program needs to be aware of the time of lastmodification of both local and remote files. We call this information thetime-stamp of a file.
The time-stamping in GNU Wget is turned on using ‘--timestamping’(‘-N’) option, or throughtimestamping = on
directive in.wgetrc. With this option, for each file it intends to download,Wget will check whether a local file of the same name exists. If itdoes, and the remote file is not newer, Wget will not download it.
If the local file does not exist, or the sizes of the files do notmatch, Wget will download the remote file no matter what the time-stampssay.
The usage of time-stamping is simple. Say you would like to download afile so that it keeps its date of modification.
wget -S http://www.gnu.ai.mit.edu/
A simple ls -l
shows that the time stamp on the local file equalsthe state of theLast-Modified
header, as returned by the server. As you can see, the time-stamping info is preserved locally, evenwithout ‘-N’ (at least forhttp).
Several days later, you would like Wget to check if the remote file haschanged, and download it if it has.
wget -N http://www.gnu.ai.mit.edu/
Wget will ask the server for the last-modified date. If the local filehas the same timestamp as the server, or a newer one, the remote filewill not be re-fetched. However, if the remote file is more recent,Wget will proceed to fetch it.
The same goes for ftp. For example:
wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"
(The quotes around that URL are to prevent the shell from trying tointerpret the ‘*’.)
After download, a local directory listing will show that the timestampsmatch those on the remote server. Reissuing the command with ‘-N’will make Wget re-fetchonly the files that have been modifiedsince the last download.
If you wished to mirror the GNU archive every week, you would use acommand like the following, weekly:
wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/
Note that time-stamping will only work for files for which the servergives a timestamp. Forhttp, this depends on getting aLast-Modified
header. Forftp, this depends on getting adirectory listing with dates in a format that Wget can parse(seeFTP Time-Stamping Internals).
Time-stamping in http is implemented by checking of theLast-Modified
header. If you wish to retrieve the filefoo.html throughhttp, Wget will check whetherfoo.html exists locally. If it doesn't,foo.html will beretrieved unconditionally.
If the file does exist locally, Wget will first check its localtime-stamp (similar to the wayls -l
checks it), and then send aHEAD
request to the remote server, demanding the information onthe remote file.
The Last-Modified
header is examined to find which file wasmodified more recently (which makes it “newer”). If the remote fileis newer, it will be downloaded; if it is older, Wget will giveup.2
When ‘--backup-converted’ (‘-K’) is specified in conjunctionwith ‘-N’, server file ‘X’ is compared to local file‘X.orig’, if extant, rather than being compared to local file‘X’, which will always differ if it's been converted by‘--convert-links’ (‘-k’).
Arguably, http time-stamping should be implemented using theIf-Modified-Since
request.
In theory, ftp time-stamping works much the same ashttp, onlyftp has no headers—time-stamps must be ferreted out of directorylistings.
If an ftp download is recursive or uses globbing, Wget will use theftpLIST
command to get a file listing for the directorycontaining the desired file(s). It will try to analyze the listing,treating it like Unixls -l
output, extracting the time-stamps. The rest is exactly the same as forhttp. Note that whenretrieving individual files from an ftp server without usingglobbing or recursion, listing files will not be downloaded (and thusfiles will not be time-stamped) unless ‘-N’ is specified.
Assumption that every directory listing is a Unix-style listing maysound extremely constraining, but in practice it is not, as manynon-Unixftp servers use the Unixoid listing format because most(all?) of the clients understand it. Bear in mind thatrfc959defines no standard way to get a file list, let alone the time-stamps. We can only hope that a future standard will define this.
Another non-standard solution includes the use of MDTM
commandthat is supported by someftp servers (including the popularwu-ftpd
), which returns the exact time of the specified file. Wget may support this command in the future.
Once you know how to change default settings of Wget through commandline arguments, you may wish to make some of those settings permanent. You can do that in a convenient way by creating the Wget startupfile—.wgetrc.
Besides .wgetrc is the “main” initialization file, it isconvenient to have a special facility for storing passwords. Thus Wgetreads and interprets the contents of$HOME/.netrc, if it findsit. You can find .netrc format in your system manuals.
Wget reads .wgetrc upon startup, recognizing a limited set ofcommands.
When initializing, Wget will look for aglobal startup file,/usr/local/etc/wgetrc by default (or some prefix other than/usr/local, if Wget was not installed there) and read commandsfrom there, if it exists.
Then it will look for the user's file. If the environmental variableWGETRC
is set, Wget will try to load that file. Failing that, nofurther attempts will be made.
If WGETRC
is not set, Wget will try to load $HOME/.wgetrc.
The fact that user's settings are loaded after the system-wide onesmeans that in case of collision user's wgetrcoverrides thesystem-wide wgetrc (in /usr/local/etc/wgetrc by default). Fascist admins, away!
The syntax of a wgetrc command is simple:
variable = value
The variable will also be called command. Validvalues are different for different commands.
The commands are case-insensitive and underscore-insensitive. Thus‘DIr__PrefiX’ is the same as ‘dirprefix’. Empty lines, linesbeginning with ‘#’ and lines containing white-space only arediscarded.
Commands that expect a comma-separated list will clear the list on anempty command. So, if you wish to reset the rejection list specified inglobalwgetrc, you can do it with:
reject =
The complete set of commands is listed below. Legal values are listedafter the ‘=’. Simple Boolean values can be set or unset using‘on’ and ‘off’ or ‘1’ and ‘0’.
Some commands take pseudo-arbitrary values. address values can behostnames or dotted-quad IP addresses.n can be any positiveinteger, or ‘inf’ for infinity, where appropriate.stringvalues can be any non-empty string.
Most of these commands have direct command-line equivalents. Also, anywgetrc command can be specified on the command line using the‘--execute’ switch (seeBasic Startup Options.)
This command used to be named passwd
prior to Wget 1.10.
This command used to be named login
prior to Wget 1.10.
Content-Length
header; the same as‘
--ignore-length’.
This is the sample initialization file, as given in the distribution. It is divided in two section—one for global usage (suitable for globalstartup file), and one for local usage (suitable for$HOME/.wgetrc). Be careful about the things you change.
Note that almost all the lines are commented out. For a command to haveany effect, you must remove the ‘#’ character at the beginning ofits line.
### ### Sample Wget initialization file .wgetrc ### ## You can use this file to change the default behaviour of wget or to ## avoid having to type many many command-line options. This file does ## not contain a comprehensive list of commands -- look at the manual ## to find out what you can put into this file. ## ## Wget initialization file can reside in /usr/local/etc/wgetrc ## (global, for all users) or $HOME/.wgetrc (for a single user). ## ## To use the settings in this file, you will have to uncomment them, ## as well as change them, in most cases, as the values on the ## commented-out lines are the default values (e.g. "off"). ## ## Global settings (useful for setting up in /usr/local/etc/wgetrc). ## Think well before you change them, since they may reduce wget's ## functionality, and make it behave contrary to the documentation: ## # You can set retrieve quota for beginners by specifying a value # optionally followed by 'K' (kilobytes) or 'M' (megabytes). The # default quota is unlimited. #quota = inf # You can lower (or raise) the default number of retries when # downloading a file (default is 20). #tries = 20 # Lowering the maximum depth of the recursive retrieval is handy to # prevent newbies from going too "deep" when they unwittingly start # the recursive retrieval. The default is 5. #reclevel = 5 # By default Wget uses "passive FTP" transfer where the client # initiates the data connection to the server rather than the other # way around. That is required on systems behind NAT where the client # computer cannot be easily reached from the Internet. However, some # firewalls software explicitly supports active FTP and in fact has # problems supporting passive transfer. If you are in such # environment, use "passive_ftp = off" to revert to active FTP. #passive_ftp = off # The "wait" command below makes Wget wait between every connection. # If, instead, you want Wget to wait only between retries of failed # downloads, set waitretry to maximum number of seconds to wait (Wget # will use "linear backoff", waiting 1 second after the first failure # on a file, 2 seconds after the second failure, etc. up to this max). #waitretry = 10 ## ## Local settings (for a user to set in his $HOME/.wgetrc). It is ## *highly* undesirable to put these settings in the global file, since ## they are potentially dangerous to "normal" users. ## ## Even when setting up your own ~/.wgetrc, you should know what you ## are doing before doing so. ## # Set this to on to use timestamping by default: #timestamping = off # It is a good idea to make Wget send your email address in a `From:' # header with your request (so that server administrators can contact # you in case of errors). Wget does *not* send `From:' by default. #header = From: Your Name# You can set up other headers, like Accept-Language. Accept-Language # is *not* sent by default. #header = Accept-Language: en # You can set the default proxies for Wget to use for http, https, and ftp. # They will override the value in the environment. #https_proxy = http://proxy.yoyodyne.com:18023/ #http_proxy = http://proxy.yoyodyne.com:18023/ #ftp_proxy = http://proxy.yoyodyne.com:18023/ # If you do not want to use proxy at all, set this to off. #use_proxy = on # You can customize the retrieval outlook. Valid options are default, # binary, mega and micro. #dot_style = default # Setting this to off makes Wget not download /robots.txt. Be sure to # know *exactly* what /robots.txt is and how it is used before changing # the default! #robots = on # It can be useful to make Wget wait between connections. Set this to # the number of seconds you want Wget to wait. #wait = 0 # You can force creating directory structure, even if a single is being # retrieved, by setting this to on. #dirstruct = off # You can turn on recursive retrieving by default (don't do this if # you are not sure you know what it means) by setting this to on. #recursive = off # To always back up file X as X.orig before converting its links (due # to -k / --convert-links / convert_links = on having been specified), # set this variable to on: #backup_converted = off # To have Wget follow FTP links from HTML files by default, set this # to on: #follow_ftp = off # To try ipv6 addresses first: #prefer-family = IPv6 # Set default IRI support state #iri = off # Force the default system encoding #locale = UTF-8 # Force the default remote server encoding #remoteencoding = UTF-8
The examples are divided into three sections loosely based on theircomplexity.
wget http://fly.srk.fer.hr/
wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
The ampersand at the end of the line makes sure that Wget works in thebackground. To unlimit the number of retries, use ‘-t inf’.
wget ftp://gnjilux.srk.fer.hr/welcome.msg
wget ftp://ftp.gnu.org/pub/gnu/ links index.html
wget -i file
If you specify ‘-’ as file name, the urls will be read fromstandard input.
wget -r http://www.gnu.org/ -o gnulog
wget --convert-links -r http://www.gnu.org/ -o gnulog
wget -p --convert-links http://www.server.com/dir/page.html
The html page will be saved to www.server.com/dir/page.html, andthe images, stylesheets, etc., somewhere underwww.server.com/,depending on where they were on the remote server.
wget -p --convert-links -nH -nd -Pdownload \ http://www.server.com/dir/page.html
wget -S http://www.lycos.com/
wget --save-headers http://www.lycos.com/ more index.html
wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
More verbose, but the effect is the same. ‘-r -l1’ means toretrieve recursively (seeRecursive Download), with maximum depthof 1. ‘--no-parent’ means that references to the parent directoryare ignored (seeDirectory-Based Limits), and ‘-A.gif’ means todownload only thegif files. ‘-A "*.gif"’ would have workedtoo.
wget -nc -r http://www.gnu.org/
wget ftp://hniksic:[email protected]/.emacs
Note, however, that this usage is not advisable on multi-user systemsbecause it reveals your password to anyone who looks at the output ofps
.
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
You can also combine the two options and make pipelines to retrieve thedocuments from remote hotlists:
wget -O - http://cool.list.com/ | wget --force-html -i -
crontab 0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
wget --mirror --convert-links --backup-converted \ http://www.gnu.org/ -o /home/me/weeklog
wget --mirror --convert-links --backup-converted \ --html-extension -o /home/me/weeklog \ http://www.gnu.org/
Or, with less typing:
wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
This chapter contains all the stuff that could not fit anywhere else.
Proxies are special-purpose http servers designed to transferdata from remote servers to local clients. One typical use of proxiesis lightening network load for users behind a slow connection. This isachieved by channeling allhttp and ftp requests through theproxy which caches the transferred data. When a cached resource isrequested again, proxy will return the data from cache. Another use forproxies is for companies that separate (for security reasons) theirinternal networks from the rest of Internet. In order to obtaininformation from the Web, their users connect and retrieve remote datausing an authorized proxy.
Wget supports proxies for both http and ftp retrievals. Thestandard way to specify proxy location, which Wget recognizes, is usingthe following environment variables:
http_proxy
https_proxy
http_proxy
and
https_proxy
variables shouldcontain the
urls of the proxies for
http and
httpsconnections respectively.
ftp_proxy
http_proxy
and
ftp_proxy
are set to the same
url.
no_proxy
no_proxy
is ‘
.mit.edu’, proxy will not be used to retrievedocuments from MIT.
In addition to the environment variables, proxy location and settingsmay be specified from within Wget itself.
Some proxy servers require authorization to enable you to use them. Theauthorization consists ofusername and password, which mustbe sent by Wget. As with http authorization, severalauthentication schemes exist. For proxy authorization only theBasic
authentication scheme is currently implemented.
You may specify your username and password either through the proxyurl or through the command-line options. Assuming that thecompany's proxy is located at ‘proxy.company.com’ at port 8001, aproxyurl location containing authorization data might look likethis:
http://hniksic:[email protected]:8001/
Alternatively, you may use the ‘proxy-user’ and‘proxy-password’ options, and the equivalent.wgetrcsettings proxy_user
andproxy_password
to set the proxyusername and password.
Like all GNU utilities, the latest version of Wget can be found at themaster GNU archive site ftp.gnu.org, and its mirrors. For example,Wget 1.13.4 can be found atftp://ftp.gnu.org/pub/gnu/wget/wget-1.13.4.tar.gz
The official web site for GNU Wget is athttp://www.gnu.org/software/wget/. However, most usefulinformation resides at “The Wget Wgiki”,http://wget.addictivecode.org/.
The primary mailinglist for discussion, bug-reports, or questionsabout GNU Wget is [email protected]. To subscribe, send anemail [email protected], or visithttp://lists.gnu.org/mailman/listinfo/bug-wget.
You do not need to subscribe to send a message to the list; however,please note that unsubscribed messages are moderated, and may take awhile before they hit the list—usually around a day. Ifyou want your message to show up immediately, please subscribe to thelist before posting. Archives for the list may be found athttp://lists.gnu.org/pipermail/bug-wget/.
An NNTP/Usenettish gateway is also available viaGmane. You can see the Gmanearchives athttp://news.gmane.org/gmane.comp.web.wget.general. Note that theGmane archives conveniently include messages from both the currentlist, and the previous one. Messages also show up in the Gmanearchives sooner than they do atlists.gnu.org.
Additionally, there is the [email protected] mailinglist. This is a non-discussion list that receives bug reportnotifications from the bug-tracker. To subscribe to this list,send an email [email protected],or visithttp://addictivecode.org/mailman/listinfo/wget-notify.
Previously, the mailing list [email protected] was used as themain discussion list, and another list,[email protected] was used for submitting anddiscussing patches to GNU Wget.
Messages from [email protected] are archived at
Messages from [email protected] are archived at
In addition to the mailinglists, we also have a support channel set upvia IRC atirc.freenode.org
, #wget
. Come check it out!
You are welcome to submit bug reports via the GNU Wget bug tracker (seehttp://wget.addictivecode.org/BugTracker).
Before actually submitting a bug report, please try to follow a fewsimple guidelines.
Also, while I will probably be interested to know the contents of your.wgetrc file, just dumping it into the debug message is probablya bad idea. Instead, you should first try to see if the bug repeatswith.wgetrc moved out of the way. Only if it turns out that.wgetrc settings affect the bug, mail me the relevant parts ofthe file.
Note: please make sure to remove any potentially sensitive informationfrom the debug log before sending it to the bug address. The-d
won't go out of its way to collect sensitive information,but the logwill contain a fairly complete transcript of Wget'scommunication with the server, which may include passwords and piecesof downloaded data. Since the bug address is publically archived, youmay assume that all bug reports are visible to the public.
gdb `whichwget` core
and typewhere
to get the backtrace. This may notwork if the system administrator has disabled core files, but it issafe to try. Like all GNU software, Wget works on the GNU system. However, since ituses GNU Autoconf for building and configuring, and mostly avoids using“special” features of any particular Unix, it should compile (andwork) on all common Unix flavors.
Various Wget versions have been compiled and tested under many kinds ofUnix systems, including GNU/Linux, Solaris, SunOS 4.x, Mac OS X, OSF(aka Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Someof those systems are no longer in widespread use and may not be able tosupport recent versions of Wget. If Wget fails to compile on yoursystem, we would like to know about it.
Thanks to kind contributors, this version of Wget compiles and workson 32-bit Microsoft Windows platforms. It has been compiledsuccessfully using MS Visual C++ 6.0, Watcom, Borland C, and GCCcompilers. Naturally, it is crippled of some features available onUnix, but it should work as a substitute for people stuck withWindows. Note that Windows-specific portions of Wget are notguaranteed to be supported in the future, although this has been thecase in practice for many years now. All questions and problems inWindows usage should be reported to Wget mailing list [email protected] where the volunteers who maintain theWindows-related features might look at them.
Support for building on MS-DOS via DJGPP has been contributed by GisleVanem; a port to VMS is maintained by Steven Schweda, and is availableathttp://antinode.org/.
Since the purpose of Wget is background work, it catches the hangupsignal (SIGHUP
) and ignores it. If the output was on standardoutput, it will be redirected to a file namedwget-log. Otherwise, SIGHUP
is ignored. This is convenient when you wishto redirect the output of Wget after having started it.
$ wget http://www.gnus.org/dist/gnus.tar.gz & ... $ kill -HUP %% SIGHUP received, redirecting output to `wget-log'.
Other than that, Wget will not try to interfere with signals in any way. C-c, kill -TERM
and kill -KILL
should kill it alike.
This chapter contains some references I consider useful.
It is extremely easy to make Wget wander aimlessly around a web site,sucking all the available data in progress. ‘wget -r site’,and you're set. Great? Not for the server admin.
As long as Wget is only retrieving static pages, and doing it at areasonable rate (see the ‘--wait’ option), there's not much of aproblem. The trouble is that Wget can't tell the difference between thesmallest static page and the most demanding CGI. A site I know has asection handled by a CGI Perl script that converts Info files tohtml onthe fly. The script is slow, but works well enough for human usersviewing an occasional Info file. However, when someone's recursive Wgetdownload stumbles upon the index page that links to all the Info filesthrough the script, the system is brought to its knees without providinganything useful to the user (This task of converting Info files could bedone locally and access to Info documentation for all installed GNUsoftware on a system is available from theinfo
command).
To avoid this kind of accident, as well as to preserve privacy fordocuments that need to be protected from well-behaved robots, theconcept ofrobot exclusion was invented. The idea is thatthe server administrators and document authors can specify whichportions of the site they wish to protect from robots and thosethey will permit access.
The most popular mechanism, and the de facto standard supported byall the major robots, is the “Robots Exclusion Standard” (RES) writtenby Martijn Koster et al. in 1994. It specifies the format of a textfile containing directives that instruct the robots which URL paths toavoid. To be found by the robots, the specifications must be placed in/robots.txt in the server root, which the robots are expected todownload and parse.
Although Wget is not a web robot in the strictest sense of the word, itcan download large parts of the site without the user's intervention todownload an individual page. Because of that, Wget honors RES whendownloading recursively. For instance, when you issue:
wget -r http://www.server.com/
First the index of ‘www.server.com’ will be downloaded. If Wgetfinds that it wants to download more documents from that server, it willrequest ‘http://www.server.com/robots.txt’ and, if found, use itfor further downloads. robots.txt is loaded only once per eachserver.
Until version 1.8, Wget supported the first version of the standard,written by Martijn Koster in 1994 and available athttp://www.robotstxt.org/wc/norobots.html. As of version 1.8,Wget has supported the additional directives specified in the internetdraft ‘
This manual no longer includes the text of the Robot Exclusion Standard.
The second, less known mechanism, enables the author of an individualdocument to specify whether they want the links from the file to befollowed by a robot. This is achieved using theMETA
tag, likethis:
This is explained in some detail athttp://www.robotstxt.org/wc/meta-user.html. Wget supports thismethod of robot exclusion in addition to the usual/robots.txtexclusion.
If you know what you are doing and really really wish to turn off therobot exclusion, set therobots
variable to ‘off’ in your.wgetrc. You can achieve the same effect from the command lineusing the-e
switch, e.g. ‘wget -e robots=off url...’.
When using Wget, you must be aware that it sends unencrypted passwordsthrough the network, which may present a security problem. Here are themain issues, and some solutions.
ps
. The bestway around it is to usewget -i -
and feed the urls toWget's standard input, each on a separate line, terminated byC-d. Another workaround is to use .netrc to store passwords; however,storing unencrypted passwords is also considered a security risk.GNU Wget was written by Hrvoje Niksic [email protected].
However, the development of Wget could never have gone as far as it has, wereit not for the help of many people, either with bug reports, feature proposals,patches, or letters saying “Thanks!”.
Special thanks goes to the following people (no particular order):
--page-requisites
andrelated options. He was the principal maintainer for some time andreleased Wget 1.6.ansi2knr
-ization. Lots ofportability fixes. Digest
authentication. The following people have provided patches, bug/build reports, usefulsuggestions, beta testing services, fan mail and all the other thingsthat make maintenance so much fun:
Tim Adam,Adrian Aichner,Martin Baehr,Dieter Baron,Roger Beeman,Dan Berger,T. Bharath,Christian Biere,Paul Bludov,Daniel Bodea,Mark Boyns,John Burden,Julien Buty,Wanderlei Cavassin,Gilles Cedoc,Tim Charron,Noel Cragg,Kristijan Conkas,John Daily,Andreas Damm,Ahmon Dancy,Andrew Davison,Bertrand Demiddelaer,Alexander Dergachev,Andrew Deryabin,Ulrich Drepper,Marc Duponcheel,Damir Dzeko,Alan Eldridge,Hans-Andreas Engel,Aleksandar Erkalovic,Andy Eskilsson,Joao Ferreira,Christian Fraenkel,David Fritz,Mike Frysinger,Charles C. Fu,FUJISHIMA Satsuki,Masashi Fujita,Howard Gayle,Marcel Gerrits,Lemble Gregory,Hans Grobler,Alain Guibert,Mathieu Guillaume,Aaron Hawley,Jochen Hein,Karl Heuer,Madhusudan Hosaagrahara,HIROSE Masaaki,Ulf Harnhammar,Gregor Hoffleit,Erik Magnus Hulthen,Richard Huveneers,Jonas Jensen,Larry Jones,Simon Josefsson,Mario Juric,Hack Kampbjorn,Const Kaplinsky,Goran Kezunovic,Igor Khristophorov,Robert Kleine,KOJIMA Haime,Fila Kolodny,Alexander Kourakos,Martin Kraemer,Sami Krank,Jay Krell,Simos KSenitellis,Christian Lackas,Hrvoje Lacko,Daniel S. Lewart,Nicolas Lichtmeier,Dave Love,Alexander V. Lukyanov,Thomas Lussnig,Andre Majorel,Aurelien Marchand,Matthew J. Mellon,Jordan Mendelson,Ted Mielczarek,Robert Millan,Lin Zhe Min,Jan Minar,Tim Mooney,Keith Moore,Adam D. Moss,Simon Munton,Charlie Negyesi,R. K. Owen,Jim Paris,Kenny Parnell,Leonid Petrov,Simone Piunno,Andrew Pollock,Steve Pothier,Jan Prikryl,Marin Purgar,Csaba Raduly,Keith Refson,Bill Richardson,Tyler Riddle,Tobias Ringstrom,Jochen Roderburg,Juan Jose Rodriguez,Maciej W. Rozycki,Edward J. Sabol,Heinz Salzmann,Robert Schmidt,Nicolas Schodet,Benno Schulenberg,Andreas Schwab,Steven M. Schweda,Chris Seawood,Pranab Shenoy,Dennis Smit,Toomas Soome,Tage Stabell-Kulo,Philip Stadermann,Daniel Stenberg,Sven Sternberger,Markus Strasser,John Summerfield,Szakacsits Szabolcs,Mike Thomas,Philipp Thomas,Mauro Tortonesi,Dave Turner,Gisle Vanem,Rabin Vincent,Russell Vincent,Zeljko Vrba,Charles G Waldman,Douglas E. Wegscheid,Ralf Wildenhues,Joshua David Williams,Benjamin Wolsey,Saint Xavier,YAMAZAKI Makoto,Jasmin Zainul,Bojan Zdrnja,Kristijan Zimmer,Xin Zou.
Apologies to all who I accidentally left out, and many thanks to all thesubscribers of the Wget mailing list.
Copyright © 2000, 2001, 2002, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. http://fsf.org/ Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The purpose of this License is to make a manual, textbook, or otherfunctional and useful documentfree in the sense of freedom: toassure everyone the effective freedom to copy and redistribute it,with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a wayto get credit for their work, while not being considered responsiblefor modifications made by others.
This License is a kind of “copyleft”, which means that derivativeworks of the document must themselves be free in the same sense. Itcomplements the GNU General Public License, which is a copyleftlicense designed for free software.
We have designed this License in order to use it for manuals for freesoftware, because free software needs free documentation: a freeprogram should come with manuals providing the same freedoms that thesoftware does. But this License is not limited to software manuals;it can be used for any textual work, regardless of subject matter orwhether it is published as a printed book. We recommend this Licenseprincipally for works whose purpose is instruction or reference.
This License applies to any manual or other work, in any medium, thatcontains a notice placed by the copyright holder saying it can bedistributed under the terms of this License. Such a notice grants aworld-wide, royalty-free license, unlimited in duration, to use thatwork under the conditions stated herein. The “Document”, below,refers to any such manual or work. Any member of the public is alicensee, and is addressed as “you”. You accept the license if youcopy, modify or distribute the work in a way requiring permissionunder copyright law.
A “Modified Version” of the Document means any work containing theDocument or a portion of it, either copied verbatim, or withmodifications and/or translated into another language.
A “Secondary Section” is a named appendix or a front-matter sectionof the Document that deals exclusively with the relationship of thepublishers or authors of the Document to the Document's overallsubject (or to related matters) and contains nothing that could falldirectly within that overall subject. (Thus, if the Document is inpart a textbook of mathematics, a Secondary Section may not explainany mathematics.) The relationship could be a matter of historicalconnection with the subject or with related matters, or of legal,commercial, philosophical, ethical or political position regardingthem.
The “Invariant Sections” are certain Secondary Sections whose titlesare designated, as being those of Invariant Sections, in the noticethat says that the Document is released under this License. If asection does not fit the above definition of Secondary then it is notallowed to be designated as Invariant. The Document may contain zeroInvariant Sections. If the Document does not identify any InvariantSections then there are none.
The “Cover Texts” are certain short passages of text that are listed,as Front-Cover Texts or Back-Cover Texts, in the notice that says thatthe Document is released under this License. A Front-Cover Text maybe at most 5 words, and a Back-Cover Text may be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy,represented in a format whose specification is available to thegeneral public, that is suitable for revising the documentstraightforwardly with generic text editors or (for images composed ofpixels) generic paint programs or (for drawings) some widely availabledrawing editor, and that is suitable for input to text formatters orfor automatic translation to a variety of formats suitable for inputto text formatters. A copy made in an otherwise Transparent fileformat whose markup, or absence of markup, has been arranged to thwartor discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amountof text. A copy that is not “Transparent” is called “Opaque”.
Examples of suitable formats for Transparent copies include plainascii without markup, Texinfo input format, LaTeX inputformat,SGML or XML using a publicly availableDTD, and standard-conforming simpleHTML,PostScript or PDF designed for human modification. Examplesof transparent image formats includePNG, XCF andJPG. Opaque formats include proprietary formats that can beread and edited only by proprietary word processors,SGML orXML for which the DTD and/or processing tools arenot generally available, and the machine-generatedHTML,PostScript or PDF produced by some word processors foroutput purposes only.
The “Title Page” means, for a printed book, the title page itself,plus such following pages as are needed to hold, legibly, the materialthis License requires to appear in the title page. For works informats which do not have any title page as such, “Title Page” meansthe text near the most prominent appearance of the work's title,preceding the beginning of the body of the text.
The “publisher” means any person or entity that distributes copiesof the Document to the public.
A section “Entitled XYZ” means a named subunit of the Document whosetitle either is precisely XYZ or contains XYZ in parentheses followingtext that translates XYZ in another language. (Here XYZ stands for aspecific section name mentioned below, such as “Acknowledgements”,“Dedications”, “Endorsements”, or “History”.) To “Preserve the Title”of such a section when you modify the Document means that it remains asection “Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice whichstates that this License applies to the Document. These WarrantyDisclaimers are considered to be included by reference in thisLicense, but only as regards disclaiming warranties: any otherimplication that these Warranty Disclaimers may have is void and hasno effect on the meaning of this License.
You may copy and distribute the Document in any medium, eithercommercially or noncommercially, provided that this License, thecopyright notices, and the license notice saying this License appliesto the Document are reproduced in all copies, and that you add no otherconditions whatsoever to those of this License. You may not usetechnical measures to obstruct or control the reading or furthercopying of the copies you make or distribute. However, you may acceptcompensation in exchange for copies. If you distribute a large enoughnumber of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, andyou may publicly display copies.
If you publish printed copies (or copies in media that commonly haveprinted covers) of the Document, numbering more than 100, and theDocument's license notice requires Cover Texts, you must enclose thecopies in covers that carry, clearly and legibly, all these CoverTexts: Front-Cover Texts on the front cover, and Back-Cover Texts onthe back cover. Both covers must also clearly and legibly identifyyou as the publisher of these copies. The front cover must presentthe full title with all words of the title equally prominent andvisible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preservethe title of the Document and satisfy these conditions, can be treatedas verbatim copying in other respects.
If the required texts for either cover are too voluminous to fitlegibly, you should put the first ones listed (as many as fitreasonably) on the actual cover, and continue the rest onto adjacentpages.
If you publish or distribute Opaque copies of the Document numberingmore than 100, you must either include a machine-readable Transparentcopy along with each Opaque copy, or state in or with each Opaque copya computer-network location from which the general network-usingpublic has access to download using public-standard network protocolsa complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps,when you begin distribution of Opaque copies in quantity, to ensurethat this Transparent copy will remain thus accessible at the statedlocation until at least one year after the last time you distribute anOpaque copy (directly or through your agents or retailers) of thatedition to the public.
It is requested, but not required, that you contact the authors of theDocument well before redistributing any large number of copies, to givethem a chance to provide you with an updated version of the Document.
You may copy and distribute a Modified Version of the Document underthe conditions of sections 2 and 3 above, provided that you releasethe Modified Version under precisely this License, with the ModifiedVersion filling the role of the Document, thus licensing distributionand modification of the Modified Version to whoever possesses a copyof it. In addition, you must do these things in the Modified Version:
If the Modified Version includes new front-matter sections orappendices that qualify as Secondary Sections and contain no materialcopied from the Document, you may at your option designate some or allof these sections as invariant. To do this, add their titles to thelist of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section Entitled “Endorsements”, provided it containsnothing but endorsements of your Modified Version by variousparties—for example, statements of peer review or that the text hasbeen approved by an organization as the authoritative definition of astandard.
You may add a passage of up to five words as a Front-Cover Text, and apassage of up to 25 words as a Back-Cover Text, to the end of the listof Cover Texts in the Modified Version. Only one passage ofFront-Cover Text and one of Back-Cover Text may be added by (orthrough arrangements made by) any one entity. If the Document alreadyincludes a cover text for the same cover, previously added by you orby arrangement made by the same entity you are acting on behalf of,you may not add another; but you may replace the old one, on explicitpermission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this Licensegive permission to use their names for publicity for or to assert orimply endorsement of any Modified Version.
You may combine the Document with other documents released under thisLicense, under the terms defined in section 4 above for modifiedversions, provided that you include in the combination all of theInvariant Sections of all of the original documents, unmodified, andlist them all as Invariant Sections of your combined work in itslicense notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, andmultiple identical Invariant Sections may be replaced with a singlecopy. If there are multiple Invariant Sections with the same name butdifferent contents, make the title of each such section unique byadding at the end of it, in parentheses, the name of the originalauthor or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list ofInvariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled “History”in the various original documents, forming one section Entitled“History”; likewise combine any sections Entitled “Acknowledgements”,and any sections Entitled “Dedications”. You must delete allsections Entitled “Endorsements.”
You may make a collection consisting of the Document and other documentsreleased under this License, and replace the individual copies of thisLicense in the various documents with a single copy that is included inthe collection, provided that you follow the rules of this License forverbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distributeit individually under this License, provided you insert a copy of thisLicense into the extracted document, and follow this License in allother respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separateand independent documents or works, in or on a volume of a storage ordistribution medium, is called an “aggregate” if the copyrightresulting from the compilation is not used to limit the legal rightsof the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does notapply to the other works in the aggregate which are not themselvesderivative works of the Document.
If the Cover Text requirement of section 3 is applicable to thesecopies of the Document, then if the Document is less than one half ofthe entire aggregate, the Document's Cover Texts may be placed oncovers that bracket the Document within the aggregate, or theelectronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the wholeaggregate.
Translation is considered a kind of modification, so you maydistribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires specialpermission from their copyright holders, but you may includetranslations of some or all Invariant Sections in addition to theoriginal versions of these Invariant Sections. You may include atranslation of this License, and all the license notices in theDocument, and any Warranty Disclaimers, provided that you also includethe original English version of this License and the original versionsof those notices and disclaimers. In case of a disagreement betweenthe translation and the original version of this License or a noticeor disclaimer, the original version will prevail.
If a section in the Document is Entitled “Acknowledgements”,“Dedications”, or “History”, the requirement (section 4) to Preserveits Title (section 1) will typically require changing the actualtitle.
You may not copy, modify, sublicense, or distribute the Documentexcept as expressly provided under this License. Any attemptotherwise to copy, modify, sublicense, or distribute it is void, andwill automatically terminate your rights under this License.
However, if you cease all violation of this License, then your licensefrom a particular copyright holder is reinstated (a) provisionally,unless and until the copyright holder explicitly and finallyterminates your license, and (b) permanently, if the copyright holderfails to notify you of the violation by some reasonable means prior to60 days after the cessation.
Moreover, your license from a particular copyright holder isreinstated permanently if the copyright holder notifies you of theviolation by some reasonable means, this is the first time you havereceived notice of violation of this License (for any work) from thatcopyright holder, and you cure the violation prior to 30 days afteryour receipt of the notice.
Termination of your rights under this section does not terminate thelicenses of parties who have received copies or rights from you underthis License. If your rights have been terminated and not permanentlyreinstated, receipt of a copy of some or all of the same material doesnot give you any rights to use it.
The Free Software Foundation may publish new, revised versionsof the GNU Free Documentation License from time to time. Such newversions will be similar in spirit to the present version, but maydiffer in detail to address new problems or concerns. Seehttp://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of thisLicense “or any later version” applies to it, you have the option offollowing the terms and conditions either of that specified version orof any later version that has been published (not as a draft) by theFree Software Foundation. If the Document does not specify a versionnumber of this License, you may choose any version ever published (notas a draft) by the Free Software Foundation. If the Documentspecifies that a proxy can decide which future versions of thisLicense can be used, that proxy's public statement of acceptance of aversion permanently authorizes you to choose that version for theDocument.
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means anyWorld Wide Web server that publishes copyrightable works and alsoprovides prominent facilities for anybody to edit those works. Apublic wiki that anybody can edit is an example of such a server. A“Massive Multiauthor Collaboration” (or “MMC”) contained in thesite means any set of copyrightable works thus published on the MMCsite.
“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0license published by Creative Commons Corporation, a not-for-profitcorporation with a principal place of business in San Francisco,California, as well as future copyleft versions of that licensepublished by that same organization.
“Incorporate” means to publish or republish a Document, in whole orin part, as part of another Document.
An MMC is “eligible for relicensing” if it is licensed under thisLicense, and if all works that were first published under this Licensesomewhere other than this MMC, and subsequently incorporated in wholeor in part into the MMC, (1) had no cover texts or invariant sections,and (2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the siteunder CC-BY-SA on the same site at any time before August 1, 2009,provided the MMC is eligible for relicensing.
To use this License in a document you have written, include a copy ofthe License in the document and put the following copyright andlicense notices just after the title page:
Copyright (C) year your name. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ``GNU Free Documentation License''.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,replace the “with...Texts.” line with this:
with the Invariant Sections being list their titles, with the Front-Cover Texts being list, and with the Back-Cover Texts being list.
If you have Invariant Sections without Cover Texts, or some othercombination of the three, merge those two alternatives to suit thesituation.
If your document contains nontrivial examples of program code, werecommend releasing these examples in parallel under your choice offree software license, such as the GNU General Public License,to permit their use in free software.
[1] If you have a.netrc file in your home directory, password will also besearched for there.
[2] As an additional check, Wget will look at theContent-Length
header, and compare the sizes; if they are not thesame, the remote file will be downloaded no matter what the time-stampsays.