Wget

From Omnia
Jump to navigation Jump to search

wget

Recursively download site into current folder (great for browsed folders)

wget -r -np -nd [URL]       # -r recursive, -np no parent, -nd no directory creation

wget:

wget -SO- [URL]             # save to stdout '-'
wget -SO /dev/null [URL]    # save to /dev/null
wget [URL] -O [OUTFILE]     # overwrite file
wget [URL] -P [PATH]        # save to path, no clobber
wget [URL] -N               # timestamp - only download if newer, and clobber
wget [URL] -r               # recursively download everything, and clobber
wget [URL] -r -nd           # recursively download into current folder (no dirs), no clobber
wget [URL] -r -l [DEPTH]    # levels to download (default is 5)
wget [URL] -r -k            # convert links for local viewing
wget [URL] -p               # recurse all needed to display current page (no downloads)
wget [URL] -r -L            # follow only relative URLs (helps keep on same host)
wget [URL] -r -np           # never ascend into parent directory
wget -e robots=off --wait 1 [url]  # ignore robots and wait a second between downloads
wget [URL] -m               # mirror website

Downloading an Entire Web Site with wget | Linux Journal - http://www.linuxjournal.com/content/downloading-entire-web-site-wget

$ wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
         www.website.org/tutorials/html/

The options are:

   --recursive: download the entire Web site.
   --domains website.org: don't follow links outside website.org.
   --no-parent: don't follow links outside the directory tutorials/html/.
   --page-requisites: get all the elements that compose the page (images, CSS and so on).
   --html-extension: save files with the .html extension.
   --convert-links: convert links so that they work locally, off-line.
   --restrict-file-names=windows: modify filenames so that they will work in Windows as well.
   --no-clobber: don't overwrite any existing files (used in case the download is interrupted and
   resumed).

It would be a VERY good idea to add to your command so you don't kill the server you are trying to download from

--wait=9 --limit-rate=10K


The Ultimate Wget Download Guide With 15 Awesome Examples - http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/

Site Download

wget -r -l1 --no-parent -A.gif http://www.locationwheretogetthefilefrom.com/dir/

-r -l1 means to retrieve recursively, with maximum depth of 1. 
--no-parent means that references to the parent directory are ignored. 
-A.gif means to download only the GIF files. (-A "*.gif" would have worked too as a wild card.)

Recursively Download FTP Site

Download FTP site to 99 levels

wget -r --level=99 ftp://myusername:mypassword@ftp.yoursite.com/
# -r –recursive Turn on recursive retrieving.
# -l depth –level=depth Specify recursion maximum depth level depth. The default maximum depth is 5.

Mirror site (infinite levels)

wget -m ftp://myusername:mypassword@ftp.yoursite.com/
# The -m option turns on mirroring i.e. it turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings:

If you download a second time, use the 'no clobber' option to keep from downloading the same files:

-nc –no-clobber

Resources:

Recursively Download MP3s

Download Zelda Reorchestrated MP3s:

wget -e robots=off --wait 1 -r -l1 -H --no-parent -nd -A .mp3 http://www.zreomusic.com/listen

Download all music files off of a website using wget:

wget -r -l1 -H -nd -A mp3 -e robots=off http://example/url
Download all music files off of a website using wget
This will download all files of the type specified after "-A" from a website. Here is a breakdown of the options:
-r turns on recursion and downloads all links on page
-l1 goes only one level of links into the page(this is really important when using -r)
-H spans domains meaning it will download links to sites that don't have the same domain
-nd means put all the downloads in the current directory instead of making all the directories in the path
-A mp3 filters to only download links that are mp3s(this can be a comma separated list of different file formats to search for multiple types)
-e robots=off just means to ignore the robots.txt file which stops programs like wget from crashing the site... sorry http://example/url lol..

Reference:

  • Download all music files off of a website using wget | commandlinefu.com [1]

keywords