Wget
wget is a *nix command line utility for fetching documents on the web. It's available on most Linux systems by default, but it doesn't come on Mac OS X. You can use Fink to install it, though. For people who are intimidated by the command line, there is SimpleWget, but I haven't actually tried it.
Contents |
[edit] 1 Examples
[edit] 1.1 Downloading a single file
This will download the DisProt database as a fasta file:
wget http://www.disprot.org/data/version_3.5/disprot_fasta_v3.5.txt
[edit] 1.2 Filtering by file type
[edit] 1.2.1 Downloading all PDFs on a page
Let's say you have a page listing a bunch of PDFs. You need them all, but you don't want to manually download them individually! The -A option lets you filter by file suffix or pattern:
wget -r -l 1 -nd -A [Pp][Dd][Ff] [URL]
-A [Pp][Dd][Ff] specifies that only filenames ending in pdf (any combination of uppercase and lowercase) will be downloaded. The -r option tells wget to follow the links it finds on the webpage. The -l 1 option tells it to go only one level deep--to only follow links it finds on the webpage you give it, and not follow links on the pages it downloads. -nd stands for no directories and makes wget download all of the PDF files to the current directory, instead of making a directory for the domain and putting them in there.
[edit] 1.2.2 Downloading 5-Species MLAGAN Alignments
The 5-Species MLAGAN Alignments are in tarballs:
http://www.brl.bcm.tmc.edu/csa/data/hg16/alignments/chr1.Mml.aligns.mfa.tgz
They can all be downloaded into your current directory with this command, very similarly to the PDF example above:
wget -r -l 1 -nd -A aligns.mfa.tgz http://www.brl.bcm.tmc.edu/csa/index.rhtml
You can read more about filtering in the wget manual.