Download all files on a website perl script
From Mike A. Leonetti
Apparently there is a Friefox plugin called DownloadThemAll! (http://www.downthemall.net/) that will do the same thing (but possibly better). But when it's Saturday night and I need to do something quickly the first thing I think of sometimes isn't "Let's search on the Internet for a solution!" It usually ends up being "Let's script it!"
So I do reinvent the wheel. Repeatedly.
To clean up the URLs I use the Perl module HTML::Entities. Sometimes URLs have nasty & like code in them.
The script can be downloaded here: http://www.mikealeonetti.com/files/reapfiles
Usage: ./reapfiles -f extensions site1 [site2] [site3] ... -h Print this message -f File extensions to search for (comma separated list) -v Causes wget output to be displayed
reapfiles -f rar http://thissite.com
Would download all RAR files to the current directory from http://thissite.com.
reapfiles -f zip,iso http://distrosite.com
Would download both ZIP and ISO files to the current directory from http://distrosite.com.
How it works
The script detects files in a tags. For example:
<a href="filename.rar">RAR file</a>
Gets found with the regex pattern and downloaded with wget. Then it checks if the file needs to be gunzipped and unzips it if necessary. This also means the script will not detect image tags and download images. It attempts to clean up improper characters that are sometimes found on sites as well as makes sure the URLs are proper.