Download all files on a website perl script

From Mike A. Leonetti

Jump to: navigation, search

Apparently there is a Friefox plugin called DownloadThemAll! (http://www.downthemall.net/) that will do the same thing (but possibly better). But when it's Saturday night and I need to do something quickly the first thing I think of sometimes isn't "Let's search on the Internet for a solution!" It usually ends up being "Let's script it!"

So I do reinvent the wheel. Repeatedly.

Contents

Requirements

To clean up the URLs I use the Perl module HTML::Entities. Sometimes URLs have nasty & like code in them.

Download

The script can be downloaded here: http://www.mikealeonetti.com/files/reapfiles

Usage

Usage: ./reapfiles -f extensions site1 [site2] [site3] ...
  -h    Print this message
  -f    File extensions to search for (comma separated list)
  -v    Causes wget output to be displayed

For example

reapfiles -f rar http://thissite.com

Would download all RAR files to the current directory from http://thissite.com.

Or

reapfiles -f zip,iso http://distrosite.com

Would download both ZIP and ISO files to the current directory from http://distrosite.com.

How it works

The script detects files in a tags. For example:

<a href="filename.rar">RAR file</a>

Gets found with the regex pattern and downloaded with wget. Then it checks if the file needs to be gunzipped and unzips it if necessary. This also means the script will not detect image tags and download images. It attempts to clean up improper characters that are sometimes found on sites as well as makes sure the URLs are proper.

Feedback

Any bugs or comments use the Discussion page or contact me. I also do feature requests, albeit a very small script.

Updates

Update 04/09/2011

Fixed a bug where the script would not like relative links. Changed the engine to be more like FileStalker's. Added support for multiple file extensions.


Personal tools
Google AdSense