An Aggressive Web Directory Busting Tool



This article discusses two things. Firstly, the concept of directory busting, and how to do it aggressively. And secondly, the implementation of our custom-built tool for directory-busting, which we have made availabile here on GitHub. This tool was crafted to perform the aggressive form of brute-force directory busting that will be discussed here.

Reconnaissance

The first part of the process of directory-busting is to perform reconnaissance (i.e web-crawling) on the target site. This is because it is necessary to first construct a list of discovered URLs of the target site, from which directory-busting attacks can be performed by altering the file paths of the discovered URLs. For example, let us suppose that our crawler discovered the following URLs:

https://www.example.com/ https://www.example.com/nowhere/ https://www.example.com/nowhere/still_nowhere/

A directory-busting tool can then parse these URLs, and generate a list of URLs to conduct attacks. Such a list might then look like this:

https://www.example.com/secret/ https://www.example.com/nowhere/secret/ https://www.example.com/nowhere/still_nowhere/secret/

Because obviously, if we simply brute-forced the base URL, that wouldn't give us nearly as much coverage as brute-forcing every subdirectory we can find by crawling.

For crawling, our aforementioned directory-busting script uses the output of Jick, a web-crawler that we made and discussed in another article.

After running Jick,

python jick.py --urls https://www.example.com/ --href --iframe --get --post --output crawler_output.txt

We can then use Jick's output file as input for our directory-busting script. It will read the URLs discovered in Jick's output file, and use that file to generate the "attacked" URLs.

python dir_buster.py --target-file crawler_output.txt

Aggression

As mentioned earlier, it is important to be aggressive when brute-force guessing URLs. For this reason, the aforementioned directory-busting script has the following aggressive features:

Firstly, it reads from several text files which contain lines, representing which strings to guess in the URLs.

The directories listed in this file will be iterated through, and the program will try each string with 3 different variations: (1) making the entire string all lowercase, (2) making the entire string all capitalized, and (3) making the first letter capitalized, and every letter after that, all lowercase. For example, if it guesses the string "passwords", it will also guess all of the following strings:

passwords PASSWORDS Passwords

It does this just in case the web-developer uses a slightly different (un)capitalization syntax when typing file/directory names.

Secondly, there is 1 file for directory names to guess. Just like with directory names, these filenames will be varied according to the 3 aforementioned schemes.

Lastly, is another file for which filenames to guess. However, this list contains only filenames which will not be modified - i.e it will not have it's capitalization altered in order to test it output according to different capitalization schemes. This is for files such as .htaccess which will probably always follow the same capitalization rules. (Because who types their .htaccess file as ".HTACCESS"?)

And although this isn't quite the same thing as the previously-mentioned 3 files, the script does contain another file for trying out different file extensions. This way, each guess for filenames will also be tried with all the extensions in this list.

Conducting the attack

The files contained in the directory-buster's directory will represent some of the most obvious things to try, for the purpose of directory-cracking. However, running the script will take some time, as even a set of short filelists (i.e directory names, filenames, file extensions, and Jick's output file of URLs discovered on the target) can quickly multiply into tens of thousands of URLs to guess. So grab yourself a Dr. Pepper and sit back while the directory-buster does its thing. Then, when (or if) a sensitive URL is discovered to have been left publicly accessible on the web, you will have successfully humiliated the developers of that website.