Recon - Application mapping

The second step in the process of attacking a web application is gathering and examining some key information about it to gain a better understanding of what you are up against.

The mapping exercise begins by enumerating the application’s content and functionality in order to understand what the application does and how it behaves.

Manual browsing + passive spidering

Browse the entire application in the normal way with BurpSuite active, visiting every link and URL, submitting every form, and proceeding through all multistep functions to completion.

If the application uses authentication, and you have or can create a login account, use this to access the authenticated functionality.

Comments review

Review comments in HTML source code:

<!--
//
/*

Robots.txt

The configuration files below may be used by the web application to give information about the accessible and disallowed URI to search engines:

/robots.txt
/sitemap.xml

# file created by Macs' Finder application for every folder and that may contain the names of files in the folder
/.DS_Store

JS & Cookies

Browse with JavaScript enabled and disabled, and with cookies enabled and disabled.

User-Agent

Change the User-Agent header to identify difference in comportment (for example, the application may have a mobile version). Firefox addon that allows for quickly changing the browser's user agent string: User Agent Switcher

Agents

# Browser
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0

# Mobile
Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1

Debug parameters

Choose one or more functionalities where hidden debug parameters may be implemented. Use Cluster bomb attack type in Burp Intruder and the following common debug parameter names with common values (such as true, yes, on, and 1):

debug
test
hide
hidden
source

For POST requests, supply the parameter in both the URL query string and the request body.

Burp Passive Spidering

Review the site map generated by the passive spidering, and identify any content or functionality that you have not walked through using your browser.

OSINT

Search engines dorks

Google hacking, or Google dorking, is a technique that use the Google search engine to enumerate the ressources indexed by Google in order to map the application and retrieve potentially sensible information.

The following Google search queries can be used to retrieve potential sensible information about the application:

 # Returns every resource within the target site that Google has a reference to:
site:<URL>

# Returns all the pages on other websites and applications that contain a link to the target:
link:<URL>

# Returns all the pages containing the expression specified referenced by Google:
site:<URL> config
site:<URL> login
site:<URL> password
site:<URL> backup

# Returns all pages with the given extensions
site:<URL> ext:xml | ext:conf | ext:cnf | ext:reg | ext:inf | ext:rdp | ext:cfg | ext:txt | ext:ora | ext:ini
site:<URL> ext:doc | ext:docx | ext:odt | ext:pdf | ext:rtf | ext:sxw | ext:psw | ext:ppt | ext:pptx | ext:pps | ext:csv

# Returns pages with SQL errors
site:<URL> intext:"sql syntax near" | intext:"syntax error has occurred" | intext:"incorrect syntax near" | intext:"unexpected end of SQL command" | intext:"Warning: mysql_connect()" | intext:"Warning: mysql_query()" | intext:"Warning: pg_connect()"

# PHPINFO
site:<URL> ext:php intitle:phpinfo "published by the PHP Group"

The Google Hacking Database, hosted on exploit-db https://www.exploit-db.com/google-hacking-database, references known Google search queries that can be used to conduct Google dorking.

For each queries, it is advised to browse to the last page of the search results and select "Repeat the Search with the Omitted Results Included".

Accounts & emails scraping

Open resources such as Google, Bing, linkedin, twitter, etc. can be used to harvest accounts and emails associated to a domain.

Those credentials may be used to conduct bruteforce attack subsequently. The tools below automate this scraping:

theHarvester.py -d <target_domain.com> -b all -l 400

Fingerprinting

Determine the technologies in use on the Web Application (CMS, etc.).

Manual Fingerprinting

Look for :

Verbose HTTP headers disclosing version numbers

Server
X-powered-by
X-Generator
...

Google any unknown / non-standard headers to discover which technology may have issued them. Load balancers usually use non-standard and misspelled headers. See Server Exposure.

Default error pages

Known patterns in HTML source code / URI:

  CMS         | Patterns
  ------------|-----------------------
  WordPress   | 'Powered by WordPress'
              | /wp-login.php
              | /wp-admin/
              | ...
  Joomla      | /\_layouts/*
  Drupal      | /node/*
              | /CHANGELOG.TXT
              | /INSTALL.txt
              | /MAINTAINERS.txt
              | /LICENSE.txt
              | ...
  OWAURL      | /OWA/

Known Cookies:

  Technology  | Cookie
  ------------|-------------------
  Java        | JSESSIONID
  IIS server  | ASPSESSIONID
  ASP.NET     | ASP.NET_SessionId
  Cold Fusion | CFID/CFTOKEN
  PHP         | PHPSESSID

Automated Fingerprinting

The whatweb Ruby script can be used to automate the fingerprinting process.

whatweb -a 3 <URL>

Active spidering & URL bruteforcing

Actively spider the application using all of the already enumerated content as a starting point.

Burp Active Spider

Burp Spider is a module that will automatically parse HTML source for URL and request them, effectively crawling the web application for openly accessible content.

The authentication forms should be completed whenever possible.

[Target] Site map -> right click <target> -> Spider this host

Burp Content Discover

Burp Content Discover uses various techniques to discover content such as spidering, intelligent URI bruteforcing with adapted to the context wordlists, etc.

[Target] Site map -> right click <target> -> [Engagement tools] Discover content

URL bruteforcing

Use the application root and any other path from already enumerated deemed fit as a starting point.

File extension

Determine file extension to use for the bruteforce (no extension + language extension).

Wordlists

Adapt the word list for the application context.

Example: if all resources in start with a capital letter, the wordlist used in the bruteforce should be capitalized. Check for default content associated with the technologies found.

# Default URI for various CMS
/Discovery/Web_Content/*

# Wordlist of 200k+ and 1.2M+ entries created by the DirBuster Team through internet crawling.
# https://github.com/Qazeer/zap-extensions/tree/master/addOns/directorylistv2_3/src/main/zapHomeFiles/fuzzers/dirbuster
# Lowercase versions: https://github.com/Qazeer/zap-extensions/tree/master/addOns/directorylistv2_3_lc/src/main/zapHomeFiles/fuzzers/dirbuster
directory-list-2.3-medium.txt
directory-list-2.3-big.txt

Recursive

A first brute forcing should be conducted with out recursively brute forcing the discovered sub directories. In case the web application root is defined, two brute forcing should be conducted (on the default / root and on the main application root).

Some interesting sub directories should then be picked for further brute force enumeration.

Tools

The following tools can be used to brute force URI:

# Recommended: ffuf.
# -ic: Ignore wordlist comments (default: false)
# -e Comma separated list of extensions. Extends FUZZ keyword.
ffuf -ic [-e <EXTENSIONS_LIST>] -w <WORDLIST> -u <URL>/FUZZ

# Executes ffuf in the background using nohup and over the URL in the specified file using interlace.
nohup interlace -timeout 7200 -threads <1 | THREADS> -c 'ffuf -r -noninteractive -ignore-body -ac -ic -w <WORDLIST> -o <OUTPUT_DIRECTORY>/ffuf-_cleantarget_.txt -u _target_/FUZZ' -tL <URL_LIST_FILE> &

# GUI
DirBuster
BurpSuite Intruder

# As being written in Go, standalone gobuster binaries can be compiled for both Linux and Windows.
# -a <USER_AGENT_STRING>: sets the User-Agent string, which defaults to "gobuster/3.1.0". Example: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36".
# -k: skips SSL / TLS certificate verification.
# -r: follow redirects.
# -t <THREADS>: number of concurrent threads, default to 10 threads.
# -d: look for backup files of found files.
# -x <EXT | EXT1, ..., EXTN>: file extension(s) that will be appended to file names.
# -s <STATUS_CODE | STATUS_CODE1, ..., STATUS_CODEN>: status code to include in output. Defaults to "200,204,301,302,307,400,401,403".
# -b <STATUS_CODE | STATUS_CODE1, ..., STATUS_CODEN>: status code to exclude from output. Will override included status if set. Example: "400,403,404,500".
gobuster dir -k -r -d -t <20 | THREADS> -o <OUT_FILE> -w <WORDLIST> -u <TARGET>
gobuster dir -k -r -d -t <20 | THREADS> -x <EXT | EXT1, ..., EXTN> -o <OUT_FILE> -w <WORDLIST> -u <TARGET>

wfuzz -t 20 -z file,<WORDLIST> <URL>/FUZZ

# Starts 5 instances of wfuzz iterating over the URL specified in the given file. Each wfuzz process runs with 40 directory brute force threads.
cat <URL_LIST_FILE> | xargs -i --max-procs=5 /usr/bin/bash -c "wfuzz -t 40 --sc 200,301 -f <OUTPUT_DIRECTORY>/{}_status_200_301.txt -z file,<WORDLIST> {}/FUZZ"

dirb

Parameters fuzzing

The wfuzz tool can be used to fuzz GET and POST requests to find accepted parameters. The SecList burp-parameter-names.txt wordlist contains more than 2000 entries of frequent parameter names.

A filter on response HTTP code or lines, words and characters number can be added using --hc/hl/hw/hh code/lines/words/chars.

wfuzz -w <WORDLIST> '<URL>?FUZZ=test'

wfuzz --hh <CHAR_NUMBER> -w <WORDLIST> '<URL>?FUZZ=test'

PreviousRecon - Hostnames discovery NextRecon - Attack surface overview

Last updated 3 years ago