Recon - Application mapping
The second step in the process of attacking a web application is gathering and examining some key information about it to gain a better understanding of what you are up against.
The mapping exercise begins by enumerating the application’s content and functionality in order to understand what the application does and how it behaves.
Manual browsing + passive spidering
Browse the entire application in the normal way with BurpSuite
active, visiting every link and URL, submitting every form, and proceeding through all multistep functions to completion.
If the application uses authentication, and you have or can create a login account, use this to access the authenticated functionality.
Comments review
Review comments in HTML source code:
<!--
//
/*
Robots.txt
The configuration files below may be used by the web application to give information about the accessible and disallowed URI to search engines:
/robots.txt
/sitemap.xml
# file created by Macs' Finder application for every folder and that may contain the names of files in the folder
/.DS_Store
JS & Cookies
Browse with JavaScript enabled and disabled, and with cookies enabled and disabled.
User-Agent
Change the User-Agent header to identify difference in comportment (for example, the application may have a mobile version). Firefox addon that allows for quickly changing the browser's user agent string: User Agent Switcher
Agents
# Browser
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0
# Mobile
Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
Debug parameters
Choose one or more functionalities where hidden debug parameters may be implemented. Use Cluster bomb
attack type in Burp Intruder
and the following common debug parameter names with common values (such as true, yes, on, and 1):
debug
test
hide
hidden
source
For POST requests, supply the parameter in both the URL query string and the request body.
Burp Passive Spidering
Review the site map generated by the passive spidering, and identify any content or functionality that you have not walked through using your browser.
OSINT
Search engines dorks
Google hacking, or Google dorking, is a technique that use the Google search engine to enumerate the ressources indexed by Google in order to map the application and retrieve potentially sensible information.
The following Google
search queries can be used to retrieve potential sensible information about the application:
# Returns every resource within the target site that Google has a reference to:
site:<URL>
# Returns all the pages on other websites and applications that contain a link to the target:
link:<URL>
# Returns all the pages containing the expression specified referenced by Google:
site:<URL> config
site:<URL> login
site:<URL> password
site:<URL> backup
# Returns all pages with the given extensions
site:<URL> ext:xml | ext:conf | ext:cnf | ext:reg | ext:inf | ext:rdp | ext:cfg | ext:txt | ext:ora | ext:ini
site:<URL> ext:doc | ext:docx | ext:odt | ext:pdf | ext:rtf | ext:sxw | ext:psw | ext:ppt | ext:pptx | ext:pps | ext:csv
# Returns pages with SQL errors
site:<URL> intext:"sql syntax near" | intext:"syntax error has occurred" | intext:"incorrect syntax near" | intext:"unexpected end of SQL command" | intext:"Warning: mysql_connect()" | intext:"Warning: mysql_query()" | intext:"Warning: pg_connect()"
# PHPINFO
site:<URL> ext:php intitle:phpinfo "published by the PHP Group"
The Google Hacking Database
, hosted on exploit-db
https://www.exploit-db.com/google-hacking-database, references known Google search queries that can be used to conduct Google dorking.
For each queries, it is advised to browse to the last page of the search results and select "Repeat the Search with the Omitted Results Included".
Accounts & emails scraping
Open resources such as Google, Bing, linkedin, twitter, etc. can be used to harvest accounts and emails associated to a domain.
Those credentials may be used to conduct bruteforce attack subsequently. The tools below automate this scraping:
theHarvester.py -d <target_domain.com> -b all -l 400
Fingerprinting
Determine the technologies in use on the Web Application (CMS, etc.).
Manual Fingerprinting
Look for :
Verbose HTTP headers disclosing version numbers
Server
X-powered-by
X-Generator
...
Google any unknown / non-standard headers to discover which technology may have issued them. Load balancers usually use non-standard and misspelled headers. See Server Exposure.
Default error pages
Known patterns in HTML source code / URI:
CMS | Patterns ------------|----------------------- WordPress | 'Powered by WordPress' | /wp-login.php | /wp-admin/ | ... Joomla | /\_layouts/* Drupal | /node/* | /CHANGELOG.TXT | /INSTALL.txt | /MAINTAINERS.txt | /LICENSE.txt | ... OWAURL | /OWA/
Known Cookies:
Technology | Cookie ------------|------------------- Java | JSESSIONID IIS server | ASPSESSIONID ASP.NET | ASP.NET_SessionId Cold Fusion | CFID/CFTOKEN PHP | PHPSESSID
Automated Fingerprinting
The whatweb
Ruby script can be used to automate the fingerprinting process.
whatweb -a 3 <URL>
Active spidering & URL bruteforcing
Actively spider the application using all of the already enumerated content as a starting point.
Burp Active Spider
Burp Spider
is a module that will automatically parse HTML source for URL and request them, effectively crawling the web application for openly accessible content.
The authentication forms should be completed whenever possible.
[Target] Site map -> right click <target> -> Spider this host
Burp Content Discover
Burp Content Discover
uses various techniques to discover content such as spidering, intelligent URI bruteforcing with adapted to the context wordlists, etc.
[Target] Site map -> right click <target> -> [Engagement tools] Discover content
URL bruteforcing
Use the application root and any other path from already enumerated deemed fit as a starting point.
File extension
Determine file extension to use for the bruteforce (no extension + language extension).
Wordlists
Adapt the word list for the application context.
Example: if all resources in start with a capital letter, the wordlist used in the bruteforce should be capitalized. Check for default content associated with the technologies found.
# Default URI for various CMS
/Discovery/Web_Content/*
# Wordlist of 200k+ and 1.2M+ entries created by the DirBuster Team through internet crawling.
# https://github.com/Qazeer/zap-extensions/tree/master/addOns/directorylistv2_3/src/main/zapHomeFiles/fuzzers/dirbuster
# Lowercase versions: https://github.com/Qazeer/zap-extensions/tree/master/addOns/directorylistv2_3_lc/src/main/zapHomeFiles/fuzzers/dirbuster
directory-list-2.3-medium.txt
directory-list-2.3-big.txt
Recursive
A first brute forcing should be conducted with out recursively brute forcing the discovered sub directories. In case the web application root is defined, two brute forcing should be conducted (on the default / root and on the main application root).
Some interesting sub directories should then be picked for further brute force enumeration.
Tools
The following tools can be used to brute force URI:
# Recommended: ffuf.
# -ic: Ignore wordlist comments (default: false)
# -e Comma separated list of extensions. Extends FUZZ keyword.
ffuf -ic [-e <EXTENSIONS_LIST>] -w <WORDLIST> -u <URL>/FUZZ
# Executes ffuf in the background using nohup and over the URL in the specified file using interlace.
nohup interlace -timeout 7200 -threads <1 | THREADS> -c 'ffuf -r -noninteractive -ignore-body -ac -ic -w <WORDLIST> -o <OUTPUT_DIRECTORY>/ffuf-_cleantarget_.txt -u _target_/FUZZ' -tL <URL_LIST_FILE> &
# GUI
DirBuster
BurpSuite Intruder
# As being written in Go, standalone gobuster binaries can be compiled for both Linux and Windows.
# -a <USER_AGENT_STRING>: sets the User-Agent string, which defaults to "gobuster/3.1.0". Example: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36".
# -k: skips SSL / TLS certificate verification.
# -r: follow redirects.
# -t <THREADS>: number of concurrent threads, default to 10 threads.
# -d: look for backup files of found files.
# -x <EXT | EXT1, ..., EXTN>: file extension(s) that will be appended to file names.
# -s <STATUS_CODE | STATUS_CODE1, ..., STATUS_CODEN>: status code to include in output. Defaults to "200,204,301,302,307,400,401,403".
# -b <STATUS_CODE | STATUS_CODE1, ..., STATUS_CODEN>: status code to exclude from output. Will override included status if set. Example: "400,403,404,500".
gobuster dir -k -r -d -t <20 | THREADS> -o <OUT_FILE> -w <WORDLIST> -u <TARGET>
gobuster dir -k -r -d -t <20 | THREADS> -x <EXT | EXT1, ..., EXTN> -o <OUT_FILE> -w <WORDLIST> -u <TARGET>
wfuzz -t 20 -z file,<WORDLIST> <URL>/FUZZ
# Starts 5 instances of wfuzz iterating over the URL specified in the given file. Each wfuzz process runs with 40 directory brute force threads.
cat <URL_LIST_FILE> | xargs -i --max-procs=5 /usr/bin/bash -c "wfuzz -t 40 --sc 200,301 -f <OUTPUT_DIRECTORY>/{}_status_200_301.txt -z file,<WORDLIST> {}/FUZZ"
dirb
Parameters fuzzing
The wfuzz
tool can be used to fuzz GET and POST requests to find accepted parameters. The SecList
burp-parameter-names.txt
wordlist contains more than 2000 entries of frequent parameter names.
A filter on response HTTP code or lines, words and characters number can be added using --hc/hl/hw/hh code/lines/words/chars
.
wfuzz -w <WORDLIST> '<URL>?FUZZ=test'
wfuzz --hh <CHAR_NUMBER> -w <WORDLIST> '<URL>?FUZZ=test'
Last updated