The first step in a targeted attack – or a penetration test or red team activity – is gathering intelligence on the target. While there are ways and means to do this covertly, intelligence gathering usually starts with scraping information from public sources, collectively known as open source intelligence or OSINT. There is such a wealth of legally collectible OSINT available now thanks to social media and the prevalence of online activities that this may be all that is required to give an attacker everything they need to successfully profile an organization or individual.
In this post, we’ll get you up to speed on what OSINT is all about and how you can learn to use OSINT tools to better understand your own digital footprint.
If you’ve heard the name but are wondering what it means, OSINT stands for open source intelligence, which refers to any information that can legally be gathered from free, public sources about an individual or organization. In practice, that tends to mean information found on the internet, but technically any public information falls into the category of OSINT whether it’s books or reports in a public library, articles in a newspaper or statements in a press release.
OSINT also includes information that can be found in different types of media, too. Though we typically think of it as being text-based, information in images, videos, webinars, public speeches and conferences all fall under the term.
By gathering publicly available sources of information about a particular target an attacker – or friendly penetration tester – can profile a potential victim to better understand its characteristics and to narrow down the search area for possible vulnerabilities. Without actively engaging the target, the attacker can use the intelligence produced to build a threat model and develop a plan of attack. Targeted cyber attacks, like military attacks, begin with reconnaissance, and the first stage of digital reconnaissance is passively acquiring intelligence without alerting the target.
Gathering OSINT on yourself or your business is also a great way to understand what information you are gifting potential attackers. Once you are aware of what kind of intel can be gathered about you from public sources, you can use this to help you or your security team develop better defensive strategies. What vulnerabilities does your public information expose? What can an attacker learn that they might leverage in a social engineering or phishing attack?
Gathering information from a vast range of sources is a time consuming job, but there are many tools to make intelligence gathering simpler. While you may have heard of tools like Shodan and port scanners like Nmap and Zenmap, the full range of tools is vast. Fortunately, security researchers themselves have begun to document the tools available.
A great place to start is the OSINT Framework put together by Justin Nordine. The framework provides links to a large collection of resources for a huge variety of tasks from harvesting email addresses to searching social media or the dark web.
In many articles on OSINT tools you’ll see reference to one or two packages included in the Kali Linux penetration testing distribution, such as theHarvester or Maltego, but for a complete overview of available OSINT tools available for Kali, check out the Kali Tools listing page, which gives both a run down of the tools and examples of how to use each of them.
Among the many useful tools you’ll find here for open source intelligence gathering are researcher-favorites like Nmap and Recon-ng. The Nmap tool allows you to specify an IP address, say, and determine what hosts are available, what services those hosts offer, the operating systems they run, what firewalls are in use and many other details.
Recon-Ng is a tool written in Python by Tim Tomes for web reconnaissance. You can use it to do things like enumerate the subdomains for a given domain, but there are dozens of modules that allow you to hook into things like the Shodan internet search engine, Github, Jigsaw, Virustotal and others, once you add the appropriate API keys. Modules are categorized in groups such as Recon, Reporting and Discovery modules.
One of the most obvious tools for use in intelligence gathering is, of course, web search engines like Google, Bing and so on. In fact, there’s dozens of search engines, and some may return better results than others for a particular kind of query. The problem is, then, how can you query these many engines in an efficient way?
A great tool that solves this problem and makes web queries more effective is Searx. Searx is metasearch engine which allows you to anonymously and simultaneously collect results from more than 70 search services. Searx is free and you can even host your own instance for ultimate privacy. Users are neither tracked nor profiled, and cookies are disabled by default. Searx can also be used over Tor for online anonymity.
Many public instances of Searx are also available for those who either don’t want or don’t need to host their own instance. See the Searx wiki for a listing.
There are many people working on new tools for OSINT all the time, and a great place to keep up with them and just about anything else in the cybersecurity world is, of course, by following people on Twitter. Keeping track of things on Twitter, though, can be difficult. Fortunately, there’s an OSINT tool for that, too, called Twint.
Twint is a Twitter scrapping tool written in Python that makes it easy to anonymously gather and hunt for information on Twitter without signing up to the Twitter service itself or using an API key as you would have to do with a tool like Recon-ng. With Twint, there’s no authentication or API needed at all. Just install the tool and start hunting. You can search by user, geolocation and time range, among other possibilities. Here’s just some of Twint’s options, but many others are available, too.
So how can you use Twint to help you keep up with developments in OSINT? Well, that’s easy and is a great example of Twint in action. As Twint allows you to specify a
--since option to only pull tweets from a certain date onwards, you could combine that with Twint’s
search verb to scrape new tweets tagged with
#OSINT on a daily basis. You could automate that script and feed the results into a database to view at your convenience by using Twint’s
--database option that saves to SQLite format.
Looks like there’s been 58 #OSINT tweets so far today!
twint -s '#osint' --since 2019-07-17
Another great tool you can use to collect public information is Metagoofil. This tool uses the Google search engine to retrieve public PDFs, Word Documents, Powerpoint and Excel files from a given domain. It can then autonomously extract metadata from these documents to produce a report listing information like usernames, software versions, servers and machine names.
In this post, we’ve covered the basic idea of OSINT and why it’s useful. We’ve looked at a couple of great places where you can discover many OSINT tools to help you with virtually any kind of information gathering you need to do, and we’ve also given you a taste of a few individual tools and shown how they can be put to work.
For anyone involved in cybersecurity, understanding how to collect open source intelligence is a vital skill. Whether you’re defending an enterprise network or testing it for weaknesses, the more you understand about its digital footprint the better able you are to see it from an attacker’s point of view. Armed with that knowledge, you can then go on to develop better defensive strategies.