Back to blog

Data Leakage Detection Best Practices

ReliaQuest 22 September 2021

Hunting for Exposed Data

The concept of protecting sensitive data from leaving the network is not new: data loss prevention (DLP) tools, watermarking, document marking schemes all exist to help combat this risk.

Unfortunately, the threat landscape requires new strategies to accompany these existing approaches. It’s no longer enough to try and prevent this data from being exposed–security teams are demanding ways to go out and hunt for this data themselves.

Much of this is urgent. Imagine you are informed of a breach or are targeted by a ransomware actor that threatens to leak your data: security teams need quick access to dark web sources to verify if data is exposed.

Data leakage detection concerns far more than the dark web, however. Our Solutions Guide breaks down this exposure in terms of four areas–all of which have different types of data that can pose a risk:

1.Search engines

For teams looking to monitor for exposed files, you’ll need to get familiar with Google hacking. Google Hacking refers to targeted search queries that will yield specific results. There are thousands of possible searches you can perform. Here are some examples.

site:s3.amazonaws.com (5.37m results)
filetype:pdf “private and confidential” (430k results)
intitle:”admin login” (281k results) Don’t forget to monitor your own website.

Some of the most sensitive documents are exposed on companies’ own websites.

site:company.com AND “private and confidential” filetype:pdf
- There are some great resources to help, such as Exploit DB’s Google Hacking Database: https://www.exploit-db.com/google-hacking-database.
- For a more digestible set of Google Hacks, check out 25 Killer Combos for Google’s Site: Operator (6 with “inurl”)
- Finally, do not rely on just Google. Bing, Yandex, and Yahoo all have slightly different yields for the same queries and, while Google is often the most extensive, there are variances.

2. Hidden directories and misconfigured online file stores

Misconfigurations of online file stores and file transfer services are commonplace. S3 buckets, FTP, RSync, SMB and website indexes are all frequently misconfigured and publicly available. Digital Shadows (now ReliaQuest) has detected more than 60 billion files exposed across these various online locations. There is a startling number of sensitive documents. For organizations, the most significant risks come from employees and contractors who may back up and store files on their personal store – unaware they are broadcasting sensitive business information to the internet.

3. Code repositories

Some of the most sensitive data exposure comes from technical data that leaks across code repositories and code-sharing sites. This is often due to a simple mistake, where a software engineer believes they are committing to a private repository, but the settings have changed to become public. A great deal of information can be garnered by performing Google hacks across GitHub (i.e., site:github.com “oath”). Still, there are several other sources to consider, such as GitLab, BitBucket, and Stack Exchange. Aside from Google Hacks, there are some sites you may want to consider adding to your OSINT, such as https://searchcode.com/

4. Dark web

Dark web sources are slightly trickier to collect across but can provide vital intelligence into data loss. This includes criminal forums, marketplaces, messaging apps, and ransomware data leak sites. (This is not the place to dive into the various nuances of dark web monitoring, but check out our Dark Web Monitoring Solutions Guide to learn more). It largely depends on what type of data you are looking for: for detecting customer accounts, credentials, and payment card details, you will need to cover dark web sources.

For each of these areas, there are free tools available to begin searching for this exposed data (the guide lists more than 15 tools and resources for detecting and analyzing exposed data).

Attackers Leveraging Exposed Data

Exposed data forms a vital component of an attacker’s reconnaissance effort. Think, for example, how useful leaked penetration tests or network schematics would be to your attackers.

Many of their techniques, shown below, rely on the availability of this data. We can turn to Mitre ATT&CK to see many techniques relevant to exposed data that are regularly observed as part of attacker campaigns:

There’s plenty of real-world examples. LockBit, a ransomware group, used credentials stolen from a previous breach to gain access to a new target.

Cybercriminals have been targeting software engineers’ GitHub accounts, who may have exposed sensitive access keys. This is true of extortion actors such as thedarkoverlord and ShinyHunters, who researchers have observed targeting OAuth credentials that provide access to cloud infrastructure.

Exposure of access keys and secrets is worryingly common; our most recent research detected more than 800,000 exposed keys–38% of these were for cloud services and 43% for databases.

Beyond the Breach

Although exposed data is a treasure chest for attacker reconnaissance, some exposed data poses its own risk.

In 2018, one actor discovered a publicly accessible manual for a Reaper Drone due to a misconfigured FTP (file transfer protocol). That same actor went on to sell that manual on a dark web marketplace.

Personal employee and customer information can also be exposed. SearchLight (now ReliaQuest’s GreyMatter Digital Risk Protection) has unearthed many spreadsheets with customer PII that has been exposed via misconfigured file stores. Undetected and mitigated, this type of breach can lead to loss of compliance and accompanying fines.

Free Tools and Best Practices

To gain a view of how SearchLight (now ReliaQuest’s GreyMatter Digital Risk Protection) alerts on exposed data, you can take a spin around Test Drive for free and see the types of alerts you could expect to see from SearchLight. Alternatively, check out our datasheets on detecting exposed documents and access keys.

If you’re interested in getting your hands on the free tools and best practices, please go ahead and download your free Data Leakage Detection Solutions guide.