Our recent report “Too Much Information”, discovered over 1.5 billion files from a host of services, including Amazon S3 buckets, rsync, SMB, FTP, NAS drives, and misconfigured websites.


We love data, and we need ways to store, share and transfer this data to other individuals and parties. There are a range of services that are used to do this, and one way that has gained popularity over the last few years is cloud storage, specifically Amazon Simple Storage Service (S3) buckets. Unfortunately, many administrators misconfigure these S3 buckets rendering the contents publicly-accessible. Barely a month goes by without another open S3 bucket being discovered – who remembers the data of 198 million voters being exposed last year?

However, S3 buckets are not alone. In our research we found that they only constituted seven percent (7%) of the exposed files we found. Many other services that are used to store, share, or transfer data are also frequently misconfigured:

  • File Transfer Protocol (A network protocol used to transfer computer files);
  • rsync (A way of transferring and synchronizing files);
  • Server Message Block (A network file sharing protocol);
  • Network-attached storage devices (Devices often used to backup home computers).

Combined, these services expose over 1.5 billion files, with SMB, rsync and FTP accounting for 33, 28, and 26 percent respectively.


What’s the damage?

The amount of exposed data is staggering. Over twelve petabytes of data is exposed (12,000 terabytes). For context, this is over four thousand times larger than the “Panama Papers” leak (2.6 terabytes). It’s also 12 thousand times larger than the Deep Root exposure of 198 million voters in 2017. Almost all countries are affected, but the United States experienced the most exposure with 239,607,590 files.


Figure 1: Geographical distribution of exposed data


Types of Exposed Data

It’s not just the volume but the sensitivity of the data that is a major cause for concern. There were a number of instances of high severity exposure of personal information, intellectual property, and security assessments.

There is an incredible amount of personal data exposed, including payroll, tax return and healthcare information. If we consider how much is exposed (the news that the data of 87 million Facebook users may have been harvested is a good example), this adds significantly to this already rich trove of data, providing more and more information that could be used for malicious purposes such as social engineering and fraud. Furthermore, with GDPR fast-approaching, there are clear regulatory concerns for organizations surrounding the protection of personal data, particularly if employees and contractors are copying and archiving work files using cloud storage and NAS solutions.


 Figure 2: Types of publicly-available personal information


Our report also highlights numerous cases of intellectual property that is also exposed through these services. In one instance, a technology company providing Electronic Medical Records software had their copyright application and full source code publicly-available. In another instance, an energy company had sensitive details and diagrams of their patent-pending technology exposed. Loss of intellectual property can also have considerable financial and reputational impacts.


Figure 3: Types of publicly-available intellectual property


Finally, there were a worrying number of security assessments made available. This includes thousands of penetration tests, network diagrams, and security audits. We found a series of security documents belonging to a leading European supplier of electronic identification services used within the banking industry. These files contained in-depth security assessments, source code testing results, and vulnerability scanning reports that revealed details on insecure servers. These infrastructure reports exposed server locations and hosting IPs, missing software patches, port information, CVE numbers, and vulnerability descriptions that may allow an attacker to modify data, inject malicious code, or perform man-in-the-middle attacks. This type of information is a goldmine for attackers targeting organizations, and an attacker will typically spend weeks, if not a couple of months performing reconnaissance on their targets to glean this exact type of information.


Figure 4: Types of publicly-available security assessments


Download a copy of our report to learn more about the types of sensitive data these services are exposing, and how you can help to reduce this problem.

Want more Digital Shadows (now ReliaQuest) research? Subscribe to our threat intelligence emails here.


Photon logo small