Originally published May 2019

2.3 billion is a massive number. It’s hard even to wrap your head around; what do I have 2.3 billion of? Video games? No. Books? No. Dollars? …

one billion dollars cash

Certainly not. What about files coming from various file stores on the internet? Not me, personally, but currently 2.3 billion files are being made publicly available by misconfigured and non-secured technologies used to store this data such as Amazon S3 buckets, Server Message Block (SMB), File Transfer Protocol (FTP) and rsync servers, as well as network-attached storage drives. This is an issue that Digital Shadows’ Photon Research Team initially brought to light in 2018 with our Too Much Information report, which detailed the discovery of 1.5 billion files. Well, one year and one massive data privacy regulation (GDPR) later, we’re back for the sequel: Too Much Information: The Sequel, to be exact.


Data Leakage Update: Summary of Our Key Findings

One full year has passed since Digital Shadows’ Photon Research Team looked at the data exposure landscape among online file storage technologies like Server Message Block (SMB) file shares, rsync servers, and Amazon Simple Storage Service (S3) buckets. There are now 750 million more files exposed than we reported last year; not all of them are blatantly sensitive, but there is plenty of gold in these mountains. Several developments over the past year have had an effect—positive and negative—on the data exposure landscape, prompting Photon to re-examine what files are still exposed and updates within the landscape.

Some of our key findings:

  • Overall, we detected 2.3 billion files exposed across SMB-enabled file shares, misconfigured network-attached storage (NAS) devices, File Transfer Protocol (FTP) and rsync servers, and Amazon S3 buckets.
  • The United States held onto its most-data-exposed title (more than 326 million files), although France and Japan lead their regions, with 151 million and 77 million files exposed, respectively.
  • Similar to last year, the SMB protocol exposed the most data among the technologies we analyzed. FTP and rsync servers claimed 20 percent and 16 percent of the exposure detected, respectively.
  • Threat actors are actively attempting to exploit this exposure. We discovered that over 17 million files across these online file repositories, which are often used for backing up data, had been encrypted by ransomware, 2 million of them linked to “NamPoHyu”, a variant of the “MegaLocker” ransomware.
  • Amazon’s new feature Block Public Access was introduced in November 2018 and has reduced the overall exposure of S3 buckets to a nearly unrecognizable amount. Having found 16 million files coming from S3 buckets in October 2018, we’re now seeing fewer than 2,000 such exposed files.
  • There are already two success stories following the General Data Protection Regulation (GDPR) being enacted in the European Union (EU): Luxembourg and the Netherlands have reduced their overall exposure and have national laws in place to implement GDPR. So far, they are the only two EU countries to do so; France, which has the greatest exposure among EU member countries, has yet to fully align to the GDPR at a national level.
  • The problem of inadvertent data exposure is not an impossible one to solve. We outline several technical mitigation steps you can take to use these file storage technologies safely and efficiently. And as with anything information security related, educating technology users is another vital step.


Raiders of the lost data arks

As of May 16, 2019, there are over 2.3 billion files exposed across online files stores like Amazon S3 buckets, SMB-enabled file shares, and NAS drives. Let that number sink in (we’ll wait) …2.3 billion. To give some perspective, that’s the amount in dollars that the latest Marvel movie spectacle Avengers: Endgame raked in at the box office within three short weeks of its release. We promise there are no spoilers in this report.

In Photon’s research for our Too Much Information report published last year, we spotted just over 1.5 billion files, thinking that alone was incredible. Using the same proprietary research technology, we’ve now detected an additional 750 million files. Last year the United States had the highest amount of exposure across online file repositories and, well, not much has changed in that respect. Our aim isn’t to place blame on any one country for the massive amount of file exposure, but there’s a healthy spread of light criticism to go around. While the United States left over 326 million files exposed, France accounted for 151 million of its own files, the United Kingdom claimed 98 million, and Japan was responsible for 77 million. As we said last year, this is a global problem that is getting even more out of hand. Countries nestled in the European continent collectively exposed the highest number of files, accounting for over 1 billion (ahem…GDPR), with another 400 million coming from the Asia-Pacific region and 590 million from across the Americas.

files distributed across the globe



SMB File Shares: SMBody stop me!

distribution of publicly-exposed data across the web

Figure 1: Geographic distribution across most of the file storage technologies analyzed.


Across those 2.3 billion files, nearly 50 percent (okay, 46 percent, to be specific) were exposed through SMB file shares: 1.071 billion files, representing a 547.6 million file increase. This takes a much bigger “chunk of the pie” than the other sources, when you look at the data laid out in Figure 2. Although rsync and FTP accounted for less of the overall exposure, only exposed rsync files actually decreased since last year, to 53.7 million files. FTP-hosted files increased by over 54 million, essentially cancelling out rsync’s decline.

Amazon S3 buckets, WebIndexes, and NAS drives also increased their overall exposure in the past year—by 79.7 million, 103 million, and 19.9 million files, respectively. But one of the main points from last year still stands true, looking at the pie chart below: Even though Amazon S3 buckets had the lion’s share of bad press and media coverage, there’s much more to the story than those caches alone.

data exposure image blog

The fact that SMB-enabled file share exposure nearly doubled in the past year is troubling for a couple of different reasons. Reason #1: We’re not entirely sure why that’s the case, although there have been some updates that could be potential indicators. In June 2018 Amazon AWS Storage Gateway added SMB support that allowed file-based applications developed for Microsoft Windows an easy way to store and access objects in Amazon S3. In November 2018 threat intelligence published by Akamai stated that threat actors were intentionally opening SMB ports 139 and 445 for their malicious purposes. Could this be blamed for the uptick? Reason #2 will come later.

Overall, each of the source types we assessed, whether FTP servers, misconfigured websites (WebIndex), or NAS drives, increased their overall exposure counts from last year. The issue of accidental exposure has never been more real. Photon set out to detail not only the kinds of data floating around in the Internet ether but why their exposure is an issue.


overall file storage exposures

Figure 2: Overall file storage exposures by percentage and number.


Ransomware targeting exposed SMB

ransomware targeting exposed SMB


Ah, ransomware…the never-ending extortion trend. The troubles and woes that ransomware can cause to businesses and individuals are well documented, starting with attackers encrypting files and holding them for “ransom”, and promising a decryption key once the victim pays the attacker, typically in Bitcoin. The standard mitigation the security community advises for those fearing a ransomware attack is to back up your files so that, in case you’re infected, you can quickly revert to saved copies and avoid downtime or payments to the attackers. But what if your backups have been encrypted by either the same variant that locked you out of your system or another variant entirely?

detected files encrypted by NamPoHyu ransomware

Figure 3:  Detected files encrypted by the NamPoHyu ransomware over time.


We detected millions of ransomware-encrypted files across the various file stores that are often used to back up systems; 17,141,587, to be exact. One variant, in particular, caught our eye: NamPoHyu. It was discovered in April 2019 as an update to the MegaLocker variant, and targets systems a little differently than traditional ransomware, by going after vulnerable Samba servers. Side note: Samba is the open-source implementation of the SMB protocol that runs on Unix-based systems and allows for file communication to Windows operating systems. We couldn’t find any numbers to suggest how widespread this ransomware may be, so we found some ourselves: Over 2 million files have been encrypted with the .nampohyu file extension, beginning around the first week of April 2019. Backing up data is not the only solution to the problem of ransomware: Secure those backups, too.

SamSam: Life after death

SamSam threat intelligence


As a bonus for Digital Shadows fans, you’ll recall in our previous paper A Tale of Epic Extortions, the Photon team detailed how the “SamSam” ransomware operators gained access to victims’ networks and held sensitive data hostage. As the analysis of our ransomware data continued, one particular file extension, .otherinformation, seemed familiar. It is, in fact, a known extension added to files that have been encrypted by the SamSam ransomware. As we combed through the data using other known extensions associated with SamSam, we detected more instances of SamSam infections that hadn’t been reported publicly. In one instance, a small university in California was infected, in January 2017, and another victim was a digital marketing firm for the automotive industry. Although this discovery is somewhat moot, due to the November 2018 United States Department of Justice indictments of the two alleged Iranian perpetrators of the SamSam attacks, it’s an exciting finding, regardless.


extortion graphic


Healthcare Data Exposure: Healthcare(less)

Healthcare data collected by hospitals—like patient diagnoses, medical images, and operational data—should be some of the most secured information, end of story. Stolen finances can be reclaimed and passwords to accounts can be changed, but profoundly personal information about your health can’t be rehabilitated with a few keystrokes. Unfortunately, there are about 4.7 million medical-related files exposed online through the file repositories we analyzed. Most of them are DICOM1 (.dcm) medical imaging files—about 4.4 million, which is double the number we saw last year (2.2 million).

As with all of the cases we discuss in this paper, not every single one of the exposed files is going to contain something sensitive. However, the sheer amount of information exposed illustrates the extent of individuals’ privacy violations, and of regulations like HIPAA2 in the United States. DICOM files were the most abundant type found, but there were others; among the 300,000 or so found were Health-Level 7 (HL7) and the HIPAA healthcare transaction format X12.


healthcare data exposure1. Digital Imaging and Communications in Medicine
2. Health Insurance Portability and Accountability Act of 1996

Identity theft on a silver platter

identity theft research


Photon discovered an open FTP server that belonged to an unwitting individual and contained everything an attacker would need to conduct identity theft: job applications, personal pictures, passport scans, and bank statements, all completely open for the world to see. Even though businesses are often the loudest voices regarding financial crimes and are likely responsible for a lot of the data currently exposed, this instance highlights the profoundly personal side of the issue. If an attacker wanted to gain access to this individual’s bank account, they’d need to perform only minor social engineering of the victim’s bank, as all the information they would need is entirely accessible. This could be devastating to them or their family if they don’t catch the fraud in time or are simply unaware that it’s happening. Applying for loans, directly stealing money from a bank account, or selling the information online are just a few ways that criminals could monetize this exposure.


Third Party Risk

third party risk


There’s been a boatload of discussion about third-party and supply-chain risk these days, and it’s easy to see why. Companies look to third parties to grow their businesses as their digital footprints continue to grow and expand online. In its annual report Data Risk in the Third-Party Ecosystem, Ponemon Institute surveyed over 1,000 security practitioners in the United States and United Kingdom, finding that 59 percent of their organizations had been hit by a data breach because of a third party.

Take IT management, for example. A construction company that manages building projects day to day may not have the expertise to deal with various technologies or operate securely enough to protect their employees and business from the various threats that exist. So, being the responsible individuals they are, the managers hire an IT management consulting firm to support their employees and enable the business to scale to the demands from their clients. But what if that consulting firm isn’t securing the documents created for the construction company? They could easily become victims in the same way a small IT consulting company in the United Kingdom did recently: More than 212,000 files were exposed by a third party, showing not just their firm’s information, but details of their clients, too. What might be the most frightening is the password lists that were found for various clients of their company—in plaintext. Furthermore, we spotted two instances in which the password lists included the passcode to the individual’s cell phone.

Amazon S3 Exposure Increased, But There’s Still Hope

number of files being exposed

Figure 4: Number of files being exposed by Amazon S3 buckets over the past year.


Now let’s get into some of the good news. There’s a lot of data out there, but many brilliant people have been working hard to try to cap the steady outflows. In November 2018 Amazon introduced Amazon S3 Block Public Access, which further locked down the default security controls for Amazon S3 buckets. Amazon S3 users can now easily set global block rules for private files, which is great! Data privacy FTW.

The data Photon gathered suggests this S3 feature has made a noticeable difference in the data exposure landscape. The massive spike seen in Figure 4 occurred just before the Block Public Access implementation, and you can see that since November there has been a significant decrease in the number of publicly accessible S3 buckets. That large, pre-feature spike represents just over 16 million S3 files. And now? The exposure hardly even registers on the same chart. On May 16, 2019 we detected 1,895 open buckets, but this is a far cry from the state that S3 buckets were in just a year or two ago. Nice work, everyone.

Earlier in 2018, Microsoft also took a bold initiative and stopped pre installing SMBv1 in Microsoft Windows 10 and Windows Server. However, it’s hard to tell what the full impact of this has been. For starters, Photon’s data collection includes all versions of SMB, not distinguishing between v1, v2, or v3. Additionally, far too many businesses are still running older versions of Microsoft Windows for various reasons. We’re not blaming any of you—it’s okay to run older operating systems—but this is the kind of update you miss out on when you do that. Kudos to Microsoft for removing SMBv1 from the pre-install list. (Baby steps)


GDPR Considerations: Compliance or defiance?

One of the biggest questions we wanted to answer in this follow-up paper was: How has the enactment of the EU’s GDPR affected the data exposure landscape? GDPR enforcement came into effect at the end of May 2018, so we shouldn’t see anything in our data coming from an EU member state, right? At least, nothing sensitive? Of course, GDPR is much more complicated than that; the short answer is that we have seen some progress but are still a long way off.

There has been so, so much discussion about the effects of GDPR on companies and how everyone processes data for EU citizens. Now that we are a year into enforcement, are we seeing less exposure of this data? Unfortunately, we detected 883 million exposed files. Within the United Kingdom, specifically, 43.5 million more files have been exposed in the year since GDPR came into effect, and 262 million more among all of the EU member states. Out of those 28 members, only two experienced a drop in the number of files exposed: Luxembourg and the Netherlands. Why were they the only two? And does this somehow show that regulations and compliance don’t work to combat this issue?


Data Leakage in Luxembourg and the Netherlands

Luxembourgian (yes, that’s how you say it) data exposure has decreased by 28 percent since the GDPR landed, which you can see in Figure 5. There also seems to have been a pretty significant downward trend beginning around late August to mid-September 2018. It’s worth drilling into the details here, to see if there’s any cause for this.


Luxembourg data leakage

Figure 5: Number of Luxembourg-based files being exposed over time


Two data privacy laws were implemented in Luxembourg on August 1, 2018 and came into effect a few weeks later, on August 20. For this member state of the EU, the laws were designed to bring Luxembourg in line with the GDPR. They repealed the previous data protection law from 2002 and outlined the general data protection framework that the country would operate under. They also established the National Data Protection Commission, outlining that body’s composition and granting it the powers required to align with the GDPR. Now, we’re not saying this has erased Luxembourg’s data exposure, but it does appear to be having an effect. Luxembourgers (also correct) appear to be “straightening their ties”, attempting to come into compliance with the GDPR. Good work.


Netherlands-based files exposed

Figure 6: Number of Netherlands-based files being exposed over time


Post-GDPR enactment, the Netherlands has decreased its exposure by 8 percent, which is a solid effort. Policy makers were ahead of the curve: starting the legislative process for the Dutch GDPR Implementation Act in December 2017; publishing the law on May 22, 2018; and beginning to apply it on May 25, 2018. The data suggests that, at least for this year, the Dutch are trending in the right direction. Right on time!

Let’s look beyond these sole two examples of EU countries with decreased overall exposure, to the other end of the spectrum: France. Of all the EU member states, France has the greatest file exposure (the second greatest worldwide), sitting pretty at over 151.6 million files. In June 2018 France enacted a law that updated a law from 1978, but it didn’t bring the country sufficiently up to speed for GDPR. In December 2018 legislators adopted Order No. 2018-1125, going into effect on June 1, 2019, which is essentially a set of laws and updates relating to data protection that would more firmly align France with the GDPR. We’re not saying that laws or regulations are the end-all, be-all solutions to data exposure; obviously, there’s much more to it than that. However, these three examples do suggest there’s at least some correlation. Here’s hoping that ongoing updates in the GDPR landscape put a patch on at least some of those leaky buckets.


How to Reduce Data Leakage and Data Exposure For Your Business

With great data comes great responsibility (to secure it). Accidental data exposure is a problem that won’t just go away. What’s become obvious over the past year, since our first paper on the subject, is that taking action can bring down the body count when it comes to files stored online. From a practical point of view, what can you do? First of all, forget that you turned a blind eye over the last 12 months; to err is human. Below are some ideas that can help reduce some of the data exposure we’ve witnessed and maintain a safe distance away from any newspaper headlines. But remember, also, that behind every great information security plan is a foundation of educated users acting on their best behavior.


data leakage graphic


To stay up to date with the team’s latest threat intelligence and security research, make sure to subscribe below.