Whether it was the Mirai botnet and Dyn or the “Cloudbleed” revelations, content delivery networks (CDNs) have been in the news recently. Research by Swisscom and Digital Shadows (now ReliaQuest) found over 100 million web pages and files exposed on CDNs, with many sensitive pdf, ppt and xls files publicly available online. The risks don’t stop here; if improperly configured, CDNs can be used to bypass age restrictions and registration requirements.
What is a CDN?
To start off, let’s level set on what a CDN does. A CDN is a system of distributed servers that deliver webpages and other web content to a user based on the geographic locations of the user, the origin of the webpage and a content delivery server. This means that users can access content a lot quicker, as well as making them less susceptible to denial of service attacks. Given that over 52% of the Alexa 1,000 websites use a CDN, you might not realize how often you are browsing CDN delivered content.
Figure 1: Diagram of a CDN. Source: gtmetrix.com
To assess the amount of content exposed by CDNs and the subsequent risk:
- We first enumerated as many Content Delivery Networks as possible and identified the most deployed CDNs. In total we identified 293 CDNs, many of which can be found here https://raw.githubusercontent.com/WPO-Foundation/webpagetest/master/agent/wpthook/cdn.h.
- Searches for these domains were completed across Google, Yandex and Bing to identify the search engine with most coverage. Google was found to have the highest yield, having the most results in over 50 percent of the CDN providers.
- Other searches were performed to assess the number of file types and the sensitivity of these documents.
- Finally, more manual analysis was applied to understand the implications of the content of these documents.
Over 100 Million Indexed Pages Leave Organizations Exposed
In total, searches indicated that there were 103,944,919 indexed web pages and web content across the CDN domains we assessed. Of these, nearly 15 million CDN delivered web pages had pdfs on them. Many of these were benign, but over 22,000 were sensitively marked and not for public distribution.
Some of the findings were enlightening. There was no shortage of intellectual property across pdfs and ppts, with designs, financial information, plans and pricing models and even reports about nuclear generating stations (Figure 2) all readily available.
Figure 2: Nuclear Generating Station
This could produce a gold mine for competitive intelligence, espionage and phishing. No hacking is necessary – the content is already out there.
The publicly available spreadsheets (xls and csv files) were worrisome as well. Examples of the types of data discovered included:
- Sensitively marked patient health testing data
- A mobile app development competition database with exposed visa numbers, dates of birth, gender and occupation
- Membership details of clubs with names, home addresses, emails and telephone numbers (See Figure 3)
Figure 3: Spreadsheet
CDNs can be used to bypass of protection mechanisms
Security mechanisms are put in place so that a website’s content is protected. However, in some instances, CDNs can be used to bypass these restrictions.
Take YouTube’s age restrictions, for example. Navigating directly to the video itself will force users to log in and verify their age (Figure 4). By searching for the video through a CDN, users can bypass this control on age restriction.
Figure 4: Age restriction on YouTube.com
Figure 5: Bypassing YouTube’s age restriction via a CDN
Secondly, we identified ways to bypass registration requirements for content. An online education platform that charges between $99 and $995 a year. For this fee, users can access a wide range of course materials. Unless they choose to access these resources through the website’s CDN, which would cost the users.
Why it matters
It is no surprise that there is sensitive information available through search engines; there are many instances of data exposed through an organization’s supply chain. As demonstrated by the previous examples, the impact of these external digital risks include:
- Loss of revenue
- Reputation damage
- Compliance issues
Adversaries can reap the rewards of these CDN issues by directing and tailoring their searches to these domains.
What can be done
Let us be clear – most files and pages available through CDNs are perfectly benign. However, a subset of this can leave organizations exposed. Considering the upcoming EU GDPR regulations, it is important that organizations understand where their data exists online. The fact that CDNs duplicate this information can pose a risk for organizations. In various cases that we identified it was actually the CDN which is exposing the data without the organization’s consent. There are several things organizations can do to secure their data, identify and mitigate the risks associated with the Digital Shadows (now ReliaQuest) found on CDNs:
- Use URL signing and appropriate TTLs on URLs that you share. URL signing allows you to protect your files from unauthorized access with a key. Cdn777 provides good advice https://client.cdn77.com/support/knowledgebase/cdn-resource/how-do-i-set-up-signed-urls.
- Have a defined document marking system, whether that is through Digital Rights Managements (DRM) are a defined template system in MS Office. This will allow you to more readily identify which documents should or should not be available online;
- Ensure that your sensitive information is not being indexed by search engines. Most CDNs will offer guides on how to unindex pages. Hubspot, for example, provides good advice on how to use noIndex and nofollow HTML metatags.
- Setup Google Alerts to monitor for the risks associated with CDNs. Understand that it isn’t always you that will be exposed these documents; often it is third parties.