Research | Our Q3 report details what's new in the world of ransomware.
Reduce Alert Noise and False Positives
Boost your team's productivity by cutting down alert noise and false positives.
Automate Security Operations
Boost efficiency, reduce burnout, and better manage risk through automation.
Dark Web Monitoring
Online protection tuned to the need of your business.
Maximize Existing Security Investments
Improve efficiencies from existing investments in security tools.
Move your security operations beyond the limitations of MDR.
Secure with Microsoft 365 E5
Boost the power of Microsoft 365 E5 security.
Secure Multi-Cloud Environments
Improve cloud security and overcome complexity across multi-cloud environments.
Secure Mergers and Acquisitions
Control cyber risk for business acquisitions and dispersed business units.
Solve security operations challenges affecting critical operational technology (OT) infrastructure.
Force-Multiply Your Security Operations
Whether you’re just starting your security journey, need to up your game, or you’re not happy with an existing service, we can help you to achieve your security goals.
Detection Investigation Response
Modernize Detection, Investigation, Response with a Security Operations Platform.
Locate and eliminate lurking threats with ReliaQuest GreyMatter
Find cyber threats that have evaded your defenses.
Security metrics to manage and improve security operations.
Breach and Attack Simulation
GreyMatter Verify is ReliaQuest’s automated breach and attack simulation capability.
Digital Risk Protection
Continuous monitoring of open, deep, and dark web sources to identify threats.
GreyMatter Phishing Analyzer removes the abuse mailbox management by automating the DIR process for you.
The GreyMatter cloud-native Open XDR platform integrates with a fast-growing number of market-leading technologies.
Unify and Optimize Your Security Operations
ReliaQuest GreyMatter is a security operations platform built on an open XDR architecture and designed to help security teams increase visibility, reduce complexity, and manage risk across their security tools, including on-premises, clouds, networks, and endpoints.
Brands of the world trust ReliaQuest to achieve their security goals.
Learn how to achieve your security outcomes faster with ReliaQuest GreyMatter.
The latest security trends and perspectives to help inform your security operations.
Industry Guides and Reports
The latest security research and industry reports.
Catch up on the latest cybersecurity podcasts, and mindset moments from our very own mental performance coaches.
A deep dive on how ReliaQuest GreyMatter addresses security challenges.
The latest threat research report from ReliaQuest Threat Research research team.
The latest white papers focused on security operations strategy, technology & insight.
Current and future SOC trends presented by our security experts.
Events & Webinars
Explore all upcoming company events, in-person and on-demand webinars
From prevention techniques to emerging security trends, our comprehensive library can arm you with the tools you need to improve your security posture.
Get the latest threat analysis from the ReliaQuest Threat Research Team. ReliaQuest ShadowTalk Weekly podcast featuring discussions on the latest cybersecurity news and threat research.
ReliaQuest's ShadowTalk is a weekly podcast featuring discussions on the latest cybersecurity news and threat research. ShadowTalk's hosts come from threat intelligence, threat hunting, security research, and leadership backgrounds providing practical perspectives on the week's top cybersecurity stories.
November 30, 2023
We bring our best attitude, energy and effort to everything we do, every day, to make security possible.
Security is a team sport.
No Show Dogs Podcast
Mental Performance Coaches Derin McMains and Dr. Nicole Detling interview world-class performers across multiple industries.
Make It Possible
Make It Possible reflects our focus on bringing cybersecurity awareness to our communities and enabling the next generation of cybersecurity professionals.
Join our world-class team.
Press and Media Coverage
ReliaQuest newsroom covering the latest press release and media coverage.
Become a Channel Partner
When you partner with ReliaQuest, you help deliver world-class cybersecurity solutions.
How can we help you?
A Mindset Like No Other in the Industry
Many companies tout their cultures; at ReliaQuest, we share a mindset. We focus on four values every day to make security possible: being accountable, helpful, adaptable, and focused. These values drive development of our platform, relationships with our customers and partners, and further the ReliaQuest promise of security confidence across our customers and our own teams.
Here at Digital Shadows (now ReliaQuest) we collect and store data from all across the web in different languages and formats. This post will be about some of the challenges you are likely to face trying to handle data in different languages and how to deal with them. Most of our code is in Java so the examples here will all be written in Java.
Not every document or web page declares its encoding (we’ll ignore those for now that lie about their encoding). So you’re either left with following the default, for example Microsoft Word documents default to UCS-2, or taking a guess. Let’s assume we have found a small file on the internet and we want to read its content. We read it into a byte array but we have no idea how the file was encoded. Let’s assume the content is UTF-8 and try and decode it.
The output we get is:
This is probably not what the content of the document is supposed to look like. Given that we guessed the encoding we can assume we got it wrong. Perhaps we should save our string and look at it later when we have a little more information. Let’s save our string (or write it to the console in this case)
Hold on what’s this?
This isn’t even the same byte array we passed in. This default behaviour does not work for us. Losing the original data is even worse than not being able to interpret it. What we really want is to report some kind of error when we meet content we can’t handle rather than quietly corrupting it. Thankfully Java can take of that for us.
Now when trying to decode our bytes we see:
Lesson – use the right functions for decoding data and don’t trust that it will be in the encoding you expect.
So we have some bytes that we have successfully avoided mangling it into a string in an invalid format. Now what? We could try again with another likely encoding but this doesn’t always make sense. We could keep the byte array and store that but now our system needs to handle storing two different types of data, raw bytes and strings. Wouldn’t it be better if we could store it as a string and still not destroy anything in it?
There are two obvious ways to handle this and that’s either encode the bytes in Base64 or use an encoding that won’t mangle our input. Here ISO_8859_1 comes to the rescue.
Our output from this is:
This is not the string that the bytes are supposed to represent but has the advantage that we haven’t lost anything. If we ever find the right encoding to use we can decode that back to the source byte array and re-encode it correctly.
Lesson – if you aren’t sure what something is you don’t need to throw it away. You can store it and come back when you do.
Here at Digital Shadows (now ReliaQuest) we use several different technologies for storing and processing our data, depending where it came from and what we want to do with it. Care must be taken to ensure that these systems are storing this data safely and correctly. Take MySQL for example, we are going to be consuming data from websites all over the world so we want to store it in UTF8.
A quick google search leads us to a well-meaning stackoverflow where you are told how to set the relevant configuration options to utf8. Problem solved?
However a closer read of the MySQL documentation will tell you:
To correctly store 4 byte characters in MySQL you must use utf8mb4 as your encoding.
Lesson – read the documentation carefully when choosing how to configure your databases and test the edge cases to make sure it works how you think it does.
Sadly the RFC’s for domains and URI’s don’t specify the encoding. However the World Wide Web Consortium recommends that UTF8 be used.
This has generally been taken to mean that all percentage encoded URL’s should be in UTF8 and that everyone will build their websites to decode their characters in UTF8. If you’re building your own website this is great advice to follow. If you want to know if you have seen a URL before in a decoded form you can’t rely on ‘recommends’.
Trying to decode the following will, in most of the URI decoders, result in an error.
However with a bit more context:
We can see that the URI is from a Chinese company. So for the final big reveal the URI encoding actually represents the same byte array we were using earlier. This extra information about the location of the website is what we needed to help us identify the right encoding. We try decoding the byte array with the GBK encoding, an encoding for simplified Chinese characters.
Lesson – when building your own web services you should follow the specification’s recommendations. When consuming others data you can’t assume they have.
In order to support collecting and analysing data in different languages you have to work to the data not to a standard or recommendation. Not every website or document will be valid, in UTF8 or even complete. This data is still important, we need to correctly handle it, detect when we can’t, fall back to safe practices and ensure that when we are done, we store it correctly.
At Digital Shadows (now ReliaQuest) we’re always on the lookout for the very best technical talent. If solving the hardest challenges and working in a fast-paced environment appeals to you, head on over to our careers page to find out more.