Data Security and Compliance in Cloud-Native Business Intelligence

This article is part of our series Selecting the Right Visualization Tool with Confidence. Other articles in the series include:

The risk of data breaches is huge, and is one of the main reasons companies are slow to adopt cloud computing. Facebook is in the news today for over half a billion profiles being leaked, including personal information! Google had to pay over $55,000,000 dollars in GDPR fines in 2020. 

If the big companies cannot get it right, why take the risk at all? Companies that do not become a data-driven business lose out to the ones that do. There is no sustainable alternative. Forrester research shows that organizations with data-driven insights are 140% more likely to create sustainable competitive advantage, and take tremendous market share from traditional organizations. 

In this article, my goal is to give you an overall understanding of what you should pay attention to and how to mitigate the risks. I am no lawyer, so you should make decisions based on advice from qualified legal, accounting, and security experts. 

Why use a cloud business intelligence tool?

This gets into a larger strategic question of digital transformation and whether to use cloud computing or infrastructure at all. In spite of all the benefits of using cloud infrastructure and SaaS tools, such as lower total cost of ownership (TCO), agility, and scalability, perception of higher security risk has been an impediment to cloud adoption. 

Ultimately, each organization has to make this decision on their own. Strategy in any risk-reward decision is greatly affected by how to mitigate or minimize the risks involved. Some organizations take a risk avoidance approach instead–which in my opinion outweighs the risk.

When it comes to using a cloud-based business intelligence tool, you will see much shorter onboarding time, and your maintenance costs are zero. So time to value is shorter, and if the tool’s pricing is in line with the market, TCO will be lower.

How is my data safe in an online system?

Data can be just as safe, if not safer, in an online tool than in a tool you manage internally. Seriously. In today’s information worker world, people connect into your network from home, coffee shops, mobile phones, etc. They are likely already connecting to your internal systems over a VPN connection, and to several SaaS products such as Salesforce or Jira. 

The top reason for data breaches is old, unpatched systems. When the security community identifies vulnerabilities in operating systems or network devices, they share these vulnerabilities in lists known as Common Vulnerabilities and Exposures (CVEs). This allows everyone to act quickly and with maximum information to resolve the vulnerabilities.

However, there is a catch. IT departments must actively manage servers, operating systems, and networks to apply these updates, so that they are no longer vulnerable. Although every IT department claims they are following best practices, the number of security breaches due to unpatched systems objectively states otherwise. Now, compare that to a product company, where every bit of their livelihood depends on keeping their SaaS tool patched and up to date–the incentives and human nature state are in their favor compared to your company’s internal systems

The second reason for data breaches is social engineering–tricking people into using weak passwords, sharing too much information, or providing an opening. Recent years have seen an 9-fold increase in these types of attacks, because of how easy and effective they are. These can be simple or complicated, but the methods attackers use often take bits of information from various sources to triangulate on a successful attack. Again, everyone assumes they would not be taken in, but the security research says otherwise. Using a cloud tool vs an on-premises tool does not impact this risk one way or the other.

You will likely run into more arguments against using an online business intelligence tool. With each, play devil’s advocate until you get past the platitudes and really understand how much of a risk each one might be.

How to evaluate a cloud BI tool’s security

Where is your data?

This is critical. How much of your data goes to the SaaS vendor’s servers? Two methods exist, with one carrying more risk than the other. In the first method, your raw data is brought into the vendors’ servers, where it is transformed or modeled into an analytical data model. In the second method, only aggregated data is brought into the vendors’ servers. The second is much lower risk, and most newer vendors take this approach.

This is a foundational architecture decision by the vendor, and not one they will be able to change for you. Often a vendor will tout this as an important feature–bring your data into our servers, and we can provide valuable data modeling to make your team more efficient (etc, etc). However, in the modern cloud data architecture, this is not a must-have feature. Tools like dbt are much more suited for this transformation, allowing your BI tool to focus on presenting the data, not transforming it.

A second question to ask about the location of your data is where the BI vendor’s servers are located. Most will be running on infrastructure provided by one of the big three cloud infrastructure providers (AWS, Azure, GCP), but not always. Each of these providers has regions globally. Depending on your industry, you may be required to guard against your data being “exported,” simply meaning that it cannot be transmitted or stored outside of your country. This leads to a line of questioning with the BI tool vendor about where their servers are located, and how they protect against data accidentally flowing through networks it should not.

Areas of exposure

When using a SaaS BI product, three technical vectors are the main consideration. When evaluating a tool, focusing on these three areas of security will be of most benefit: 

  1. The HTTP connection. This is the network opening that allows the user to connect from the browser. Do they use TLS/SSL for all connections? 
  2. The database connection. This is the network opening that allows the SaaS tool to connect to your data warehouse (which also could be on the cloud, or might be an on-premises database)
  3. Embedding. In an embedded visualization situation, this allows your application to embed dashboards from the vendor tool, and is important to review

Information security policies

You should ask to review these, and have a technology architect or security expert familiar with our business to call out any potential issues. Some of these will be technical in nature–encryption at rest, encryption in motion, etc. But the real focus of these is on people and policy. Things like password requirements; approvals; audits and reviews; and procedures and communications in case of breach. 

Mitigating and Minimizing Risk

Attestations and risk levels

Depending on your industry, you will have various regulations around data security. Some of the more widely recognized are HIPAA and GDPR. By using vendors who have participated in attestations or audits, you defer to experts and push much of the cost of risk mitigation onto the service provider. Here is how it works: consider a situation where a cloud vendor wants to process Personally Identifiable Information (PII) on behalf of its clients. Regulations state that these procedures must be audited. If each of the vendor’s clients must pass through an audit, that could mean hundreds of audits on the vendor, and each client must pay for their own audit. An attestation allows for the vendor to be audited a single time, and the auditor provides an attestation to each of the clients. This is efficient, cost effective, and is the standard in the audit world. 

SOC-2 is widely recognized as the standard for SaaS vendors. Developed by the AICPA, it is a robust framework that ensures a minimum level of compliance around data security controls. You should ask about this, as well as any industry-specific regulations and attestations that the vendor may have in place. Most of the companies we work with are not large enterprises, so working with one of the Big 4 accounting firms does not make sense for them. Linford & Co is the leading provider of attestations for SaaS companies and is a group I trust with these kinds of needs.

An up and coming risk mitigation strategy is to automatically monitor risk exposure of your vendors. The risk network curated by Cyber GRX. Not every vendor you are considering will be a part of this network. But if they are, using the information to reduce your third-party risk is an easy way to gain more comfort in using a particular vendor.

Alternatives to naive cloud architecture

In some cases, a hybrid approach between on-premises versus cloud is available. This is especially applicable in systems with a lot of moving components, such as a data platform. Datateer is designed for the security-conscious customer, with high levels of segregation to ensure there is no “cross-pollination” of data, and that data never leaves your control. This is not the mainstream approach, which is to have your data flow onto vendors’ servers. This makes things much easier for the vendor to process, but increases risk substantially.

One of the pillars in our stack is Prefect, which has pioneered this hybrid approach. This approach is more difficult for cloud-based BI tools to achieve. But as mentioned earlier in this article, if they have designed for it, they can prevent your raw data from flowing anywhere unnecessarily.

Part of this hybrid approach could mean hosting your own business intelligence tool on cloud infrastructure. This will guarantee that none of your data flows onto a cloud vendor’s systems, but it is quite a bit more maintenance. And it exposes you to the problem mentioned earlier of old, unpatched security vulnerabilities. Surprisingly, few options exist in this vein. Superset is a young but great option for internal analytics. And if you get going and realize managing your own solution is too much to take on, the project creators provide a commercially hosted option at Preset.

Insurance

Should be a no-brainer, but often this is overlooked. You and the vendor you choose should both have a Cyber Liability Policy including Data Breach Coverage. Regardless of whether you use a cloud vendor or on-premises solution, regardless of how good the information security policies and attestations make things seem, breaches are likely to occur. It is almost common knowledge that data breaches are a “not if, but when” situation. That doesn’t absolve all of us from our due diligence, but it certainly calls for protecting against the situation.

Summary

The benefits of using cloud-based business intelligence tools outweigh the risks. With a focus on mitigating and minimizing the risk, you can enjoy those benefits while protecting your business from the downsides.

In this article, we talked about key risks to be aware of and ways to evaluate BI tools in light of those risks. We also discussed ways to mitigate and minimize the risk of trusting a third party with your data.

Many vendors pay attention to all this and can help you understand the security posture of their products. You can also take advantage of Datateer’s free strategy sessions to talk through these risks and help make decisions.

Ultimately, the benefits will outweigh the risks for most, including you!