Data Lake vs Data Warehouse vs Data Mart
Data Analytics, data strategy

Data Lake vs Data Warehouse vs Data Mart

Getting your head around data storage can feel like trying to pick the right tool out of a packed toolbox. You've got data lakes, data warehouses, and data marts. 

Sure, they might sound like they do the same job—like storing all that crucial data your business keeps churning out. But, believe it or not, picking the right one can make a massive difference in how you use that data to make smarter decisions.

You don’t use a hammer for everything. Each of these tools has its speciality. And knowing which is which? That's what we're here to figure out together.

So, if you've ever scratched your head thinking, "What the heck's the difference?" you're in good company. We're about to break it down, nice and easy, starting with a quick look at what sets them apart. It's not just about finding a place to stash your data—it's about making that data work for you.

Quick Take: What Is the Difference Between a Data Lake, Data Warehouse, and Data Mart?

  • Data Lake: A vast storage pool for all types of data (structured, semi-structured, unstructured) in their native format. Ideal for flexibility and scalability.
  • Data Warehouse: A structured repository of filtered, processed data ready for analysis. Best for query-intensive reporting and data analytics.
  • Data Mart: A subset of a data warehouse, tailored for the specific needs of individual departments or business units.
Data lakes storage and flexibility

What is a Data Lake? The Ultimate Data Reservoir

Data Lake time! So, what is this exactly? Think of a data lake as a massive, digital storage pool where you can dump literally all kinds of data—structured, semi-structured, unstructured, you name it. It's like the Wild West of data storage; everything goes, from detailed customer information to social media posts.

Primary Purpose of a Data Lake

Imagine having a vast expanse where you can store every type of data your business encounters—emails, social media interactions, transaction records, and more—in their native format. That's the essence of a data lake. It's designed to be a catch-all, holding a wide variety of data types, both structured and unstructured, at scale. The beauty of a data lake lies in its flexibility and scalability, accommodating the explosive growth of data in today's digital world.

The primary purpose here is not just to store data but to keep it in its raw form until it's needed. This approach offers flexibility for data scientists and analysts, who can dive in to explore, experiment, and uncover new insights without the constraints of predefined schemas or structures. The intended audiences are more technical, and the intended use cases are more exploratory

Data Lake Architecture: Designed for Flexibility

The architecture of a data lake is fundamentally different from traditional data storage solutions. It's built on technologies that allow for the storage of vast amounts of data in various formats. This setup includes powerful metadata tagging capabilities, ensuring that despite the lake's vast size, you can quickly find and access the data you need.

A well-designed data lake supports multiple data ingestion methods, including batch processing and real-time streaming, making it incredibly versatile. Whether it's immediate insights from live data or deep analyses of historical data, the architecture of a data lake is all about enabling access to data in its most flexible form.

What is a Data Lake vs Data Warehouse? The Flexibility Factor

When we pit data lake vs data warehouse, the key difference is flexibility versus structure. Data lakes allow you to store all your data without worrying about organizing it upfront. This "store now, figure out how to use it later" approach is perfect for businesses that want to capture every piece of data but may not yet know how they'll analyze it.

Imagine you’re at a growing business, overflowing with data from customer interactions, sales, and social media. Here's where the choice gets real: opt for a data lake if you're still figuring out the gold mines in this data deluge. It’s like keeping all your childhood toys in a giant box—someday, you’ll find valuable ones worth revisiting. On the flip side, if you're a retailer with a clear need to analyze sales trends and customer behavior, a data warehouse offers the structured space you need, kind of like a well-organized closet where everything has its place, ready for analysis.

Data warehouses, in contrast, require data to be structured and organized before it can be stored. This means you need to have a clear understanding of how you plan to use the data, making data warehouses ideal for scenarios where the analysis needs are well-defined and consistent.

Data Mart vs Data Lake: Keeping Options Open

Comparing data lake vs data mart highlights the distinction between vast storage capabilities and targeted, department-specific insights. While data marts provide streamlined access to data for specific business functions, data lakes offer a broader canvas, inviting exploration and discovery across the entirety of an organization's data.

This open-ended approach of data lakes is particularly valuable in environments where innovation and flexibility are paramount. It allows businesses to adapt quickly to new data sources and types, fostering an agile data culture.

Enterprise Data Lakes: Scaling with Your Business

For businesses dealing with large-scale data challenges, enterprise data lakes offer a solution that grows with your needs. These platforms are designed to handle the complexity and volume of data typical for large organizations, providing robust, secure, and efficient data storage options.

Enterprise data lakes stand out by offering advanced features such as machine learning capabilities and sophisticated data governance tools, ensuring that as your data grows, your ability to manage and leverage it effectively grows too.

Data warehouse, fast and easy answers

What is a Data Warehouse? The Organized Library of Data

Think of a data warehouse as your super-organized, highly efficient digital library. It's where you keep all your structured data—sales records, customer interactions, transaction histories—neatly categorized and easy to find. The primary purpose here? To make retrieving and analyzing this data a breeze for reporting, decision-making, and getting those valuable insights.

What is the Primary Purpose of a Data Warehouse?

Imagine walking into a library where every book is meticulously organized, labeled, and easy to find. That's your data warehouse in the digital world. It's designed for structured data—things like numbers and texts in tables—that's been cleaned and processed for easy querying. Businesses use data warehouses to keep their historical data in one place, making it simpler to analyze trends, generate reports, and make informed decisions.

Data warehouses aren't just about storage; they're about speed and efficiency. They use a special kind of architecture that optimizes data retrieval, making it faster to access the information you need. This setup is perfect for businesses that rely on regular reporting and data analysis to guide their strategies.

Data Mart vs Data Warehouse: Diving Deeper

Here's where it gets a bit more nuanced. A data warehouse is the comprehensive collection of an organization's historical data, aimed at supporting decision-making across the board. Data marts, on the other hand, are like the specialized sections within this vast library, dedicated to specific business lines or departments.

What are the primary differences between a Data Warehouse and a Data Mart?

The difference between a data warehouse and a data mart can be likened to shopping at a superstore vs. a specialty shop. Data marts offer the convenience of having just the relevant data for a specific team's needs, making it easier and quicker for them to get insights without sifting through the entire data warehouse.

Difference Between Data Lake and Data Warehouse: Choosing Between the Two

In the context of data warehouse vs data lake, the main thing to remember is the type of data you're dealing with and the flexibility you need. Data warehouses excel with structured data and provide powerful insights through complex queries and analyses. They're your best bet when you know what questions you want to ask of your data.

Data lakes, with their ability to store unstructured data (like text, images, and videos), offer a broader playground for data exploration. They're ideal when you're collecting vast amounts of data in different formats and want to keep your options open for how you might use it in the future.

What is an Enterprise Data Warehouse? 

For larger organizations or those with particularly complex data needs, enterprise data warehouse solutions are the way to go. These systems are designed to handle vast volumes of data across different departments, ensuring data consistency and reliability. They can be crucial for businesses that depend on large-scale data analysis to inform their strategies, offering advanced features like data mining and predictive analytics.

Data mart offers tailored answers

What is a Data Mart? The Specialized Data Boutique

Moving on to data marts, these are the go-to for department-specific insights. They're like those boutique stores that specialize in one type of product, offering a curated selection that’s exactly what you’re looking for.

The Niche Focus of Data Marts

Data marts serve a specialized function, focusing on the specific needs of individual departments or business units within an organization. Whether it's the marketing team looking to analyze campaign performance or the finance department monitoring budget allocations, data marts provide a tailored view of the data that matters most to them.

This specialization means data marts can be optimized for faster queries and analyses, as they contain less data and are more closely aligned with the specific tools and applications used by their intended users. It's like having a dedicated workspace that's set up just the way you like it, with everything you need within arm's reach.

Data Mart Architecture: Streamlined for Insight

The architecture of a data mart is intentionally straightforward and efficient. By focusing on a smaller subset of data, data marts allow for quicker access and simpler data models. This setup supports rapid reporting and analysis, enabling departments to make agile, informed decisions.

Furthermore, data mart architecture often includes pre-calculated measures and aggregated data, which speeds up analysis even more. This design consideration ensures that users can access insights quickly, without the need for extensive data processing or manipulation.

Integrating Data Marts with Larger Data Strategies

Data marts play a crucial role in a broader data strategy, acting as accessible endpoints for complex data systems. They allow organizations to decentralize their data analysis efforts, enabling departments to operate independently while still aligning with the overall data strategy.

Integrating data marts with data lakes and data warehouses provides a balanced approach to data management, where flexibility and exploration in a data lake complement the structured and fast-access environment of data warehouses and data marts. This integrated approach ensures that organizations can cater to both broad data exploration initiatives and specific, targeted analysis needs.

Choosing a lake, warehouse, or mart

Choosing the Right Solution: Lake, Warehouse, or Mart?

Deciding between a data lake, data warehouse, and data mart can feel like standing at a crossroads. Each path leads to a different destination, suited for varying business needs and data strategies. Let's break down how to choose the right path for your data journey.

Understanding Your Data Needs

First things first, understanding the type of data you have and what you want to do with it is crucial. If your business generates a vast amount of both structured and unstructured data and you wish to keep all options open for analysis, a data lake might be your best bet. It's like having a giant canvas where you can later decide which part of the picture you want to paint.

On the other hand, if your data is primarily structured and you're focused on specific, query-intensive reporting and analytics, a data warehouse offers the structured environment you need. It’s perfect when you know exactly what questions you’re asking of your data.

For targeted insights relevant to specific departments or business functions, data marts provide that focused lens. They are the go-to when the need is for quick, easy access to data that supports department-specific decision-making.

Considering Scalability and Flexibility

Scalability is another key factor. Enterprise data lakes and data warehouse solutions are designed to scale with your business, handling increasing volumes of data without sacrificing performance. If you anticipate rapid growth or a significant expansion in the types of data you will collect, these solutions can provide the robust framework necessary to support that growth.

Flexibility, especially in data format and structure, leans heavily towards data lakes. They allow you to store data as is, without needing upfront structuring, offering flexibility for data scientists and analysts to explore data in its raw form.

Integration Capabilities

Think about how your chosen solution will integrate with existing systems and workflows. Enterprise data lakes anddata warehouse services, and data marts each offer different integration capabilities. A seamless integration means less disruption to existing processes and a smoother transition to using your new data storage solution.

Cost Considerations

Budget is always a factor. Initial setup and ongoing operational costs can vary widely between data lakes, data warehouses, and data marts. Consider not only the upfront investment but also the long-term value each solution brings to your business. Sometimes, the more cost-intensive option upfront can lead to greater savings and efficiencies down the line.

RELATED ARTICLE: How Much Do Data Analytics Service Cost?

Make It a Combo!

Some companies benefit from a combination of more than one of these. At Datateer, we have a data architecture that we use for all of our clients. 

Data Architecture flowchart

First, all data goes into what we call “raw” data, which is a lightweight data lake. For clients that need to explore data in its raw form, as it was when it left the operational system, this raw data in the data lake gives them a place to do so.

The data lake feeds the warehouse. Here we combine and transform data into a defined, curated structure. This is ideal for answering questions that come up repeatedly, e.g. “How much revenue did we have last month by region and product line?”

Data marts are specialized views tailored for narrower audiences. They are especially useful when a data warehouse grows larger, or when it has a lot of general information not as useful for answering questions narrower in scope. 

Quick Take: How Do You Choose Between a Data Lake and a Data Warehouse?

  • Assess Your Data Types: Data lakes are suited for a mixture of structured and unstructured data, while data warehouses are ideal for structured data.

  • Consider Your Analytical Needs: If uncertain about future analytics needs, opt for a data lake. For established analytical processes, choose a data warehouse.

  • Evaluate Flexibility vs. Structure: Data lakes offer flexibility without the need for data structuring. Data warehouses require structured data but provide faster, more efficient querying capabilities.

Summary: Empowering Your Data Strategy with Data Lakes, Data Warehouses, and Data Marts

Navigating the world of data lakes, data warehouses, and data marts can initially seem daunting. Yet, understanding these tools is essential in today’s data-driven landscape. Each serves a unique purpose, catering to different needs within an organization, and choosing the right one can significantly empower your data strategy.

Differences and use cases of data lake, data warehouse, and data mart
Chart comparing use cases, purposes, benefits of data lake vs data warehouse vs data mart

Data lakes offer flexibility and scalability, making them ideal for businesses that deal with a wide variety of data types and need the room to explore and innovate. Data warehouses bring structure and efficiency, perfect for those who need quick, reliable access to organized data for analysis and reporting. Meanwhile, data marts provide targeted insights, serving the specific needs of individual departments with precision.

The decision between a data lake vs data warehouse, or including a data mart, boils down to understanding your data needs, considering scalability, integration capabilities, and of course, budget. With the right approach, businesses can leverage these solutions to not only manage their data more effectively but also gain critical insights that drive strategic decisions.

RELATED ARTICLE: What is Managed Analytics? A Guide to Managed Analytics Services

Remember, it’s not just about storing data. It’s about unlocking its potential to inform, innovate, and guide your business to new heights. Whether you’re exploring enterprise data lakes, data warehouse solutions, or data marts, the key is to align your choice with your business objectives and data strategy.

Read More
Data Analytics Consultants How to Hire One
Business Intelligence, Data Analytics, data strategy

What Does a Data Analytics Consultant Do? How to Hire One

A data analytics consultant organizes and analyzes a business’s data to turn the data into an asset useful for making decisions, creating operational visibility, and answering questions

Data is one of the most valuable assets for any business. Understanding the role of a data analytics consultant is crucial for any organization looking to leverage data effectively. These professionals are central to transforming complex data into actionable insights, combining data engineering skills with analytical expertise.

Before going too much further, get the companion checklist to help apply what we cover in this article.

Free Checklist and Template Evaluate and Hire Data Consultants

Armed with the knowledge of what exactly a data and analytics consultant does and the skills they provide, we will then discuss how to find and hire the right analytics consultant

 

Data Analytics Consultants

What Does a Data Analytics Consultant Do?

A data analytics consultant serves as both a constructor of data frameworks and an interpreter of data insights. They possess a unique blend of technical skills in data engineering – such as building and maintaining data systems – and analytical prowess in extracting meaningful insights from complex datasets. 

Their role involves creating and managing the infrastructure required for data collection, processing, and storage. This includes designing data models, developing algorithms for data analysis, and creating visualizations to communicate findings clearly.

Data analytics consultants play a vital role in enabling businesses to make informed decisions. They provide the expertise needed to navigate data, ensuring an organization's data strategy aligns with its business objectives.

A data analytics consultant is not just an analyst but a comprehensive data expert. They are instrumental in building a data-driven culture within an organization, ensuring that data is not just available but also accessible and actionable for decision-making. Although some consultants specialize in specific skills, others strive to provide a blend of all necessary skills.

Data Analytics Consulting Specialties

The Diverse Specialties and Skills in Data Analytics Consulting

Data analytics consulting isn't a one-size-fits-all profession. It spans a wide range of specializations, each tailored to different aspects of business and data needs. Understanding these specializations is key when looking for the right consultant for your business.

Consider these four dimensions to understand how an analytics consultant might specialize and be a good fit for your needs.

Business Function or Industry

Analytics consultants that specialize in your specific need bring more than technical expertise to an engagement. Some even focus exclusively on specific industries or business functions. Some examples include: 

  1. Web and Digital Analytics Consultants: In the digital realm, these consultants analyze web traffic and user engagement to improve online presence and digital marketing strategies. They're crucial for businesses looking to optimize their online platforms.
  2. Financial Analytics Consultant: These make sense of accounting and financial data, and tie that data to operational data to help form a complete financial picture. 
  3. Product Analytics Consultant: Product leaders make use of data to inform product strategic decisions as well as optimize customer experience and improve key usage metrics. 
  4. Marketing Analytics Consultant: They specialize in analyzing marketing data to measure campaign effectiveness, understand consumer behavior, and optimize marketing strategies for better ROI.
  5. People Analytics Consultant (HR Analytics): These consultants apply data analysis to human resources, helping businesses optimize recruitment, track employee performance, and improve organizational culture.
  6. E-commerce Analytics Consultant: For businesses in the e-commerce space, these consultants analyze customer behavior, market trends, and sales data to enhance the online shopping experience and boost sales.

Client Size and Geography

Many analytics consultants focus on serving customers in their home city or state, or focus on clients of a certain size. Although they obviously lack the depth of expertise of consultants focused on an industry, this is not necessarily a bad thing. In fact, they may be used to serving organizations that aren’t yet mature around data analytics–which is most organizations. These types of specialists often bring best practices that work for companies of a certain size or in a specific geographic area. 

Technical Specializations

Many data analytics consultants specialize in a certain practice within the broader data analytics umbrella. This can be useful for unusual situations like large or complex data, when an organization grows in their needs to justify a team of data analytics experts. 

  1. Data Engineering: These individuals focus on getting data out of the source systems, organizing it, automating processes, and creating data models that are easy to use and perform well. 
  2. Data Analysis: Analyst consultants use data to fulfill business requirements–or even shaping requirements from ambiguous or general needs. They analyze data to understand it and make sense of it for reporting or answering questions.
  3. Predictive Analytics: Sometimes labeled “machine learning,” these analytics consultants use statistical programming libraries to extrapolate forward projections and predictions based on available data. 
  4. Artificial Intelligence: Human nature predicts that many people will begin labeling themselves as AI analytics consultants. And many businesses will get caught up in the hype–be sure AI is the specialty you need before pursuing a data analytics consultant specializing in this. An AI consultant will be able to apply AI tools and products but may be lacking in more foundational capabilities. 

H3 Product Focus

Another typical way to specialize is by product. This can be useful if your company already has invested in specific technology products. Data and analytics consultants that specialize in a product, tool, or framework will bring best practices built up from previous engagements. 

These products fall into three basic categories:

  1. Data Warehouse: This is a database designed for data analytics and the types of queries and operations needed. Examples include Snowflake, BigQuery, and Redshift. See our Data Warehouse Services.
  2. Reporting or Exploration: These tools are the “last mile” and typically are the only thing that end users see. These products visualize data, provide reports and dashboards, and provide various levels of exploration capabilities. Some examples include Sigma Computing, Tableau, Astrato, and Luzmo. There are dozens of these products on the market
  3. Data Ingestion: Data ingestion is extracting data from operational systems, APIs, databases, and other sources into a single location (the data warehouse) for analysis. Specialty tools and frameworks include Fivetran, Rivery, Portable, Meltano, and Integrate. See our Data Integration & Extraction (ETL, ELT) Platform feature.
Data Analytics How to Hire and Evaluate

How to Evaluate and Hire a Data Analytics Consultant

Now that you have an understanding of the landscape of analytics consultants, let’s look at how you can evaluate them to select the right one. Then we’ll describe the typical process of an engagement. 

H3 Evaluating a Data Analytics Consultant

Here is a checklist you can use to ensure you thoroughly evaluate any data analytics consultant to ensure a good fit.

  1. Define Your Business Needs: Before you start looking for a consultant, have a clear understanding of what you need. Don’t overcomplicate this, but do write it down so you consistently communicate it. Are you looking for insights into customer behavior, improving operational efficiency, or predictive analytics for future planning? Knowing your objectives will guide you in finding a consultant with the right expertise.
  2. Look for Relevant Experience and Specialization: Use the framework from the section above to decide whether any specializations are important to you. Check their past projects and client testimonials to gauge their expertise and success in their professed specialties.
  3. Assess Technical and Analytical Skills: Ensure the consultant has a strong foundation in data engineering and analytical skills. This can be very difficult because you are hiring expertise you do not have. Some consultants can provide examples of prior work or portfolios. Sometimes a third-party consultant will be willing to perform a technical assessment on your behalf. And often product vendors know who is who in their network of consulting partners.
  4. Consider Communication and Problem-Solving Abilities: A good consultant should not only be technically proficient but also able to communicate complex data insights in a clear and understandable manner. Look for someone who is a good listener, can ask insightful questions, and is adept at solving complex problems.
  5. Discuss and Understand Their Methodology: Each consultant may have a different approach to data analytics. In truth, many do not have one at all–watch out for this. If they assume they will just follow whatever process you typically use, that is a red flag. Discuss their methodology to ensure it aligns with your expectations and business goals. Understanding their process will help you gauge how they will handle your data and the insights they will provide.
  6. Review Their Portfolio and Case Studies: A consultant's portfolio and case studies can provide valuable insights into their work style and the kind of results they deliver. Look for case studies or examples that are similar to your business situation.
  7. Set Clear Expectations and Deliverables: Be clear about what you expect in terms of deliverables, timelines, and communication. This will help set a clear path for the consultancy and avoid any misunderstandings later on.
  8. Discuss Costs and ROI: This is appropriate and expected to be a part of the first conversation. They should be willing to give you a “ballpark” idea of what drives cost and how you should start to plan. Understand their fee structure and discuss the expected return on investment. 
  9. Plan for Long-term Engagement: Consider how the consultant can be a part of your long-term data strategy. Data analytics is not a one-time activity but an ongoing process, and having a reliable consultant can be a valuable asset for your business’s growth.

Hiring a Data Analytics Consultant the Right Way

Once you’ve identified the right consultant, engaging with them is typically straightforward but has a few things not to overlook.

Understand exactly what drives cost and how you will be charged. This is often an hourly rate, but not always. Some are deliverable-based or project-based. With hourly rates, ensure you understand how hours will be reported and tracked against milestones and deliverables. 

Datateer offers a Managed Analytics service with pricing that scales up and down by data asset under management.

Understand their information security policy, especially where your data will reside, who will have access to it, and what the data analytics consultant is allowed to do with your data. Don’t assume anything here, and make sure to get it in writing. See Datateer’s Information Security Policy as an example (you are welcome to reference this or use it as a boilerplate)

Understand ownership of deliverables–and data. Understand what happens if the data analytics consultant underperforms or does not deliver. This is often not nefarious but happens more than most in the industry care to admit. Data is complex, and it often happens that the fees start adding up faster than the deliverables arrive. (If you’d like to see Datateer’s Master Services Agreement or Subcontractor Agreement, reach out and we can share).

With a clear master agreement in hand, your analytics consultant can create a 1-page Statement of Work (“SOW”) that defines the deliverables and price. Referencing the master agreement, the SOW can stay short and sweet, but still be legally strong. 

Establish communication and reporting processes, and a way to have touchpoint meetings where you adjust the engagement parameters. With these, everyone knows how to communicate about things that aren’t working and need adjustment. 

Conclusion

Selecting the right data analytics consultant is a strategic step in leveraging your business data effectively. These experts bring a blend of data engineering and analytical skills, essential for transforming data into actionable insights. The key lies in identifying a consultant whose expertise aligns with your specific business needs and goals.

Free Checklist and Template Evaluate and Hire Data Consultants

In your search, focus on their technical proficiency, industry experience, and problem-solving approach. A consultant’s ability to clearly communicate complex data insights is as crucial as their technical skills. Remember, a successful engagement involves not just the right skill set but also a strong alignment with your business's values and objectives.

Ultimately, the right data analytics consultant can be a valuable partner, propelling your business toward data-driven decision-making and growth. Make this choice thoughtfully, and you’ll set your business on a path to harnessing the full power of your data.


Read More
Database to Dataset
data strategy

Discovering the Analyst Experience and Impact on Data Democratization


This article is part of our series Selecting the Right Visualization Tool with Confidence. Other articles in the series include:


Data democratization is definitely a buzzword. Like digital transformation and moving to the cloud, it is ambiguous and can mean different things to different companies. But like many buzzwords, it contains a nugget of truth. Data democratization is truly a shift. When choosing a cloud business intelligence product, it is important that the vendor’s perspective aligns with your own. Otherwise you will end up with that familiar situation that the product does not quite “fit” with your organization.

What is data democratization?

Some years ago, I was part of a successful data warehouse initiative at a large financial services institution. Overall the project had gone very smoothly, and we were able to produce analytics that did a good job answering questions for the business. However, I became aware of an interesting pattern. We began to receive more and more requests to produce answers to questions, rather than the business users answering their own questions. Folks were willing to wait in the queue for days rather than attempt to answer their own questions with the tooling provided. It was just easier for them to ask the data team. 

This experience illustrates the essence of data democratization. Although we had produced a good data warehouse, we had failed to create tooling that enabled everyone to participate in answering their own questions. Here is a deeper explanation from Forbes.

The rise of the Data Analyst

No matter how many slick marketing campaigns tell you their product makes data easy, working with data is hard. Quanthub describes the skyrocketing demand for data analysts, happening because they are the bridge between data and knowledge.

(P.S. The term “data science” is all the rage, and you may be tempted to go hire a data scientist. But data analysis is where most answers come from and is much more generally applicable).

The ideal state is that anyone in your business can answer their own questions. However, the current reality is that many users are not comfortable working with data and operate more like consumers of reports and analyses. But there is a major shift happening right now: more people can function as analysts, one reason being that new tools are providing new capabilities to support this.

Here are fundamental roles on a modern data team, to highlight the change in the data analyst role:

  • Data Architect. Defines the overall system and how all the pieces fit together. Defines the data model in the warehouse
  • Data Engineer. Writes code to move data from sources to warehouse, combine data from multiple sources, transform data so it fits into the warehouse model. 
  • Data Operations. Monitors and maintains the live system
  • Data Analyst. The connection between the business and the data. Analyzes the data to produce answers to business questions
  • Data Scientist. Applies statistical analysis on data to test hypotheses and create predictive models. 

The major shift in the data analyst role is they are no longer part of the data team. And this has nothing to do with a title or formal responsibility. More often, they are part of business departments, and they happen to have an interest and ability to understand the data. You may have encountered the person on the marketing team, for example, that is a spreadsheet maven and always seems to have a spreadsheet available. Or the person on the finance team who produces charts and graphs in presentations that show just how the company is doing. 

Many companies are embracing this analyst-first reality and looking for tools that can support this mode of operating. 

Buzzwords versus reality

The buzzword folks promote a vision where everyone is working with data. The reality is that the analysts have become the true workhorses, and are the bridge to everyone in your organization successfully using data.

Adopting tools and processes that fit this reality will be of most benefit to your company. From our research, many vendors do recognize this reality and take varying approaches to supporting the data analyst.

Various approaches to the analyst experience

One benefit to a crowded BI market is that it forces innovation. Below are some various approaches to the analyst experience.

Separate the query from analysis

A downfall of earlier tools (and I am only talking a few years ago, not to mention decades ago) is they assumed all users knew SQL. Most analysts come from a background using spreadsheets to analyze data. Thus, converting the data warehouse tables into datasets that feel more like spreadsheets is going to enable more people to use the data. 

Sigma Computing is a company that has embraced this concept. Not only do they separate the data set generation from the analysis, they are all in on spreadsheet analysis. In evaluating their product, I found this intuitive and right in line with an analyst-first approach.

Holistics also pushes the approach of providing modeling separate from analysis. This allows the more technical data team to define datasets, opening up the analysis to a broader group of people. 

Panintelligence is an embed-first tool that follows a similar approach of defining a model that it then uses to drive GUI-based creation of visualizations.

Each of these uses the information in the model to generate queries that leverage the data warehouse infrastructure. Some tools actually process the queries on the BI tool’s infrastructure, which can be a data compliance risk. Be sure to ask.

Query builders

Allowing people who do not know SQL an ability to generate queries (rather than write them directly) has been a long time coming. The idea is not new, but only in the last few years have vendors been able to do it well. Like bumpers in bowling, this approach allows more people to use the database without ending up in the gutter. 

Trevor.io leads with their query builder functionality, and they say this is where most of their users spend their time.

Chartio had really begun pushing this with their Visual SQL feature, before their acquisition.

Although this is a great approach and an improvement over requiring all users to know SQL, there is often a tradeoff. The easier a tool makes this for the user, the less flexible it becomes in the types of queries and analyses that are available. When evaluating tools that take this approach, be sure to understand how well they support direct SQL access, and what the tipping point is where getting into SQL is necessary.

Direct SQL

This is an earlier approach that requires analysts to know SQL to make queries to the database. This is still powerful, especially if the analysts in your organization do know SQL well. This approach allows for a lighter-weight tool that can be quicker to deploy and use. It does, however, come with another tradeoff in that metrics and analyses will become inconsistent over time.

Metabase is a tool we have enjoyed using. They do have some query builder type of features, but we found that we quickly jumped into SQL in most cases.

Redash is very lightweight and essentially just a visualization tool on top of a SQL database. For someone who knows SQL, Redash is easy to get into and start using.

We were a customer of Mode before moving to Chartio. Because Mode takes a direct SQL approach, it is well suited for larger data teams that need good collaboration among each other. But it is intimidating for analysts with a less technical background. 

Preset actually takes the direct SQL to its logical conclusion, with SQL in everything–queries, metrics, filters, formatters, etc. It is very powerful, but obviously requires some SQL expertise. 

Summary

When evaluating business intelligence tools, recognizing the approach they take to the analyst experience is crucial. No matter how good your data pipelines, data warehouse, and other pieces of your platform are, the analyst is going to make your data initiative successful. This article discussed how to understand the data analyst in the overall process, as well as various approaches vendors take to providing a good analyst experience.

Read More
Where Is Your Data
data strategy, data visualization

Data Security and Compliance in Cloud-Native Business Intelligence


This article is part of our series Selecting the Right Visualization Tool with Confidence. Other articles in the series include:


The risk of data breaches is huge, and is one of the main reasons companies are slow to adopt cloud computing. Facebook is in the news today for over half a billion profiles being leaked, including personal information! Google had to pay over $55,000,000 dollars in GDPR fines in 2020. 

If the big companies cannot get it right, why take the risk at all? Companies that do not become a data-driven business lose out to the ones that do. There is no sustainable alternative. Forrester research shows that organizations with data-driven insights are 140% more likely to create sustainable competitive advantage, and take tremendous market share from traditional organizations. 

In this article, my goal is to give you an overall understanding of what you should pay attention to and how to mitigate the risks. I am no lawyer, so you should make decisions based on advice from qualified legal, accounting, and security experts. 

Why use a cloud business intelligence tool?

This gets into a larger strategic question of digital transformation and whether to use cloud computing or infrastructure at all. In spite of all the benefits of using cloud infrastructure and SaaS tools, such as lower total cost of ownership (TCO), agility, and scalability, perception of higher security risk has been an impediment to cloud adoption. 

Ultimately, each organization has to make this decision on their own. Strategy in any risk-reward decision is greatly affected by how to mitigate or minimize the risks involved. Some organizations take a risk avoidance approach instead–which in my opinion outweighs the risk.

When it comes to using a cloud-based business intelligence tool, you will see much shorter onboarding time, and your maintenance costs are zero. So time to value is shorter, and if the tool’s pricing is in line with the market, TCO will be lower.

How is my data safe in an online system?

Data can be just as safe, if not safer, in an online tool than in a tool you manage internally. Seriously. In today’s information worker world, people connect into your network from home, coffee shops, mobile phones, etc. They are likely already connecting to your internal systems over a VPN connection, and to several SaaS products such as Salesforce or Jira. 

The top reason for data breaches is old, unpatched systems. When the security community identifies vulnerabilities in operating systems or network devices, they share these vulnerabilities in lists known as Common Vulnerabilities and Exposures (CVEs). This allows everyone to act quickly and with maximum information to resolve the vulnerabilities.

However, there is a catch. IT departments must actively manage servers, operating systems, and networks to apply these updates, so that they are no longer vulnerable. Although every IT department claims they are following best practices, the number of security breaches due to unpatched systems objectively states otherwise. Now, compare that to a product company, where every bit of their livelihood depends on keeping their SaaS tool patched and up to date–the incentives and human nature state are in their favor compared to your company’s internal systems

The second reason for data breaches is social engineering–tricking people into using weak passwords, sharing too much information, or providing an opening. Recent years have seen an 9-fold increase in these types of attacks, because of how easy and effective they are. These can be simple or complicated, but the methods attackers use often take bits of information from various sources to triangulate on a successful attack. Again, everyone assumes they would not be taken in, but the security research says otherwise. Using a cloud tool vs an on-premises tool does not impact this risk one way or the other.

You will likely run into more arguments against using an online business intelligence tool. With each, play devil’s advocate until you get past the platitudes and really understand how much of a risk each one might be.

How to evaluate a cloud BI tool’s security

Where is your data?

This is critical. How much of your data goes to the SaaS vendor’s servers? Two methods exist, with one carrying more risk than the other. In the first method, your raw data is brought into the vendors’ servers, where it is transformed or modeled into an analytical data model. In the second method, only aggregated data is brought into the vendors’ servers. The second is much lower risk, and most newer vendors take this approach.

This is a foundational architecture decision by the vendor, and not one they will be able to change for you. Often a vendor will tout this as an important feature–bring your data into our servers, and we can provide valuable data modeling to make your team more efficient (etc, etc). However, in the modern cloud data architecture, this is not a must-have feature. Tools like dbt are much more suited for this transformation, allowing your BI tool to focus on presenting the data, not transforming it.

A second question to ask about the location of your data is where the BI vendor’s servers are located. Most will be running on infrastructure provided by one of the big three cloud infrastructure providers (AWS, Azure, GCP), but not always. Each of these providers has regions globally. Depending on your industry, you may be required to guard against your data being “exported,” simply meaning that it cannot be transmitted or stored outside of your country. This leads to a line of questioning with the BI tool vendor about where their servers are located, and how they protect against data accidentally flowing through networks it should not.

Areas of exposure

When using a SaaS BI product, three technical vectors are the main consideration. When evaluating a tool, focusing on these three areas of security will be of most benefit: 

  1. The HTTP connection. This is the network opening that allows the user to connect from the browser. Do they use TLS/SSL for all connections? 
  2. The database connection. This is the network opening that allows the SaaS tool to connect to your data warehouse (which also could be on the cloud, or might be an on-premises database)
  3. Embedding. In an embedded visualization situation, this allows your application to embed dashboards from the vendor tool, and is important to review

Information security policies

You should ask to review these, and have a technology architect or security expert familiar with our business to call out any potential issues. Some of these will be technical in nature–encryption at rest, encryption in motion, etc. But the real focus of these is on people and policy. Things like password requirements; approvals; audits and reviews; and procedures and communications in case of breach. 

Mitigating and Minimizing Risk

Attestations and risk levels

Depending on your industry, you will have various regulations around data security. Some of the more widely recognized are HIPAA and GDPR. By using vendors who have participated in attestations or audits, you defer to experts and push much of the cost of risk mitigation onto the service provider. Here is how it works: consider a situation where a cloud vendor wants to process Personally Identifiable Information (PII) on behalf of its clients. Regulations state that these procedures must be audited. If each of the vendor’s clients must pass through an audit, that could mean hundreds of audits on the vendor, and each client must pay for their own audit. An attestation allows for the vendor to be audited a single time, and the auditor provides an attestation to each of the clients. This is efficient, cost effective, and is the standard in the audit world. 

SOC-2 is widely recognized as the standard for SaaS vendors. Developed by the AICPA, it is a robust framework that ensures a minimum level of compliance around data security controls. You should ask about this, as well as any industry-specific regulations and attestations that the vendor may have in place. Most of the companies we work with are not large enterprises, so working with one of the Big 4 accounting firms does not make sense for them. Linford & Co is the leading provider of attestations for SaaS companies and is a group I trust with these kinds of needs.

An up and coming risk mitigation strategy is to automatically monitor risk exposure of your vendors. The risk network curated by Cyber GRX. Not every vendor you are considering will be a part of this network. But if they are, using the information to reduce your third-party risk is an easy way to gain more comfort in using a particular vendor.

Alternatives to naive cloud architecture

In some cases, a hybrid approach between on-premises versus cloud is available. This is especially applicable in systems with a lot of moving components, such as a data platform. Datateer is designed for the security-conscious customer, with high levels of segregation to ensure there is no “cross-pollination” of data, and that data never leaves your control. This is not the mainstream approach, which is to have your data flow onto vendors’ servers. This makes things much easier for the vendor to process, but increases risk substantially.

One of the pillars in our stack is Prefect, which has pioneered this hybrid approach. This approach is more difficult for cloud-based BI tools to achieve. But as mentioned earlier in this article, if they have designed for it, they can prevent your raw data from flowing anywhere unnecessarily.

Part of this hybrid approach could mean hosting your own business intelligence tool on cloud infrastructure. This will guarantee that none of your data flows onto a cloud vendor’s systems, but it is quite a bit more maintenance. And it exposes you to the problem mentioned earlier of old, unpatched security vulnerabilities. Surprisingly, few options exist in this vein. Superset is a young but great option for internal analytics. And if you get going and realize managing your own solution is too much to take on, the project creators provide a commercially hosted option at Preset.

Insurance

Should be a no-brainer, but often this is overlooked. You and the vendor you choose should both have a Cyber Liability Policy including Data Breach Coverage. Regardless of whether you use a cloud vendor or on-premises solution, regardless of how good the information security policies and attestations make things seem, breaches are likely to occur. It is almost common knowledge that data breaches are a “not if, but when” situation. That doesn’t absolve all of us from our due diligence, but it certainly calls for protecting against the situation.

Summary

The benefits of using cloud-based business intelligence tools outweigh the risks. With a focus on mitigating and minimizing the risk, you can enjoy those benefits while protecting your business from the downsides.

In this article, we talked about key risks to be aware of and ways to evaluate BI tools in light of those risks. We also discussed ways to mitigate and minimize the risk of trusting a third party with your data.

Many vendors pay attention to all this and can help you understand the security posture of their products. You can also take advantage of Datateer’s free strategy sessions to talk through these risks and help make decisions.

Ultimately, the benefits will outweigh the risks for most, including you!

Read More
Visual Experience
data strategy, data visualization

Selecting the Right Visualization Tool with Confidence


This article is part of our series Selecting the Right Visualization Tool with Confidence. Other articles in the series include:


Evaluating business intelligence tools is exhausting.

First of all, there are a ton of them on the market. The high-profile acquisitions of Tableau, Looker, Chartio, and Qlik must be inspiring to entrepreneurs who want to have similar exits. The field is crowded. Even after years of discovering vendors and evaluating their products, we still discover new products regularly. 

Producing a basic tool–a UI on top of a visualization library–must be fairly easy, proven by how many are on the market right now. But building a solid product and company requires more than that. And so many nuances exist that can stop a report developer or designer in their tracks, or cause workarounds. How could you ever cover all those situations in your evaluation?

Every vendor’s message starts to sound the same, and is really some variation of how easy their tool will make the whole effort. Although 80% of the effort in data analytics efforts goes to data engineering and modeling, some BI tool vendors will tell you all of that can somehow magically go away.

From my experience, the BI tool is the tip of the spear or the top of the iceberg. It provides the visual culmination of all the work and analysis a company has put into their data platform–the presentation layer intended to produce something clean and useful. 

Really, more attention should be paid to getting the data, analysis, and metrics right. However, because the charts and visuals are what most users will interact with, the BI tool you choose is critical.

This article is the first in a series that will share our direct experience, the experience of our customers, and contributions from the community in the Chartio migration research project (here and here).

Below we lay out the overall approach we have developed to compare apples to apples. And in the coming weeks we will have articles diving even deeper into aspects of a BI tool evaluation we have found to be important:

  • Data Security & Compliance
  • Visualization Capabilities & Dashboarding
  • Self-Service Analysis and “Data Democratization”
  • The Support Experience
  • End User Experience (including Embedding)
  • Pricing and Budgeting
  • Performance
  • The Intangibles
  • Miscellaneous and Doing Too Much

A visual scoring system

In a crowded market you need some sort of way to compare things objectively. If you have ever had the experience of buying or renting a place to live, you have probably experienced house hunting fatigue. You look at so many homes, they all start to blur together. You can’t remember whether the kitchen you liked was a part of the first home you saw, or the second. Did you really like that 2nd house, or are you just tired when you see the 8th and want to make a decision?

That is the feeling we had when trying to make sure we were evaluating all the options. 

To combat this many people make a list of features or other aspects of their evaluation, and then create a scoring or rating system. We started that way, too. But numbers in a spreadsheet only go so far. 

Another thing we learned was this decision was not a one-time event. Each time we learned something new about any given tool, we found ourselves revisiting the spreadsheet to re-evaluate. We needed something that we could return to time and again, without wasting time rehashing things already discussed or re-orienting to a bunch of numbers.

We are evaluating a visual tool, so why not do something visual? By giving each feature two ratings–importance and score–we were able to create a simple visual experience. 

Here is a sample.

At a glance, we can:

  • see the overall value of one tool against a field of competitors. 
  • focus in on a single feature and see how each tool compares. 
  • immediately identify any missing critical features or holes in our analysis. 

Check out our Product Evaluation Matrix. Feel free to make a copy to get your own analysis started.

Not everything can be critical

Regardless of how you score or evaluate, each vendor is trying with all their might to be different from the others. So we end up with many features and variations of features, many of which are appealing.

But not everything can be of critical importance to your company. In a perfect world, you could enumerate the features you want, and someone would give you an order form and a price tag, and you are off to the races. But in an imperfect world like ours, you have to choose a tool that most closely matches your needs and desires.

It is the tendency of all us humans to overdo it and assume too many things are critical. At the extreme, if everything is equally weighted in your decision, you will end up with the most average product on the market, rather than the one that fits you best.

Recognize emotion and play the game

Buying decisions are emotional. Most of the decision is subconscious. According to the best research we have, emotion is what really drives the purchasing behaviors, and also, decision making in general. Experienced salespeople also know this, hence the saying, “sell the sizzle, not the steak.”

95% of thought, emotion, and learning occur in the unconscious mind–that is, without our awareness

Gerald Zaltman, How Customers Think

We use objectivity and logic to talk ourselves into things. Recognize this about yourself and the dynamics of the evaluation and decision process. 

Immediately after a good demo, you will have an affinity to the product that might stick. This is why every product company is willing to spend a lot of time demonstrating their product. Or, if the sales engineer had a bad day, it might cause you to see the product in less of a light.

To solve this problem:

  • Understand the sales process, and roll with it: first they qualify your company, budget, and timing; they push a demo; they provide statistics or other sales collateral based on your specific concerns; they give a free trial; they push for a decision; you negotiate terms and sign the 12 or 36 month contract. If you aren’t ready to move to the next stage, tell them. If they understand where they stand in your evaluation, often they will offer things to help you move forward, such as a longer trial period.
  • Eliminate early. If you limit your critical criteria and push those items in the early-stage calls, you can avoid spending time in product demos or trials for products that do not fit.
  • Pace yourself. Recognize that your deadline should drive the number of products that progress to your later stages. Give yourself time to evaluate, but set a deadline to force decision and action.
  • Use the evaluation spreadsheet. The visualization is intended to jar us visually–green good, red bad–and wake us up if we are being sucked in emotionally to a tool that objectively doesn’t hold up.
  • Review regularly. Have a regular review of your criteria and ratings with a decision committee. At least involve a trusted advisor or just a second set of eyes so that you can see things clearly and objectively. 

What happens if you make a bad decision?

I highlight this point specifically because we made the wrong choice prior to discovering Chartio, and had to live with the consequences. We also found that a completely exhaustive review of every single aspect of every single product that might just be the right one is totally impossible. Maybe you have that kind of time, and if so then use it. But if you are like us, you need to make a decision on somewhat incomplete information.

To mitigate the risk of a bad decision:

  • Extended trials. If you have the time to invest in deeper proof-of-concept exercises, many vendors will extend their trial period if they know they are in the running in a small field of competitors. Remember the sales stages and play the game.
  • Avoid doing too much in the tool. It is easy to forget that these feature-rich tools should be focused on presenting data. Doing too much data modeling or manipulation in the BI tool creates lock-in. Keeping these other responsibilities in your data warehouse prevents you from becoming overly dependent on the presentation tool. Focus on your critical criteria, and be selective about what makes something critical.
  • Remember the implementation cost. Costs of standing up a new BI tool are usually about the same as the cost of the tool itself. So plan for 2x whatever the price tag before you will see any return on investment from the tool.
  • Have backup plans. If the tool does not deliver as advertised, what then? Often in business there are less optimal ways to accomplish something that you can use as a plan B. For example, we often embed custom visualizations. Our plan B was modifying our custom visualization server to overcome some limitations of a product we chose. Our costs increased, but it wasn’t the end of the world.
Read More
Monetize Data
data strategy

Three ways to monetize data — but you only need one

The article by Barbara Wixom and Jeanne Ross was important to my learning about data strategy. They identified the three ways to monetize data.

When someone realizes they are sitting on data that could be an asset, it is like an explosion of ideas and possibilities. I love seeing an executive go through this transformation of thought. They see literally dozens of opportunities to create value using the data their company owns or has access to use.

The three ways to monetize data are:

Sell it

If you are gathering a lot of data specific to your niche, there are probably people that want like to analyze it. The stock exchanges are a recognizable example. They have transactional information about buys, sells, options, etc. A single transaction is quite boring, but in aggregate they become powerful. This is proven by just how commonplace reporting on financial news is, and how ubiquitous stock price charts are.

Your niche is not as globally interesting as that, but it is probably even more interesting to your customers, partners, and industry analysts.

This is a natural place for branstorming to go, and an idea that comes up almost every time. It seems so easy on the surface. Unfortunately, it is the least accessible. It requires a completely different business model than what you are already in–including different buyers, different sales processes, and different support and delivery mechanisms.

Internal efficiency

Managers who have accurate metrics of their operations and a convenient way to evaluate what is actually happening on the ground make much better decisions than otherwise. They know what should be happening, and having data to indicate what is actually happening empowers them to effectively manage. Alternatively, they manage to the squeaky wheel, by what they can directly observe, or by gut feel.

It’s not hard to see how this could make things more efficient and reduce waste and costs. This is also the most familiar territory, and business intelligence and its host of products and vendors have made this commonplace.

This is a great place to start. The downside is that it isn’t directly monetizing data. It’s ROI is usually in incremental improvements to processes already in place. So there is a limit to the cost savings you could expect to see–only so much juice you can squeeze from that orange.

Customer-facing analytics

To me, this is the most exciting and most accessible way to monetize data. Companies that get this right have reported being able to close more deals, grow key accounts, and improve retention.

Customer-facing analytics can come in many forms. Providing a dashboard within your web app or web site is nice and straightforward. This can inform the customer about the use of your services, or even benchmark their use against the rest of your customers in aggregate.

You could produce simple reports for sales teams that enable them to have more meaningful conversations with their buyers and even get access to strategic roles in their target accounts.

One thing I caution against is immediately charging a fee for your first foray into customer-facing analytics. Although this is easy to stick in a spreadsheet and show how much money you’ll make, customers rarely respond by pulling out their credit card for a feature they didn’t ask for. A better strategy is to differentiate your core offerings and use that differentiation in the sales process.

I like that these three categories cover pretty much any monetization strategy you can come up with. You know your customers and you know your market. Any of these monetization strategies have been proven to create value. If you want to impact top-line revenue, start by adding customer-facing analytics into your core product. If you want to focus more on efficiency, start with internal analytics.

Read More