dbt Semantic Layer - Implementation
Introduction Welcome back to the dbt Semantic Layer series! This article is a continuation of our previous article titled “dbt Semantic Layer - what…
Read moreYou could talk about what makes companies data-driven for hours. Fortunately, as a single picture is worth a thousand words, we can also use an analogy. And a useful analogy in this case is that of a cyborg.
A cyborg - a fictional or hypothetical person whose physical abilities are extended beyond normal human limitations by mechanical elements built into the body.
source: Oxford Languages
A cyborg uses technology to gain capabilities outreaching human limitations. Modern companies use digital processes with the same goal in mind, so similar thinking can be applied here.
Naturally, people and computers are best suited to performing different types of activities. We are good at applying common sense, strategic thinking and creativity. But we are no match for computers when it comes to executing multiple repetitive tasks, handling large amounts of data and processing thousands of operations per second.
One way of thinking about data-driven companies is that they efficiently use technology, by which we mean measurement and data to enhance human capabilities. For example, instead of running marketing campaigns based on gut feeling, they collect unbiased data about historical promotions, measure their impact on sales results and can make decisions with predictable results.
Making sure that technology is used where it makes the most sense, allows these companies to:
These improvements allow companies to be more efficient, having a positive effect on reducing costs and increasing revenues. But achieving this state requires at least a few prerequisites.
If you are thinking about making your company more data-driven, these four enablers will help you along the way:
Because building more data-driven processes is impossible without a plan, we will start by explaining how to build the roadmap of data initiatives.
Before you jump into the process of implementing tools, delivering use cases and upskilling your employees, it’s usually a good idea to take a step back and assess your situation. The goal of this step is to understand where your company currently is and where you want to take it. Such an exercise helps you to identify the biggest data opportunities and areas for improvement. Armed with this knowledge, you are in the best position to define your goals for the following months that are both ambitious and realistic to achieve.
To define a roadmap you can follow this three step process:
We have described the end to end process of building a data roadmap in the following blog posts: “Data-driven fast-track”, “Is my company data-driven?”. You can also watch a video record from the webinar we have organized “Data-driven fast-track: introduction to data-drivennes”.
The first enabler helps you to make sure that you have a plan for your data-driven journey. The second one will give you a strong boost throughout the entire process.
Thanks to rich cloud offerings (e.g. GCP, AWS, Azure), and open source technologies, advanced data solutions are more available than ever. Today it is possible to shift from no or little data infrastructure to a complete, modern data stack in just a few months (see: “How we built a Modern Data Platform in 4 months”). A modern data stack will help you address all of your core data needs (from ingestion to reporting and machine learning). It will also increase your capabilities to use data for making most decisions in your company. This is made possible through three key pillars of the modern data platform concept:
One of the key pillars of modern data infrastructure is creating the least friction between your users and the data through simplicity. That’s why modern data platforms embrace the role of SQL. SQL is probably the only common tool across all the data users (engineers, analysts, data scientists, and even some business users), that is also relatively easy to learn.
The spirit of empowerment with technology allows for reimagining the role of analysts and creating analytics engineers. They are enabled to work autonomously (without the support of data engineers) not only on the reporting, but on most data tasks, from ingestion all the way up. Having a team of self-sufficient analytics engineers, relying primarily on SQL and other low-code tools, helps to:
Despite relying on an interface that is simple for the users, modern data technologies don't lag behind on imposing the best engineering practices in your data teams. With tools like dbt and great_expectations
, minimum effort is required for analytics engineers to take care of testing the data, building the documentation and using version control. Finally, modern tools also allow data governance to become a first class citizen, instead of an afterthought, by exposing pragmatic ways of addressing key challenges in this area.
Companies that are further down the data-driven path will also benefit from tools designed to efficiently prototype and deploy Machine Learning models. Through modern data technology, they get access to a flexible experimentation environment. Deployment can also turn from a separate, significant effort for every model, into a repeatable, seamless experience.
Finally, modern data technology brings the data closer to its users than ever. It's not only that analytics engineers can be more self-sufficient, but also that business users can gain the capability to more freely explore data on their own. Tools like Looker, or Looker Studio (formerly known as Google Data Studio) embrace the concept of self-service. They allow non-technical users to interact with the data through an intuitive, drag and drop interface. This fosters data democratization throughout the company by delivering the data to the fingertips of the end users, making it readily available.
Technology can be a game changer for all data processes, noticeable for all of your data users (from engineers to business stakeholders). But even the best tools by themselves are not enough to deliver business value.
To notice tangible results from your investment, you need to use data to solve relevant business problems. Building a list of analytical use cases can help you to narrow your focus and enable prioritization for delivery. When we refer to analytical use cases, we have in mind particularly data initiatives focused on delivering business value.
One example of such an initiative, especially relevant for these current times of high uncertainty, can be a churn model. Acquiring new customers is usually more expensive than keeping existing ones. Predicting which customers may want to stop using your products allows you to proactively encourage them to stay (e.g. through special offers and discounts). Such an initiative can help you protect the existing revenue stream.
Another example, this time oriented at increasing sales, would be using data to build a comprehensive understanding of your customers. Having a rich dataset of your customers can be used in multiple ways, like identifying segments of clients with similar interests to provide more relevant recommendations of the products they might be interested in.
When developing a roadmap, it may seem difficult to know where to start. Fortunately, there is a systematic way of figuring this out. It’s inspired by how movie studios test different ideas for stories, without the necessity of creating an entire movie to do so (which would be extremely expensive). It’s by using the notion of divergent and convergent thinking, that was later brought to business through design thinking.
In the phase of divergent thinking you try to come up with as many ideas for use cases as possible. You write down all of them, even if they seem a bit crazy. At this stage you can simply ask your team members about their ideas, or boost their creativity by organizing workshops.
During these workshops you can help participants to look at your company from different perspectives, by sketching flows of your key processes, or trying to build a “pseudo” mathematical formula to break down your revenue. By zooming into the individual components of your company, you will help your team to look at the problem from different angles and generate a plethora of ideas. Eventually you come to the point where a new problem arises. Let’s say that you have written down fifty ideas. How do you select the best of them?
One observation that can help you is that with so many ideas, it doesn’t really matter if a given idea is 49th, or 50th in terms of attractiveness. You can proceed with the first phase of prioritization by grouping your ideas into three categories:
Now that you know which ideas are the most promising, you can prepare a more detailed evaluation for them, by looking at the criteria, such as: expected impact, effort, risk and time required for delivery. This will help you select the use cases with the best trade off of value for risk & resources.
Ideas from the high-risk, high-reward category require a different approach. Because there is a lot of uncertainty around them, you want to reduce the associated risk. You can achieve this by building simple, end to end prototypes. In the best case scenario, this will allow you to verify the viability of a given solution. If not, you will at least learn something more about the problem.
The last thing to take into account is that the whole effort would be in vain if your users couldn’t use or understand your products. The next enabler allows you to prevent this from happening.
Through modern data technology and analytical use cases, you can ensure broad access to data across the entire company. However, this isn't enough in order to make sure that your business stakeholders will be able to use your data products. To empower data democratization, you need to make data skills widely available at your company. This doesn’t mean that everyone has to become a statistician. It means that most employees should build at least basic data literacy skills.
data literacy - individual’s ability to read, understand, and utilize data in different ways
source: Harvard Business School
Data literacy is about acquiring data skills to enable all people (not only data teams) to use data for making better decisions. These skills are:
One way of building these skills is through education. Another way to further promote using them is by building habits of using data on a daily basis. You can achieve this by setting up simple but effective processes. One example can be setting up regular (e.g. weekly) sessions to discuss current performance using metrics. This helps to build an understanding of the causal relationships between your actions and their impact on your business.
Another example can be referring to data when making important decisions, for example by asking follow up questions, like:
By asking such questions on a regular basis, leaders can help decision makers get used to reaching for data when solving important business problems.
As you can see, a data literate company does not mean that everyone at the company knows how to code. In the end, common sense enriched with data already gives you the edge over just common sense on its own. Both are important factors in order to cope in these complex times such ours.
The current times we are in are nothing like what we were used to over the many, relatively peaceful years in the past. In the face of a global recession, it would be prudent to focus on generating the highest possible outputs with limited resources. In short, a paradigm of doing more with less is extremely relevant. That is one of the reasons why we decided to design a pragmatic process to help any company take the first steps on their data-driven journey. The stake is to build your capabilities as quickly as possible and start focusing on harnessing the value of your data.
If you are interested in finding out what this means for your company in practice, you can start by filling in a data-driven survey by following this link. After doing so, you will receive a summary report with recommendations for your next steps from one of our experts. We would also be happy to help you understand these findings by organizing a knowledge-sharing session with one of our experts.
Introduction Welcome back to the dbt Semantic Layer series! This article is a continuation of our previous article titled “dbt Semantic Layer - what…
Read moreA few months ago I was working on a project with a lot of geospatial data. Data was stored in HDFS, easily accessible through Hive. One of the tasks…
Read moreModern Data Stack has been around for some time already. Both tools and integration patterns have become more mature and battle tested. We shared our…
Read moreYou just finished the Apache Spark-based application. You ran so many times, you just know the app works exactly as expected: it loads the input…
Read moreRecently we published the first ebook in the area of MLOps: "Power Up Machine Learning Process. Build Feature Stores Faster - an Introduction to…
Read moreAt GetInData we use the Kedro framework as the core building block of our MLOps solutions as it structures ML projects well, providing great…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?