5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.
A year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast thanks to a lot of innovation, competition, and use of technologies that become now critical to almost all companies on this planet. Let’s read about the 5 current trends that will be described in detail by selected presentations at the upcoming edition of Big Data Tech Warsaw 2021 (February 25-26th).
MLOps becomes mainstream
ML/AI becomes ubiquitous in our daily life
Data Quality and Data Observability becomes easier
Larger clouds over the Big Data landscape
Best practices for managing Big Data teams and projects emerge.
The issue of building machine learning systems, especially scalable ones, was presented by Google in a research paper in 2015 ("Hidden Technical Debt in Machine Learning Systems"). At that time, many companies were already in the process of creating large-scale ML systems. Significantly, however, few had a dedicated platform or tools that would support the end-to-end life-cycle of their ML models and the daily work of their ML teams.
Presentations at the Big Data Technology Warsaw Summit, dealing with the issue of Machine Learning Operations (MLops).
Last year we had a number of very interesting MLOps-related presentations at BDTWS 2020 given by speakers from companies such as Spotify, Disney+, Synerise. The mentioned companies were part of the Data Science & ML track last year.
Interest in the topic is growing constantly, so this year on BDTWS we have prepared a special track called MLOps. Below are examples of Machine Learning Operations presentations that can be seen at the Big Data Technology Warsaw Summit 2021:
Keven(Qi) Wang will talk about MLOps journey at H&M on the public cloud. In his speech he will present their entire MLOps stack that has been adopted by multiple product teams managing 100s of models across the entire H&M value chain. It enables data scientists to develop models in a highly interactive environment, enables engineers to manage large scale model training and model serving pipeline with full traceability.
Maciej Pieńkosz from Sotrender, a company whose main task is to analyze huge amounts of data coming from Social Media, will talk about their ML use-cases and GCP components they use (e.g. AI Platform Notebooks, AI Platform Training, Cloud Run, Gitlab CI/CD). His presentation will cover the full lifecycle of the ML model - from experimentation, through deployment and training, to model monitoring.
It's hard to operate in the IT industry (especially within Big Data projects implemented on open-source technologies) and not know the Australian company called Atlassian. Jiamei Du will talk about how her company uses A/B experiments to build better products. Part of her story will focus on their MLOps tools and infrastructure to make their A/B experiments as efficient as possible.
One cannot fail to mention the members of GetInData, who will present their experiences in building portable and reusable ML platforms in various environments (cloud, hybrid, on-premise) using a mix of open-source and cloud-based technologies for various customers. They will share their experience and best practices that come from multiple production implementations.
NoMagic robots improve iteratively and continuously thanks to the software 2.0 improvement cycle supported by an in-house data engine. Watch this short video below to see what type of robots they teach using ML/AI.
Those are only a few highlighted examples, but you will definitely learn more about Machine Learning Operations at Big Data Tech Warsaw 2021.
Trend 2. ML/AI becomes ubiquitous in our daily life
Adoption of Machine Learning, Data Science, and AI algorithms and techniques always required a lot of work, skills, and time Undoubtedly, however, when conducted successfully, it brings excellent results.. One of the favorite examples to mention is Discover Weekly implemented by Swedish, world-wide known company, Spotify. Below, you can see slides created by my ex-colleagues at Spotify. On those slides, they describe how Discover Weekly came to be, highlighting technical challenges, data-driven development, and the ML models used to power their recommendations engine. It was a complex process, not done overnight. Integrate all necessary (open-source) technologies, then build scalable architecture, implement smart algorithms and monitor it was undoubtedly a big undertaking, at least five years ago.
Today, building dedicated ML platforms and using MLOps toolkits can significantly increase companies productivity. Very often, they also switch to the public cloud - it helps to take advantage of ready-to-use libraries and hardware, and as a consequence, makes their job easier. These processes result in the possibility of experimenting, training and deploying new models faster and cheaper.
Clearly, more and more ML models appears in our daily life these days.
Machine Learning/Artificial Intelligence -related presentations you can watch at Big Data Tech Warsaw
During the BDTWS 2021 conference, you can count on many presentations that (a) describe use-cases, algorithms, and techniques which show how Machine Learning and Artificial Intelligence solve real-world business problems and (b) share their lessons learned from working with ML, Data Science, and advanced analytics. Let’s highlight a few interesting examples:
Mikio Braun (ex-Zalando) will talk about the lessons he learned on building large-scale production recommender systems. He will, among other things, explain how to bridge the gap from the raw mathematical models and algorithms to robust and scalable software systems. It will be exploring the union of theory and practice
Boxun Zhang (ex-Spotify, currently at Unity) will talk about similar issues in his presentation, although he will focus on the aspect related to real-time and large-scale Machine Learning systems. Boxun will also share several generalizable lessons that make ML systems performant from an ML perspective and scalable from an engineering perspective.
It's also hard not to mention GetInData members who will present their experiences from a year-long journey in developing Kcell (a large Kazach telecom’s) big data analytics platform and building data-driven solutions on top of it that help to reduce costs, improve the quality of the services and understand users' needs better.
Machine Learning is often used for prediction, forecasting, and anomaly detection. At the BDTWS 2021 we will be able to hear the story about a near real-time ML model built by Ericsson. It is used for predicting telecom systems degradation and outage based on historical fault & performance data. This model helps the operations team to conduct proactive monitoring, thanks to which the number of hours that support engineers spent on solving issues has significantly decreased. We are talking about a drop ranging between 30 and 40%. It also improved the UXin pre-paid calls and made customer retention higher. Peltarion (a Swedish company that specializes in AI) will describe their state-of-the-art weather forecasting AI service. Sotrender (a Polish company that analyses data from social media) will explain how they use ML to predict and monitor the effectiveness of campaigns conducted on the Facebook platform.
At Big Data Technology Warsaw Summit 2021 there will also be presentations on the use of data, science and technology to generate insights for search and recommendation systems in an e-commerce platform (Etsy), to build content personalization systems in e-commerce (eBay), run A/B experiments for growth (Atlassian), analyze geophysical data from ground-penetrating radars using deep-learning techniques (SGPR.TECH), and more.
Trend 3. Data Quality and Data Observability becomes easier
For data-driven company, things like data quality and observability have always been important, even a long time ago when tools like Hadoop and Hive were open-sourced. On the other hand, it was always problematic, due to the lack of simple-to-use and feature-rich technologies (especially the open-source ones ). For this reason, many companies haven’t addressed these problems correctly.
Recently, however, the status quo has changed, and new tools have emerged that significantly facilitate data quality and data observability. This includes various tools such as Apache Atlas, Amundsen from Lyft, Dataportal from AirBnB (see a picture below), Datahub from LinkedIn, Data Catalog from Google, and Deequ from Amazon to name a few. These tools are often integrated together - check how Amundsen can work together with Feast for machine learning discovery or Atlas for data discovery.
Data Quality and Data Observability at BDTWS 2021
There will be a presentation on a new open-source technology called Marquezthat can be used for data lineage and observability.This new tool can help to understand how amounts of data are flowing through company’s systems. Thanks to this, it will be possible to demonstrate the dependencies that occur between individual teams receiving and producing data, as well as easier to carry out data pipelines audit.
While ensuring that data quality is important even in the small data set, Criteo representatives will tell how they addressed data quality challenges on their 120+ PB data lake and thousands of jobs. . Their journey began two years ago, and they will now share with us the data and thoughts they have collected. The picture below shows data lake anomaly detection at Criteo (source)
The presentation from OLX will concern pragmatic approach to data quality . It will focus on a a review of already existing frameworks and approaches to data quality. Beside this, it will include principles behind adapting these approaches and designing data quality systems at OLX.
It’s not all, as there will also be presentations about building testable data pipelines at Target and about a tool called Diftong from Klarna for validating big data workflows.
These are the first three trends in Big Data that will be strongly represented in presentations at the BDTWS 2021 conference. But that's not all, go to the next post to learn about the next two trends and learn a bit about the presentations that will apply to them!
big data
cloud computing
bigdatatechwarsaw
machine learning
cloud
open source
21 January 2021
Like this post? Spread the word
Want more? Check our articles
Radio DaTa Podcast
Data Journey with Alessandro Romano (FREE NOW) – Dynamic pricing in a real-time app, technology stack and pragmatism in data science.
In this episode of the RadioData Podcast, Adama Kawa talks with Alessandro Romano about FREE NOW use cases: data, techniques, signals and the KPIs…
Data Journey with Yetunde Dada & Ivan Danov (QuantumBlack) – Kedro (an open-source MLOps framework) – introduction, benefits, use-cases, data & insights used for its development
In this episode of the RadioData Podcast, Adam Kawa talks with Yetunde Dada & Ivan Danov about QuantumBlack, Kedro, trends in the MLOps landscape e.g…
Training ML models and using them in online prediction on production is not an easy task. Fortunately, there are more and more tools and libs that can…
Introduction to dbt Cloud - features, capabilities and limitations
dbt Cloud is a service that helps data analysts and engineers put their dbt deployments into production. As data-driven organizations continue to grow…