Centrifuge : Data quality in Spark without the costs!

Jonathan Winandy

Abstract

Data quality is a growing concern in Big Data as more and more bugs are due to the lack of quality in data. However, data quality efforts come in second, and often too late. In this talk, we will apply algebraic abstraction to the composition of data pipelines resulting in inlined, unified and performant data quality checks. We will see how these techniques can be used to find different classes of bugs in pipelines and make “same day delivery” possible in production-critical projects.

Bio

Jonathan is a passionate Data Engineer, herding those big cats that sip at “Datalakes” at dawn. He cofounded a couple of companies related to Data (in healthcare/BI) and contributes to the Scala community in France.

Abstract

Bio

Conference

Attending