List: Data Engineering / Pipelines | Curated by David Bender

Mar 7, 2023

40 stories

Data Engineering / Pipelines
In
TDS Archive
by
Cassie Kozyrkov
The Obscure Art of Data DesignBattling an embarrassing new alchemy for the digital era
Nov 22, 2022
16
Nov 22, 2022
16
In
TDS Archive
by
Madison Schott
What is dbt?Your guide to analytics engineering and the tool that created it
Jul 13, 2021
5
Jul 13, 2021
5
In
The Prefect Blog
by
Khuyen Tran
Orchestrate Your Data Science Project with Prefect 2.0Make Your Data Science Pipeline Resilient Against Failures
Jun 29, 2022
1
Jun 29, 2022
1
In
Tinyclues Vision
by
Mike Aidane
4 Design Principles for Robust Data PipelinesDesign Principals for traditional Software Engineering quickly fail when working with large and diverse sets of data — a new way of…
Mar 11, 2022
2
Mar 11, 2022
2
In
Dev Genius
by
Ashish MJ
Kafka with PythonThis article aims to outline the core concepts of Apache Kafka and write simple producer and consumer programs using python.
Jan 23, 2022
2
Jan 23, 2022
2
In
TDS Archive
by
Emma Rizzi
Deploying Prefect Server with AWS ECS and Docker StorageHow to orchestrate and automate workflows with Prefect running on ECS Fargate with a private Docker registry
Aug 24, 2021
1
Aug 24, 2021
1
In
Dev Genius
by
Haq Nawaz
Python ETL Pipeline: The Incremental data load TechniquesThe incremental data load approach in ETL (Extract, Transform and Load) is the ideal design pattern. In this process, we identify and…
Mar 25, 2022
6
Mar 25, 2022
6
Oladokun Joseph
A step-by-step guide to building a simple data pipeline with AirplaneThere are many layers to data engineering but the most important job of a data engineer is to build data pipelines that will make quality…
Mar 27, 2022
1
Mar 27, 2022
1
In
Better Programming
by
Samhita Alla
5 Open-Source Tools That Can Help You Build ML Pipelines With EaseAll production-friendly
Feb 11, 2022
1
Feb 11, 2022
1
In
SeattleDataGuy By SeattleDataGuy
by
Ben Rogojan
Starburst Data Raised $100M — But What Is It?Looking At A Company Trying To Make Presto(now Trino, easier)
Aug 28, 2021
1
Aug 28, 2021
1
In
ITNEXT
by
Tobias Wissmueller
Event-Driven Architectures with Kafka and PythonEverything You Need to Get Started
Oct 22, 2021
2
Oct 22, 2021
2
In
TDS Archive
by
James Briggs
SQL on The Cloud With PythonA straightforward guide to SQL on Google Cloud and Python
Sep 4, 2020
7
Sep 4, 2020
7
In
Elucidata
by
Sahil Rai
How to Build Highly Effective ETL PipelinesA quick look at building inexpensive yet scalable ETL pipelines
Feb 21, 2022
4
Feb 21, 2022
4
Kestra
Introducing Kestra, infinitely scalable open source orchestration and scheduling platform.Today, our team is proud to announce a first public release of Kestra, an open-source platform to orchestrate & schedule any kinds of…
Feb 2, 2022
11
Feb 2, 2022
11
In
Geek Culture
by
Madison Schott
An Analytic Engineer’s Honest Review of AirbyteHow to ingest Mailchimp data into Snowflake
Jan 14, 2022
2
Jan 14, 2022
2
Anna Geller
How to Use Prefect and Monte Carlo to Achieve More Reliable Data PipelinesIntroducing Monte Carlo data lineage tasks in Prefect
Feb 15, 2022
2
Feb 15, 2022
2
In
TDS Archive
by
Krasnov Vitaliy
Building Python Microservices with Apache Kafka: All Gain, No PainEngineers often use Apache Kafka in their everyday work. The major tasks that Kafka performs are: read messages, process messages, write…
Nov 10, 2021
3
Nov 10, 2021
3
In
cisco-fpie
by
Mirko Raca
Great (data) expectations — automatic data quality validationWhen was the last time you spoke to your data?
Feb 11, 2022
2
Feb 11, 2022
2
In
Geek Culture
by
Andrea Capuano
Airflow for non-batch, non-scheduled workloadsOn December 2020 Apache released Airflow 2.0, introducing a lot of new interesting changes.
The one that I find more appealing is the 17x…
Jun 29, 2021
Jun 29, 2021
Anna Geller
How to Make Your Data Pipelines More Dynamic Using Parameters in PrefectHow to pass runtime-specific parameter values to your data pipelines
Jan 25, 2022
Jan 25, 2022

Data Engineering / Pipelines

The Obscure Art of Data Design

Battling an embarrassing new alchemy for the digital era

What is dbt?

Your guide to analytics engineering and the tool that created it

Orchestrate Your Data Science Project with Prefect 2.0

Make Your Data Science Pipeline Resilient Against Failures

4 Design Principles for Robust Data Pipelines

Design Principals for traditional Software Engineering quickly fail when working with large and diverse sets of data — a new way of…

Kafka with Python

This article aims to outline the core concepts of Apache Kafka and write simple producer and consumer programs using python.

Deploying Prefect Server with AWS ECS and Docker Storage

How to orchestrate and automate workflows with Prefect running on ECS Fargate with a private Docker registry

Python ETL Pipeline: The Incremental data load Techniques

The incremental data load approach in ETL (Extract, Transform and Load) is the ideal design pattern. In this process, we identify and…

A step-by-step guide to building a simple data pipeline with Airplane

There are many layers to data engineering but the most important job of a data engineer is to build data pipelines that will make quality…

5 Open-Source Tools That Can Help You Build ML Pipelines With Ease

All production-friendly

Starburst Data Raised $100M — But What Is It?

Looking At A Company Trying To Make Presto(now Trino, easier)

Event-Driven Architectures with Kafka and Python

Everything You Need to Get Started

SQL on The Cloud With Python

A straightforward guide to SQL on Google Cloud and Python

How to Build Highly Effective ETL Pipelines

A quick look at building inexpensive yet scalable ETL pipelines

Introducing Kestra, infinitely scalable open source orchestration and scheduling platform.

Today, our team is proud to announce a first public release of Kestra, an open-source platform to orchestrate & schedule any kinds of…

An Analytic Engineer’s Honest Review of Airbyte

How to ingest Mailchimp data into Snowflake

How to Use Prefect and Monte Carlo to Achieve More Reliable Data Pipelines

Introducing Monte Carlo data lineage tasks in Prefect

Building Python Microservices with Apache Kafka: All Gain, No Pain

Engineers often use Apache Kafka in their everyday work. The major tasks that Kafka performs are: read messages, process messages, write…

Great (data) expectations — automatic data quality validation

When was the last time you spoke to your data?

Airflow for non-batch, non-scheduled workloads

On December 2020 Apache released Airflow 2.0, introducing a lot of new interesting changes. The one that I find more appealing is the 17x…

How to Make Your Data Pipelines More Dynamic Using Parameters in Prefect

How to pass runtime-specific parameter values to your data pipelines

David Bender