Ten Pillars of Data Engineering

Aaron Ginder
11 min readJan 3, 2023

Due to the overwhelming success of “Top 5 Things You Need to Know as a Data Engineer” (which is worth checking out if you haven’t already), here is the 10 pillars that make a successful data engineering team.

10 Pillars of Data Engineering. Aaron Ginder (2021)

1. Re-usable Frameworks

https://content.altexsoft.com/media/2020/08/big-data-frameworks-classified-by-data-analysis-ty.png

Frameworks are the foundation of any effective data engineering system and come in lots of forms. Frameworks by definition should be applicable across the whole data engineering practice. These can be open-source such as Apache Beam pre-existing templates, or customised in-house patterns such as customised flex template beam pipelines.

At what point should someone create a bespoke framework rather than using existing one? Well, it depends. Measure the value of having a customised framework against the cost. If the opportunity cost exhibits negative returns, why bother going to the effort of creating and maintaining that framework?

Common frameworks include:

  • Programmatic libraries — building on top of open-source libraries such as Dask DataFrames, custom Airflow operators
  • Data pipeline templates Apache beam templates, common Airflow DAGS

--

--

Aaron Ginder

An enthusiastic technologist looking to share my passion for cloud computing, programming and software engineering.