Why your company should start using dbt… with an example
What is dbt?
Founded by fishtown analytics, data build tool (dbt) is one of the novel tools talked about in Data Engineering. The success of dbt derives from the simplicity to create an orchestration data pipeline by using standard SQL and YAML configurations. Fishtown analytics achieves this by abstracting any pipeline DAG complexities and handles it for you (amazing, I know!).
Many offerings from cloud providers make it easy to ingest the data. dbt focuses on the transform of ETL by providing a command line interface (or cloud environment) solution to dynamically build your SQL pipelines.
More info about dbt and concepts can be found here.
Why use dbt?
There are many reasons you may want to choose dbt as your orchestration tool…
- Easy to use. Anyone with a SQL background (analytics developers, data engineers and scientists) to build pipelines
- Built-in Jinja functionality to allow dynamic templating of SQL scripts with functions (either installed as a package or defined yourself) at runtime
- Automatically generate documentation that can be deployed on a web server for all to access
- You no longer need to define your task dependencies — dbt handles this for you
- Growing community allowing a large forum to ask & answer questions
- dbt can integrate easily with cloud providers, including GCP, AWS & Azure
Example dbt Pipeline on GCP
If you are as excited as I am about dbt, keep reading. I will go through how to create a high-level SQL data pipeline which can be deployed to your cloud provider.
To get you started, here is a list of dbt commands that will get you started…
Running the Country GDP Growth Pipeline…
Taking a different approach to show you how to run the pipeline, I have created a short video to illustrate how to execute the gdp growth pipeline. The purpose of this pipeline is to combine population statistics with GDP per country over time and calculate the average GDP per person for a given year.
I must admit, it’s not the most insightful data pipeline but is a great way to simply illustrate why you should use dbt!
Here is the DAG describing what the pipeline does…
Feel free to reach out with any questions!