After learning DBT some years ago, I've been constantly wondering why aren't we unit-testing our DBT models.
So after some research, I've decided to write an article about how to unit-test DBT and Postgres properly.
https://t.co/ncsHNOuBfW
#dbt#analytics#analyticsengineering
@Jra_tech@thebmbennett@pdrmnvd Define complexity.
The solution is just Airflow in K8s with CI/CD pipelines.
I believe that a lot of companies are using k8s as part of their infrastructure, so it's just reusing some staff on a really simple way, since airflow helm is quite simple to deploy and to integrate with
@thebmbennett Happy to inspire you :) I hope the article helped your team on archiving the success.
Btw, I would be interested in knowing your transition experience :)
Any other tools you included?
@fred_irodrigues@code@ApacheAirflow We run all, but it could be possible to extract from the merge diff which models were updated.
We do something similar for running SQLfluff only on updated models
We just published an article about how we moved successfully away from DBT Cloud to DBT Core +
@code + @ApacheAirflow.
https://t.co/ovdrxEDABF
Some context triggering our movement on my previous tweet (https://t.co/cOkjDVZlCM)
So, after @getdbt announcement on doubling their subscription plan from $50/seat to $100/seat, we decided to move all to DBT Core + @ApacheAirflow + VS (@code).
DBT is an excellent open-source project supported by the community and the company, but this movement is abusive.
@anna__geller@startdataeng It could be because airflow, yes. But also because I prefer to have only orchestration logic in the DAGs and then all the transformation logic on their domain repos/Dockers. So airflow is always agnostic about which libraries or dark magic are you doing in the tasks
@anna__geller@startdataeng I would think more on a way where orchestration and job to be done are completely decoupled. I try as much as possible to execute K8sPodOperators instead of PythonOperators
@AndyRitting@floydophone@getdbt@ApacheAirflow@code@dagster Why only changed models? Are you defining all DBT models as views?
Otherwise, you will be only updating the new models but not updating the existing ones with the new incoming data.
You can always use DBT Core in local and then just one seat in DBT cloud as orchestrator.
So, after @getdbt announcement on doubling their subscription plan from $50/seat to $100/seat, we decided to move all to DBT Core + @ApacheAirflow + VS (@code).
DBT is an excellent open-source project supported by the community and the company, but this movement is abusive.
@fred_irodrigues@getdbt@ApacheAirflow@code Our company is already in Gitlab ๐ ๐ซ , so we use Gitlab Runners in our K8s.
Also, I have to admit, that GitHub was quite better on CI/CD and integrations with Slack than Gitlab is right now. But that could give us material for another thread ๐งต๐ค
@rahulj51@getdbt@ApacheAirflow@code The Airflow DAG is scheduled to run on a daily basis, however, when we merge code to the main branch, we want to speed up our work and execute all the new code straight away. So we trigger the DAG making sure we have in Prod what we have in main without waiting until the next one
@rahulj51@getdbt@ApacheAirflow@code Gitlab CI/CD creates a docker image with all the code, then calls Airflow API to trigger a DAG with K8sPodOperator executing the required DBT command
@rahulj51@getdbt@ApacheAirflow@code Nothing, after using VC with the DBT plugins we got everything covered. Also we publish the DBT catalog to S3 so we can navigate it and share it with the rest of the team.