Describe the feature
We are looking to import/link jobs from multiple projects/environments to be used in a single jobs.yml file without duplicates defined. We were hoping that using the --templated-fields flag on the import-jobs command would recognize that jobs exist with duplicate names in the projects/environments we pass into the command, but instead it still creates 3 entries with 3 unique linked ids, even if the rest of the job definition is the same. If this was a feature that could be supported, that would be great.
Here is some information on how our current project structure looks:
jaffle_shop_dev
- project_id: xxx
- environment: dev
jaffle_shop_qa
- project_id: yyy
- environment: qa
jaffle_shop_prod
- project_id: zzz
- environment: prod
While there are 3 separate dbt projects, they are all linked to 1 repository that developers work out of and are inherently linked by being a "jaffle_shop" project. Historically in our dbt instance, developers would create a job via the dbt Cloud UI in the dev project, and then replicate the job into their qa project and prod project once deployment and promotion required it. So for example, "job1" would get created in dev, "job1" would be created in qa after testing in dev, then "job1" would finally be created in prod.
Currently, when running dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.yml, we'll receive an output of something like this:
# jobs.yml
import_1:
linked_id: 000001
account_id: 000
project_id: xxx
environment_id: 111
dbt_version:
name: job1
settings:
threads: 4
target_name: default
execution:
timeout_seconds: 0
deferring_job_definition_id:
deferring_environment_id:
run_generate_sources: false
execute_steps:
- dbt build --selector mymodel
generate_docs: false
schedule:
cron: 9 */12 * * 0,1,2,3,4,5,6
triggers:
github_webhook: false
git_provider_webhook: false
schedule: false
on_merge: false
description: ''
run_compare_changes: false
compare_changes_flags: --select state:modified
job_type: other
triggers_on_draft_pr: false
job_completion_trigger_condition:
custom_environment_variables: []
import_2:
linked_id: 000002
account_id: 000
project_id: yyy
environment_id: 222
dbt_version:
name: job1
settings:
threads: 4
target_name: default
execution:
timeout_seconds: 0
deferring_job_definition_id:
deferring_environment_id:
run_generate_sources: false
execute_steps:
- dbt build --selector mymodel
generate_docs: false
schedule:
cron: 9 */12 * * 0,1,2,3,4,5,6
triggers:
github_webhook: false
git_provider_webhook: false
schedule: false
on_merge: false
description: ''
run_compare_changes: false
compare_changes_flags: --select state:modified
job_type: other
triggers_on_draft_pr: false
job_completion_trigger_condition:
custom_environment_variables: []
import_3:
linked_id: 000003
account_id: 000
project_id: zzz
environment_id: 333
dbt_version:
name: job1
settings:
threads: 4
target_name: default
execution:
timeout_seconds: 0
deferring_job_definition_id:
deferring_environment_id:
run_generate_sources: false
execute_steps:
- dbt build --selector mymodel
generate_docs: false
schedule:
cron: 9 */12 * * 0,1,2,3,4,5,6
triggers:
github_webhook: false
git_provider_webhook: false
schedule: false
on_merge: false
description: ''
run_compare_changes: false
compare_changes_flags: --select state:modified
job_type: other
triggers_on_draft_pr: false
job_completion_trigger_condition:
custom_environment_variables: []
Even when using the --templated-fields flag on import-jobs, it would simply replace the hardcoded project_id and environment_id in each job with {{ project_id }} and {{ environment_id }} respectively and have 3 job definitions.
If using import-jobs with --templated-fields, what we would ideally like to see would be:
# vars_dev.yml
project_id: xxx
environment_id: 111
# vars_qa.yml
project_id: yyy
environment_id: 222
# vars_prod.yml
project_id: zzz
environment_id: 333
# templated_fields.yml
project_id: "{{ project_id }}"
environment_id: "{{ environment_id }}"
Which then running dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.yml would output the following:
# jobs.yml
import_1:
linked_id:
- 000001
- 000002
- 000003
account_id: 000
project_id: {{ project_id }}
environment_id: {{ environment_id }}
dbt_version:
name: job1
settings:
threads: 4
target_name: default
execution:
timeout_seconds: 0
deferring_job_definition_id:
deferring_environment_id:
run_generate_sources: false
execute_steps:
- dbt build --selector mymodel
generate_docs: false
schedule:
cron: 9 */12 * * 0,1,2,3,4,5,6
triggers:
github_webhook: false
git_provider_webhook: false
schedule: false
on_merge: false
description: ''
run_compare_changes: false
compare_changes_flags: --select state:modified
job_type: other
triggers_on_draft_pr: false
job_completion_trigger_condition:
custom_environment_variables: []
Having a list of linked_id is just a proposed idea, I'm not sure if this is currently possible, but just wanted to illustrate the idea to simplify the jobs file while also having a unique linked ID for each imported job.
Describe alternatives you've considered
We've considered creating separate import files for each environment, which would hardcode the project_id/environment_id in the respective files and then a single jobs.yml for any jobs that would be created in code going forward taking advantage of the vars files for templated field injection... but it feels somewhat clunky since now jobs would need to be managed in multiple files if it was a job imported or not. Something like this as a file structure:
dbt/
jobs/
import/
dev_jobs.yml
qa_jobs.yml
prd_jobs.yml
jobs.yml
Who will this benefit?
dbt users that manage multiple projects in one repository, or dbt instances that use multiple projects with a single environment (instead of 1 project with multiple environments).
Are you interested in contributing this feature?
Sure, happy to help, would just need to know if this was something that can be supported and what the best method for supporting this would be before beginning.
Describe the feature
We are looking to import/link jobs from multiple projects/environments to be used in a single
jobs.ymlfile without duplicates defined. We were hoping that using the--templated-fieldsflag on theimport-jobscommand would recognize that jobs exist with duplicate names in the projects/environments we pass into the command, but instead it still creates 3 entries with 3 unique linked ids, even if the rest of the job definition is the same. If this was a feature that could be supported, that would be great.Here is some information on how our current project structure looks:
jaffle_shop_dev
jaffle_shop_qa
jaffle_shop_prod
While there are 3 separate dbt projects, they are all linked to 1 repository that developers work out of and are inherently linked by being a "jaffle_shop" project. Historically in our dbt instance, developers would create a job via the dbt Cloud UI in the dev project, and then replicate the job into their qa project and prod project once deployment and promotion required it. So for example, "job1" would get created in dev, "job1" would be created in qa after testing in dev, then "job1" would finally be created in prod.
Currently, when running
dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.yml, we'll receive an output of something like this:Even when using the
--templated-fieldsflag onimport-jobs, it would simply replace the hardcodedproject_idandenvironment_idin each job with{{ project_id }}and{{ environment_id }}respectively and have 3 job definitions.If using
import-jobswith--templated-fields, what we would ideally like to see would be:Which then running
dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.ymlwould output the following:Having a list of
linked_idis just a proposed idea, I'm not sure if this is currently possible, but just wanted to illustrate the idea to simplify the jobs file while also having a unique linked ID for each imported job.Describe alternatives you've considered
We've considered creating separate import files for each environment, which would hardcode the project_id/environment_id in the respective files and then a single
jobs.ymlfor any jobs that would be created in code going forward taking advantage of the vars files for templated field injection... but it feels somewhat clunky since now jobs would need to be managed in multiple files if it was a job imported or not. Something like this as a file structure:Who will this benefit?
dbt users that manage multiple projects in one repository, or dbt instances that use multiple projects with a single environment (instead of 1 project with multiple environments).
Are you interested in contributing this feature?
Sure, happy to help, would just need to know if this was something that can be supported and what the best method for supporting this would be before beginning.