Skip to content

[Feature] Reduce duplicate job definitions when using import-jobs/link across multiple projects and environments #155

@justbldwn

Description

@justbldwn

Describe the feature
We are looking to import/link jobs from multiple projects/environments to be used in a single jobs.yml file without duplicates defined. We were hoping that using the --templated-fields flag on the import-jobs command would recognize that jobs exist with duplicate names in the projects/environments we pass into the command, but instead it still creates 3 entries with 3 unique linked ids, even if the rest of the job definition is the same. If this was a feature that could be supported, that would be great.

Here is some information on how our current project structure looks:

jaffle_shop_dev

  • project_id: xxx
  • environment: dev
    • environment_id: 111

jaffle_shop_qa

  • project_id: yyy
  • environment: qa
    • environment_id: 222

jaffle_shop_prod

  • project_id: zzz
  • environment: prod
    • environment_id: 333

While there are 3 separate dbt projects, they are all linked to 1 repository that developers work out of and are inherently linked by being a "jaffle_shop" project. Historically in our dbt instance, developers would create a job via the dbt Cloud UI in the dev project, and then replicate the job into their qa project and prod project once deployment and promotion required it. So for example, "job1" would get created in dev, "job1" would be created in qa after testing in dev, then "job1" would finally be created in prod.

Currently, when running dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.yml, we'll receive an output of something like this:

# jobs.yml
  import_1:
    linked_id: 000001
    account_id: 000
    project_id: xxx
    environment_id: 111
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []
  import_2:
    linked_id: 000002
    account_id: 000
    project_id: yyy
    environment_id: 222
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []
  import_3:
    linked_id: 000003
    account_id: 000
    project_id: zzz
    environment_id: 333
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []

Even when using the --templated-fields flag on import-jobs, it would simply replace the hardcoded project_id and environment_id in each job with {{ project_id }} and {{ environment_id }} respectively and have 3 job definitions.

If using import-jobs with --templated-fields, what we would ideally like to see would be:

# vars_dev.yml
project_id: xxx
environment_id: 111
# vars_qa.yml
project_id: yyy
environment_id: 222
# vars_prod.yml
project_id: zzz
environment_id: 333
# templated_fields.yml
project_id: "{{ project_id }}"
environment_id: "{{ environment_id }}"

Which then running dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.yml would output the following:

# jobs.yml
  import_1:
    linked_id:
    - 000001
    - 000002
    - 000003
    account_id: 000
    project_id: {{ project_id }}
    environment_id: {{ environment_id }}
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []

Having a list of linked_id is just a proposed idea, I'm not sure if this is currently possible, but just wanted to illustrate the idea to simplify the jobs file while also having a unique linked ID for each imported job.

Describe alternatives you've considered
We've considered creating separate import files for each environment, which would hardcode the project_id/environment_id in the respective files and then a single jobs.yml for any jobs that would be created in code going forward taking advantage of the vars files for templated field injection... but it feels somewhat clunky since now jobs would need to be managed in multiple files if it was a job imported or not. Something like this as a file structure:

dbt/
  jobs/
    import/
      dev_jobs.yml
      qa_jobs.yml
      prd_jobs.yml
    jobs.yml   

Who will this benefit?
dbt users that manage multiple projects in one repository, or dbt instances that use multiple projects with a single environment (instead of 1 project with multiple environments).

Are you interested in contributing this feature?
Sure, happy to help, would just need to know if this was something that can be supported and what the best method for supporting this would be before beginning.

Metadata

Metadata

Assignees

Labels

triageIssue needs to be triaged by the maintainer team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions