[Feature] Reduce duplicate job definitions when using import-jobs/link across multiple projects and environments

**Describe the feature**
We are looking to import/link jobs from multiple projects/environments to be used in a single `jobs.yml` file without duplicates defined. We were hoping that using the `--templated-fields` flag on the `import-jobs` command would recognize that jobs exist with duplicate names in the projects/environments we pass into the command, but instead it still creates 3 entries with 3 unique linked ids, even if the rest of the job definition is the same. If this was a feature that could be supported, that would be great.

Here is some information on how our current project structure looks: 

**jaffle_shop_dev**
- **project_id:** xxx
- **environment:** dev
  - **environment_id:** 111

jaffle_shop_qa
- **project_id:** yyy
- **environment:** qa
  - **environment_id:** 222

jaffle_shop_prod
- **project_id:** zzz
- **environment:** prod
  - **environment_id:** 333

While there are 3 separate dbt projects, they are all linked to 1 repository that developers work out of and are inherently linked by being a "jaffle_shop" project. Historically in our dbt instance, developers would create a job via the dbt Cloud UI in the dev project, and then replicate the job into their qa project and prod project once deployment and promotion required it. So for example, "job1" would get created in dev, "job1" would be created in qa after testing in dev, then "job1" would finally be created in prod.

Currently, when running `dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.yml`, we'll receive an output of something like this:

```yaml
# jobs.yml
  import_1:
    linked_id: 000001
    account_id: 000
    project_id: xxx
    environment_id: 111
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []
  import_2:
    linked_id: 000002
    account_id: 000
    project_id: yyy
    environment_id: 222
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []
  import_3:
    linked_id: 000003
    account_id: 000
    project_id: zzz
    environment_id: 333
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []
```

Even when using the `--templated-fields` flag on `import-jobs`, it would simply replace the hardcoded `project_id` and `environment_id` in each job with `{{ project_id }}` and `{{ environment_id }}` respectively and have 3 job definitions.

If using `import-jobs` with `--templated-fields`, what we would ideally like to see would be:
```yaml
# vars_dev.yml
project_id: xxx
environment_id: 111
```

```yaml
# vars_qa.yml
project_id: yyy
environment_id: 222
```

```yaml
# vars_prod.yml
project_id: zzz
environment_id: 333
```

```yaml
# templated_fields.yml
project_id: "{{ project_id }}"
environment_id: "{{ environment_id }}"
```

Which then running `dbt-jobs-as-code import-jobs --account-id 000 --project-id xxx --environment-id 111 --project-id yyy --environment-id 222 --project-id zzz --environment-id 333 --include-linked-id >> jobs.yml` would output the following:

```yaml
# jobs.yml
  import_1:
    linked_id:
    - 000001
    - 000002
    - 000003
    account_id: 000
    project_id: {{ project_id }}
    environment_id: {{ environment_id }}
    dbt_version:
    name: job1
    settings:
      threads: 4
      target_name: default
    execution:
      timeout_seconds: 0
    deferring_job_definition_id:
    deferring_environment_id:
    run_generate_sources: false
    execute_steps:
      - dbt build --selector mymodel
    generate_docs: false
    schedule:
      cron: 9 */12 * * 0,1,2,3,4,5,6
    triggers:
      github_webhook: false
      git_provider_webhook: false
      schedule: false
      on_merge: false
    description: ''
    run_compare_changes: false
    compare_changes_flags: --select state:modified
    job_type: other
    triggers_on_draft_pr: false
    job_completion_trigger_condition:
    custom_environment_variables: []
```

Having a list of `linked_id` is just a proposed idea, I'm not sure if this is currently possible, but just wanted to illustrate the idea to simplify the jobs file while also having a unique linked ID for each imported job.

**Describe alternatives you've considered**
We've considered creating separate import files for each environment, which would hardcode the project_id/environment_id in the respective files and then a single `jobs.yml` for any jobs that would be created in code going forward taking advantage of the vars files for templated field injection... but it feels somewhat clunky since now jobs would need to be managed in multiple files if it was a job imported or not. Something like this as a file structure:

```
dbt/
  jobs/
    import/
      dev_jobs.yml
      qa_jobs.yml
      prd_jobs.yml
    jobs.yml   
```

**Who will this benefit?**
dbt users that manage multiple projects in one repository, or dbt instances that use multiple projects with a single environment (instead of 1 project with multiple environments).

**Are you interested in contributing this feature?**
Sure, happy to help, would just need to know if this was something that can be supported and what the best method for supporting this would be before beginning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Reduce duplicate job definitions when using import-jobs/link across multiple projects and environments #155

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature] Reduce duplicate job definitions when using import-jobs/link across multiple projects and environments #155

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions