Skip to content

Revamp EKS Fargate documentation #20144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

JacksonDavenport
Copy link
Contributor

@JacksonDavenport JacksonDavenport commented Apr 24, 2025

What does this PR do?

TEEP-601

This PR grew a bit more than expected and is effectively a big rewrite of this doc.

  • Moves a lot of the instructions to prerequisite steps, especially as those were in common between Operator/Helm/Manual
  • Centralize and clarify a lot of the contextual information at the top
  • Clarify use of Secret for API Keys and Tokens, all manual setups use the Secret ref instead of raw API Key
  • Overhaul how we recommend to do the RBAC to
    • Clarify/stress how this is used in the pods
    • Provide examples on how to configure it per namespace
    • Provide examples on how to configure it for multiple pre-existing service accounts
    • Avoid name clashes with the Helm/Operator installs (ClusterRole/ClusterRoleBinding: datadog-agent -> datadog-agent-fargate)
    • Provide a troubleshooting kubectl auth command to validate
  • General cleanup on Operator/Helm install methods, especially for samples, results, and header structure
  • General cleanup on manual install methods
  • Update integrations to Autodiscovery V2 syntax
  • Fix some inaccuracies for what is/isn't collected
  • Fix a lot of broken and orphaned links

Motivation

An accumulation of different customer and TSE experiences with this doc. Originally was just for RBAC and Keys, but turned into a full revamp.

Did NOT cover how APM Auto Instrumentation plays with EKS Fargate. That is a whole other topic that is probably worth a separate guide page to tackle compared to this.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@domalessi
Copy link
Contributor

Created DOCS-10710 to track documentation team review

@cswatt cswatt added the editorial review Waiting on a more in-depth review from a docs team editor label Apr 24, 2025
```

**Note**: Don't forget to replace `<YOUR_DATADOG_API_KEY>` with the [Datadog API key from your organization][14].
**Note**: Add `DD_TAGS` to append additional space separated `<KEY>:<VALUE>` tags. The `DD_CLUSTER_NAME` environment variable will set your `kube_cluster_name` tag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to mention that DD_CLUSTER_NAME only works on v7.34+? If you think this isn't common and there won't be support cases surrounding this then I think its fine. I defer to your experience on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fine, I haven't seen anyone use anything in the 7.3X in a while. That was released March 2022 so everyone should hopefully be above this. Especially those setting up new EKS Fargate environments.

**Notes**:

- Don't forget to replace `<YOUR_DATADOG_API_KEY>` with the [Datadog API key from your organization][14].
- Container metrics are not available in Fargate because the `cgroups` volume from the host can't be mounted into the Agent. The [Live Containers][17] view reports 0 for CPU and Memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to no longer mention that traditional container metrics are not available in Fargate?

Copy link
Contributor Author

@JacksonDavenport JacksonDavenport Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this line is just wrong, I can get standard kubernetes.cpu and kubernetes.memory metrics, generic container.cpu and container.memory metrics, as well as Live Containers by default. Then Live Process metrics once enabled.

That looks like a note we may have added way back when we first released EKS Fargate support and before we just started going to the kubelet for everything here. #6045

Left a sample in that https://datadoghq.atlassian.net/browse/TEEP-601

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I started playing around a Fargate cluster and was also seeing container.cpu and container.memory metrics. Thanks for catching this inaccurate note.

## Running the Agent as a side-car
- image: datadog/agent
name: datadog-agent
## Enabling port 8125 for DogStatsD metric collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @wdhif Do you think we want to update our documentation for DogstatsD configuration to use volumes, mounts + UDS?

- name: DD_PROCESS_CONFIG_PROCESS_COLLECTION_ENABLED
value: "true"
# (...)
```

## Log collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future note:

Soon we'll have logging directly via the Agent DataDog/datadog-agent#35190

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's really good, EKS Fargate log collection has been a bit of a pain point. This PR itself doesn't really change the logs setup. So a future one for APM/Auto-Instrumentation (probably a separate guide page) would be good. As well as one to add these instructions to this doc, and IMO move this strategy to a separate page to keep this one trim - assuming that Agent based collection is just all around superior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants