Skip to content

feat: add missing Spark import/export support for metrics #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 6, 2025

Conversation

natoscott
Copy link
Collaborator

@natoscott natoscott commented Apr 17, 2025

Feature: The role can gather metrics from Apache Spark and send metrics information into Spark. There are two new boolean parameters:

  • metrics_into_spark: false

Boolean flag allowing metric values to be exported into Spark.

  • metrics_from_spark: false

Boolean flag allowing metrics from Spark to be made available.

Reason: Users of Apache Spark may want to get metrics into and out of Spark. pcp has had support for Spark for a while.

Result: Users can get metrics into and out of Apache Spark.

Resolves Red Hat issue RHEL-17564

Resolves Red Hat issue RHEL-17564

Signed-off-by: Nathan Scott <[email protected]>
@natoscott natoscott requested a review from richm as a code owner April 17, 2025 08:17
@richm
Copy link
Collaborator

richm commented Apr 22, 2025

[citest]

@richm
Copy link
Collaborator

richm commented Apr 23, 2025

@natoscott el7 test fails:

TASK [fedora.linux_system_roles.private_metrics_subrole_spark : Install needed Spark metrics packages] ***
task path: /tmp/collections-Z9d/ansible_collections/fedora/linux_system_roles/roles/private_metrics_subrole_spark/tasks/main.yml:41
Tuesday 22 April 2025  20:10:28 -0400 (0:00:00.034)       0:00:03.177 ********* 
fatal: [managed-node2]: FAILED! => {
    "changed": false, 
    "rc": 126, 
    "results": [
        "No package matching 'pcp-pmda-openmetrics' found available, installed or updated"
    ]
}

MSG:

No package matching 'pcp-pmda-openmetrics' found available, installed or updated

is spark supported on el7? perhaps the package name is different?

@natoscott
Copy link
Collaborator Author

@richm thanks Rich, I'll get back to this when some other fires are put out ... there's no pcp-pmda-openmetrics on earlier RHEL releases, simplest approach here will be to add some distro version checks I expect.

@richm
Copy link
Collaborator

richm commented May 2, 2025

@richm thanks Rich, I'll get back to this when some other fires are put out ... there's no pcp-pmda-openmetrics on earlier RHEL releases, simplest approach here will be to add some distro version checks I expect.

performancecopilot/ansible-pcp#81

richm and others added 6 commits May 5, 2025 08:13
tox-lsr 3.6.0 will guarantee order of qemu test execution, which should
help make tests reproducible and help debug test failures.

Improve qemu test logging - this will help debug the qemu test
failures.

Signed-off-by: Rich Megginson <[email protected]>
These tests are problematic in github qemu tests, and that
functionality (scsi, anyway) in the testing farm integration
tests.

Yes, we should have a way to provide tags on a per-role basis . . .

Signed-off-by: Rich Megginson <[email protected]>
Bumps [sclorg/testing-farm-as-github-action](https://github.com/sclorg/testing-farm-as-github-action) from 3 to 4.
- [Release notes](https://github.com/sclorg/testing-farm-as-github-action/releases)
- [Commits](sclorg/testing-farm-as-github-action@v3...v4)

---
updated-dependencies:
- dependency-name: sclorg/testing-farm-as-github-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
…from 3921d49..dfb6a7d

dfb6a7d Merge branch 'richm-no-openmetrics-el7'
35b3553 fix: no pcp-pmda-openmetrics on EL7

git-subtree-dir: vendor/github.com/performancecopilot/ansible-pcp
git-subtree-split: dfb6a7df166cc68372da44145854e5c830fb252e
Signed-off-by: Rich Megginson <[email protected]>
@richm
Copy link
Collaborator

richm commented May 5, 2025

[citest]

@richm
Copy link
Collaborator

richm commented May 5, 2025

@natoscott any idea about the fedora 42 failure?

  TASK [Check if OpenMetrics PMDA has Spark metrics registered] ******************
  task path: /home/runner/work/metrics/metrics/tests/check_from_spark.yml:3
  Monday 05 May 2025  22:25:25 +0000 (0:00:00.022)       0:00:29.790 ************ 
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (10 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (9 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (8 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (7 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (6 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (5 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (4 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (3 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (2 retries left).
  FAILED - RETRYING: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (1 retries left).
  fatal: [/home/runner/.cache/linux-system-roles/fedora-42.qcow2]: FAILED! => {
      "attempts": 10,
      "changed": false,
      "cmd": [
          "pmprobe",
          "-I",
          "openmetrics.control.status"
      ],
      "delta": "0:00:00.007049",
      "end": "2025-05-05 22:25:39.459070",
      "rc": 0,
      "start": "2025-05-05 22:25:39.452021"
  }
  
  STDOUT:
  
  openmetrics.control.status -12357 Unknown metric name

@natoscott
Copy link
Collaborator Author

@richm no, it doesn't fail like this locally so a bit of a mystery. I guess we'll need to capture /var/log/pcp/pmcd/{pmcd,openmetrics}.log contents to discover further details.

@richm
Copy link
Collaborator

richm commented May 6, 2025

@richm no, it doesn't fail like this locally so a bit of a mystery. I guess we'll need to capture /var/log/pcp/pmcd/{pmcd,openmetrics}.log contents to discover further details.

I am able to reproduce locally:

TASK [Check if OpenMetrics PMDA has Spark metrics registered] ******************
task path: /home/rmeggins/linux-system-roles/metrics/tests/check_from_spark.yml:3
Tuesday 06 May 2025  06:44:56 -0600 (0:00:00.027)       0:00:59.234 ***********
FAILED - RETRYING: [/home/rmeggins/.cache/linux-system-roles/fedora-42.qcow2]: Check if OpenMetrics PMDA has Spark metrics registered (10 retries left).
...
fatal: [/home/rmeggins/.cache/linux-system-roles/fedora-42.qcow2]: FAILED! => {
    "attempts": 10,
    "changed": false,
    "cmd": [
        "pmprobe",
        "-I",
        "openmetrics.control.status"
    ],
    "delta": "0:00:00.004852",
    "end": "2025-05-06 12:45:09.684370",
    "rc": 0,
    "start": "2025-05-06 12:45:09.679518"
}

STDOUT:

openmetrics.control.status -12357 Unknown metric name

I see this in the journal:

May 06 12:44:55 localhost pmcd[12062]: [Tue May  6 12:44:55] pmdaopenmetrics(12062) Error: cannot read /var/lib/pcp/pmdas/openmetrics/config.d/spark.url: [Errno 13] Permission denied: '/var/lib/pcp/pmdas/openmetrics/config.d/spark.url'
...
May 06 12:44:55 localhost pmcd[12139]: [Tue May  6 12:44:55] pmdaopenmetrics(12139) Error: cannot read /var/lib/pcp/pmdas/openmetrics/config.d/spark.url: [Errno 13] Permission denied: '/var/lib/pcp/pmdas/openmetrics/config.d/spark.url'
...
May 06 12:44:55 localhost audit[12167]: AVC avc:  denied  { open } for  pid=12167 comm="rc" path="/var/tmp/pmlogger_rc.PJ6ntRcBG/pcp.env.path" dev="vda4" ino=10074 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:tmp_t:s0 tclass=file permissive=0
May 06 12:44:55 localhost audit[12167]: AVC avc:  denied  { open } for  pid=12167 comm="rc" path="/var/tmp/pmlogger_rc.PJ6ntRcBG/pcp.env.path" dev="vda4" ino=10074 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:tmp_t:s0 tclass=file permissive=0
May 06 12:44:55 localhost rc[12167]: /etc/pcp.env: line 150: /var/tmp/pmlogger_rc.PJ6ntRcBG/pcp.env.path: Permission denied
May 06 12:44:55 localhost audit[12167]: AVC avc:  denied  { open } for  pid=12167 comm="rc" path="/var/tmp/pmlogger_rc.PJ6ntRcBG/pcp.env.path" dev="vda4" ino=10074 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:tmp_t:s0 tclass=file permissive=0
May 06 12:44:55 localhost audit[12167]: AVC avc:  denied  { open } for  pid=12167 comm="rc" path="/var/tmp/pmlogger_rc.PJ6ntRcBG/pcp.env.path" dev="vda4" ino=10074 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:tmp_t:s0 tclass=file permissive=0
...

so I'm assuming it is selinux related

@richm
Copy link
Collaborator

richm commented May 6, 2025

so I'm assuming it is selinux related

yep - disabling selinux makes the test pass

@richm
Copy link
Collaborator

richm commented May 6, 2025

@natoscott let's merge this PR, and work on the selinux policy separately

@richm richm changed the title fix: add missing Spark import/export support for metrics feat: add missing Spark import/export support for metrics May 6, 2025
@richm richm merged commit d3d2fb5 into linux-system-roles:main May 6, 2025
20 of 21 checks passed
@natoscott
Copy link
Collaborator Author

@richm OK, great - thanks for testing & merging Rich - I'll dig into that selinux failure.

@natoscott natoscott deleted the spark-support branch May 7, 2025 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants