Skip to content

Monitoring#7

Merged
phuchoang2603 merged 4 commits into
mainfrom
monitoring
Jun 11, 2025
Merged

Monitoring#7
phuchoang2603 merged 4 commits into
mainfrom
monitoring

Conversation

@phuchoang2603
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings June 11, 2025 17:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes outdated monitoring configurations and Docker compose definitions while introducing new configuration files for Tempo, Prometheus, Loki, and Grafana provisioning along with updates to application tracing and deployment settings. Key changes include:

  • Removal of legacy configuration files and dashboard definitions.
  • Addition of new configuration files for Tempo, Prometheus, Loki, and Grafana datasources/dashboards.
  • Updates in the application's tracing setup and minor runtime tweaks in the Dockerfile and CI workflow.

Reviewed Changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated no comments.

Show a summary per file
File Description
deployments/docker/config/loki-config.yml Removed legacy Loki configuration.
deployments/docker/config/grafana/provisioning/datasources/datasources.yml Removed old Grafana datasource configuration.
deployments/docker/config/grafana/dashboards/hpp.json Removed legacy dashboard for House Price Prediction API.
deployments/docker/config/alertmanager.yml Removed Alertmanager configuration.
deployments/docker/compose.yaml Removed legacy Docker Compose services definitions.
config/tempo-config.yaml Added new Tempo configuration for tracing support.
config/prometheus/prometheus.yml Added new Prometheus configuration.
config/loki-config.yml Added updated Loki configuration.
config/grafana/provisioning/datasources/datasources.yml Added updated Grafana datasource provisioning config.
config/grafana/provisioning/dashboards/prometheus-dashboard.yml Updated dashboard provider name and settings.
config/config.alloy Added Alloy configuration for endpoints, metrics, and tracing.
app/utils/tracing_config.py Updated tracing config to export traces to Grafana Alloy via OTLP.
app/main.py Added tracing instrumentation and enriched log and span metadata.
Dockerfile Updated CMD with added flag for uvicorn logging.
.github/workflows/release.yml Updated Dockerfile path for the release workflow.
Comments suppressed due to low confidence (1)

app/main.py:92

  • The 'traceable' decorator is used in this file but not imported. Please import it from 'app/utils/tracing_config.py' to ensure proper tracing functionality.
@traceable

@phuchoang2603 phuchoang2603 merged commit 1c73ebf into main Jun 11, 2025
@phuchoang2603 phuchoang2603 deleted the monitoring branch June 11, 2025 17:19
phuchoang2603 added a commit that referenced this pull request Jun 15, 2025
* restructure

* replace node-exporter, promtail with alloy

* deploy tempo for tracing

* ci: update Helm values.yaml and Chart.yaml to ghcr.io/phuchoang2603/realtime-credit-card-fraud-detection:v1.0.0

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
phuchoang2603 added a commit that referenced this pull request Jun 17, 2025
* init commit

* add notebooks and data

* Colab (#1)

* add link to google colab

* update structure

* update jupyter-lab-docker

* Refactor notebook structure (#2)

* reconstruct structure folder

* update report on chapter

* update reports document (#3)

* reconstruct structure folder

* update report on chapter

* update report

* export decision tree model and notebook (#4)

* fix grammar on notebooks (#5)

* export decision tree model and notebook

* fix grammar

* Fast api app (#6)

* first commit on python fast api app

* deploy test-client

* first success test

however, need to implement custom rules and additional features to be
considered

* feat: add precondition rules to block suspicious customer and terminal

* success docker deploy of both test and api

* Monitoring (#7)

* restructure

* replace node-exporter, promtail with alloy

* deploy tempo for tracing

* ci: update Helm values.yaml and Chart.yaml to ghcr.io/phuchoang2603/realtime-credit-card-fraud-detection:v1.0.0

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Unit test (#8)

* add unit test requirements

* turn off tracing when running test

* update workflow

* K8s monitoring (#9)

* fix enviroment variable

* update helm chart

* bump version to 1.0.1

* ci: update Helm values.yaml and Chart.yaml to ghcr.io/phuchoang2603/realtime-credit-card-fraud-detection:v1.0.1

* add traefik and simplifies helm chart

* add alloy argo cd test

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Monitoring configuration (#10)

* add logging stack: alloy, loki, grafana

* minor fix before argo-cd app-of-apps

* Argo cd (#11)

* argo-cd app of apps

* fix alloy mount docker container

* fix loki dns service to use rke2

* fix grafana pvc

* revert to use pvc

* add traefik and cert manager argocd app of apps

* fix cert-manager metrics expose

* update repo

* update traefik load balancer ip

* add prometheus chart

* add tempo and finalize, i guess

* wrong tempo path

* try to remove loki resources config

* pls works

* move ingress route into unify location

* in the end, still came back to kube-prometheus-stack, but only for grafana and prometheus

* fix spacing

* fix loki

* move ingressroute config to another place

* fix typo

* fix typo #2

* bump traefik version

* why does this keep happening to me

* really angry

* adapt url

* reduce workload by disable prometheus operator

* reorder  file structure

* remove foreground cascading deletion

* try alloy receiver port 12345

* add alloy ingress route

* update url

* use metric from prometheus insteaed

* fix typo

* disable tls for tempo

* try to disable tls

* change cluster Ip to load balancer

* revert back to default to ensure security

* change target revison to main

* bump to 1.0.2

* update document for deployment (#12)

* rename to argo apps (#13)

* fix workflow (#14)

* fix workflow

* bump to 1.0.3

* ci: update Helm values.yaml and Chart.yaml to ghcr.io/phuchoang2603/realtime-credit-card-fraud-detection:v1.0.3

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Cloud (#15)

* add finalizer

* change manifest apps to proxmox

* add gke terraform

* separate proxmox and cloud

* update docs

* update value

* update acme tls server

* fix acme server

* drop tls verify on test client

* bump to v1.0.4

* ci: update Helm values.yaml and Chart.yaml to ghcr.io/phuchoang2603/realtime-credit-card-fraud-detection:v1.0.4

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Documents (#16)

* change namespace

* add README

* change target revision temporarily

* update docs

* change to main

* update README

* bump to 1.0.5

* format Dockerfile

* fix alloy endpoint

* refactor: separate prod and dev requirements.txt

* refactor: move fixture into separate file

* simulate await api calling for pre_condition_checks

* ci: add step check code coverage

* rename helm-chart to helm-charts

* add helm chart for traefik and cert-manager, group into api-gateway namespace

* docs: change name and graph

* now using lifespan event, simpler test client too

* ci: experiement remove uv pip install

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants