Skip to content

Commit e24afaa

Browse files
authored
add release note for v1.6 (#178)
add release note for v1.6
1 parent c25887f commit e24afaa

1 file changed

Lines changed: 58 additions & 0 deletions

File tree

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
slug: release-ltp-v1.6
3+
title: Releasing Lucia Training Platform v1.6
4+
author: Lucia Training Platform Team
5+
tags: [ltp, announcement, release]
6+
---
7+
8+
We are pleased to announce the official release of **Lucia Training Platform v1.6.0**!
9+
10+
## Lucia Training Platform v1.6.0 Release Notes
11+
12+
This release focuses on security hardening, Docker image optimization, infrastructure upgrades, and bug fixes across the platform.
13+
14+
## Platform Features & Bug Fixes
15+
- Upgraded webportal to Node.js 24 and removed the separate webportal-dind service — webportal now runs directly without Docker-in-Docker, simplifying deployment and reducing image size
16+
- Fixed job-detail page error handling for permission denied errors — now shows a clear message instead of infinite loading
17+
- Fixed job YAML and output log display issues on the webportal
18+
- Added support for tagging different types of GPUs
19+
- Skipped validation job submission for CPU nodes
20+
- Made Prometheus retention size configurable per service to prevent disk full issues
21+
- Added tool to preserve application tokens when revoking all tokens
22+
- Removed cronjob of abnormal-detector when stopping the service
23+
- Fixed exception when no name exists in filter
24+
25+
## Docker Image Optimization
26+
- Reduced Docker image sizes for cluster-local-storage, copilot-chat, dashboard-data-backup, utilization-reporter, abnormal-detector, cert-expiration-checker, cluster-utilization, reverse-proxy, and model-proxy
27+
- Upgraded metrics-cleaner base image from Python 3.7 to 3.12-slim
28+
- Cleaned up job-exporter Docker image
29+
30+
## Infrastructure & Networking
31+
- Updated Cilium from 1.18.6 to 1.18.9
32+
- Updated Go version to 1.25 across all Go-based components
33+
- Homebrew build for kube-scheduler and Grafana container images
34+
- Downgraded kube-scheduler version to match service Kubernetes version
35+
- Added IPoIB subnet route in init.sh to fix InfiniBand TCP connectivity on NetworkManager-managed nodes
36+
- Fixed DNS problem for cluster-local-storage
37+
- Fixed zlib 1.3.1 missing issue for pylon
38+
- Added Managed Identity support for build scripts
39+
- Made imagePullSecrets conditional to eliminate FailedToRetrieveImagePullSecret warnings
40+
- Removed secret deployment for image pull in favor of ACR credentials
41+
42+
## Alert Manager & Node Management
43+
- Fixed KeyError when alert-parser processes validating nodes with no alerts
44+
- Downgraded hardware issues without Azure FaultCode to triaged_unknown to avoid broken OFR pipeline
45+
- Prevented node-recycler from submitting duplicate OFR tickets for the same node
46+
- Skipped classification for cordoned nodes with empty NodeId to prevent OFR pipeline stalling
47+
48+
## Security
49+
- Updated Go toolchain and packages across all Go-based services
50+
- Updated Node.js packages for rest-server, alert-handler, job-status-change-notification, database-controller, and webportal
51+
- Updated Python packages for copilot-chat
52+
- Fixed S360 vulnerabilities across 13 container images including openssl, axios, follow-redirects, lodash, nodemailer, and minimatch
53+
- Updated go-ntlmssp to 0.1.1 for reverse proxy
54+
- Updated k8s-rdma-shared-dev-plugin to adapt to latest gRPC package
55+
56+
## CI/CD
57+
- Updated CI workflow to filter dev-box from changed services detection
58+
- Removed all existing statefulsets in the system during cleanup instead of only config-defined ones

0 commit comments

Comments
 (0)