-
Notifications
You must be signed in to change notification settings - Fork 122
Katalyst-colocation-orm can be installed on enhanced-k8s cluster but katalyst-colocation cannot be installed #617
Description
What happened?
I followed Colocate your application using Katalyst to install Katalyst.
It mentioned that if you use Kubewharf enhanced kubernetes, install katalyst-colocation
And if you use vanilla kubernetes, install katalyst-colocation-orm
My node follows Install Kubewharf enhanced-k8s to install enhanced k8s, but only katalyst-colocation-orm can be installed instead of katalyst-colocation
If I install katalyst-colocation, it will report the following error in katalyst-colocation-agent
I0610 13:10:27.641756 1 state_checkpoint.go:121] "[cpu_plugin] State checkpoint: restored state from checkpoint"
I0610 13:10:27.641777 1 util.go:68] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] get reservedQuantityInt: 0 from ReservedCPUCores configuration
I0610 13:10:27.641787 1 util.go:77] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] take reservedCPUs: by reservedCPUsNum: 0
I0610 13:10:27.641832 1 policy.go:950] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).cleanPools] there is no pool to delete
I0610 13:10:27.641842 1 policy.go:964] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReservePool] initReservePool reserve:
I0610 13:10:27.641859 1 state_mem.go:109] "[cpu_plugin] updated cpu plugin pod entries" podUID="reserve" containerName="" allocationInfo="{\"pod_uid\":\"reserve\",\"owner_pool_name\":\"reserve\",\"allocation_result\":\"\",\"original_allocation_result\":\"\",\"topology_aware_assignments\":{},\"original_topology_aware_assignments\":{},\"init_timestamp\":\"\",\"labels\":null,\"annotations\":null,\"qosLevel\":\"\"}"
I0610 13:10:27.644274 1 policy.go:1039] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReclaimPool] exist initial reclaim: 0-9
I0610 13:10:27.644300 1 agent.go:102] needToRun "qrm_cpu_plugin"
I0610 13:10:27.644308 1 agent.go:91] initializing "qrm_io_plugin"
I0610 13:10:27.644320 1 agent.go:102] needToRun "qrm_io_plugin"
I0610 13:10:27.644325 1 agent.go:91] initializing "qrm_network_plugin"
W0610 13:10:27.644335 1 util.go:122] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.filterNICsByAvailability] nic: eno1 doesn't have IP address
I0610 13:10:27.644344 1 util.go:302] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.getReservedBandwidth] reservedBanwidth: 0, nicCount: 1, policy: first,
I0610 13:10:27.644361 1 state_net.go:47] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.NewNetworkPluginState] initializing new network plugin in-memory state store"
I0610 13:10:27.644372 1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644511 1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644531 1 state_net.go:121] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetMachineState] updated network plugin machine state" NICMap="{\"wlp2s0\":{\"egress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"ingress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"pod_entries\":{}}}"
I0610 13:10:27.644543 1 state_net.go:145] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetPodEntries] updated network plugin pod resource entries" podEntries="{}"
I0610 13:10:27.644555 1 state_checkpoint.go:136] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*stateCheckpoint).restoreState] state checkpoint: restored state from checkpoint"
I0610 13:10:27.644572 1 policy.go:177] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.(*StaticPolicy).ApplyConfig] apply configs, qosLevelToNetClassMap: map[dedicated_cores:0 reclaimed_cores:0 shared_cores:0 system_cores:0], podLevelNetClassAnnoKey: katalyst.kubewharf.io/net_class_id, podLevelNetAttributesAnnoKeys: []
I0610 13:10:27.644581 1 agent.go:102] needToRun "qrm_network_plugin"
I0610 13:10:27.644588 1 agent.go:91] initializing "periodical-handler-manager"
I0610 13:10:27.644593 1 agent.go:102] needToRun "periodical-handler-manager"
I0610 13:10:27.644600 1 agent.go:91] initializing "katalyst-agent-orm"
I0610 13:10:27.644631 1 manager.go:86] "Creating topology manager with policy per scope" topologyPolicyName=""
E0610 13:10:27.644640 1 manager.go:129] unknown policy: ""
E0610 13:10:27.644647 1 agent.go:94] Error initializing "katalyst-agent-orm"
I0610 13:10:27.644662 1 file.go:257] [GetUniqueLock] release lock successfully
I0610 13:10:28.396105 1 file.go:90] fsNotify watcher notify "/var/lib/kubelet/resource-plugins/kubelet_qrm_checkpoint": CREATE
I0610 13:10:28.396155 1 topology_adapter.go:281] qrm state file changed, notify to update topology status
I0610 13:10:28.396166 1 kubeletplugin.go:177] send topology change notification to plugin kubelet-reporter-plugin
run command error: failed to init ORM: unknown policy: ""
Only katalyst-agent not working
root@debian-node-1:~# kubectl get pods -n katalyst-system
NAME READY STATUS RESTARTS AGE
katalyst-colocation-katalyst-agent-f5glx 0/1 CrashLoopBackOff 4 (36s ago) 2m32s
katalyst-colocation-katalyst-agent-jzgft 0/1 CrashLoopBackOff 4 (52s ago) 2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-jcn9m 1/1 Running 0 2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-vpjvq 1/1 Running 0 2m32s
katalyst-colocation-katalyst-metric-85c47ff4bf-nl9sf 1/1 Running 0 2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-8mszz 1/1 Running 0 2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-c27qc 1/1 Running 0 2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-ngz2x 1/1 Running 0 2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-vrnzs 1/1 Running 0 2m32s
But install katalyst-colocation-orm in Kubewharf enhanced kubernetes work fine(pod status of agent is Running)
What did you expect to happen?
install katalyst-colocation in KubeWharf-enhanced-kubernetes work fine
How can we reproduce it (as minimally and precisely as possible)?
Install katalyst-colocation using helm after installing KubeWharf-enhanced-kubernetes
helm install katalyst-colocation -n katalyst-system --create-namespace kubewharf/katalyst-colocation
Software version
No response