Skip to content

Katalyst-colocation-orm can be installed on enhanced-k8s cluster but katalyst-colocation cannot be installed #617

@ozline

Description

@ozline

What happened?

I followed Colocate your application using Katalyst to install Katalyst.

It mentioned that if you use Kubewharf enhanced kubernetes, install katalyst-colocation

And if you use vanilla kubernetes, install katalyst-colocation-orm

My node follows Install Kubewharf enhanced-k8s to install enhanced k8s, but only katalyst-colocation-orm can be installed instead of katalyst-colocation

If I install katalyst-colocation, it will report the following error in katalyst-colocation-agent

I0610 13:10:27.641756       1 state_checkpoint.go:121] "[cpu_plugin] State checkpoint: restored state from checkpoint"
I0610 13:10:27.641777       1 util.go:68] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] get reservedQuantityInt: 0 from ReservedCPUCores configuration
I0610 13:10:27.641787       1 util.go:77] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] take reservedCPUs:  by reservedCPUsNum: 0
I0610 13:10:27.641832       1 policy.go:950] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).cleanPools] there is no pool to delete
I0610 13:10:27.641842       1 policy.go:964] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReservePool] initReservePool reserve:
I0610 13:10:27.641859       1 state_mem.go:109] "[cpu_plugin] updated cpu plugin pod entries" podUID="reserve" containerName="" allocationInfo="{\"pod_uid\":\"reserve\",\"owner_pool_name\":\"reserve\",\"allocation_result\":\"\",\"original_allocation_result\":\"\",\"topology_aware_assignments\":{},\"original_topology_aware_assignments\":{},\"init_timestamp\":\"\",\"labels\":null,\"annotations\":null,\"qosLevel\":\"\"}"
I0610 13:10:27.644274       1 policy.go:1039] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReclaimPool] exist initial reclaim: 0-9
I0610 13:10:27.644300       1 agent.go:102] needToRun "qrm_cpu_plugin"
I0610 13:10:27.644308       1 agent.go:91] initializing "qrm_io_plugin"
I0610 13:10:27.644320       1 agent.go:102] needToRun "qrm_io_plugin"
I0610 13:10:27.644325       1 agent.go:91] initializing "qrm_network_plugin"
W0610 13:10:27.644335       1 util.go:122] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.filterNICsByAvailability] nic: eno1 doesn't have IP address
I0610 13:10:27.644344       1 util.go:302] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.getReservedBandwidth] reservedBanwidth: 0, nicCount: 1, policy: first,
I0610 13:10:27.644361       1 state_net.go:47] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.NewNetworkPluginState] initializing new network plugin in-memory state store"
I0610 13:10:27.644372       1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644511       1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644531       1 state_net.go:121] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetMachineState] updated network plugin machine state" NICMap="{\"wlp2s0\":{\"egress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"ingress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"pod_entries\":{}}}"
I0610 13:10:27.644543       1 state_net.go:145] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetPodEntries] updated network plugin pod resource entries" podEntries="{}"
I0610 13:10:27.644555       1 state_checkpoint.go:136] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*stateCheckpoint).restoreState] state checkpoint: restored state from checkpoint"
I0610 13:10:27.644572       1 policy.go:177] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.(*StaticPolicy).ApplyConfig] apply configs, qosLevelToNetClassMap: map[dedicated_cores:0 reclaimed_cores:0 shared_cores:0 system_cores:0], podLevelNetClassAnnoKey: katalyst.kubewharf.io/net_class_id, podLevelNetAttributesAnnoKeys: []
I0610 13:10:27.644581       1 agent.go:102] needToRun "qrm_network_plugin"
I0610 13:10:27.644588       1 agent.go:91] initializing "periodical-handler-manager"
I0610 13:10:27.644593       1 agent.go:102] needToRun "periodical-handler-manager"
I0610 13:10:27.644600       1 agent.go:91] initializing "katalyst-agent-orm"
I0610 13:10:27.644631       1 manager.go:86] "Creating topology manager with policy per scope" topologyPolicyName=""
E0610 13:10:27.644640       1 manager.go:129] unknown policy: ""
E0610 13:10:27.644647       1 agent.go:94] Error initializing "katalyst-agent-orm"
I0610 13:10:27.644662       1 file.go:257] [GetUniqueLock] release lock successfully
I0610 13:10:28.396105       1 file.go:90] fsNotify watcher notify "/var/lib/kubelet/resource-plugins/kubelet_qrm_checkpoint": CREATE
I0610 13:10:28.396155       1 topology_adapter.go:281] qrm state file changed, notify to update topology status
I0610 13:10:28.396166       1 kubeletplugin.go:177] send topology change notification to plugin kubelet-reporter-plugin
run command error: failed to init ORM: unknown policy: ""

Only katalyst-agent not working

root@debian-node-1:~# kubectl get pods -n katalyst-system
NAME                                                       READY   STATUS             RESTARTS      AGE
katalyst-colocation-katalyst-agent-f5glx                   0/1     CrashLoopBackOff   4 (36s ago)   2m32s
katalyst-colocation-katalyst-agent-jzgft                   0/1     CrashLoopBackOff   4 (52s ago)   2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-jcn9m   1/1     Running            0             2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-vpjvq   1/1     Running            0             2m32s
katalyst-colocation-katalyst-metric-85c47ff4bf-nl9sf       1/1     Running            0             2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-8mszz    1/1     Running            0             2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-c27qc    1/1     Running            0             2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-ngz2x       1/1     Running            0             2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-vrnzs       1/1     Running            0             2m32s

But install katalyst-colocation-orm in Kubewharf enhanced kubernetes work fine(pod status of agent is Running

What did you expect to happen?

install katalyst-colocation in KubeWharf-enhanced-kubernetes work fine

How can we reproduce it (as minimally and precisely as possible)?

Install katalyst-colocation using helm after installing KubeWharf-enhanced-kubernetes

helm install katalyst-colocation -n katalyst-system --create-namespace kubewharf/katalyst-colocation

Software version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions