Add MCP Tools for OVS layer#13
Conversation
cmd/ovnk-mcp-server/main.go
Outdated
| k8sMcpServer, err := kubernetesmcp.NewMCPServer(serverCfg.Kubernetes) | ||
| clientSet, err := client.NewOVNKMCPServerClientSet(serverCfg.Kubernetes) |
There was a problem hiding this comment.
@pperiyasamy When I worked on the k8s layer, the idea was to provide only the necessary tools to the other layers and not expose the entire k8s client. The tools that are added in the k8s mcp package should be enough for the other layers like OVS, OVN, etc. to interact with the k8s cluster.
There was a problem hiding this comment.
I think the tools in pkg/kubernetes/mcp/pods.go, specifically ExecPod, should be enough for this layer
There was a problem hiding this comment.
ok @arkadeepsen , i didn't know this. good idea. changed the ovs implementation accordingly.
pkg/kubernetes/client/client.go
Outdated
| // OVNKMCPServerClientSet is a client set for the OVN Kubernetes MCP server. | ||
| type OVNKMCPServerClientSet struct { | ||
| clientSet kubernetes.Interface | ||
| ClientSet kubernetes.Interface |
There was a problem hiding this comment.
As I mentioned earlier, the k8s client is not exported because the other layers can simply use the ExecPod tool.
| } | ||
|
|
||
| // AddTools registers OVS tools with the MCP server | ||
| func (s *MCPServer) AddTools(server *mcp.Server) { |
There was a problem hiding this comment.
There are other tools also mentioned in the OKEP for the OVS layer. I think those are also needed to be added as tools.
There was a problem hiding this comment.
it has relevant ovs tools now except retis collect because ovnkube-node pod doesn't have retis installed with it, can we rely on any other pod to implement this ? or are we ok to ignore this for now ?
There was a problem hiding this comment.
We can skip retis for now. As the collect might need an extended period of running it which we are not currently supporting in pod exec or node debug. Also this tool might be a better fit for external tools rather than OVS layer. WDYT?
There was a problem hiding this comment.
yes, right, let's do it separately for retis.
pkg/ovs/mcp/mcp.go
Outdated
| if len(ovsInstance.Containers) > 0 { | ||
| // Get list of bridges | ||
| bridgeNames, err := s.runOVSCommand(ctx, pod.Name, in.Namespace, | ||
| ovsInstance.Containers[0], []string{"ovs-vsctl", "list-br"}) |
There was a problem hiding this comment.
Is it always expected that whichever container gets chosen here, will be able to execute the command? If so, a comment here explaining it would be good.
There was a problem hiding this comment.
container dependency is not needed, removed it.
pkg/ovs/mcp/mcp.go
Outdated
| } | ||
|
|
||
| // ListOVSInstances lists running OVS instances. | ||
| func (s *MCPServer) ListOVSInstances(ctx context.Context, req *mcp.CallToolRequest, |
There was a problem hiding this comment.
Do you think we can simply provide tools for ovs-vsctl list-br, ovs-vsctl list-ports, ovs-vsctl list-ifaces, and ovs-ofctl dump-flows? The LLM can call these tools indivudally and would be able to get the results.
The OKEP actually talks about providing tools for running the various list, show and dump commands for the OVS layer. Here we are doing more than that.
There was a problem hiding this comment.
ok, you are correct. ovs layer needn't not have complex logic, let LLM correlate it accordingly. removed ListOVSInstances tool now from the implementation.
|
Hey @pperiyasamy, I have added a few comments. PTAL. |
8f64961 to
496eedc
Compare
arkadeepsen
left a comment
There was a problem hiding this comment.
The list of tools looks good for now. We may have to revisit them for adding support for other read-only commands if they are needed for debugging.
pkg/ovs/mcp/mcp.go
Outdated
| func NewMCPServer(k8sMcpServer *kubernetesmcp.MCPServer) (*MCPServer, error) { | ||
| return &MCPServer{ | ||
| k8sMcpServer: k8sMcpServer, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
Here error returned is always nil. Do we need the error as return type?
| } | ||
|
|
||
| // AddTools registers OVS tools with the MCP server | ||
| func (s *MCPServer) AddTools(server *mcp.Server) { |
There was a problem hiding this comment.
We can skip retis for now. As the collect might need an extended period of running it which we are not currently supporting in pod exec or node debug. Also this tool might be a better fit for external tools rather than OVS layer. WDYT?
pkg/ovs/mcp/mcp.go
Outdated
| Runs 'ovs-vsctl list-br' command and returns the names of all configured bridges. | ||
|
|
||
| Parameters: | ||
| - namespace (optional): Kubernetes namespace of the OVS pod |
There was a problem hiding this comment.
This might not be correct as namespace is needed for running the exec pod command.
There was a problem hiding this comment.
This applies to all the tools.
There was a problem hiding this comment.
right, changed it.
| // validateBridgeName validates that a bridge name is safe and non-empty. | ||
| // Bridge names should only contain alphanumeric characters, hyphens, and underscores. | ||
| func validateBridgeName(bridge string) error { | ||
| if bridge == "" { | ||
| return fmt.Errorf("bridge name cannot be empty") | ||
| } | ||
|
|
||
| // OVS bridge names typically follow naming conventions: alphanumeric, hyphens, underscores | ||
| validBridgeName := regexp.MustCompile(`^[a-zA-Z0-9_-]+$`) | ||
| if !validBridgeName.MatchString(bridge) { | ||
| return fmt.Errorf("invalid bridge name %q: must contain only alphanumeric characters, hyphens, and underscores", bridge) | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| // validateFlowSpec validates that a flow specification is safe and non-empty. | ||
| func validateFlowSpec(flow string) error { | ||
| if flow == "" { | ||
| return fmt.Errorf("flow specification cannot be empty") | ||
| } | ||
|
|
||
| // Check for potentially dangerous characters that shouldn't appear in flow specs | ||
| // Flow specs should contain: alphanumeric, commas, equals, colons, periods, slashes, parentheses, brackets | ||
| // We explicitly block: semicolons, pipes, backticks, dollar signs, and other shell metacharacters | ||
| dangerousChars := regexp.MustCompile(`[;&|$` + "`" + `<>\\]`) | ||
| if dangerousChars.MatchString(flow) { | ||
| return fmt.Errorf("invalid flow specification: contains potentially dangerous characters") | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Nice validations to prevent command injection.
| } | ||
|
|
||
| // Limit to MaxLines if specified | ||
| flows = limitLines(flows, in.MaxLines) |
There was a problem hiding this comment.
Shold we validate MaxLines is not negative? Since providing zero allows the option to get all the lines from the command execution, does it make sense to not allow negative values at all?
There was a problem hiding this comment.
This applies to the other tools with MaxLines option.
There was a problem hiding this comment.
sure, now restricting output with only 100 lines when in.MaxLines <= 0, i think that make sense (as same as sos commands).
| if len(in.AdditionalParams) > 0 { | ||
| cmd = append(cmd, in.AdditionalParams...) | ||
| } |
There was a problem hiding this comment.
Here the AdditionalParams are not validated which could result in command injection.
|
@arghosh93 will also review this PR |
|
|
||
| func (s *MCPServer) runCommand(ctx context.Context, req *mcp.CallToolRequest, namespacedName k8stypes.NamespacedNameParams, | ||
| commands []string) ([]string, error) { | ||
| _, result, err := s.k8sMcpServer.ExecPod(ctx, req, k8stypes.ExecPodParams{NamespacedNameParams: namespacedName, Command: commands}) |
There was a problem hiding this comment.
The container name is not taken as a parameter for the different ovs commands. Will it always be able to get the result from the 1st container in the pod?
There was a problem hiding this comment.
yes, container name is not passed from here, seems like mcp pod client ensures it always run on the 1st container https://github.com/ovn-kubernetes/ovn-kubernetes-mcp/blob/main/pkg/kubernetes/client/pods.go#L51.
There was a problem hiding this comment.
That's correct that the k8s layer is always choosing the 1st container, but the question is that will it always work? What I mean is that will any of the container be able to provide the response or should container name be an optional parameter for the LLM to provide?
There was a problem hiding this comment.
ah! ok, we could run ovs commands on any container for ovnkube-node pod, so i don't see a need for taking container name as an optional parameter for now.
This commit introduces new MCP tools for inspecting and debugging Open vSwitch (OVS) in ovn-kubernetes clusters: - ovs-list-br: List all OVS bridges on a pod - ovs-list-ports: List all ports attached to an OVS bridge - ovs-list-ifaces: List all interfaces attached to an OVS bridge - ovs-vsctl-show: Display comprehensive OVS configuration overview - ovs-ofctl-dump-flows: Dump and analyze OpenFlow rules from OVS bridges with optional filtering by pattern - ovs-appctl-dump-conntrack: Inspect connection tracking (conntrack) entries from the OVS datapath to debug stateful firewall rules and NAT - ovs-appctl-ofproto-trace: Simulate packet processing through the OpenFlow pipeline to troubleshoot flow rules and forwarding decisions Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
|
|
||
| mcp.AddTool(server, | ||
| &mcp.Tool{ | ||
| Name: "ovs-list-ports", |
There was a problem hiding this comment.
I think we are adding too many tools. We may end up having too many tools if we create separate tools for all kind of list operations.
In my opinion we can have 3 sets of tools. one each for ovs-vsctl, ovs-ofctl and ovs-appctl. Then we can pass separate commands to list or dump separate ovs entities. In this way, adding more capabilities would be easier in the future.
There was a problem hiding this comment.
these tools are implemented as per design: https://github.com/ovn-kubernetes/ovn-kubernetes/blob/master/docs/okeps/okep-5494-ovn-kubernetes-mcp-server.md#example-workflow-for-an-end-user. I think it doesn't harm anything, but if there is a need for minimizing these commands, we can revisit it later.
| Runs 'ovs-vsctl list-br' command and returns the names of all configured bridges. | ||
|
|
||
| Parameters: | ||
| - namespace: Kubernetes namespace of the OVS pod |
There was a problem hiding this comment.
I think instead of taking namespace and name of the POD, we can take name of the node as parameter. Then we can exec to the OVN POD running on that node and extract OVS data.
relying on name and namespace might need use of another tool in future to get name of the ovs POD for a certain node.
There was a problem hiding this comment.
This knowledge can be simply built within LLM Model itself and able to to run ovs command on the desired pod. so not seeing a necessity for taking node name as parameter.
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if result.Stderr != "" { |
There was a problem hiding this comment.
any kind of warnings may also be considered as error returned by the command. I think filtering warnings is a good idea.
There was a problem hiding this comment.
do you mean warning messages from result.Stderr ? can you explain it a bit more ?
There was a problem hiding this comment.
I hit an issue while working on a PR for kernel layer where CLI tool was returning a harmless warning along with stdout. But since the warning was being received by stderr, the whole function execution was failing and the LLM was not returning any data.
There was a problem hiding this comment.
ok @arghosh93 , i saw those handling with kernel layer, but i don't see that is happening with ovs commands which runs on ovnkube-node pod.
| - name: Name of the pod running OVS | ||
| - bridge: Name of the OVS bridge (e.g., "br-int") | ||
| - flow: Flow specification describing the packet to trace (e.g., "in_port=1,ip,nw_src=10.244.0.5,nw_dst=10.96.0.1") | ||
| - filter (optional): Regex pattern to filter trace output lines |
There was a problem hiding this comment.
can we expect user to provide a fully working regex? or instead we should take import, src/dst IP and port as parameter and make a regex to filter going forward?
There was a problem hiding this comment.
currently it takes regex string as input and throws an error if it is invalid, i hope LLM itself can help to build regex string.
| ) | ||
|
|
||
| // BridgeResult contains the list of OVS bridges found on a node. | ||
| type BridgeResult struct { |
There was a problem hiding this comment.
Most of the results are sets of string data. Can't we use a single type for all similar kinds of results?
There was a problem hiding this comment.
though it has list of strings in the result, having them in separate fields provide API clarity, semantic JSON field names etc.
tssurya
left a comment
There was a problem hiding this comment.
has two lgtms
merging even if I didn't review it myself.
This PR introduces four new MCP tools for inspecting and debugging OVS in ovn-kubernetes clusters:
with optional filtering by pattern
entries from the OVS datapath to debug stateful firewall rules and NAT
OpenFlow pipeline to troubleshoot flow rules and forwarding decisions
These tools enable AI-assisted debugging of OVS networking issues without requiring direct cluster access.
Signed-off-by: Periyasamy Palanisamy pepalani@redhat.com
Co-Authored-By: Claude noreply@anthropic.com
Example output: