Skip to content

Add sosreport MCP tools#2

Merged
tssurya merged 1 commit intoovn-kubernetes:mainfrom
kyrtapz:sosreport_mcp
Dec 4, 2025
Merged

Add sosreport MCP tools#2
tssurya merged 1 commit intoovn-kubernetes:mainfrom
kyrtapz:sosreport_mcp

Conversation

@kyrtapz
Copy link
Contributor

@kyrtapz kyrtapz commented Oct 29, 2025

Implements MCP tools for analyzing sosreports:

  • sos-list-plugins: Lists enabled sosreport plugins with their command counts
  • sos-list-commands: Returns all commands executed by a specific plugin
  • sos-search-commands: Searches for commands matching a pattern across all plugins
  • sos-get-command: Retrieves command output from sosreport with optional regex filtering
  • sos-search-pod-logs: Searches Kubernetes pod container logs with pattern matching

Assisted-By: Claude Sonnet 4.5
Signed-off-by: Patryk Diak pdiak@redhat.com

Example using claude code when asked about ovs drops:

...
● ovn-kubernetes - sos-get-command (MCP)(sosreport_path: "sosreport-ip-10-0-1-69-2025-10-29-mvjmpmc", filepath: "sos_commands/openvswitch/ovs-ofctl_dump-flows_br-ex", grep_pattern: "drop", max_lines: 200)
  ⎿ {                                                                                                                                                                                                                                                                                                                                                                                                                             
      "output": " cookie=0x304, duration=303.023s, table=0, n_packets=0, n_bytes=0, idle_age=303, priority=700,icmp,in_port=2,nw_dst=172.20.0.1,icmp_type=3,icmp_code=4 actions=drop\n cookie=0x304, duration=303.023s, table=0, n_packets=0, n_bytes=0, idle_age=303, priority=700,icmp,in_port=2,nw_dst=10.0.1.39,icmp_type=3,icmp_code=4 actions=drop\n cookie=0xdeff105, duration=304.192s, table=0, n_packets=0, n_bytes=0, i
    dle_age=304, priority=115,ip,in_port=2,nw_dst=172.30.0.0/16 actions=drop\n cookie=0xdeff105, duration=304.191s, table=0, n_packets=0, n_bytes=0, idle_age=304, priority=103,ip,in_port=2,nw_src=10.128.0.0/14 actions=drop\n cookie=0xdeff105, duration=304.190s, table=0, n_packets=0, n_bytes=0, idle_age=304, priority=9,in_port=2 actions=drop\n"
    }


● Yes, there are several drop flows with traffic hitting them:

  br-int (Integration Bridge):

  Most significant drops:
  - Table 43: 747 packets dropped (1,495,492 bytes) - default drop for unmatched traffic
  - Table 44: 288 packets dropped (786,822 bytes) - IP traffic from 02:95:90:03:29:85 (10.0.1.39)
  - Table 44: 250 packets dropped (34,637 bytes) - hairpin traffic (metadata=0x2, same source/dest)
  - Table 9: 193 packets dropped (15,186 bytes) - traffic marked for drop (reg0=0x8000, metadata=0x1)
  - Table 0: 2 packets dropped (176 bytes) - default drop for unmatched incoming traffic

  Security/anti-spoofing drops (no traffic):
  - ICMP packet-too-big and fragmentation-needed messages
  - Invalid source IPs (0.0.0.0/8, 127.0.0.0/8, 224.0.0.0/4, etc.)
  - TTL=0/1 packets
  - Multicast with wrong TTL
  - Broadcast MAC addresses
  - Router solicitation/advertisement ICMPv6

  br-ex (External Bridge):

  Active drops:
  - Priority 700: ICMP fragmentation needed for 172.20.0.1 and 10.0.1.39
  - Priority 115: Service network traffic (172.30.0.0/16) on port 2
  - Priority 103: Pod network traffic (10.128.0.0/14) on port 2
  - Priority 9: Default drop for port 2

  The largest drop count is in br-int table 43 with nearly 1.5MB of traffic being dropped.

Based on #1

@FrostGod
Copy link

FrostGod commented Oct 29, 2025

@kyrtapz , can we please wait for the first MR to be merged and try to get it done ASAP? its little confusing to review this.
and maybe lets also create open dashboard to see what tools each person is trying to add.
so that work won't be duplicated.

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Oct 30, 2025

@kyrtapz , can we please wait for the first MR to be merged and try to get it done ASAP? its little confusing to review this.

Hey @FrostGod!, there is no intent of getting this one in before #1 merges. I am not sure why this PR is confusing but you can see I am the author of the last commit.

maybe lets also create open dashboard to see what tools each person is trying to add. so that work won't be duplicated.

That is a good idea I think @arkadeepsen was looking into it, Aurko can you confirm?

@arkadeepsen
Copy link
Contributor

That is a good idea I think @arkadeepsen was looking into it, Aurko can you confirm?

I have created all the issues that we are working on and assigned them to the respective individuals who are working on them.

@FrostGod
Copy link

FrostGod commented Oct 30, 2025

Hey @FrostGod!, there is no intent of getting this one in before #1 merges. I am not sure why this PR is confusing but you can see I am the author of the last commit.

cool, it was confusing in the sense, what all functionalities are in progress..., and how does the overall picture look like, what all layers is the MCP suppose to do,
but now with tasks been created i get the rough image

@arkadeepsen
Copy link
Contributor

@pperiyasamy will review the PR.

Copy link
Contributor

@pperiyasamy pperiyasamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried out sosreport tooling from claude, it works great. some comments.

}

// listPlugins returns a list of enabled plugins with their command counts
func listPlugins(sosreportPath string) (types.ListPluginsResult, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice plugin aggregation for the collected logs from sos report's manifest.json file !
what about the remaining logs collected outside of sos_commands directory.
For example need to analyze ovs-vswitchd and ovsdb-server.log logs from /var/log/openvswitch directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a rason to do that as they are already there as part of sos_commands: sos_commands/openvswitch/:

journalctl_--no-pager_--unit_openvswitch
journalctl_--no-pager_--unit_openvswitch-ipsec
journalctl_--no-pager_--unit_openvswitch-nonetwork
journalctl_--no-pager_--unit_ovs-configuration
journalctl_--no-pager_--unit_ovsdb-server
journalctl_--no-pager_--unit_ovs-vswitchd


var result types.SearchCommandsResult
if maxResults == 0 {
maxResults = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason for restricting it with maxResults ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I want to have a reasonable limit so AI doesn't eat too much tokens, there is still an option to modify it if needed.

// Remove the prefix if exists
logPath = strings.TrimPrefix(logPath, "host/")

fullPath := filepath.Join(sosreportPath, logPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may also have rotated log files in .gz format, shouldn't that taken care of ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I will try to look into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Report contains the sosreport details
type Report struct {
Plugins map[string]PluginDetail `json:"plugins"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok to ignore other parameters from the sosreport ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they will be skipped.

})
}

result.TotalCommands = totalCommands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my understanding: what is the reason for exposing CommandCount and TotalCommands via MCP tooling ? is that really needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it hurts, it can help the model to narrow down the search criteria if there are too many results.

Copy link
Contributor

@arkadeepsen arkadeepsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following are some comments from my first pass of the PR. I'll do at least one more pass.

Comment on lines 42 to 49
// Always add sosreport tools (they work offline with sosreport path parameter)
sosreportServer, err := sosreportmcp.NewMCPServer()
if err != nil {
log.Fatalf("Failed to create sosreport MCP server: %v", err)
}
log.Println("Adding sosreport tools to OVN-K MCP server")
sosreportServer.AddTools(ovnkMcpServer)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the sosreport MCP server to the offline mode. When the MCP server is deployed in-cluster, we would still be adding all the sosreport tools, which ideally shouldn't happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to make it offline-mode only initially but I can imagine a scenario where a singular MCP would be used for analyzing the sosreport and interacting with a cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be achieved by supporting another value (say both) for the mode flag. When this mode is chosen, both live-cluster and offline mode tools are added to the MCP server.

Comment on lines 45 to 67
file, err := os.Open(fullPath)
if err != nil {
return "", err
}

matches, err := readWithLimit(file, searchPattern, maxResults-totalMatches)
if err != nil {
file.Close()
return "", err
}

if len(matches) > 0 {
result.WriteString(fmt.Sprintf("\n=== %s ===\n", logPath))
result.WriteString(matches)
totalMatches += len(strings.Split(matches, "\n"))

if totalMatches >= maxResults {
result.WriteString(fmt.Sprintf("\n... (search truncated at %d results)\n", maxResults))
file.Close()
return result.String(), nil
}
}
file.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to create a separate function (or even a local function) so that the file.Close() can be added to defer? This would help in better maintainability of the code for future changes.

- sosreport_path (required): Path to extracted sosreport directory
- filepath (required): Relative filepath from sos-list-commands or sos-search-commands
- pattern (optional): Regex pattern to filter output lines
- max_lines (optional): Maximum lines to return (default: 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"plugins": [
{"name": "crio", "command_count": 15},
{"name": "openvswitch", "command_count": 187},
{"name": "neworkmanager", "command_count": 2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"name": "neworkmanager", "command_count": 2}
{"name": "networkmanager", "command_count": 2}

nit

Comment on lines 14 to 16
func NewMCPServer() (*MCPServer, error) {
return &MCPServer{}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the error as a return type since only nil is being returned here? Do you feel we might need this later on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the error.

}

if maxResults == 0 {
maxResults = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create const for default max results here.

}

// AddTools registers sosreport tools with the MCP server
func (s *MCPServer) AddTools(server *mcp.Server) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add tool(s) for interacting with the artifacts from var/lib/ovn-ic/etc directory where OVN DBs are located along with the libovsdb log files.

Copy link
Contributor Author

@kyrtapz kyrtapz Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var/lib/ovn-ic/etc is openshift specific and to interact with the DBs we would have to introduce a DB client.
While I think this might be useful in the future I don't think it should be done as part of this PR, especially considering we might get significant overlap with must-gather mcp.
ovs logs are already available through journal commands.

return "", err
}

fullPath := filepath.Join(sosreportPath, relativeFilepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative path should be validated so that it doesn't contain patterns like ../, ../../, etc. for traversing file path outside of the sosreport path.

}

for _, cmd := range plugin.Commands {
result.Commands = append(result.Commands, types.CommandSummary{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every sosreport tools return the output with content and structuredContent which are the same contents. is this specific design? for example

{
  "content": [
    {
      "type": "text",
      "text": "{\"output\":\" cookie=0x304, duration=13656.724s, table=0, n_packets=0, n_bytes=0, idle_age=13656, priority=700,icmp,in_port=2,nw_dst=10.0.0.4,icmp_type=3,icmp_code=4 actions=drop\\n
...
    }
  ],
  "structuredContent": {
    "output": " cookie=0x304, duration=13656.724s, table=0, n_packets=0, n_bytes=0, idle_age=13656, priority=700,icmp,in_port=2,nw_dst=10.0.0.4,icmp_type=3,icmp_code=4 actions=drop\n
...
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great observation!
Is this specific to the sosreport though? I don't think I am doing anything different and I would just brush it as mcp-sdk quirk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it's not specific to sosreport. It should be mcp-sdk designation^^

}
}
if totalMatches == 0 {
return fmt.Sprintf("No matches found for pattern: %s\n", pattern), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support case-insensitive fuzzy search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search is case sensitive but the model can easily make it insensitive with a proper regex filter.

@kyrtapz kyrtapz force-pushed the sosreport_mcp branch 2 times, most recently from 1a9e505 to 988eff5 Compare November 27, 2025 16:24
Copy link
Contributor

@arkadeepsen arkadeepsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the below comments mostly looks good.

Should we add a section in the README file to document how to run the MCP server in offline mode?

Comment on lines 52 to 277
func TestListPlugins(t *testing.T) {
tests := []struct {
name string
path string
wantError bool
wantPluginCount int
wantTotalCmds int
}{
{
name: "valid sosreport",
path: sosreportTestData,
wantError: false,
wantPluginCount: 3, // openvswitch, networking, container_log
wantTotalCmds: 3, // 2 from openvswitch, 1 from networking, 0 from container_log
},
{
name: "non-existent path",
path: "testdata/non-existent",
wantError: true,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result, err := listPlugins(tt.path)
if tt.wantError {
if err == nil {
t.Errorf("listPlugins() expected error but got nil")
}
return
}
if err != nil {
t.Errorf("listPlugins() unexpected error = %v", err)
return
}

if len(result.Plugins) != tt.wantPluginCount {
t.Errorf("listPlugins() got %d plugins, want %d", len(result.Plugins), tt.wantPluginCount)
}

if result.TotalCommands != tt.wantTotalCmds {
t.Errorf("listPlugins() got %d total commands, want %d", result.TotalCommands, tt.wantTotalCmds)
}

// Verify plugin names
pluginNames := make(map[string]bool)
for _, p := range result.Plugins {
pluginNames[p.Name] = true
}
expectedPlugins := []string{"openvswitch", "networking", "container_log"}
for _, expected := range expectedPlugins {
if !pluginNames[expected] {
t.Errorf("listPlugins() missing expected plugin %q", expected)
}
}
})
}
}

func TestListCommands(t *testing.T) {
tests := []struct {
name string
path string
pluginName string
wantError bool
errorMsg string
wantCmdCount int
}{
{
name: "openvswitch plugin",
path: sosreportTestData,
pluginName: "openvswitch",
wantError: false,
wantCmdCount: 2,
},
{
name: "networking plugin",
path: sosreportTestData,
pluginName: "networking",
wantError: false,
wantCmdCount: 1,
},
{
name: "container_log plugin with no commands",
path: sosreportTestData,
pluginName: "container_log",
wantError: false,
wantCmdCount: 0,
},
{
name: "non-existent plugin",
path: sosreportTestData,
pluginName: "non-existent",
wantError: true,
errorMsg: "plugin \"non-existent\" not found",
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result, err := listCommands(tt.path, tt.pluginName)
if tt.wantError {
if err == nil {
t.Errorf("listCommands() expected error but got nil")
}
return
}
if err != nil {
t.Errorf("listCommands() unexpected error = %v", err)
return
}

if result.Plugin != tt.pluginName {
t.Errorf("listCommands() got plugin %q, want %q", result.Plugin, tt.pluginName)
}

if result.CommandCount != tt.wantCmdCount {
t.Errorf("listCommands() got %d commands, want %d", result.CommandCount, tt.wantCmdCount)
}

if len(result.Commands) != tt.wantCmdCount {
t.Errorf("listCommands() got %d command entries, want %d", len(result.Commands), tt.wantCmdCount)
}

// For plugins with commands, verify structure
if tt.wantCmdCount > 0 {
for _, cmd := range result.Commands {
if cmd.Exec == "" {
t.Errorf("listCommands() command has empty Exec field")
}
if cmd.Filepath == "" {
t.Errorf("listCommands() command has empty Filepath field")
}
}
}
})
}
}

func TestSearchCommands(t *testing.T) {
tests := []struct {
name string
path string
pattern string
maxResults int
wantError bool
wantMinMatches int
wantMaxMatches int
}{
{
name: "search for ovs commands",
path: sosreportTestData,
pattern: "ovs",
maxResults: 100,
wantError: false,
wantMinMatches: 2, // At least the 2 ovs commands
wantMaxMatches: 100,
},
{
name: "search for ip command",
path: sosreportTestData,
pattern: "ip.*show",
maxResults: 100,
wantError: false,
wantMinMatches: 1, // ip addr show
wantMaxMatches: 100,
},
{
name: "search with no matches",
path: sosreportTestData,
pattern: "nonexistent-pattern-xyz",
maxResults: 100,
wantError: false,
wantMinMatches: 0,
wantMaxMatches: 0,
},
{
name: "invalid regex pattern",
path: sosreportTestData,
pattern: "[invalid(",
maxResults: 100,
wantError: true,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result, err := searchCommands(tt.path, tt.pattern, tt.maxResults)
if tt.wantError {
if err == nil {
t.Errorf("searchCommands() expected error but got nil")
}
return
}
if err != nil {
t.Errorf("searchCommands() unexpected error = %v", err)
return
}

if result.Total < tt.wantMinMatches {
t.Errorf("searchCommands() got %d matches, want at least %d", result.Total, tt.wantMinMatches)
}

if result.Total > tt.wantMaxMatches {
t.Errorf("searchCommands() got %d matches, want at most %d", result.Total, tt.wantMaxMatches)
}

if len(result.Matches) != result.Total {
t.Errorf("searchCommands() matches slice length %d doesn't match Total %d", len(result.Matches), result.Total)
}

// Verify match structure
for _, match := range result.Matches {
if match.Plugin == "" {
t.Errorf("searchCommands() match has empty Plugin field")
}
if match.Exec == "" {
t.Errorf("searchCommands() match has empty Exec field")
}
if match.Filepath == "" {
t.Errorf("searchCommands() match has empty Filepath field")
}
}
})
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests can be moved to commands_test.go file.

return "", err
}

if maxResults == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate maxResults so that negative values are not passed? The behavior of this function can differ from readWithLimit function because of how the limits are being checked:

https://github.com/ovn-kubernetes/ovn-kubernetes-mcp/pull/2/files#diff-96788fc8eaf4aeb1ec4e58457dfb8718493b392ed34cc9b39730bb4d75e694fcR58

https://github.com/ovn-kubernetes/ovn-kubernetes-mcp/pull/2/files#diff-68350f60f448c5e5211a8963dd46037e3ca299e582aa1bf4d725232503a53b1bR62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the check to default to 100 if <= 0

}
}

if maxLines == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maxLines in getCommandOutput function be validated for negative values? Related to https://github.com/ovn-kubernetes/ovn-kubernetes-mcp/pull/2/files#r2571904444

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the model will try to use negative values and I modified the defaulting, we should be good.

Implements MCP tools for analyzing sosreports:
  - sos-list-plugins: Lists enabled sosreport plugins with their command counts
  - sos-list-commands: Returns all commands executed by a specific plugin
  - sos-search-commands: Searches for commands matching a pattern across all plugins
  - sos-get-command: Retrieves command output from sosreport with optional regex filtering
  - sos-search-pod-logs: Searches Kubernetes pod container logs with pattern matching

Assisted-By: Claude Sonnet 4.5
Signed-off-by: Patryk Diak <pdiak@redhat.com>
Copy link
Contributor

@arkadeepsen arkadeepsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@pperiyasamy pperiyasamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tssurya tssurya merged commit 7a97731 into ovn-kubernetes:main Dec 4, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants