Skip to content

Conversation

ChuanFF
Copy link
Contributor

@ChuanFF ChuanFF commented Sep 24, 2025

Description

Currently, the Kubernetes service discovery implementation in APISIX incorrectly handles EndpointSlices, treating each individual slice as the complete set of endpoints for a service. This contradicts the official Kubernetes documentation.

According to the Kubernetes EndpointSlice documentation:

"For a given service there may be multiple EndpointSlice objects which must be joined to produce the full set of endpoints; you can find all of the slices for a given service by listing EndpointSlices in the service's namespace whose kubernetes.io/service-name label contains the service's name."

The correct behavior should be to combine all EndpointSlices sharing the same kubernetes.io/service-name label to form the complete endpoint set for a service.

Solution Approach

1. Add EndpointSlices Cache

  • Implemented endpoint_slices_cache to store all EndpointSlices with the structure:
    endpoint_slices_cache["k8s_id/namespace/k8s_service_name:port_name"] = {slice1, slice2, ...}
  • Any changes to EndpointSlices trigger updates to this cache
  • The cache is then used to assemble the complete node list for each service, which is subsequently updated in shared memory

2. Code Refactoring

  • Renamed variables to better align with Kubernetes EndpointSlices data structure terminology
  • Improved code readability and maintainability

3. Some Bug Fixes

  • Fixed incorrect handling of addresses field: The addresses field contains a list of IP strings, not objects with .ip attributes
  • Optimized node sorting logic: Simplified the nested loop structure in endpoint_buffer processing:
    -- Before (incorrect):
    for _, ports in pairs(endpoint_buffer) do
        for _, nodes in pairs(ports) do
            core.table.sort(nodes, sort_nodes_cmp)
        end
    end
    
    -- After (correct):
    for _, nodes in pairs(endpoint_buffer) do
        core.table.sort(nodes, sort_nodes_cmp)
    end
    The endpoint_buffer uses ports as keys and node lists as values, requiring only a single iteration level.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. bug Something isn't working labels Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant