Skip to content

Change pathz telemetry to be per policy rather than per path #1282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

marcushines
Copy link
Contributor

[Note: Please fill out the following template for your pull request. lines
tagged with "Note" can be removed from the template.]

[Note: Before this PR can be reviewed please agree to the CLA covering this
repo. Please also review the contribution guide -
https://github.com/openconfig/public/blob/master/doc/contributions-guide.md]

Change Scope

  • Refactoring the pathz yang telemetry to be based on per policy rather than by path.
    This has been requested by multiple vendors as per path is not needed and causes a very large memory and state management issue

  • this is not backwards compatible but currently these counters should be in production

Platform Implementations

@OpenConfigBot
Copy link

OpenConfigBot commented Apr 16, 2025

No major YANG version changes in commit 85df6a2

@marcushines marcushines force-pushed the hines branch 4 times, most recently from 68d114a to 696db95 Compare April 16, 2025 03:01
Copy link
Contributor

@robshakir robshakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module: openconfig-gnsi-pathz

  augment /oc-sys:system:
    +--ro gnmi-pathz-policies
       +--ro policies
          +--ro policy* [instance]
             +--ro instance    -> ../state/instance
             +--ro state
                +--ro instance?     enumeration
                +--ro version?      version
                +--ro created-on?   created-on
  augment /oc-sys:system/oc-sys-grpc:grpc-servers/oc-sys-grpc:grpc-server/oc-sys-grpc:state:
    +--ro gnmi-pathz-policy-version?      version
    +--ro gnmi-pathz-policy-created-on?   created-on
  augment /oc-sys:system/oc-sys-grpc:grpc-servers/oc-sys-grpc:grpc-server:
    +--ro gnmi-pathz-policy-counters
       +--ro paths
          +--ro path* [name]
             +--ro name     -> ../state/name
             +--ro state
                +--ro name?     string
                +--ro reads
                |  +--ro access-rejects?       oc-yang:counter64
                |  +--ro last-access-reject?   oc-types:timeticks64
                |  +--ro access-accepts?       oc-yang:counter64
                |  +--ro last-access-accept?   oc-types:timeticks64
                +--ro writes
                   +--ro access-rejects?       oc-yang:counter64
                   +--ro last-access-reject?   oc-types:timeticks64
                   +--ro access-accepts?       oc-yang:counter64
                   +--ro last-access-accept?   oc-types:timeticks64

LGTM, couple of editorial suggestions.

@marcushines marcushines requested a review from robshakir April 16, 2025 15:54
@dplore dplore moved this to Ready to discuss in OC Operator Review Apr 16, 2025
@earies
Copy link
Contributor

earies commented Apr 16, 2025

For reference, the updated proposed tree:

module: openconfig-gnsi-pathz

  augment /oc-sys:system:
    +--ro gnmi-pathz-policies
       +--ro policies
          +--ro policy* [instance]
             +--ro instance    -> ../state/instance
             +--ro state
                +--ro instance?     enumeration
                +--ro version?      version
                +--ro created-on?   created-on
  augment /oc-sys:system/oc-sys-grpc:grpc-servers/oc-sys-grpc:grpc-server/oc-sys-grpc:state:
    +--ro gnmi-pathz-policy-version?      version
    +--ro gnmi-pathz-policy-created-on?   created-on
  augment /oc-sys:system/oc-sys-grpc:grpc-servers/oc-sys-grpc:grpc-server:
    +--ro policies
       +--ro policy* [name]
          +--ro name     -> ../state/name
          +--ro state
             +--ro name?     string
             +--ro reads
             |  +--ro access-rejects?       oc-yang:counter64
             |  +--ro last-access-reject?   oc-types:timeticks64
             |  +--ro access-accepts?       oc-yang:counter64
             |  +--ro last-access-accept?   oc-types:timeticks64
             +--ro writes
                +--ro access-rejects?       oc-yang:counter64
                +--ro last-access-reject?   oc-types:timeticks64
                +--ro access-accepts?       oc-yang:counter64
                +--ro last-access-accept?   oc-types:timeticks64

Some comments to consider:

  1. I would like to solicit thoughts around the last comment in the potential gRPC server subtree refactoring here: Refactor gRPC service modeling #1096 (comment)
  2. re: the comment above, I would forsee services as such possibly split into a few subtrees that should be classified by top-level containers that align directly with the service intent - e.g.
  • Global gnsi related items located under a path such as: /system/gnsi or /system/aaa/gnsi
  • Per NBI state/stats under its respective service endpoint: /system/grpc-servers/grpc-server/services/gnsi
  1. In this model we have a list of policy instances, version, created-on which is good (I'd probably suggest an anchor point such as above rather) but we have augments to grpc-server state of versions w/ created-on as well. Is this not global to the element and not "per server"? In theory you could have multiple listening servers for gNSI services but the policy is global tmk
  2. If we consider audit trails, we now have a list of policies w/ r/w accept/reject counters associated to each policy. I believe this is incomplete as what you really want to know is given a policy, what users had hits against what rule. e.g. something more like
  +--ro gnsi
     +--ro pathz
        +--ro policies
           +--ro policy* [instance]
              +--ro instance      enumeration
              +--ro version?      version
              +--ro created-on?   created-on
              +--ro users
                 +--ro user* [name]
                    +--ro name     string
                    +--ro rules
                       +--ro rule* [id]
                          +--ro id          string
                          +--ro counters
                             +--ro matches?   oc-yang:counter64

@marcushines
Copy link
Contributor Author

gnsi pathz only applies to gnmi... what other server do you think it would relate to?
as far as 4. we don't care about which users are doing what - that is what acctz is for

@earies
Copy link
Contributor

earies commented Apr 16, 2025

gnsi pathz only applies to gnmi... what other server do you think it would relate to?

  augment /oc-sys:system/oc-sys-grpc:grpc-servers/oc-sys-grpc:grpc-server/oc-sys-grpc:state:
    +--ro gnmi-pathz-policy-version?      version
    +--ro gnmi-pathz-policy-created-on?   created-on

So the above should only be visible under a grpc-server that has the GNMI service correct? Such a scenario imo would fall under something like (2.b) above where service state could be bound via when clauses to the services under a given server.

But I was moreso getting at the version and created-on are applied globally to the element and not per grpc-server instance. If one wanted to say create 2 listening grpc-servers, 1 in global RIB, 1 in management they would share the same global policy thus shouldnt the applied policy be outside these servers and more well suited under something like /system/... ?

as far as 4. we don't care about which users are doing what - that is what acctz is for

ok, if we scratch the users out of this, should we not pair up the nomenclature to the proto (e.g. id vs. name) and count "matches" and "last-match timestamp" against a given AuthorizationRule rather? The "accept" and "reject" counters seem unnecessary here since the action in the rule already specifies this

@LimeHat
Copy link

LimeHat commented Apr 17, 2025

+1 to the per policy per server vs per policy question.

gnsi pathz only applies to gnmi... what other server do you think it would relate to?

Each node can have multiple gnmi-servers (for instance, when an operator has multiple mgmt vrfs, inband, oob, etc). This is quite common.

}
container reads {
description
"The counter were collected while
performing a read operation on the
schema path.";
performing read operations on the
Copy link

@LimeHat LimeHat Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what, exactly, is a read operation on the policy?

Is it a single RPC allowed/denied by the policy (regardless of a number of paths authorized, e.g. a GetRequest with 10 allowed paths will be counted as +1?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested description:

"The number of gnmi.Get and gnmi.Subscribe RPC's that match the policy and were accepted or rejected. "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "accept/rejected" do you mean "permitted/denied by the policy"?
its not really that binary - a gNMI read operation can be a mix of permitted and denied paths.
For example, if the policy is "permit read /a and deny read /a/b", and /a/b exists in YANG, then a request to read /a will be partially permitted, partially denied.

uses counters;
}
container writes {
description
"The counter were collected while
performing a write operation on the
schema path.";
performing write operations on the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as for the reads, please specify the exact definition of what should be counted here (number of set requests, number of individual operations in each request, etc)

@rnhaddad
Copy link

+1 to the per policy per server vs per policy question.

I think the current model assumes that an instance of the service (e.g. pathz) is per grpc-server. In some cases this may make sense, but in others it does not. For example: pathz is supposed to rotate the policy for the node and not for the server it is tied to, regardless of how many grpc-server instances are configured (note that one may still configure gNMI across multiple servers for different reasons - the model allows it).

@@ -109,62 +115,44 @@ module openconfig-gnsi-pathz {
description
"A collection of counters collected by the gNSI.pathz module.";

container gnmi-pathz-policy-counters {
container policies {
Copy link

@charanjith-anet charanjith-anet Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called 'pathz' so that the yang tree looks as follows:

/system/grpc-servers/grpc-server
    +--ro pathz
       +--ro policy* [name]
          +--ro name     -> ../state/name
          +--ro state

Copy link
Member

@dplore dplore Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on these suggestions:

Lists need to follow OC style guide, (lists/list) and strongly recommend using a counters container for those leaves, following precedent in most of the rest of the OC models.

  augment /oc-sys:system/oc-sys-grpc:grpc-servers/oc-sys-grpc:grpc-server:
  +--ro pathz
    +--ro policies
       +--ro policy* [name]
          +--ro name     -> ../state/name
          +--ro state
             +--ro name?     string
             +--ro last-read-reject?   oc-types:timeticks64
             +--ro last-read-accept?   oc-types:timeticks64
             +--ro last-write-reject?   oc-types:timeticks64
             +--ro last-write-accept?   oc-types:timeticks64
             +--ro counters
             |  +--ro read-rejects?       oc-yang:counter64
             |  +--ro read-accepts?       oc-yang:counter64
                +--ro write-rejects?       oc-yang:counter64
                +--ro write-accepts?       oc-yang:counter64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments - this should not be "per gRPC server instance" so I believe the anchor point is incorrect first and foremost.

The gNSI suite should be looked at hollisically of which there are some gRPC APIs to interact (there could very well be others both native or in other OC methods)

But wherever the anchor point ends up - +1 to the last comment in order to follow the encapsulation of lists inside containers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, your suggestion could perhaps be stated as reserving the /system/grpc-servers/grpc-server subtree to paths which are generic across any/all grpc servers (which means it will only have things like a connection list and packet/byte counters).

The anchor for pathz can be at the system level since pathz is a "system-wide" entity.

  augment /oc-sys:system/:
  +--ro pathz
    +--ro policies
       +--ro policy* [name]
          +--ro name     -> ../state/name
          +--ro state
             +--ro name?     string
             +--ro last-read-reject?   oc-types:timeticks64
             +--ro last-read-accept?   oc-types:timeticks64
             +--ro last-write-reject?   oc-types:timeticks64
             +--ro last-write-accept?   oc-types:timeticks64
             +--ro counters
             |  +--ro read-rejects?       oc-yang:counter64
             |  +--ro read-accepts?       oc-yang:counter64
                +--ro write-rejects?       oc-yang:counter64
                +--ro write-accepts?       oc-yang:counter64

Copy link

@LimeHat LimeHat Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@earies

so I believe the anchor point is incorrect first and foremost.

It is not necessarily incorrect, but there's a mismatch between the PR description and the anchor point.

You can apply the same policy to multiple servers, but still implement per-server counters, especially if (as @dplore suggested) the purpose of this PR is to count number of accepted/denied RPCs (which hasn't been confirmed yet).

After changes in openconfig/gnsi/pull/219, and combined with the idea of doing per-RPC counters, we are essentially in the authz-like territory which already implements per-server counters
https://github.com/openconfig/public/blob/master/release/models/gnsi/openconfig-gnsi-authz.yang

Which kind of brings the question, do we really need all these separate counters? Might as well just do combined per-grpc server statistics for both authz and pathz rejects.

@brianneville
Copy link
Contributor

Nice to see the move away from xpaths :)

I have a few initial thoughts on this proposal:

  1. On the notion of per-policy things.
    The pathz spec doesn't deal with multiple policies.
    From the point of view of the spec, there is only one active policy on the system (unfinalized rotations can introduce the sandbox policy, but it doesnt make sense to track counters for the sandboxed policy, as this is not actually doing enforcement).
    The decision of "what happens to a policy after it has been rotated away and is no longer active" (i.e. is it stored on the device somewhere? is it deleted/overwritten?) is implementation-specific, not defined in the spec.
    In either case however, there's no reason why a non-active policy should ever sporadically become useful again, as the device can only have a new policy imposed via the Rotate RPC - it cannot be instructed to pick up a policy file from disk.
    Is the idea of this PR that the YANG tree should be keeping a record of each distinct policy that was ever rotated onto the system and then track counters for each one?

    In that case we would only update the list entry corresponding to the the most recent policy, and then we would also have a bunch of stale entries in the list corresponding to the previous policies.
    I dont know if i can see the value in keeping stale list entries around. For example, if you're interested in seeing information about historical accesses to the system, then accounting logs (acctz) would be a much more useful source of information, as those logs would include the gNMI path requested, timestamp, and information about the client.

    To me it would make more sense if we only tracked counters for the active policy instance (i.e. a container instead of a list), and then as soon as a rotation occurred we just reset the counters to 0 and start over.

  2. how should policies be named/identified?
    Using the version/created-on timestamp to identify it is not a good option imo as its not guaranteed that the version/created-on are updated every time the policy is changed (as the gNSI client could set the force_overwrite field to forcibly update a policy without changing version https://github.com/openconfig/gnsi/blob/fef60085e9600881689196b52cff5dd62aa1a4c1/pathz/pathz.proto#L128 )
    If we were to use the version/created-on to identify a policy, then we would either:

    i). need to reset the counters associated with a policy once a Rotate RPC completes, so that they reflect the policy changing (this aligns more with what i was saying in point 1 above)
    or
    ii). need to accept that the counters for a particular policy may be an accumulation of any counters associated with past policies that have now been force-overwritten.

    The other option would be that the server delegates a name to the policy (e.g. monotonically incrementing version number) - and then either the gNMI client or the YANG schema would need to find some way to associate that with version/created-on information.

@LimeHat
Copy link

LimeHat commented Apr 26, 2025

Is the idea of this PR that the YANG tree should be keeping a record of each distinct policy that was ever rotated onto the system and then track counters for each one?

Considering that the state is ephemeral, that doesn't seem particularly useful either.

I'd vote for simplification (count things only for one active policy; TBD with or without per-server granularity)

@jsterne
Copy link

jsterne commented Apr 28, 2025

Isn't the 'read-rejects' counter a potential security issue? It feels like it could be used as a probe to discover information about the configuration or state that the reader isn't supposed to know. E.g. try reading a particular list to see if any entries are in it (reading some ancestor container), or trying a specific key.
In other access control mechanisms I've seen, items that the user doesn't have permission to see are just silently filtered out.

@jsterne
Copy link

jsterne commented Apr 28, 2025

There is a mention of Subscribe RPC above. But is access control actually applied at subscription time or is it conceptually applied on the notification data stream? For example: if a user has permission to access interface "foo" but not interface "bar", but interface "bar" doesn't exist yet, and the user subscribes to "all interfaces" (which is only foo at the moment), what happens when interface "bar" is later added to the config? So I'm not sure that a counter for 'rejects' should even apply to Subscribe?

@LimeHat
Copy link

LimeHat commented Apr 28, 2025

But is access control actually applied at subscription time or is it conceptually applied on the notification data stream?

both, depending on the policy and the subscription content.

see the recent discussion in openconfig/gnsi#219

if a user has permission to access interface "foo" but not interface "bar", but interface "bar" doesn't exist yet, and the user subscribes to "all interfaces" (which is only foo at the moment), what happens when interface "bar" is later added to the config? So I'm not sure that a counter for 'rejects' should even apply to Subscribe?

after the latest changes, this subscription will be rejected (regardless of the presence of interface "bar"). [assuming you don't have a blanket /interfaces allow in your policy, only /interfaces/interface[name=foo] is allowed]

@dplore
Copy link
Member

dplore commented May 13, 2025

Reviewed in May 13, 2025 OC Operators meeting. Moving to "waiting for author status" so @marcushines can respond to the comments above. One question is, is the goal to count accepted/rejected RPC's or paths? (ie: how should an RPC with 10 paths, 9 are accepted and 1 is rejected be counted?)

@dplore dplore moved this from Ready to discuss to Waiting for author in OC Operator Review May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for author
Development

Successfully merging this pull request may close these issues.

10 participants