-
Notifications
You must be signed in to change notification settings - Fork 180
Add SCEP certificate enrollment and 802.1x port-based network access control (PNAC) #5691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -604,6 +604,102 @@ There are two levels of errors: | |
| - A particular management port could not be used to reach the controller. In that case | ||
| the `ErrorInfo` for the particular `DevicePort` is set to indicate the error and timestamp. | ||
|
|
||
| ## Port-Based Network Access Control (802.1X) and SCEP Certificate Enrollment | ||
|
|
||
| EVE supports IEEE 802.1X Port-Based Network Access Control (PNAC), allowing network switches | ||
| to restrict port-level access until the device authenticates with a valid certificate. | ||
| IEEE 802.1X is a standard for port-based network access control that works at Layer 2 | ||
| of the network stack. A switch port starts in an unauthorized state and only grants full | ||
| network access after the connected device (the supplicant) successfully authenticates | ||
| against an authentication server (typically a RADIUS server) via the switch (the authenticator). | ||
|
|
||
| To obtain the certificate required for authentication, EVE implements SCEP (Simple Certificate | ||
| Enrollment Protocol), a protocol designed for automated certificate enrollment from | ||
| a Certificate Authority (CA). SCEP allows a device to generate a key pair, submit | ||
| a Certificate Signing Request (CSR) to a SCEP server, and receive a signed certificate | ||
| in return. | ||
|
|
||
| The 802.1X supplicant is implemented using [wpa_supplicant](https://w1.fi/wpa_supplicant/) | ||
| with EAP-TLS as the authentication method. The SCEP client is implemented using | ||
| the [github.com/smallstep/scep](https://github.com/smallstep/scep) Go library. | ||
|
|
||
| ### Bootstrapping workflow | ||
|
|
||
| The full workflow from an unauthenticated device to an authenticated network port is: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will it be default workflow for every port, or just for the ones we marked as PNAC-required?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just for those with PNAC enabled.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we notify if "non-PNAC enabled" port has no connectivity because it's most likely in a PNAC-enabled network?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Detection of 802.1X is not included in this PR and may be added as a future enhancement, as it was not required for the current scope.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, was just poking around use case, thanks |
||
|
|
||
| 1. **DHCP with vendor class identification**: The device sends a DHCP request that includes | ||
| a Vendor Class Identifier (DHCP Option 60) set to `LFEDGE-EVE`. This identifies the device | ||
| as running EVE OS to the network infrastructure. | ||
|
|
||
| 2. **Non-authenticated VLAN access**: The network switch places the port into a non-authenticated | ||
| (bootstrap) VLAN. Because the switch detects the EVE OS vendor class identifier, it allows | ||
| the device to reach the controller and fetch the network configuration including the SCEP | ||
| enrollment profile. This step is critical for bootstrapping — the device needs connectivity | ||
| to obtain the certificate it will later use for authentication. | ||
|
|
||
| 3. **SCEP certificate enrollment**: The device follows the SCEP profile received from | ||
| the controller to enroll a certificate. It can communicate with the SCEP server in one | ||
| of two ways: | ||
| - **Directly**: The device contacts the SCEP server URL specified in the profile. | ||
| - **Via controller proxy**: The device routes SCEP requests through a controller-provided | ||
| SCEP proxy (essentially an HTTP proxy), which is useful when the SCEP server is not | ||
| directly reachable from the bootstrap VLAN. | ||
|
|
||
| 4. **802.1X port authentication**: Once the certificate is enrolled, the device uses it | ||
| to authenticate the port via 802.1X EAP-TLS. Upon successful authentication, the switch | ||
| moves the port to the authenticated VLAN, granting full network access. | ||
|
Comment on lines
+648
to
+650
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that the port will be part of the unautenticated VLAN while the 802.1X is doing multiple round trips to authenticate? Does DHCP get triggered pnac.dhcp.reacquire.delay after the802.1X exchange is successful? If the port is connected to no VLAN while the 802.1X is progressing, then there will be less concerns about needing to delay the DHCP request.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First we need IP from the unautenticated VLAN to access the cloud, get config and enroll certificate from a SCEP server.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I rewrote this and replaced the delay interval, which was quite fragile, with |
||
|
|
||
| 5. **DHCP reacquisition**: After the port authentication state changes, the device retries | ||
| the DHCP request with exponential backoff (2s, 4s, 8s, ...) to obtain an IP address from | ||
| the authenticated VLAN. Retries continue until the IP subnet changes (indicating the VLAN | ||
| transition completed) or the configured maximum number of retries | ||
| ([`pnac.dhcp.reacquire.max.retries`](CONFIG-PROPERTIES.md), default 4) is reached. | ||
|
|
||
| ### Configuration | ||
|
|
||
| PNAC and SCEP are configured through the controller using the device API: | ||
|
|
||
| - **SCEP profiles** are defined in `EdgeDevConfig.ScepProfiles` and specify the SCEP server URL, | ||
| whether to use the controller proxy, a challenge password (encrypted), trusted CA certificates, | ||
| and CSR parameters (subject DN, SANs, key type, hash algorithm, renewal period). | ||
|
|
||
| - **PNAC configurations** are defined in `EdgeDevConfig.Pnacs`, each referencing network adapter | ||
| and a SCEP profile by logical names. They specify the EAP method (currently EAP-TLS), | ||
| an optional EAP identity. If no EAP identity is configured, EVE will derive the identity from | ||
| the enrolled certificate, preferring the subject common name (CN), or the SAN URI if CN is absent. | ||
|
|
||
| Relevant [configuration properties](CONFIG-PROPERTIES.md): | ||
|
|
||
| | Property | Default | Description | | ||
| |---|---|---| | ||
| | `scep.retry.interval` | 300s (5 min) | Interval between retry attempts for failed or pending SCEP enrollments | | ||
| | `pnac.dhcp.reacquire.max.retries` | 4 | Max DHCP reacquire retries (with exponential backoff) after 802.1X authentication state change. Set to 0 to disable | | ||
| | `dhcp.enable.vendorclassid` | true | Enables sending DHCP Vendor Class Identifier (Option 60) as `LFEDGE-EVE` | | ||
|
|
||
| ### Certificate lifecycle | ||
|
|
||
| The enrolled certificate is stored on the device along with its private key (kept in the vault | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does "the vault" mean /persist/vault? Can the private key be needed immediately after a reboot to (re)authenticiate over 802.1X?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes Port authentication is needed for application connectivity. |
||
| for protection). EVE monitors the certificate's validity and automatically initiates renewal | ||
| when the configured percentage of the certificate's lifetime has elapsed | ||
| (controlled by `RenewPeriodPercent` in the CSR profile). If the SCEP server or CSR profile | ||
| configuration changes, EVE will re-enroll the certificate against the new parameters. | ||
|
|
||
| ### Status and metrics reporting | ||
|
|
||
| EVE publishes the following information to the controller: | ||
|
|
||
| - **PNAC status** (per-port): Whether 802.1X is enabled, the current supplicant state | ||
| (e.g. connecting, authenticating, authenticated, failed), the timestamp of the last | ||
| successful authentication, and any authentication errors. | ||
|
|
||
| - **Enrolled certificate status**: Details of the installed certificate including subject, | ||
| issuer, SANs, validity period, SHA-256 fingerprint, key type, and current certificate status | ||
| (e.g. valid, expired, pending enrollment). | ||
|
|
||
| - **PNAC metrics** (per-port): EAPOL frame counters including frames received/transmitted, | ||
| EAPOL-Start and EAPOL-Logoff frames, EAP-Request/Response frames, and counts of invalid | ||
| or malformed frames. | ||
|
|
||
| ## Air-Gap Mode | ||
|
|
||
| Air-Gap mode allows a device to operate without connectivity to the main controller, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,6 +13,7 @@ import ( | |
| "strings" | ||
| "time" | ||
|
|
||
| eveinfo "github.com/lf-edge/eve-api/go/info" | ||
| "github.com/lf-edge/eve/pkg/pillar/agentbase" | ||
| "github.com/lf-edge/eve/pkg/pillar/agentlog" | ||
| "github.com/lf-edge/eve/pkg/pillar/base" | ||
|
|
@@ -80,6 +81,8 @@ type nim struct { | |
| subNetworkInstanceConfig pubsub.Subscription | ||
| subEdgeNodeClusterStatus pubsub.Subscription | ||
| subKubeUserServices pubsub.Subscription | ||
| subVaultStatus pubsub.Subscription | ||
| subEnrolledCertStatus pubsub.Subscription | ||
|
|
||
| // Publications | ||
| pubDummyDevicePortConfig pubsub.Publication // For logging | ||
|
|
@@ -91,10 +94,13 @@ type nim struct { | |
| pubCipherMetrics pubsub.Publication | ||
| pubCachedResolvedIPs pubsub.Publication | ||
| pubWwanConfig pubsub.Publication | ||
| pubPNACMetrics pubsub.Publication | ||
|
|
||
| // Metrics | ||
| agentMetrics *controllerconn.AgentMetrics | ||
| cipherMetrics *cipher.AgentMetrics | ||
| agentMetrics *controllerconn.AgentMetrics | ||
| cipherMetrics *cipher.AgentMetrics | ||
| metricInterval uint32 // In seconds | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. technically it's a limitation but I don't think anyone would setup to collect metrics less frequent than 136 years :)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Max allowed value for |
||
| publishTicker *flextimer.FlexTickerHandle | ||
|
|
||
| // Configuration | ||
| globalConfig types.ConfigItemValueMap | ||
|
|
@@ -219,11 +225,12 @@ func (n *nim) run(ctx context.Context) (err error) { | |
| stillRunning := time.NewTicker(stillRunTime) | ||
| n.PubSub.StillRunning(agentName, warningTime, errorTime) | ||
|
|
||
| // Publish metrics for zedagent every 10 seconds | ||
| interval := 10 * time.Second | ||
| // Publish network metrics | ||
| interval := time.Duration(n.metricInterval) * time.Second | ||
| max := float64(interval) | ||
| min := max * 0.3 | ||
| publishTimer := flextimer.NewRangeTicker(time.Duration(min), time.Duration(max)) | ||
| publishTicker := flextimer.NewRangeTicker(time.Duration(min), time.Duration(max)) | ||
| n.publishTicker = &publishTicker | ||
|
|
||
| // Periodically resolve the controller hostname to keep its DNS entry cached, | ||
| // reducing the need for DNS lookups on every controller API request. | ||
|
|
@@ -243,6 +250,8 @@ func (n *nim) run(ctx context.Context) (err error) { | |
| n.subWwanStatus, | ||
| n.subNetworkInstanceConfig, | ||
| n.subKubeUserServices, | ||
| n.subVaultStatus, | ||
| n.subEnrolledCertStatus, | ||
| } | ||
| for _, sub := range inactiveSubs { | ||
| if err = sub.Activate(); err != nil { | ||
|
|
@@ -292,8 +301,16 @@ func (n *nim) run(ctx context.Context) (err error) { | |
| case change := <-n.subKubeUserServices.MsgChan(): | ||
| n.subKubeUserServices.ProcessChange(change) | ||
|
|
||
| case <-publishTimer.C: | ||
| case change := <-n.subVaultStatus.MsgChan(): | ||
| n.subVaultStatus.ProcessChange(change) | ||
|
|
||
| case change := <-n.subEnrolledCertStatus.MsgChan(): | ||
| n.subEnrolledCertStatus.ProcessChange(change) | ||
| n.handleEnrolledCertUpdate() | ||
|
|
||
| case <-publishTicker.C: | ||
| start := time.Now() | ||
| n.publishPNACMetrics() | ||
| err = n.cipherMetrics.Publish(n.Log, n.pubCipherMetrics, "global") | ||
| if err != nil { | ||
| n.Log.Error(err) | ||
|
|
@@ -408,6 +425,14 @@ func (n *nim) initPublications() (err error) { | |
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| n.pubPNACMetrics, err = n.PubSub.NewPublication(pubsub.PublicationOptions{ | ||
| AgentName: agentName, | ||
| TopicType: types.PNACMetricsList{}, | ||
| }) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
|
|
@@ -613,6 +638,27 @@ func (n *nim) initSubscriptions() (err error) { | |
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| n.subVaultStatus, err = n.PubSub.NewSubscription(pubsub.SubscriptionOptions{ | ||
| AgentName: "vaultmgr", | ||
| MyAgentName: agentName, | ||
| TopicImpl: types.VaultStatus{}, | ||
| Activate: false, | ||
| CreateHandler: n.handleVaultStatusCreate, | ||
| ModifyHandler: n.handleVaultStatusModify, | ||
| WarningTime: warningTime, | ||
| ErrorTime: errorTime, | ||
| }) | ||
|
|
||
| n.subEnrolledCertStatus, err = n.PubSub.NewSubscription(pubsub.SubscriptionOptions{ | ||
| AgentName: "scepclient", | ||
| MyAgentName: agentName, | ||
| TopicImpl: types.EnrolledCertificateStatus{}, | ||
| Activate: false, | ||
| Persistent: true, | ||
| WarningTime: warningTime, | ||
| ErrorTime: errorTime, | ||
| }) | ||
| return nil | ||
| } | ||
|
|
||
|
|
@@ -661,6 +707,17 @@ func (n *nim) applyGlobalConfig(gcp *types.ConfigItemValueMap) { | |
| timeout := gcp.GlobalValueInt(types.NetworkTestTimeout) | ||
| n.connTester.TestTimeout = time.Second * time.Duration(timeout) | ||
| n.connTester.DiagRemoteEndpoints = types.GetDiagRemoteEndpointURLs(n.Log, gcp) | ||
| metricInterval := gcp.GlobalValueInt(types.MetricInterval) | ||
| if metricInterval != 0 && n.metricInterval != metricInterval { | ||
| if n.publishTicker != nil { | ||
| interval := time.Duration(metricInterval) * time.Second | ||
| maxTime := float64(interval) | ||
| minTime := maxTime * 0.3 | ||
| n.publishTicker.UpdateRangeTicker( | ||
| time.Duration(minTime), time.Duration(maxTime)) | ||
| } | ||
| n.metricInterval = metricInterval | ||
| } | ||
| n.gcInitialized = true | ||
| } | ||
|
|
||
|
|
@@ -850,6 +907,31 @@ func (n *nim) handleKubeUserServicesDelete(_ interface{}, _ string, _ interface{ | |
| n.dpcManager.UpdateKubeUserServices(types.KubeUserServices{}) | ||
| } | ||
|
|
||
| func (n *nim) handleVaultStatusCreate(_ interface{}, key string, statusArg interface{}) { | ||
| n.handleVaultStatusImpl(key, statusArg) | ||
| } | ||
|
|
||
| func (n *nim) handleVaultStatusModify(_ interface{}, key string, statusArg, _ interface{}) { | ||
| n.handleVaultStatusImpl(key, statusArg) | ||
| } | ||
|
|
||
| func (n *nim) handleVaultStatusImpl(_ string, statusArg interface{}) { | ||
| status := statusArg.(types.VaultStatus) | ||
| vaultIsReady := status.Name == types.DefaultVaultName && | ||
| status.ConversionComplete && | ||
| status.Status != eveinfo.DataSecAtRestStatus_DATASEC_AT_REST_ERROR | ||
| n.dpcManager.UpdateVaultReadiness(vaultIsReady) | ||
| } | ||
|
|
||
| func (n *nim) handleEnrolledCertUpdate() { | ||
| var enrolledCerts []types.EnrolledCertificateStatus | ||
| for _, item := range n.subEnrolledCertStatus.GetAll() { | ||
| certStatus := item.(types.EnrolledCertificateStatus) | ||
| enrolledCerts = append(enrolledCerts, certStatus) | ||
| } | ||
| n.dpcManager.UpdateEnrolledCerts(enrolledCerts) | ||
| } | ||
|
|
||
| func (n *nim) listPublishedDPCs(directory string) (dpcFilePaths []string) { | ||
| locations, err := os.ReadDir(directory) | ||
| if err != nil { | ||
|
|
@@ -962,3 +1044,34 @@ func (n *nim) ingestDevicePortConfigFile(oldDirname string, newDirname string, n | |
| filename, err) | ||
| } | ||
| } | ||
|
|
||
| func (n *nim) publishPNACMetrics() { | ||
| var pnacMetricsList types.PNACMetricsList | ||
| dnsObj, err := n.pubDeviceNetworkStatus.Get("global") | ||
| if err != nil { | ||
| return | ||
| } | ||
| dns, ok := dnsObj.(types.DeviceNetworkStatus) | ||
| if !ok { | ||
| return | ||
| } | ||
| for _, port := range dns.Ports { | ||
| if port.IfName == "" || !port.PNAC.Enabled { | ||
| continue | ||
| } | ||
| ifIndex, exists, err := n.networkMonitor.GetInterfaceIndex(port.IfName) | ||
| if !exists || err != nil { | ||
| continue | ||
| } | ||
| metrics, err := n.networkMonitor.GetPNACMetrics(ifIndex) | ||
| if err != nil { | ||
| n.Log.Error(err) | ||
| } else { | ||
| pnacMetricsList.Ports = append(pnacMetricsList.Ports, metrics) | ||
| } | ||
| } | ||
| err = n.pubPNACMetrics.Publish(pnacMetricsList.Key(), pnacMetricsList) | ||
| if err != nil { | ||
| n.Log.Error(err) | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you get PENDING will you wait for 5 minutes by default before checking the status? That seems like a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PENDING is returned when certificate enrollment requires manual approval from an administrator. Given this, it’s reasonable to expect the process to take at least a few minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the SCEP RFC does not define a recommended polling interval, thus I made it configurable.