@@ -31,12 +31,12 @@ However, for host endpoints, Calico is more lenient; it only polices
31
31
traffic to/from interfaces that it's been explicitly told about. Traffic
32
32
to/from other interfaces is left alone.
33
33
34
- As of Calico v2.1.0, Calico applies host endpoint security policy both to traffic
35
- that is terminated locally, and to traffic that is forwarded between host
36
- endpoints. Previously, policy was only applied to traffic that was terminated
34
+ As of Calico v2.1.0, Calico applies host endpoint security policy both to traffic
35
+ that is terminated locally, and to traffic that is forwarded between host
36
+ endpoints. Previously, policy was only applied to traffic that was terminated
37
37
locally. The change allows Calico to be used to secure a NAT gateway or router.
38
38
Calico supports selector-based policy as normal when running on a gateway or router
39
- allowing for rich, dynamic security policy based on the labels attached to your
39
+ allowing for rich, dynamic security policy based on the labels attached to your
40
40
workloads.
41
41
42
42
> ** NOTE**
@@ -414,13 +414,16 @@ Policy for host endpoints can be marked as 'doNotTrack'. This means that rules
414
414
in that policy should be applied before any data plane connection tracking, and
415
415
that packets allowed by these rules should not be tracked.
416
416
417
- A typical scenario for using 'doNotTrack' policy would be a server, running
418
- directly on a host, that accepts a very high rate of shortlived connections,
419
- such as ` memcached ` . On Linux, if those connections are tracked, the conntrack
420
- table can fill up and then Linux may drop packets for further connection
421
- attempts, meaning that those newer connections will fail. If you are using
422
- Calico to secure that server's host, you can avoid this problem by defining a
423
- policy that allows access to the server's ports and is marked as 'doNotTrack'.
417
+ Untracked policy is designed for allowing untracked connections to a server
418
+ process running directly on a host - where by 'directly' we mean _ not_ in a
419
+ pod/VM/container workload. A typical scenario for using 'doNotTrack' policy
420
+ would be a server, running directly on a host, that accepts a very high rate of
421
+ shortlived connections, such as ` memcached ` . On Linux, if those connections
422
+ are tracked, the conntrack table can fill up and then Linux may drop packets
423
+ for further connection attempts, meaning that those newer connections will
424
+ fail. If you are using Calico to secure that server's host, you can avoid this
425
+ problem by defining a policy that allows access to the server's ports and is
426
+ marked as 'doNotTrack'.
424
427
425
428
Since there is no connection tracking for a 'doNotTrack' policy, it is
426
429
important that the policy's ingress and egress rules are specified
@@ -429,3 +432,285 @@ an ingress rule allowing access *to* port 999 and an egress rule allowing
429
432
outbound traffic * from* port 999. (Whereas for a connection tracked policy, it
430
433
is usually enough to specify the ingress rule only, and then connection
431
434
tracking will automatically allow the return path.)
435
+
436
+ Because of how untracked policy is implemented, untracked ingress rules apply
437
+ to all incoming traffic through a host endpoint - regardless of where that
438
+ traffic is going - but untracked egress rules only apply to traffic that is
439
+ sent from the host itself (not from a local workload) out of that host
440
+ endpoint.
441
+
442
+ ## Pre-DNAT policy
443
+
444
+ Policy for host endpoints can be marked as 'preDNAT'. This means that rules in
445
+ that policy should be applied before any DNAT (Destination Network Address
446
+ Translation), which is useful if it is more convenient to specify Calico policy
447
+ in terms of a packet's original destination IP address and port, than in terms
448
+ of that packet's destination IP address and port after it has been DNAT'd.
449
+
450
+ An example is securing access to Kubernetes NodePorts from outside the cluster.
451
+ Traffic from outside is addressed to any node's IP address, on a known
452
+ NodePort, and Kubernetes (kube-proxy) then DNATs that to the IP address of one
453
+ of the pods that provides the corresponding Service, and the relevant port
454
+ number on that pod (which is usually different from the NodePort).
455
+
456
+ As NodePorts are the externally advertised way of connecting to Services (and a
457
+ NodePort uniquely identifies a Service, whereas an internal port number may
458
+ not), it makes sense to express Calico policy to expose or secure particular
459
+ Services in terms of the corresponding NodePorts. But that is only possible if
460
+ the Calico policy is applied before DNAT changes the NodePort to something
461
+ else - and hence this kind of policy needs 'preDNAT' set to true.
462
+
463
+ In addition to being applied before any DNAT, the enforcement of pre-DNAT
464
+ policy differs from that of normal host endpoint policy in three key details,
465
+ reflecting that it is designed for the policing of incoming traffic from
466
+ outside the cluster:
467
+
468
+ 1 . Pre-DNAT policy may only have ingress rules, not egress. (When incoming
469
+ traffic is allowed by the ingress rules, standard connection tracking is
470
+ sufficient to allow the return path traffic.)
471
+
472
+ 2 . Pre-DNAT policy is enforced for all traffic arriving through a host
473
+ endpoint, regardless of where that traffic is going, and - in particular -
474
+ even if that traffic is routed to a local workload on the same host.
475
+ (Whereas normal host endpoint policy is skipped, for traffic going to a
476
+ local workload.)
477
+
478
+ 3 . There is no 'default drop' semantic for pre-DNAT policy (as there is for
479
+ normal host endpoint policy). In other words, if a host endpoint is defined
480
+ but has no pre-DNAT policies that explicitly allow or deny a particular
481
+ incoming packet, that packet is allowed to continue on its way, and will
482
+ then be accepted or dropped according to workload policy (if it is going to
483
+ a local workload) or to normal host endpoint policy (if not).
484
+
485
+ ## When do host endpoint policies apply?
486
+
487
+ As stated above, normal host endpoint policies apply to traffic that arrives on
488
+ and/or is sent to a host interface, except if that traffic comes from or is
489
+ destined for a workload on the same host; but the rules for applying untracked
490
+ and pre-DNAT policies are different in some cases. Here we present and
491
+ summarize all of those rules together, for all possible flows and all types of
492
+ host endpoints policy.
493
+
494
+ For packets that arrive on a host interface and are destined for a local
495
+ workload - i.e. a locally-hosted pod, container or VM:
496
+
497
+ - Pre-DNAT policies apply.
498
+
499
+ - Normal policies do not apply - by design, because Calico enforces the
500
+ destination workload's ingress policy in this case.
501
+
502
+ - Untracked policies technically do apply, but never have any net positive
503
+ effect for such flows.
504
+
505
+ > ** NOTE**
506
+ >
507
+ > To be precise, untracked policy for the incoming host interface may apply
508
+ > in the forwards direction, and if so it will have the effect of forwarding
509
+ > the packet to the workload without any connection tracking. But then, in
510
+ > the reverse direction, there will be no conntrack state for the return
511
+ > packets to match, and there is no application of any egress rules that may
512
+ > be defined by the untracked policy - so unless the workload's policy
513
+ > specifically allows the relevant source IP, the return packet will be
514
+ > dropped. That is the same overall result as if there was no untracked
515
+ > policy at all, so in practice it is as if untracked policies do not apply
516
+ > to this flow.
517
+
518
+ For packets that arrive on a host interface and are destined for a local
519
+ server process in the host namespace:
520
+
521
+ - Untracked, pre-DNAT and normal policies all apply.
522
+
523
+ - If a packet is explicitly allowed by untracked policy, it skips over any
524
+ pre-DNAT and normal policy.
525
+
526
+ - If a packet is explicitly allowed by pre-DNAT policy, it skips over any
527
+ normal policy.
528
+
529
+ For packets that arrive on a host interface (A) and are forwarded out of the
530
+ same or another host interface (B):
531
+
532
+ - Untracked policies apply, for both host interfaces A and B, but only the
533
+ ingress rules that are defined in those policies. The forwards direction is
534
+ governed by the ingress rules of untracked policies that apply to interface
535
+ A, and the reverse direction is governed by the ingress rules of untracked
536
+ policies that apply to interface B, so those rules should be defined
537
+ symmetrically.
538
+
539
+ - Pre-DNAT policies apply, specifically the ingress rules of the pre-DNAT
540
+ policies that apply to interface A. (The reverse direction is allowed by
541
+ conntrack state.)
542
+
543
+ - Normal policies apply, specifically the ingress rules of the normal policies
544
+ that apply to interface A, and the egress rules of the normal policies that
545
+ apply to interface B. (The reverse direction is allowed by conntrack state.)
546
+
547
+ - If a packet is explicitly allowed by untracked policy, it skips over any
548
+ pre-DNAT and normal policy.
549
+
550
+ - If a packet is explicitly allowed by pre-DNAT policy, it skips over any
551
+ normal policy.
552
+
553
+ For packets that are sent from a local server process (in the host namespace)
554
+ out of a host interface:
555
+
556
+ - Untracked policies apply, specifically the egress rules of the untracked
557
+ policies that apply to the host interface.
558
+
559
+ - Normal policies apply, specifically the egress rules of the normal policies
560
+ that apply to that host interface.
561
+
562
+ - Pre-DNAT policies do not apply.
563
+
564
+ For packets that are sent from a local workload out of a host interface:
565
+
566
+ - No host endpoint policies apply.
567
+
568
+ ## Pre-DNAT policy: a worked example
569
+
570
+ Imagine a Kubernetes cluster, that its administrator wants to secure as much as
571
+ possible against incoming traffic from outside the cluster. Let's suppose that:
572
+
573
+ - The cluster provides various useful Services that are exposed as Kubernetes
574
+ NodePorts - i.e. as well-known TCP port numbers that appear to be available
575
+ on any node in the cluster.
576
+
577
+ - Most of those Services, however, should not be accessed from outside the
578
+ cluster via _ any_ node, but instead via a LoadBalancer IP that is routable
579
+ from outside the cluster and maps to one of just a few 'ingress' nodes. (The
580
+ LoadBalancer IP is a virtual IP that, at any given time, gets routed somehow
581
+ to one of those 'ingress' nodes.)
582
+
583
+ - For a few Services, on the other hand, there is no LoadBalancer IP set up, so
584
+ those Services should be accessible from outside the cluster through their
585
+ NodePorts on any node.
586
+
587
+ - All other incoming traffic from outside the cluster should be disallowed.
588
+
589
+ ![ ] ( {{site.baseurl}}/images/bare-metal-example.png )
590
+
591
+ For each Service in the first set, we want to allow traffic from outside the
592
+ cluster that is addressed to ` <service-load-balancer-ip>:<service-port> ` , but
593
+ only when it enters the cluster through one of the 'ingress' nodes. For each
594
+ Service in the second set, we want to allow traffic from outside the cluster
595
+ that is addressed to ` <node-ip>:<service-node-port> ` , via any node.
596
+
597
+ We can do this by applying Calico pre-DNAT policy to the external interfaces of
598
+ each cluster node. We use pre-DNAT policy, rather than normal host endpoint
599
+ policy, for two reasons:
600
+
601
+ 1 . Normal host endpoint policy is not enforced for incoming traffic to a local
602
+ pod, whereas pre-DNAT policy is enforced for _ all_ incoming traffic. Here
603
+ we want to police all incoming traffic from outside the cluster, regardless
604
+ of its destination, so pre-DNAT is the right choice.
605
+
606
+ 2 . We want to express our policy in terms of the external port numbers
607
+ ` <service-port> ` and ` <service-node-port> ` . The kube-proxy on the ingress
608
+ node will use DNATs to change those port numbers (and IP addresses) to those
609
+ of one of the pods that backs the relevant Service. Our policy therefore
610
+ needs to be enforced _ before_ those DNATs, and of course that is exactly
611
+ what pre-DNAT policy is for.
612
+
613
+ Let's begin with the policy to disallow incoming traffic by default. Every
614
+ outward interface of each node, by which traffic from outside could possibly
615
+ enter the cluster, must be defined as a Calico host endpoint; for example, for
616
+ ` eth0 ` on ` node1 ` :
617
+
618
+ ```
619
+ apiVersion: v1
620
+ kind: hostEndpoint
621
+ metadata:
622
+ name: node1-eth0
623
+ node: node1
624
+ labels:
625
+ host-endpoint: ingress
626
+ spec:
627
+ interfaceName: eth0
628
+ ```
629
+
630
+ The nodes that are allowed as load balancer ingress nodes should have an
631
+ additional label to indicate that, let's say ` load-balancer-ingress: true ` .
632
+
633
+ Then we can deny all incoming traffic through those interfaces, unless it is
634
+ from a source IP that is known to be within the cluster. (Note: we are
635
+ assuming that the same interfaces can also be used for traffic that is
636
+ forwarded from other nodes or pods in the cluster - as would be the case for
637
+ nodes with only one external interface.)
638
+
639
+ ```
640
+ apiVersion: v1
641
+ kind: policy
642
+ metadata:
643
+ name: disallow-incoming
644
+ spec:
645
+ preDNAT: true
646
+ order: 100
647
+ ingress:
648
+ - action: deny
649
+ source:
650
+ notNets: [<pod-cidr>, <cluster-internal-node-cidr>, ...]
651
+ selector: host-endpoint=='ingress'
652
+ ```
653
+
654
+ Now, to allow traffic through the load balancer ingress nodes to
655
+ ` <service-load-balancer-ip>:<service-port> ` (for each load-balanced Service):
656
+
657
+ ```
658
+ apiVersion: v1
659
+ kind: policy
660
+ metadata:
661
+ name: allow-load-balancer-service-1
662
+ spec:
663
+ preDNAT: true
664
+ order: 90
665
+ ingress:
666
+ - action: allow
667
+ destination:
668
+ nets: [<service-load-balancer-ip>]
669
+ ports: [<service-port>]
670
+ selector: load-balancer-ingress=='true'
671
+ ```
672
+
673
+ And for traffic to NodePorts - for each non-load-balanced Service - via any
674
+ node:
675
+
676
+ ```
677
+ apiVersion: v1
678
+ kind: policy
679
+ metadata:
680
+ name: allow-node-port-service-1
681
+ spec:
682
+ preDNAT: true
683
+ order: 90
684
+ ingress:
685
+ - action: allow
686
+ destination:
687
+ ports: [<node-port>]
688
+ selector: host-endpoint=='ingress'
689
+ ```
690
+
691
+ And that completes the example. It's worth re-emphasizing, though, two key
692
+ points about the application of pre-DNAT policy that make this work; especially
693
+ as pre-DNAT policy differs on these points from normal host endpoint policy.
694
+
695
+ Firstly, there is no 'default drop' semantic for pre-DNAT policy, like there
696
+ _ is_ for normal policy. So, if policies are defined such that _ some_ pre-DNAT
697
+ policies apply to a host endpoint, but none of those policies matches a
698
+ particular incoming packet, that packet is allowed to continue on its way.
699
+ (Whereas if there are normal policies that apply to a host endpoint, and
700
+ none of those policies matches a packet, that packet will be dropped.)
701
+
702
+ For the example here, that means that we can specify some pre-DNAT policy,
703
+ applying to all of the cluster's external interfaces, without having to
704
+ enumerate and explicitly _ allow_ all of the internal flows that may also go
705
+ through those interfaces. It's also why the second point works...
706
+
707
+ Namely, that if traffic comes in through a host endpoint and is routed to a
708
+ local workload, any host endpoint pre-DNAT policy is enforced as well as the
709
+ ingress policy for that workload - whereas normal host endpoint policy is
710
+ skipped in that scenario. (Normal host endpoint policy is 'trumped' by
711
+ workload policy, for packets going to a local workload.)
712
+
713
+ For the example here, that means that the last pre-DNAT policy above does not
714
+ accidentally expose workloads that happen to use the same ` <node-port> ` , or
715
+ that provide the backing for ` <node-port> ` , unless those workloads' own policy
716
+ allows that.
0 commit comments