IPTables Reference
In order to route TCP traffic in a pod to and from the proxy, an init
container
is used to set up iptables
rules at the start of an injected pod’s
lifecycle.
At first, linkerd-init
will create two chains in the nat
table:
PROXY_INIT_REDIRECT
, and PROXY_INIT_OUTPUT
. These chains are used to route
inbound and outbound packets through the proxy. Each chain has a set of rules
attached to it, these rules are traversed by a packet in order.
Inbound connections
When a packet arrives in a pod, it will typically be processed by the
PREROUTING
chain, a default chain attached to the nat
table. The sidecar
container will create a new chain to process inbound packets, called
PROXY_INIT_REDIRECT
. The sidecar container creates a rule
(install-proxy-init-prerouting
) to send packets from the PREROUTING
chain
to our redirect chain. This is the first rule traversed by an inbound packet.
The redirect chain will be configured with two more rules:
ignore-port
: will ignore processing packets whose destination ports are included in theskip-inbound-ports
install option.proxy-init-redirect-all
: will redirect all incoming TCP packets through the proxy, on port4143
.
Based on these two rules, there are two possible paths that an inbound packet can take, both of which are outlined below.
The packet will arrive on the PREROUTING
chain and will be immediately routed
to the redirect chain. If its destination port matches any of the inbound ports
to skip, then it will be forwarded directly to the application process,
bypassing the proxy. The list of destination ports to check against can be
configured when installing Linkerd. If the
packet does not match any of the ports in the list, it will be redirected
through the proxy. Redirection is done by changing the incoming packet’s
destination header, the target port will be replaced with 4143
, which is the
proxy’s inbound port. The proxy will process the packet and produce a new one
that will be forwarded to the service; it will be able to get the original
target (IP:PORT) of the inbound packet by using a special socket option
SO_ORIGINAL_DST
. The new packet
will be routed through the OUTPUT
chain, from there it will be sent to the
application. The OUTPUT
chain rules are covered in more detail below.
Outbound connections
When a packet leaves a pod, it will first traverse the OUTPUT
chain, the
first default chain an outgoing packet traverses in the nat
table. To
redirect outgoing packets through the outbound side of the proxy, the sidecar
container will again create a new chain. The first outgoing rule is similar to
the inbound counterpart: any packet that traverses the OUTPUT
chain should be
forwarded to our PROXY_INIT_OUTPUT
chain to be processed.
The output redirect chain is slightly harder to understand but follows the same logical flow as the inbound redirect chain, in total there are 4 rules configured:
ignore-proxy-uid
: any packets owned by the proxy (whose user id is2102
), will skip processing and return to the previous (OUTPUT
) chain. From there, it will be sent on the outbound network interface (either to the application, in the case of an inbound packet, or outside of the pod, for an outbound packet).ignore-loopback
: if the packet is sent over the loopback interface (lo
), it will skip processing and return to the previous chain. From here, the packet will be sent to the destination, much like the first rule in the chain.ignore-port
: will ignore processing packets whose destination ports are included in theskip-outbound-ports
install option.redirect-all-outgoing
: the last rule in the chain, it will redirect all outgoing TCP packets to port4140
, the proxy’s outbound port. If a packet has made it this far, it is guaranteed its destination is not local (i.elo
) and it has not been produced by the proxy. This means the packet has been produced by the service, so it should be forwarded to its destination by the proxy.
A packet produced by the service will first hit the OUTPUT
chain; from here,
it will be sent to our own output chain for processing. The first rule it
encounters in PROXY_INIT_OUTPUT
will be ignore-proxy-uid
. Since the packet
was generated by the service, this rule will be skipped. If the packet’s
destination is not a port bound on localhost (e.g 127.0.0.1:80
), then it will
skip the second rule as well. The third rule, ignore-port
will be matched if
the packet’s destination port is in the outbound ports to skip list, in this
case, it will be sent out on the network interface, bypassing the proxy. If the
rule is not matched, then the packet will reach the final rule in the chain
redirect-all-outgoing
– as the name implies, it will be sent to the proxy to
be processed, on its outbound port 4140
. Much like in the inbound case, the
routing happens at the nat
level, the packet’s header will be re-written to
target the outbound port. The proxy will process the packet and then forward it
to its destination. The new packet will take the same path through the OUTPUT
chain, however, it will stop at the first rule, since it was produced by the
proxy.
The substantiated explanation applies to a packet whose destination is another
service, outside of the pod. In practice, an application can also send traffic
locally. As such, there are two other possible scenarios that we will explore:
when a service talks to itself (by sending traffic over localhost or by using
its own endpoint address), and when a service talks to itself through a
clusterIP
target. Both scenarios are somehow related, but the path a packet
takes differs.
A service may send requests to itself. It can also target another container in the pod. This scenario would typically apply when:
- The destination is the pod (or endpoint) IP address.
- The destination is a port bound on localhost (regardless of which container it belongs to).
When the application targets itself through its pod’s IP (or loopback address), the packets will traverse the two output chains. The first rule will be skipped, since the owner is the application, and not the proxy. Once the second rule is matched, the packets will return to the first output chain, from here, they’ll be sent directly to the service.
A service may send requests to itself using its clusterIP. In such cases, it is not guaranteed that the destination will be local. The packet follows an unusual path, as depicted in the diagram below.
When the packet first traverses the output chains, it will follow the same path
an outbound packet would normally take. In such a scenario, the packet’s
destination will be an address that is not considered to be local by the
kernel– it is, after all, a virtual IP. The proxy will process the packet, at
a connection level, connections to a clusterIP
will be load balanced between
endpoints. Chances are that the endpoint selected will be the pod itself,
packets will therefore never leave the pod; the destination will be resolved to
the podIP. The packets produced by the proxy will traverse the output chain and
stop at the first rule, then they will be forwarded to the service. This
constitutes an edge case because at this point, the packet has been processed
by the proxy, unlike the scenario previously discussed where it skips it
altogether. For this reason, at a connection level, the proxy will not mTLS
or opportunistically upgrade the connection to HTTP/2 when the endpoint is
local to the pod. In practice, this is treated as if the destination was
loopback, with the exception that the packet is forwarded through the proxy,
instead of being forwarded from the service directly to itself.
Rules table
For reference, you can find the actual commands used to create the rules below. Alternatively, if you want to inspect the iptables rules created for a pod, you can retrieve them through the following command:
$ kubectl -n <namesppace> logs <pod-name> linkerd-init
# where <pod-name> is the name of the pod
# you want to see the iptables rules for
Inbound
# | name | iptables rule | description |
---|---|---|---|
1 | redirect-common-chain | iptables -t nat -N PROXY_INIT_REDIRECT | creates a new iptables chain to add inbound redirect rules to; the chain is attached to the nat table |
2 | ignore-port | iptables -t nat -A PROXY_INIT_REDIRECT -p tcp --match multiport --dports <ports> -j RETURN | configures iptables to ignore the redirect chain for packets whose dst ports are included in the --skip-inbound-ports config option |
3 | proxy-init-redirect-all | iptables -t nat -A PROXY_INIT_REDIRECT -p tcp -j REDIRECT --to-port 4143 | configures iptables to redirect all incoming TCP packets to port 4143 , the proxy’s inbound port |
4 | install-proxy-init-prerouting | iptables -t nat -A PREROUTING -j PROXY_INIT_REDIRECT | the last inbound rule configures the PREROUTING chain (first chain a packet traverses inbound) to send packets to the redirect chain for processing |
Outbound
# | name | iptables rule | description |
---|---|---|---|
1 | redirect-common-chain | iptables -t nat -N PROXY_INIT_OUTPUT | creates a new iptables chain to add outbound redirect rules to, also attached to the nat table |
2 | ignore-proxy-uid | iptables -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -j RETURN | when a packet is owned by the proxy (--uid-owner 2102 ), skip processing and return to the previous (OUTPUT ) chain |
3 | ignore-loopback | iptables -t nat -A PROXY_INIT_OUTPUT -o lo -j RETURN | when a packet is sent over the loopback interface (lo ), skip processing and return to the previous chain |
4 | ignore-port | iptables -t nat -A PROXY_INIT_OUTPUT -p tcp --match multiport --dports <ports> -j RETURN | configures iptables to ignore the redirect output chain for packets whose dst ports are included in the --skip-outbound-ports config option |
5 | redirect-all-outgoing | iptables -t nat -A PROXY_INIT_OUTPUT -p tcp -j REDIRECT --to-port 4140 | configures iptables to redirect all outgoing TCP packets to port 4140 , the proxy’s outbound port |
6 | install-proxy-init-output | iptables -t nat -A OUTPUT -j PROXY_INIT_OUTPUT | the last outbound rule configures the OUTPUT chain (second before last chain a packet traverses outbound) to send packets to the redirect output chain for processing |