Architecture
kty is a standalone SSH server which provides much of the standard SSH
functionality as if a Kubernetes cluster was a single host. This allows for
ssh me@my-cluster to provide a shell in a running pod, tunneling between
resources without a VPN and copying files from the cluster. By leveraging OpenID
and RBAC, you can provide access to your clusters from anywhere using ssh, a
tool that comes included in almost every environment out there.
Similar to normal SSH servers, it delegates authentication and authorization to other methods. In particular, identity is verified via. OpenID providers or SSH public keys that have been added as custom resources to the cluster. It is possible to bring your own provider but there is a default one provided that allows for Github and Google authentication out of the box.
Once authenticated, each user’s session has a Kubernetes client attached to it
that impersonates the user - in fact, the server itself has almost no
permissions itself. Based on the user’s permissions granted with RBAC, the
server acts on the user’s behalf based on what the SSH session is telling it to
do. This is effectively the same thing that kubectl does with authentication
plugins, just without the plugin or kubectl binary involved.
How it works
Shell Access
After a session has been established, the SSH client can issue a pty request. To do this:
- The client requests a new channel.
- The
PTYrequest is sent to the server. - A k8s client is created that impersonates the user.
- An IO loop and UI loop are started as background tasks.
- The user receives a TUI dashboard that is interpreting the k8s client’s output.
Note: log streaming works similarly here. A stream is setup and gets added to a buffer on each loop of the renderer.
Once the dashboard has been started up, a user can go to the Shell tab. This:
- Issues a
pod/execrequest in the client. - Pauses drawing the UI in the state that it was at.
- Switches the UI to
rawmode. Instead of drawing the UI, it streams directly from the pod’sexecinput/output. - On destruction of the stream, it resumes drawing the UI.
Client functionality used:
- A Reflector to list pods and get all the state there.
pod/logsto stream logs from the podpod/execto exec/bin/bashin a container and return a shell. This also uses thettyfunctionality there to allow for direct access instead of a simple stream.
Ingress Tunneling
Once a session has been established:
- The client starts listening to the configured port on localhost.
- When a new connection to that port is received, it opens a
direct-tcpipchannel to the server. - The incoming hostname is mapped from a k8s resource string
(
<resource>/<namespace>/<name>) to something that can be connected to. In the case of nodes and pods, this is the IP address associated with the resource itself. For services, this is the service’s DNS name itself. - A connection from the server to the resource is established.
- A background task is started that copies data between the client’s channel and the stream between the server and resource.
- When EOF is sent on either the source stream or the destination stream, the background task is canceled and the channel is closed.
Notes:
- The client creates a new channel on each incoming TCP connection.
- Usually, a client sends a PTY request and then asks for new tcpip channels. Because of this behavior, we show tunnel status in the dashboard. It also report errors on a per-session basis.
- The server assumes that it has access to the requested resources. If not, the connection will fail and the user will receive an error.
- This does not use
pod/port-forwardlikekubectldoes. It proxies directly from the server to the resource. - This does not rely on
proxysupport in the API server. That is restricted to HTTP/HTTPS and doesn’t allow raw TCP.
Client functionality used:
getfor the resource in question (node, pod).
Egress Tunneling
Once a session has been established:
- The client issues a tcpip-forward request.
- The client optionally sends a
ptyrequest. - A backgroudn task is started that listens on a random port on the server.
- The service in the connection string (
-R default/foo:8080:localhost:8080) is patched (or created) so that it has no selector. - An EndpointSlice is created with:
- An address pointing at the server’s IP address. This comes from
local-ip-address when run within a cluster and via.
serveconfig otherwise. - A
TargetRefthat is the server. On cluster, this is built from the kubeconfig’s default namespace and the hostname of the pod.
- An address pointing at the server’s IP address. This comes from
local-ip-address when run within a cluster and via.
- Incoming connections open a forwarded-tcpip channel to the client.
- A background task is started that copies data between the source (something on the cluster) and the destination (the localhost).
- When EOF is sent on either the source stream or the destination stream, the background task is canceled and the SSH channel is closed. This does not terminate the session - that is still running and could be handling other things.
Notes:
- There’s a new channel created to the client on every incoming connection for the cluster. This works because SSH sessions are assumed to be multiplexed and bidirectional.
- The service is patched, the assumption is that the user issuing the request can override the service if desired. It is entirely possible, however, that an important service is overwritten.
- The service and endpoint created are not garbage collected. OwnerReferences are not cross-namespace, so it becomes difficult to know what is unused and what isn’t.
Client functionality used:
patchfor services and endpoints.
SCP / SFTP
Once a session has been established:
- The client requests a new channel.
- A subsystem request for
sftpis issued. - The channel is handed off to a background task which handles the sftp protocol.
At this point, what happens depends on the incoming request. For a simple
scp me@my-cluster:/default/foo/bar/etc/hosts /tmp/hosts:
- An fstat request is sent to verify that the file exists.
- The
defaultnamespace is fetched. - The
foopod is fetched. - The
barcontainer is verified to exist. - A
pod/execis started in thedefault/foopod for thebarcontainer that doesls -l /etc/hosts. - The result of this is parsed into something that can be sent back to the client.
- The client issues an
openrequest. - The client issues a
readrequest. - Another
pod/execis started, this time withcat /etc/hosts. The content of this is streamed back to the client. - The client finally issues a
closerequest and an EOF. - The server closes the connection.
In addition to stat and read requests, SFTP allows for browsing entire file
trees. This is handled at the namespace/pod level via list requests for those
resources (eg list namespaces for the root). Inside the container, pdo/exec
and ls is used again on a per-directory basis.
Notes:
- This seems a little ridiculous, and it is. This is almost how
kubectl cpworks! Instead oflsandcatit usestarand hopes that it works.
Client functionality used:
listfor namespaces and pods.getfor namespaces and pods.pod/execfor files inside the container.
Design Decisions
-
Instead of having a separate
UserCRD to track the user, we rely on k8s’ users/groups which are subjects on (Cluster)RoleBindings. Identity is extracted from the openid tokens via claims (email by default) and that is used to map to k8s concepts. TheKeyresource maps the values from a previous token to the SSH key used during the original authentication attempt. This key expires when the token itself would have and can be created manually with any desired expiration. -
The minimum viable access is
listforpodsacross all namespaces. Understanding what subset of a cluster users can see is a PITA. This is because k8s cannot filterlistrequests to a subset. When combined with the fact thatSelfSubjectRulesReviewonly works on a per-namespace basis, it becomes extremely expensive to understand what an individual user can see across the entire cluster. This will be updated in the future but requires namespaces to be available via UI elements.