Description
Recently we proposed pcap-release as an easy way for CF application developers and landscape operators to capture network traffic for their apps and/or their BOSH VMs. See issue cloudfoundry/cf-deployment#980 for a more detailed description of pcap-release.
For the use case of capturing traffic from CF apps, we would need to implement some features in the cloud-controller and would like to get your feedback on our proposed solution.
The following diagram shows how we're planning to capture app network traffic via the pcap-agent
on the app-container, which is then sent via the pcap-api
to the cf-CLI
on the client machine:
Our proposed solution would work similarly to the cf app-ssh process:
cf-CLI
plugin that implements commands to enable and perform tcpdumps on specific apps/app instances, with a possibility to pass on a packet filter as a parameter (e.g. for a specific source address) (see app-ssh commands)pcap-api
(analogous tossh-proxy
for app-ssh) acts as endpoint forcf-CLI
and passes the requests on to thepcap-agent
on the app-containers.pcap-api
is also responsible for user authentication.pcap-agent
(analogous todiego-sshd
for app-ssh) runs on the container and acts as a wrapper to libpcap to capture network traffic
The only difference to app-ssh in regards to the cloud-controller implementation is that pcap-agent
requires root permissions on the container to be able to access network traffic data. diego-sshd
runs as user vcap
.
We have already successfully executed a spike/PoC where we modified cloud-controller and diego-release on one of our dev-landscapes to globally enable pcap-agent
/run the agent on every app-container in the landscape:
- The pcap-agent binary is built and packaged into the
buildpack_app_lifecycle
by diego-release (alongsidediego-sshd
), which is then extracted on every app-container - We created a new action and port mappings for
pcap-agent
in the cloud-controller
(More details on the changes to diego-release here: cloudfoundry/diego-release#703)
With these modifications we were able to capture a tcpdump on an app-container via the pcap-agent
from any landscape-internal VM.
The biggest limitation of this spike (in regards to cloud-controller) was that we didn't implement an "app-feature-flag" similarly to allow_ssh in the CC.
Before we move further, we would like to get your feedback, especially for the following questions:
- Do you see any roadblocks or complexities we might have missed?
- Is the container-root permission for
pcap-agent
acceptable or is it an issue? - While app-ssh permissions can be granted on both space and app-level, we were considering having only permissions on app-level for simplicity. Do you think this would be enough to satisfy legal requirements like GDPR?
- Do we need/Do you see value in a global cf feature flag for
pcap-release
?- Is it even possible to switch on during runtime if the
pcap-api
needs to be deployed?
- Is it even possible to switch on during runtime if the