Container Security Observability In Cloud Environments

David Levitsky
8 min readMay 11, 2021

As organizations look to accelerate their delivery cycles, container adoption has skyrocketed in recent years. Cloud providers have made the container deployment experience as easy as a one-click deploy. However, the ease of deployment and fast-paced delivery cycles still come with the challenge of securing these kinds of workloads.

How do you develop security monitoring for your container workloads in the cloud? Does the design change if your containers are running in a serverless environment? In this post, we will outline different strategies for approaching container security observability in traditional Container-as-a-Service (CaaS) environments, as well as serverless CaaS environments.

Container security can be broken down into two main categories: static observability and runtime observability (also known as dynamic). Let’s walk through how we got here and how we can secure our container workloads. A comprehensive security strategy includes many more components such as least-privilege IAM access and locked down networking, but for the sake of brevity we will focus only on a container and its components.

Background

Before we had containers, we had virtual machines. The approach to monitoring these hosts is pretty straightforward — take your favorite security agent, drop it on a box, and ship your metrics off to your SIEM solution. A popular open source agent is osquery, but there are many others such as ossec, sysdig, and more. Here’s a simplified example of what that could look like:

Traditional VM Monitoring

How do these agents collect information about your virtual machine and monitor it for potentially malicious events? They take advantage of the Linux Audit Framework. This framework audits a variety of events occurring on your machine, such as file access, directory access, and system calls. Palantir has a nice post regarding the audit framework for some light reading, but the key takeaway is you can configure the audit framework to monitor sensitive operations, like a process communicating with the kernel, and hook into this stream of events. If a malicious process tries to spawn a reverse shell by opening a port, your security agent can identify this action and fire an alert. More recent solutions like Falco leverage eBPF rather than the audit framework, but we’ll touch on this in a future post.

Monitoring Containers — With Host Access

In the example above, we see a 1:1 mapping of security agent to a host. However, with container workloads, we now have multiple applications running on the same host. Clearly, the model above needs to change, right?

The answer is yes, but it depends on how much. We will discuss monitoring containers in two different kinds of cloud environments — one where you have access to the host, and one where you don’t (serverless).

Runtime Observability

In the scenario where we have access to the host, the approach is very similar to what we saw with virtual machines. Take the EC2 launch type on AWS EKS or AWS ECS. AWS handles the container orchestration for you, but you are still responsible for scaling, monitoring, and patching your EC2 instances. This means you have full access to the host and can install your agent of choice on the box.

Why is having host access important? Containers run in their own namespaces, which is an abstraction of resources to provide isolation. In order to view all processes, we need privileged (root) access to view all namespaces and have a complete view of what’s happening on the machine. For a deep dive on container isolation, I highly recommend reading Chapter 4 from Liz Rice’s Container Security.

Now we know we have privileged access, and we know we need to get an agent on each host. There are two approaches to accomplish this:

  • Golden Images
  • DaemonSet (Kubernetes construct)

Golden Image

In the golden image approach, we install all security and logging agents that should be running and create a machine image (AMI in the AWS world). We can then use this image as the base image for nodes running our container workloads — the security agents are already installed and will run at the host level on each node.

From the bottom up.

As the security agents are able to run in a privileged context and have access to all namespaces, they can monitor existing and future container workloads spun up on the machine.

DaemonSet

Per the Kubernetes documentation, a DaemonSet ensures that all nodes run a copy of a pod. As new nodes are added to the cluster, Kubernetes will ensure that your specified pod is running on the node as well. Hence, you can add your security agent of choice and mark it as a DaemonSet, and Kubernetes will ensure it’s present on every node in your cluster.

There are several things to look out for with this approach. First, note that the DaemonSet will need to run in privileged mode. Secondly, with this approach, your agent is running inside of a container and thus a virtualized view of filesystems. You will need to volume mount relevant directories from the host into your container in order for the agent to do its job, else you will be limited to seeing only what’s happening inside of your container’s namespace. This is outside of the scope of this post, but you can find a deeper dive in this blog post.

Static Observability

So far we’ve only discussed runtime observability and figuring out what’s happening with our container at runtime. However, any container security strategy should start with static observability.

With traditional VM-based application deployments, workloads will typically fetch a whole suite of dependencies and libraries at runtime. This makes it difficult to do static analysis of the entire workload prior to deployment time, as the application has full right to fetch whatever it needs at runtime.

Containers, however, operate under a different construct — they are supposed to be immutable infrastructure. This means that whatever is packaged inside your container should be the exact same artifact that is deployed, and lends itself very nicely to static analysis. As developers create new images and push them to a repository, they should be scanned for vulnerabilities. A popular open source tool is Clair, and there are numerous commercial solutions as well. Unscanned images, or images with reported vulnerabilities, can be prevented from ever being deployed using something like OPA. This is a much stronger security control, as it is preventative rather than reactive, and we can catch and prevent vulnerabilities before they ever hit a production environment.

Container scanning as part of CI/CD flow.

Monitoring Containers — Serverless

Runtime Observability

Now things start to get a little tricky. In a serverless environment, by design, we no longer have access to the host. Many Linux capabilities are denied to provide a stronger guarantee of security, and DaemonSets are not allowed, so our approaches above won’t work. We have several different approaches to try to implement security observability for our serverless workloads:

  • Image Layering
  • Sidecars
  • Using Cloud Provider Solution

Image Layering

In this approach, we embed our agent directly into our application’s container by adding a new layer to our application’s image. You can see an example Dockerfile provided by Aqua here, which would result in an image similar to this:

The agent’s scope will be limited only to the current container, and it will be able to obtain a very limited set of metrics as underlying infrastructure is abstracted away . This approach has additional drawbacks — it violates the principle that a container should have only its own libraries and dependencies in it, and it requires custom modifications to each application’s Dockerfile. For a few applications, this might be doable, but if you are a governance team in a large organization, enforcement might be impractical.

Sidecars

In this approach, we create a sidecar that runs alongside the main application. Here, we run into similar drawbacks as above, with the added overhead of consuming extra CPU + memory due to the need for another container.

Additional sidecar container running in the same pod as the main application.

Using Cloud Provider Solution

One could make a very good argument that since the host machine is owned and controlled by the cloud provider, they should be responsible for exposing security observability for customer consumption. AWS and GCP both have taken a shot at this.

In 2020, AWS announced support for the CAP_SYS_PTRACE capability in their serverless container orchestration offering called AWS Fargate. This provides a process the ability to introspect and control other processes, allowing syscall arguments to be captured. Sysdig has a really interesting writeup on how they extended their runtime security tool Falco to leverage this new capability and observe Fargate containers in ways that were not possible before.

GCP took a different approach with the release of Container Threat Detection, a managed service to monitor the state of container images, evaluate changes, and detect a predetermined list of runtime attacks. Users simply need to enable this feature and Google does all the hard work of monitoring images at runtime. A really cool feature, which ties into the premise of immutability of container workloads, is the ability to detect and alert on new binaries executed and libraries loaded that were not part of the original image (and therefore indicates suspicious activity).

It will be interesting to see what AWS and GCP continue to roll out in this space to help customers get access to data they need. It’s a little ironic that a serverless product was rolled out to reduce operational toil, only for customers to ask for some layers to be peeled back for their custom use cases.

Static Observability

Nothing changes from the previous section on static observability in serverless environments — if anything, due to the difficulty of runtime instrumentation in these environments, static observability becomes even more important as the main security control for your workloads.

Summary

In this post, we walked through the changes in endpoint security monitoring, from the world of traditional virtual machines to the container-based workloads prevalent today. We discussed approaches to both static and runtime observability, as well as the nuances between a traditional and serverless CaaS environment. Each approach could merit its own article, but we focused on keeping it high-level to provide as much context on the problem as possible without too many distractions.

Thanks for reading!

--

--

David Levitsky

Security Engineer. Passionate about all things related to cloud platforms. Editor of Simply CloudSec, a blog on cloud security.