Samar Sidharth Jan 20, 2021

Falco Performance Testing

Special Thanks to Leonardo Grasso for assisting me

Agenda

The agenda of this document is to share the experience and explain the steps followed for the performance testing of Falco application deployed using helm chart on a Kubernetes cluster and establish a relation between the resources (CPU and Memory) required by Falco and the number of syscall per second it can handle.

Assumptions

The assumptions for the performance testing are as below:

Known events are the syscalls that will match the Falco rule and trigger an alert
Unknown events are the syscalls that will be discarded by Falco as they will not meet any condition in Falco rules
The load of unknown events should be minimum during the Falco performance testing i.e. other activities on the host should be restricted to minimum
The performance testing is done by Falco event-generator tool benchmark feature
The Falco event-generator benchmark generates syscall events only, so there are no events of type k8s-audit during this benchmarking exercise
Please, keep in mind that not all actions can be used for benchmarking since some of them take too long to generate a high number of EPS. For example, k8saudit actions are not supposed to work, since those actions need some time to create Kubernetes resources. Also, some syscall actions sleep for a while (like the syscall.ReadSensitiveFileUntrusted) thus cannot be used.
EPS (Event Per Second) number of events generated by event-generator in a single round. It doesn't corresponds to a syscall but the event here is a combination of multiple syscalls e.g. if the action targets a rule that's triggered when a file is created under /bin , the action will probably use 3 syscall:
1. One to check the directory
2. One to create the file under /bin
3. Finally the last one to delete the file
  However the event-generator counts only 1 event per action
```
INFO statistics  cpu="38.1%" lost="2%" res_mem="38 MB" throughput="4371.6 EPS" virt_mem="1.1 GB"
```
Falco receive a lot of events but not all of those trigger a rule so the drop stats produced by the Falco process refers to the raw numbers of syscall received (Falco does not know if those syscall could match a rule, since it lost them). On the other hand, the event-generator knows exactly which events it has generated, so can say for sure: I had sent 100 and I received 50 back, thus 50% are lost
No Custom rules considered during performance testing, only default Falco rules are triggered
Only tested the benchmarking feature with three types of actions ChangeThreadNamespace|ReadSensitiveFileUntrusted|WriteBelowBinaryDir
The event-generator round prior to the round where drops were seen in the Falco logs was considered to calculate the number of syscall supported at a particular resource setup
Falco Fun Facts:
Here are some facts about Falco that might come to your mind during performance testing.
- Falco just read one event a time from the buffer, process it and the discard it
- Falco receive a lot of events but not all of those trigger a rule
- Falco it's just designed to have a low memory consumption and the memory is not strictly related to the EPS, since each events are processed one-by-one (so once an event is processed, it's then discarded and the memory is freed)

Steps

Setup

Falco was deployed with -s option and --stats-interval set to 1 sec in order to capture the total syscalls by Falco per second

-s <stats_file>               If specified, append statistics related to Falco's reading/processing of events
                               to this file (only useful in live mode).
 --stats-interval <msec>       When using -s <stats_file>, write statistics every <msec> ms.
                               This uses signals, so don't recommend intervals below 200 ms.
                               Defaults to 5000 (5 seconds).

Sample Configuration:

# Changes in Falco daemonset
   spec:
      containers:
      - args:
        - /usr/bin/falco
        - -s
        - /var/log/falco.txt
        - --stats-interval
        - "1000"
        - --cri
        - /run/containerd/containerd.sock
        - -K
        - /var/run/secrets/kubernetes.io/serviceaccount/token
        - -k
        - https://$(KUBERNETES_SERVICE_HOST)
        - -pk

Enable the gRPC and relaxed the rate-limiter in the Falco configuration by making the below changes in the values.yaml

  grpc:
    enabled: true
    threadiness: 0

    # gRPC unix socket with no authentication
    unixSocketPath: "unix:///run/falco/falco.sock"

    # gRPC over the network (mTLS) / required when unixSocketPath is empty
    listenPort: 5060
    privateKey: "/etc/falco/certs/server.key"
    certChain: "/etc/falco/certs/server.crt"
    rootCerts: "/etc/falco/certs/ca.crt"

  # gRPC output service.
  # By default it is off.
  # By enabling this all the output events will be kept in memory until you read them with a gRPC client.
  # Make sure to have a consumer for them or leave this disabled.
  grpcOutput:
    enabled: true

  # A throttling mechanism implemented as a token bucket limits the
  # rate of Falco notifications. This throttling is controlled by the following configuration
  # options:
  #  - rate: the number of tokens (i.e. right to send a notification)
  #    gained per second. Defaults to 1.
  #  - max_burst: the maximum number of tokens outstanding. Defaults to 1000.
  #
  # With these defaults, Falco could send up to 1000 notifications after
  # an initial quiet period, and then up to 1 notification per second
  # afterward. It would gain the full burst back after 1000 seconds of
  # no activity.
  outputs:
    rate: "1000000000"
    maxBurst: "1000000000"

Download the event-generator binary from here. Extract the binary and use command event-generator list to list the sample events this tool generate

Run event-generator using the bench option with below command

event-generator bench "ChangeThreadNamespace|ReadSensitiveFileUntrusted|WriteBelowBinaryDir" --loop --pid $(ps -ef | awk '$8=="falco" {print $2}')

event-generator bench

Benchmark for Falco

Synopsis

Benchmark a running Falco instance.

This command generates a high number of Event Per Second (EPS), to test the events throughput allowed by Falco. The number of EPS is controlled by the "--sleep" option: reduce the sleeping duration to increase the EPS. If the "--loop" option is set, the sleeping duration is halved on each round. The "--pid" option can be used to monitor the Falco process.

N.B.: - the Falco gRPC Output must be enabled to use this command - "outputs.rate" and "outputs.max_burst" values within the Falco configuration must be increased, otherwise EPS will be rate-limited by the throttling mechanism - since not all actions can be used for benchmarking, only those actions matching the given regular expression are used

One common way to use this command is as following:

event-generator bench "ChangeThreadNamespace|ReadSensitiveFileUntrusted" --loop --sleep 10ms --pid $(pidof -s falco)

Warning: This command might alter your system. For example, some actions modify files and directories below /bin, /etc, /dev, etc. Make sure you fully understand what is the purpose of this tool before running any action.

event-generator bench [regexp] [flags]

Options

      --all                            Run all actions, including those disabled by default
      --as string                      Username to impersonate for the operation
      --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --cache-dir string               Default HTTP cache directory (default "$HOME/.kube/http-cache")
      --certificate-authority string   Path to a cert file for the certificate authority
      --client-certificate string      Path to a client certificate file for TLS
      --client-key string              Path to a client key file for TLS
      --cluster string                 The name of the kubeconfig cluster to use
      --context string                 The name of the kubeconfig context to use
      --grpc-ca string                 CA root file path for connecting to a Falco gRPC server (default "/etc/falco/certs/ca.crt")
      --grpc-cert string               Cert file path for connecting to a Falco gRPC server (default "/etc/falco/certs/client.crt")
      --grpc-hostname string           Hostname for connecting to a Falco gRPC server (default "localhost")
      --grpc-key string                Key file path for connecting to a Falco gRPC server (default "/etc/falco/certs/client.key")
      --grpc-port uint16               Port for connecting to a Falco gRPC server (default 5060)
      --grpc-unix-socket string        Unix socket path for connecting to a Falco gRPC server (default "unix:///run/falco/falco.sock")
  -h, --help                           help for bench
      --humanize                       Humanize values when printing statistics (default true)
      --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
      --kubeconfig string              Path to the kubeconfig file to use for CLI requests.
      --loop                           Run in a loop
      --match-server-version           Require server version to match client version
  -n, --namespace string               If present, the namespace scope for this CLI request (default "default")
      --pid int                        A process PID to monitor while benchmarking (e.g. the falco process)
      --polling-interval duration      Duration of gRPC APIs polling timeout (default 100ms)
      --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
      --round-duration duration        Duration of a benchmark round (default 5s)
  -s, --server string                  The address and port of the Kubernetes API server
      --sleep duration                 The length of time to wait before running an action. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means no sleep. (default 100ms)
      --token string                   Bearer token for authentication to the API server
      --user string                    The name of the kubeconfig user to use

Options inherited from parent commands

  -c, --config string      Config file path (default $HOME/.falco-event-generator.yaml if exists)
      --logformat string   available formats: "text" or "json" (default "text")
  -l, --loglevel string    Log level (default "info")

Execution

Monitor the number of syscall per second on the servers where Falco is deployed. In the below sample output the values in the "cur" sections shows the cumulative total values whereas the values in the "delta" sections represents the values for that particular instance

Falco syscall logs:

{"sample": 2961, "cur": {"events": 46226792, "drops": 86992, "preemptions": 0}, "delta": {"events": 10700, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2962, "cur": {"events": 46323843, "drops": 86992, "preemptions": 0}, "delta": {"events": 97051, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2963, "cur": {"events": 46696561, "drops": 86992, "preemptions": 0}, "delta": {"events": 372718, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2964, "cur": {"events": 47069599, "drops": 86992, "preemptions": 0}, "delta": {"events": 373038, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2965, "cur": {"events": 47419658, "drops": 86992, "preemptions": 0}, "delta": {"events": 350059, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2966, "cur": {"events": 47784238, "drops": 86992, "preemptions": 0}, "delta": {"events": 364580, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2967, "cur": {"events": 48134675, "drops": 102975, "preemptions": 0}, "delta": {"events": 350437, "drops": 15983, "preemptions": 0}, "drop_pct": 4.56088},
{"sample": 2968, "cur": {"events": 48311955, "drops": 131484, "preemptions": 0}, "delta": {"events": 177280, "drops": 28509, "preemptions": 0}, "drop_pct": 16.0813},
{"sample": 2969, "cur": {"events": 48323039, "drops": 131484, "preemptions": 0}, "delta": {"events": 11084, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2970, "cur": {"events": 48333847, "drops": 131484, "preemptions": 0}, "delta": {"events": 10808, "drops": 0, "preemptions": 0}, "drop_pct": 0},
{"sample": 2971, "cur": {"events": 48342737, "drops": 131484, "preemptions": 0}, "delta": {"events": 8890, "drops": 0, "preemptions": 0}, "drop_pct": 0}

This way you can get the syscall activity on you servers in idle mode and can use it as a reference. Also you can check if there are any drops happening prior to starting the event-generator

Start the event-generator tool and observer the statistics printed by it for every round and parallelly monitory the number of syscalls per second in the Falco syscall output.

The event-generator cycle: In every round it generates a load at certain rate (EPS) then rests, during this time it calculates the statistics for that previous round and in the resting time the sycalls also drop. In the next round it doubles the rate and same cycle is repeated till it is stopped.

event-generator logs:

INFO round #14                                     sleep="12.207µs"
INFO resting...
INFO syscall.ReadSensitiveFileUntrusted            actual=14224 expected=14458 ratio=0.9838151888227971
INFO syscall.WriteBelowBinaryDir                   actual=14219 expected=14456 ratio=0.9836054233536248
INFO syscall.ChangeThreadNamespace                 actual=14208 expected=14457 ratio=0.9827765096493049
INFO statistics                                    cpu="68.0%" lost="1%" res_mem="36 MB" throughput="8674.2 EPS" virt_mem="1.1 GB"
INFO
INFO round #15                                     sleep="6.103µs"
INFO resting...
INFO syscall.ReadSensitiveFileUntrusted            actual=15093 expected=16962 ratio=0.8898125221082419
INFO syscall.WriteBelowBinaryDir                   actual=15080 expected=16963 ratio=0.8889936921535105
INFO syscall.ChangeThreadNamespace                 actual=15058 expected=16961 ratio=0.8878014268026649
INFO statistics                                    cpu="72.6%" lost="11%" res_mem="36 MB" throughput="10177.2 EPS" virt_mem="1.1 GB"

The instance you see values in drops in the "delta" section (sample 2967 in the Falco syscall logs above) stop the event-generator tool and capture the values of the event-generator round prior to it (The drops occured in round #15 as seen in the event generator statistics above, so we will consider the values for round #14)
The total syscall supported by Falco at the given resource setting can then be calculated by taking the average of the syscall values in instances (in sample 2963, 2964, 2965 and 2966) prior to the instance where drop occurred. The instances (sample 2961 and 2962 ) with low syscall are for the resting period in the event-generator cycle

Observations

The observation here are just for the reference purpose only and the intention of this document is to illustrate a process for Falco performance testing, it is highly recommended to carry out the performance testing on your environment and use this data for reference purpose only. The environment under test is a kubernetes cluster deployed over VM hosted on openstack

CPU	Memory	Number of syscalls per second	EPS (rounded to lower limit)
500m(0.5)	512 Mi	Upto 150K	2800
1	512 Mi	Upto 250K	4200
2	512 Mi	Upto 320K	7000

Important points:

Adding/removing the type of event in event generator will have impact on the EPS but the total syscall value should still be around the same range

If the EPS gets stuck at certain range then review the type of events used in the event generator to generate the syscall load as the event generator runs action sequentially (single-threaded), so the total time it takes in a loop is the sum of the time needed to execute all the three events so if one of those is slow, the whole loop will be slow. And when the sleeping time reaches 0, the event-generator EPS cannot grow more since all the time in the loop is just the actions' execution, Thus resulting in EPS toggling in certain range and not increasing with every round

Example:

In the below example the EPS was not increasing at the rate at which it was supposed to but rather went down in one round. This is an indication that one of the event ChangeThreadNamespace|ReadSensitiveFileUntrusted|WriteBelowBinaryDir was causing the event generator to slow down. In this case it was WriteBelowBinaryDir as the server had I/O issue on the root disk. After the problematic event WriteBelowBinaryDir was removed from the list it worked fine

#event-generator bench "ChangeThreadNamespace|ReadSensitiveFileUntrusted|WriteBelowBinaryDir" --loop --grpc-unix-socket=unix:///var/run/falco/falco.sock --pid <Falco PID>

INFO round #12                                     sleep="97.656µs"
INFO resting...
INFO syscall.WriteBelowBinaryDir                   actual=1696 expected=1696 ratio=1
INFO syscall.ChangeThreadNamespace                 actual=1695 expected=1695 ratio=1
INFO syscall.ReadSensitiveFileUntrusted            actual=1696 expected=1696 ratio=1
INFO statistics                                    cpu="21.9%" lost="0%" res_mem="121 MB" throughput="1017.4 EPS" virt_mem="1.5 GB"
INFO
INFO round #13                                     sleep="48.828µs"
INFO resting...
INFO syscall.WriteBelowBinaryDir                   actual=1787 expected=1787 ratio=1
INFO syscall.ChangeThreadNamespace                 actual=1788 expected=1788 ratio=1
INFO syscall.ReadSensitiveFileUntrusted            actual=1786 expected=1786 ratio=1
INFO statistics                                    cpu="23.6%" lost="0%" res_mem="121 MB" throughput="1072.2 EPS" virt_mem="1.5 GB"
INFO
INFO round #14                                     sleep="24.414µs"
INFO resting...
INFO syscall.WriteBelowBinaryDir                   actual=1614 expected=1614 ratio=1
INFO syscall.ChangeThreadNamespace                 actual=1613 expected=1613 ratio=1
INFO syscall.ReadSensitiveFileUntrusted            actual=1615 expected=1615 ratio=1
INFO statistics                                    cpu="22.4%" lost="0%" res_mem="121 MB" throughput="968.4 EPS" virt_mem="1.5 GB"

←Previous