[#129] Tune cgroup resources

Should help by giving 3GB headroom instead of 1GB for the server and operating system. Empirically, it looks like the OOM killer is operating properly and killing user code rather than system processes, but the small amount of headroom could have been a problem. Extensive usage of swap could also have been a problem so I disabled swap for user code. Reduced the CPU quota to eliminate access to bursting from user code, as well, and bumped the pid quota because we had a lot of headroom there.
2021-10-24 12:32:08 -07:00 · 2021-10-24 12:32:08 -07:00 · 0d92a77922
parent bc900174a9
commit 0d92a77922
2 changed files with 20 additions and 8 deletions
--- a/packer/riju.slice
+++ b/packer/riju.slice
@ -3,12 +3,26 @@ Description=Resource limits for Riju user containers
 Before=slices.target

 [Slice]
+
+# t3.large instance has baseline CPU performance of 60% and is
+# burstable up to 200%. Reserve bursting for server + operating
+# system.
 CPUAccounting=true
-CPUQuota=100%
+CPUQuota=60%
+
+# t3.large instance has 8GB memory, so reserve 3GB for server +
+# operating system. Disable swap for now.
 MemoryAccounting=true
-MemoryMax=3G
-MemorySwapMax=8G
+MemoryMax=5G
+MemorySwapMax=0
+
+# Empirically, EC2 instances appear to have /proc/sys/kernel/pid_max
+# equal to 2^22 = 4194304. It should be safe to give about a tenth of
+# this space to user code.
 TasksAccounting=true
-TasksMax=2048
+TasksMax=400000
+
+# Attempt to deny access to EC2 Instance Metadata service from user
+# code.
 IPAccounting=true
 IPAddressDeny=169.254.169.254
--- a/system/src/riju-system-privileged.c
+++ b/system/src/riju-system-privileged.c
@ -233,13 +233,11 @@ void session(char *uuid, char *lang, char *imageHash)
        "--name",
        container,
        "--cpus",
-        "1",
+        "0.6",
        "--memory",
        "1g",
-        "--memory-swap",
-        "8g",
        "--pids-limit",
-        "2048",
+        "4000",
        "--cgroup-parent",
        "riju.slice",
        "--label",