[#129] Tune cgroup resources

Should help by giving 3GB headroom instead of 1GB for the server and
operating system. Empirically, it looks like the OOM killer is
operating properly and killing user code rather than system processes,
but the small amount of headroom could have been a problem.

Extensive usage of swap could also have been a problem so I disabled
swap for user code. Reduced the CPU quota to eliminate access to
bursting from user code, as well, and bumped the pid quota because we
had a lot of headroom there.
This commit is contained in:
Radon Rosborough 2021-10-24 12:32:08 -07:00
parent bc900174a9
commit 0d92a77922
2 changed files with 20 additions and 8 deletions

View File

@ -3,12 +3,26 @@ Description=Resource limits for Riju user containers
Before=slices.target
[Slice]
# t3.large instance has baseline CPU performance of 60% and is
# burstable up to 200%. Reserve bursting for server + operating
# system.
CPUAccounting=true
CPUQuota=100%
CPUQuota=60%
# t3.large instance has 8GB memory, so reserve 3GB for server +
# operating system. Disable swap for now.
MemoryAccounting=true
MemoryMax=3G
MemorySwapMax=8G
MemoryMax=5G
MemorySwapMax=0
# Empirically, EC2 instances appear to have /proc/sys/kernel/pid_max
# equal to 2^22 = 4194304. It should be safe to give about a tenth of
# this space to user code.
TasksAccounting=true
TasksMax=2048
TasksMax=400000
# Attempt to deny access to EC2 Instance Metadata service from user
# code.
IPAccounting=true
IPAddressDeny=169.254.169.254

View File

@ -233,13 +233,11 @@ void session(char *uuid, char *lang, char *imageHash)
"--name",
container,
"--cpus",
"1",
"0.6",
"--memory",
"1g",
"--memory-swap",
"8g",
"--pids-limit",
"2048",
"4000",
"--cgroup-parent",
"riju.slice",
"--label",