How to configure CPU/memory of EC2 with ECS service

Ats
3 min readAug 10, 2024

--

This is a note about what I did to adjust the CPU/memory of an existing EC2 with ECS service.

Photo by Christian Wiediger on Unsplash

Background

For the last few weeks, the service that I work on came to receive the SIGTERM signal and restart the application server. I investigated the cause and found a very likely candidate in the AWS health console: lack of memory.

The server is run on an EC2 instance and ECS manages the instance. Fundamentally, I needed to investigate why the memory had reached the limit and whether there were memory leaks. But the software was providing our service to our customers and couldn’t bear with the sudden shutdown. So I decided to increase the limitation first as a temporary solution and investigate the software later. I started to find out how to do it because this was the first time for me to tweak the ECS settings.

What I did

First of all, I checked the ECS task definition file. I attached the part of it and faked the values below.

{
"family": "exmaple",
"executionRoleArn": "arn:aws:iam::111111111111:role/exmaple",
"containerDefinitions": [
{ container settings },
],
"volumes": [],
"placementConstraints": [],
"taskRoleArn": "arn:aws:iam::111111111111:role/example-web",
"requiresCompatibilities": [
"EC2"
],
"cpu": "256",
"memory": "512"
}

I found there were settings for the memory limitation and checked the document of the task definition file for ECS.

From the document, the memory corresponds to the CPU value. So I needed to change the values for CPU and memory. For that, I needed to know the maximum CPU value for my EC2 instance. The service used a m5 instance and googled the list of values for each size of m5.

Let’s say the service used m5.large instance. It can go with 4vCPU and 16 GB memory. I increased the value to 4 times the values of the current memory limitation because I would decrease the number to the appropriate value after the proper investigation. If the maximum value was sufficiently large, memory usage should have been satiated at some point. Otherwise, there would be a memory leak. So my task definition file looked like below.

{
"family": "exmaple",
"executionRoleArn": "arn:aws:iam::111111111111:role/exmaple",
"containerDefinitions": [
{ container settings },
],
"volumes": [],
"placementConstraints": [],
"taskRoleArn": "arn:aws:iam::111111111111:role/example-web",
"requiresCompatibilities": [
"EC2"
],
"cpu": "512",
"memory": "2048"
}

I wasn’t sure all I had to do was change the values in the file. I thought I might need to do something somewhere. So I tested the deployment on my staging environment first and checked the healthy monitor. Contrary to my fears, this was all the work I had to do. So I did the same thing to the production task file and deployed it. Then, as expected, the SIGTERM signal is no longer present. (I need to find out the root cause since now)

That’s it!

--

--

Ats
Ats

Written by Ats

I like building something tangible like touch, gesture, and voice. Ruby on Rails / React Native / Yocto / Raspberry Pi / Interaction Design / CIID IDP alumni

No responses yet