Frustrations with Fargate and ECS

Filed under aws on April 30, 2020

I just spent a good month wrangling ECS Fargate deployments with a ton of cloudformation to boot. I’m going to outline the stupid things I’ve had to deal with and some errors I’ve found

Fargate

Fargate is an AWS offering that allows you to bring up docker containers without needing to provision servers to run them on. Aside from that, it’s much the same as a normal ECS cluster from a developer perspective as far as I can tell.

ECS itself has a number of nice abstractions to make it easier to conceptualise how your containers are running, such as breaking things down into Services and Tasks, and lets you apply various security configurations onto your containers. The flip side to this configurability is that I have a huge Cloudformation file for all my fargate services and I’m not game to touch the bits I have working in the Fargate configurations.

Things that shit me

Lack of feedback in the UI

All service events are hidden somewhere unintuitive in the UI. I found them once I think and have just reverted to using the command line tools because I can’t find them again.

Continues to try pulling a repo that doesn’t exist

This probably wasted the most time early on when I was writing my template. I accidentally passed in my docker repos incorrectly from my makefile and as a result had an incorrectly configured task that was trying to pull down an image that didn’t exist. Instead of erroring out and stopping my stack update, it just kept trying while I was on the gitlab build minutes clock.

Had events been more visible in the UI this wouldn’t have been an issue, but it wasn’t until it occurred to me to use the command line that I found out what was going on.

No feedback at all if your entrypoint is wrong

I’ve been trying to quickly convert manually deployed services into full cloudformation stacks as I’ve been going. To accomplish this, I’ll copy over my makefile, gitlab build file and cloudformation templates and modify what’s necessary. I’ve found that one thing I forget a lot is the container entrypoint.

The main problem I have here is that ECS just won’t say anything or output any kind of message if it fails to boot for this reason, preferring instead to just drop into a boot loop and chew up your gitlab minutes. My first time grappling with this, I was up until 2am before I realised that my makefile was pushing dodgy parameter overrides into my template. I’m not sure how Amazon would fix this under the hood, but it’s a major one for people who aren’t 100% familiar with the gotchas of the platform like me.

Conclusion

Looking at this, I think all my problems could be solved if there was better error reporting in the UI. Annoying as hell, but I’ve made my AWS bed so I better lie in it.