I’ve been a Linux enthusiast since I first started uni. I still have my first Kubuntu CD somewhere, and while I mostly work off the Macbook work has supplied me now, my own personal Thinkpad is running Arch.
However, one thing I never bothered diving into too deeply was systemd. This came back to bite me a bit with my openstack install, as Zun containers were hanging on boot.
This led me down the rabbit hole, and I’ve spent about 3 weeks trying to figure it out. Originally, I thought it was a Zun problem, since it just looked like it was never starting the container. After trawling through logs, I eventually tracked it down to the time when Kuryr was trying to attach the interfaces to the bridge.
I modified the systemd unit file to run the service as root. It wouldn’t even start.
I continued digging through configs, and realised that nearly everything was configured
correctly, but kuryr would fail trying to call
At some point yesterday evening, I noticed a couple of options in the Kuryr systemd unit,
AmbientCapabilities. I’d never heard of these before, but
digging into it, it looked to be exactly what I needed. While it already had
in the list, it didn’t have
CAP_DAC_READ_SEARCH. I added these,
and it started working.
I’m guessing that Kuryr was invoking
ovs-vsctl as the
kuryr user, which didn’t have
permission to edit the OVS DB file. Adding these capabilities to the systemd unit file
looks to have fixed it.
I’m wondering what the security implications would be, but since my server is pretty isolated from the outside world I’m not too worried. I’ll continue to look into it and see if there’s a better way.
Looking into the man page, it seems that these were introduced in Linux 2.2. I feel like it’s something I should have been aware of, since it explains how a lot of user permissions work under the hood. I’m fascinated now, so I’ll definitely be reading more into how I can use this to shore up and deploy applications.
I really hadn’t paid much mind to the virtual networking that openstack installed into Ubuntu. Diving into Kuryr’s source to see where it was failing also piqued my curiosity around how that all worked. I’ll definitely be having a better look into how to configure my own virtual networks.
Discoverability of these things
I’m a little disappointed in my Google-fu here, as I had stare at and tinker with a lot of log files before I was even able to ask the right questions about what was happening on my machine. I’m not sure if it’s just that I’m not used to debugging these kinds of issues, or if it’s just that the information isn’t readily available. I think it’s more likely to be the former, so it’s probably time to brush back up on my Linux knowledge. I have a lifetime subscription to Whizlabs, so maybe I’ll use that to study for RHCE or something.
Definitely feeling humbled, my Linux knowledge is better than a lot of developers, but I honestly feel like I haven’t even scratched the surface. I have a new mission to learn more about how Linux works under the hood, and I’m excited to see what else I can learn about. Maybe I’ll go back to digging into the kernel source code, could be fun.