• 0 Posts
  • 14 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle





  • It’s not that clear cut a problem. There seems to be two elements; the kernel driver had a memory safety bug; and a definitions file was deployed incorrectly, triggering the bug. The kernel driver definitely deserves a lot of scrutiny and static analysis should have told them this bug existed. The live updates are a bit different since this is a real-time response system. If malware starts actively exploiting a software vulnerability, they can’t wait for distribution maintainers to package their mitigation - they have to be deployed ASAP. They certainly should roll-out definitions progressively and monitor for anything anomalous but it has to be quick or the malware could beat them to it.

    This is more a code safety issue than CI/CD strategy. The bug was in the driver all along, but it had never been triggered before so it passed the tests and got rolled out to everyone. Critical code like this ought to be written in memory safe languages like Rust.



  • I would encourage you not to split things up too finely. A single repo for your environment would allow you to see all related changes with git. E.g. if you set up a new VM it might need a playbook to set something up, a script to automate a task, and a DNS entry. With a well put together commit message explaining why you’re making those changes there’s not much need for external documentation.

    Maybe if you want some more info organised in a wiki, point to the initial commit where you introduced some set up. That way you can see how something was structured. Or if you have a issue tracker you can comment with research on something and then close the issue when you commit a resolution.

    Try not to have info spread out too much or maintaining all the pieces will become a chore. Make it simple and easy to keep up.