The 5k lines of code are spread over 44 files. Most of this code is testing code. Just about 2k LOC are functional code. The functions are generally small. Only 5 functions have more than 100 LOC, and all have less than 200 LOC. All the functions have generous docstrings. I set up a system with a manager agent that would create action plans and executor agents that would execute, and then the manager had to review and accept the execution, and if not (it did happen once), the executor had to resume the execution until the manager confirmed the task was done. I stumbled upon this workflow by myself, and so far it looks like it works.