Vaguely Aligned

Strange Loops and technological absurdity

Self-directed AI Dev: Cellular Automata

I benchmarked a number of agentic LLMs in a simple autonomous dev loop: a “Tech Lead” agent researches the problem and spawns dev sub-agents to complete the task. The tech lead / dev agents each have an assigned LLM and toolset. More details regarding methodology can be found in this post. Take a look at the demos below. tl;dr: GLM 4.6 was the clear winner, and one of the cheapest to boot. ...

November 10, 2025 · 2 min · 359 words · Kenny

Experiments in Autonomous AI Development

I started out 2025 deeply skeptical of AI development tools. I had played around a bit with generating code via back-and-forth conversations with LLM chat apps (ChatGPT and Claude) but was unimpressed and viewed it as mostly a toy to get simple projects off the ground but actively time-wasting for large, complex codebases. Then in March I tried Claude Code. I set up Claude Code on a side project (a codebase with ~10k lines of Elixir) and was immediately several times more productive. Switching between planning mode and build mode felt natural - very analogous to how I reasoned about coding tasks myself. Yes the AI made mistakes, but it built and tested the changes itself, so the debug feedback loop was tight and hands-off. I found myself spending much more time in “Architect” mode than thinking about individual lines of code or getting stuck on mundane bugs. ...

November 9, 2025 · 10 min · 2016 words · Kenny