They insisted on selling out the future for dubious short-term interests again #capitalism, so about three million tokens in, I have more thoughts on coding with an LLM.
It’s Chaotic
The model has its strengths and weaknesses, but it can be hard to predict how a specific task will fit.
It went great to upgrade an internal site from Bootstrap 3 to 4 to 5. It only did okay at dark mode for it. The machine simply does not know what has contrast and what does not. I spent a long time asking it for updates on a component-by-component basis.
And sometimes, it just outright makes mistakes. My first-ever fix for an LLM-generated bug was for some text
disappearing from the website, because it had transformed something of the form display(error ? err_msg : text) around to if (error) { display(err_msg); } during the process. It quit displaying text for the
normal/success case.
On a different project, I accidentally clipped its wings by not having the vendor directory installed, and
it hallucinated some atrocious code. The model “not knowing what it didn’t know” greatly hampered its
ability to proceed… and it didn’t know that, either. It didn’t stop and ask for the problem to be fixed.
It just slopped some garbage out.
On a third project, it perfectly generated a GitHub workflow for “build an ECR image on push”, and then
flopped on its face with a manual workflow for “deploy such an ECR image into ECS”. It minimized IAM
permissions, blissfully unaware that ecs:DescribeTasks does not use a resource tag. That one action must be
given permissions on resource *, even to describe a specific task that is known in advance. Faced with the
error, it shuffled code around to do the same operation a different way, which also failed. The human had to
track it down in the AWS Console and documentation.
(I asked it to store the IAM policies in the repo for reference. I do not plug the LLM into AWS, GitHub,
MySQL, the bastion host, a web browser, or even git fetch.)
It was pretty good at finding differences between an old system and a new one, but less effective at porting the missing features across. Most of the time was spent on the human working out the tangled mess of the most difficult pieces. In the aftermath, closer human review observed 27% defective commits had been made.
An equal number of commits were “not how I would code it,” which is also something that bothers me. It is my
name that goes onto the commit, and will show up in git blame later. Those also got patched up.
Secure Code is an Afterthought
Even with a strong pattern of CSRF and Allow header mitigations (i.e. a couple of function calls in the setup), it was not able to generate code to handle these concerns. While it probably knows how to set up a popular framework like Symfony or Laravel to do it, it is not able to learn the pattern in our own ancient code.
It might be no better than other developers on the team at XSS, but that is concerning in both directions. I
don’t want either of them introducing div.innerHTML = htmlStr! String concatenation is a security
vulnerability in systems like this.
When generating some code for the GitHub workflows, it produced a command of the form THING=$(...) and then
proceeded to use $THING without checking that it actually got any output in there. For shell, it’s always
best to back out as soon as it starts into the weeds.
Good Prompts Take Knowledge
It saves time and money to point the LLM directly at .svcPop instead of describing “the service redemption
bubble” and making the machine thrash around, running half a dozen ripgrep commands to try to find it.
On the other hand, making changes with an LLM can quickly erode one’s low-level understanding of the code, making attempted “good” prompts into not-so-good ones. When I’m not the one making the changes, I lose understanding and effectiveness. If I’m using it to write using some new libraries, like Pest or AmPHP/Revolt, I am also losing both depth of learning and retention of what’s left. I can’t ambiently absorb knowledge from documentation I am not looking at.
(And as we saw, even if the prompt is good, the results may not be.)
Narrow Focus is Double Edged
The narrow focus on the task at hand is what makes the LLM useful at what it is doing, but it is also what builds technical debt. It’s happy to generate all-new CSS for anything it does, without worrying about whether any of it can be a shared concept across the codebase.
Whether I’m reading or writing the code, I’m thinking about this stuff. It was me who noticed the multiple ‘loading’ spinner images. When asked to replace all of them, the LLM generated a gif (?) of a non-theme color, that wasn’t animated (???). Then, it copied that over all four files (including the two unused ones), corrupting the layout where the smaller file had been used, and declared it done. Oh, and the gif assumed a white background, on a site that already had a dark mode. I threw up my hands dramatically, tracked down an SVG, and fixed it all myself.
Meanwhile, its chaotic nature makes it somewhat random which CSS features will be used. This is especially noticeable for things like choosing between repeating selector prefixes, or nesting the blocks. It’s 2026 and nested CSS is Baseline 2023, so it’s not like this is going to break anything accessing the site, but it immediately raises questions about how to control the CSS feature usage for sites where we don’t have as much latitude in dictating choice of browser.
But When It’s Good, It’s Good
When I’m using the LLM on a task that it is good for, making quick work of some long-delayed upgrade or feature request, the feeling approaches what others describe as manic. With all the downsides that entails, too: the selfishness, the hubris, and the possibility that it will turn out to be a complete waste after all.
But its siren song is sparkling and effervescent.
This iteration of models is good enough to see why people like it.
Tech Can’t Solve Social Problems
The code I upgraded to Bootstrap 5 had been on Bootstrap 3 because we aren’t spending any time for maintenance in the constant rush toward “more features." Nothing can be done until the threat becomes existential. I’m worried that higher levels of the company will soon see the LLMs as a way to continue this business-as-usual approach. The developers “have this tool to be more productive,” so we can expect more features, sooner.
I also don’t know if management has visibility into the LLM usage, to understand what they’re actually getting for their money. It’s entirely possible that such information is only available to someone who may or may not be using the corporate account’s resources for personal projects. Actually worrying about this is above my pay grade, especially since I have no evidence whatsoever, but still: it’s an obvious potential weakness.
The Chaos Demon
Sometimes, the LLM simply doesn’t follow the prompt. Or accept correction. The only thing to do is to Ctrl+C and try anew.
And push down the thought of a world with dangerous equipment going rogue like this. Self-driving cars. Industrial equipment. Weapons nominally in the hands of ICE or the police.