Sunday, March 1, 2026

Using LLMs Again

They insisted on selling out the future for dubious short-term interests again #capitalism, so about three million tokens in, I have more thoughts on coding with an LLM.

It’s Chaotic

The model has its strengths and weaknesses, but it can be hard to predict how a specific task will fit.

It went great to upgrade an internal site from Bootstrap 3 to 4 to 5.  It only did okay at dark mode for it.  The machine simply does not know what has contrast and what does not.  I spent a long time asking it for updates on a component-by-component basis.

And sometimes, it just outright makes mistakes.  My first-ever fix for an LLM-generated bug was for some text disappearing from the website, because it had transformed something of the form display(error ? err_msg : text) around to if (error) { display(err_msg); } during the process.  It quit displaying text for the normal/success case.

On a different project, I accidentally clipped its wings by not having the vendor directory installed, and it hallucinated some atrocious code.  The model “not knowing what it didn’t know” greatly hampered its ability to proceed… and it didn’t know that, either.  It didn’t stop and ask for the problem to be fixed.  It just slopped some garbage out.

On a third project, it perfectly generated a GitHub workflow for “build an ECR image on push”, and then flopped on its face with a manual workflow for “deploy such an ECR image into ECS”.  It minimized IAM permissions, blissfully unaware that ecs:DescribeTasks does not use a resource tag.  That one action must be given permissions on resource *, even to describe a specific task that is known in advance.  Faced with the error, it shuffled code around to do the same operation a different way, which also failed.  The human had to track it down in the AWS Console and documentation.

(I asked it to store the IAM policies in the repo for reference.  I do not plug the LLM into AWS, GitHub, MySQL, the bastion host, a web browser, or even git fetch.)

It was pretty good at finding differences between an old system and a new one, but less effective at porting the missing features across.  Most of the time was spent on the human working out the tangled mess of the most difficult pieces.  In the aftermath, closer human review observed 27% defective commits had been made.

An equal number of commits were “not how I would code it,” which is also something that bothers me.  It is my name that goes onto the commit, and will show up in git blame later.  Those also got patched up.

Secure Code is an Afterthought

Even with a strong pattern of CSRF and Allow header mitigations (i.e. a couple of function calls in the setup), it was not able to generate code to handle these concerns.  While it probably knows how to set up a popular framework like Symfony or Laravel to do it, it is not able to learn the pattern in our own ancient code.

It might be no better than other developers on the team at XSS, but that is concerning in both directions.  I don’t want either of them introducing div.innerHTML = htmlStr!  String concatenation is a security vulnerability in systems like this.

When generating some code for the GitHub workflows, it produced a command of the form THING=$(...) and then proceeded to use $THING without checking that it actually got any output in there.  For shell, it’s always best to back out as soon as it starts into the weeds.

Good Prompts Take Knowledge

It saves time and money to point the LLM directly at .svcPop instead of describing “the service redemption bubble” and making the machine thrash around, running half a dozen ripgrep commands to try to find it.

On the other hand, making changes with an LLM can quickly erode one’s low-level understanding of the code, making attempted “good” prompts into not-so-good ones.  When I’m not the one making the changes, I lose understanding and effectiveness.  If I’m using it to write using some new libraries, like Pest or AmPHP/Revolt, I am also losing both depth of learning and retention of what’s left.  I can’t ambiently absorb knowledge from documentation I am not looking at.

(And as we saw, even if the prompt is good, the results may not be.)

Narrow Focus is Double Edged

The narrow focus on the task at hand is what makes the LLM useful at what it is doing, but it is also what builds technical debt.  It’s happy to generate all-new CSS for anything it does, without worrying about whether any of it can be a shared concept across the codebase.

Whether I’m reading or writing the code, I’m thinking about this stuff.  It was me who noticed the multiple ‘loading’ spinner images.  When asked to replace all of them, the LLM generated a gif (?) of a non-theme color, that wasn’t animated (???).  Then, it copied that over all four files (including the two unused ones), corrupting the layout where the smaller file had been used, and declared it done.  Oh, and the gif assumed a white background, on a site that already had a dark mode.  I threw up my hands dramatically, tracked down an SVG, and fixed it all myself.

Meanwhile, its chaotic nature makes it somewhat random which CSS features will be used.  This is especially noticeable for things like choosing between repeating selector prefixes, or nesting the blocks.  It’s 2026 and nested CSS is Baseline 2023, so it’s not like this is going to break anything accessing the site, but it immediately raises questions about how to control the CSS feature usage for sites where we don’t have as much latitude in dictating choice of browser.

But When It’s Good, It’s Good

When I’m using the LLM on a task that it is good for, making quick work of some long-delayed upgrade or feature request, the feeling approaches what others describe as manic.  With all the downsides that entails, too: the selfishness, the hubris, and the possibility that it will turn out to be a complete waste after all.

But its siren song is sparkling and effervescent.

This iteration of models is good enough to see why people like it.

Tech Can’t Solve Social Problems

The code I upgraded to Bootstrap 5 had been on Bootstrap 3 because we aren’t spending any time for maintenance in the constant rush toward “more features."  Nothing can be done until the threat becomes existential.  I’m worried that higher levels of the company will soon see the LLMs as a way to continue this business-as-usual approach.  The developers “have this tool to be more productive,” so we can expect more features, sooner.

I also don’t know if management has visibility into the LLM usage, to understand what they’re actually getting for their money.  It’s entirely possible that such information is only available to someone who may or may not be using the corporate account’s resources for personal projects.  Actually worrying about this is above my pay grade, especially since I have no evidence whatsoever, but still: it’s an obvious potential weakness.

The Chaos Demon

Sometimes, the LLM simply doesn’t follow the prompt.  Or accept correction.  The only thing to do is to Ctrl+C and try anew.

And push down the thought of a world with dangerous equipment going rogue like this.  Self-driving cars.  Industrial equipment.  Weapons nominally in the hands of ICE or the police.

Sunday, January 25, 2026

Identity Requires Long Term Secrets

One cannot remove all long-term credentials.  The process of establishing a session is one of identifying a stable entity (such as a user account) and giving that entity temporary access to the resource servers.  In simplest form, this is providing the username (entity) and password (secret that authenticates the username), and receiving a session cookie (temporary credentials) for accessing the service.

Somewhere at the root of trust must be a long-term credential.  Otherwise, if all temporary credentials have expired, how is the user authenticated in order to generate a new one?  What would stop anyone else from going through the same process for the user?

An individual service can outsource user authentication: they can email a code, use SMS or a voice call, or integrate with a third party service like Okta or any OAuth provider.  In those cases, the long-term exists, but the actual location of the key store is externalized.  Then the service is at the mercy of the security of that key store.  Email is probably low risk for normal people, who have an account with multi-factor authentication at a large provider who’s going to notice ‘unusual’ logins, but my quirky personal email isn’t like that.

The other problem with outsourcing is that if the provider changes their mind about account requirements, users can get locked out of both their email and their service account at the same time.  (Ask someone how hard it is to maintain a secondary Google account.)

Everything else is a long-term credential stored by the service.  Passwords need their hash to be checked against.  Passkeys, authenticator app codes, and client certificates are also linked to a user account, so must be stored with it.  The service cannot accept any of these things for the wrong user.

Sunday, January 11, 2026

Trying Stage Manager

I tried Stage Manager on my desktop in macOS 26.  There’s not much to say about it, because it didn’t click for me.  I just don’t work with that many big windows.

If there’s a truly huge window like an image editor, that tends to be the only thing I’m using “at one time,” and there’s no need to Stage Manage through them.  When there is, Cmd+Tab has worked well.

When I’m working hard on my personal website, it tends to involve four windows arranged spatially: Podman Desktop, MacVim, and iTerm2 non-overlapping on one workspace, and Firefox (and its dev tools) on the next.

I was confused about the order of apps in the sidebar.  I eventually realized they were “swapping” between the app being restored and the one being minimized when changing apps, but that didn’t really help in terms of efficiency.  Using several apps in sequence means they keep moving around instead of having a consistent placement.

It might be more of a revelation on a laptop, where having half the screen size means having a quarter of the area for individual windows.  Or maybe I’m just set in my ways after 30 years.

Sunday, January 4, 2026

Lazy Init Only Scatters Latency

People report on the Internet that their “Hello World” NodeJS functions in AWS Lambda initialize in 100–150 ms on a cold start, while our real functions spend 1000–1700 ms.  Naturally, I went looking for ways to optimize, but what I found wasn’t exactly clear-cut.

A popular option is for the function to handle multiple events, choosing internal processing based on the shape of the event.  Maybe a large fraction of their events don’t need to handle a PDF, so they can skip loading the PDF library up front.

Unfortunately, my needs are for a function which handles just two events (git push and “EC2 instance state changed” events) and in both cases, the code needs to contact AWS services:

  1. git push will always fetch the chat token from Secrets Manager
  2. Instance-state changes track run-time using DynamoDB (and may need the chat token)

If I push enough of the AWS SDK initialization off to run time, all I’m actually doing is pushing the delay over to run time.  I would need to be able to have a relatively high-frequency request that didn’t use the AWS SDK in order to separate the SDK latency from average processing.  Even then, it still wouldn’t work if the first request needed AWS!

Nonetheless, I did the experiments, and as far as I can tell, lazy init does exactly what I predicted: causes more run time and less init time, for a similar total, on a cold start.  Feeding it 33% more RAM+CPU lets it run 22% faster and consume more memory total, which suggests that it’s doing less garbage collection.  It would be nice to have GC stats, then!

Warm-start execution, for what it’s worth, is 10% of the overall runtime of a cold start, or what was about 25% of the cold-start run time before any changes were made.  Either GC or CPU throttling is hampering cold-start performance.

(I’d also love to know how AWS is doing the bin-packing in the background.  Do they allocate “bucket sizes” and put 257–512 MB functions into 512 MB “reservations,” or do they actually try to fill hosts precisely?  Actually, it’s probably oversubscribed, but by how much?  “Run code without thinking about servers,” they said, and I replied, “Don’t tell me what to do!”)

The experiment I didn’t do was whether using esbuild to publish a 1.60 MB single-file CommonJS bundle, instead of a 0.01 MB zip with ESM modules, will do anything.  Most sources say that keeping the file size down is the number one concern for init speed.  At this point, I think if I wanted more speed, I would port to Go.

Sunday, December 28, 2025

Two Thoughts on Ubuntu Signing Keys

Here’s something I don’t get: why is there a trusted “2012 CD signing key” on my Ubuntu 24.04 machines, when there is also a “2018” signing key?  Shouldn’t this be a transition that could have completed within five years?  Shouldn’t we be able to tie the 2012 key to a specific repository set, instead of all packages?  The latter includes PPAs and I really wish neither of those CD signing keys were valid for that purpose.

The cryptographic domains should be separated:

  1. One CD signing key, tied to the CD/DVD packages
  2. One online release signing key, tied to the Ubuntu main/security sources
  3. One key per PPA, tied to that PPA

Deprecating globally-trusted keys for PPAs is a good step, but the globally-trusted release keys (especially ones that are over a decade old) should be cleaned out immediately as well.

Semi-related pro tip: extrepo

Many packages are supported in extrepo, which handles the keys for you.  There is no need for arcane gpg format-conversion commands, no worrying about whether it goes into /usr (incorrect under Unix philosophy, but widely recommended) or /etc, no manually editing sources files, and especially no cursed curl | bash invocation.

$ sudo apt install extrepo

And then you can do stuff like:

$ extrepo search github
$ sudo extrepo enable github-cli
$ sudo apt install --update gh

This is especially useful for upstreams that distribute an official deb package, outside of PPAs.  I aim to get the code from as close to the source as possible, where the distro itself doesn’t suit my needs.

Sunday, December 21, 2025

Adventures with my old iPod Touch

The iPod Touch (4th Gen) in the car could no longer be detected, so we pulled it out of the console to find it in DFU mode.  Yikes.

I had extremely little hope, but I took it and its USB-A to 30-pin dock cable (the only extant cable of this type in my collection) inside, plugged it into a USB-C to USB-A adapter, and plugged it into the Mac.  It… er… worked. Sure, it appeared to open in Finder and not iTunes (rip), but it actually worked (on the second try; the advice for “unknown error 9” is to try again.  What are we doing as a profession.)  I was able to restore iOS 6.1.6 to it, although I did not have the option to keep my data.

But, I never moved my music onto the Mac.  I figured, with a hardware device that is fifteen years old, and back in factory state, surely, Linux should be able to sync to it?

The first problem was even getting it connected, because Amarok threw an error from ifuse.  Copying the command out of the error message and running it in a terminal worked totally fine.  (I didn’t think of this at the time, but… Amarok logs in the systemd journal.  Maybe its permissions have been stripped down too far.)

Once that was up and running, I restarted Amarok a couple of times, before I found out where it had hidden the iPod.  It’s under “Local Collection.”

I then waited a long time for things to sync.  I waited so long that I wandered off and forgot to set “Don’t Sleep”, so the computer suspended.  The iPod made its ancient, discordant glissando when the computer woke up, and then Amarok—and any process trying to stat() the ifuse mount point—froze.  ifuse sat there burning 100% CPU for a couple of minutes, and then I restarted.

(Apparently the sleep interval was fifteen minutes, the longest time that doesn’t make KDE System Settings complain about ’using more energy.'  Well… I paid for it, one way or the other.)

I got it going again.  Amarok carefully loaded gigabytes of tracks onto the iPod Touch, then started complaining about checksum errors for the database.  That’s the part that makes them useful, instead of having the iPod show “No content” and a button for the iTunes Store.  That ended up being the final boss that I couldn’t beat.  The tracks are still there, apparently, showing up as “Other” data on the Mac.

Yeah.

I plugged the backup drive into the Mac, imported everything, and exported it to the iPod Touch.  The double copy was orders of magnitude faster than Amarok’s unidirectional efforts.  I should never have been so lazy.

  • Free Software: 0
  • Proprietary OS: 2

I don’t know how we got here.

Sunday, December 14, 2025

Three Zsh Tips

To expand environment variables in the prompt, make sure setopt prompt_subst is in effect.  This is the default in sh/ksh modes, but not zsh mode.

To automatically report the time a command took if it consumes more than a certain amount of CPU time, set REPORTTIME to the limit in seconds.  That is, REPORTTIME=1 (I am impatient) will act like I ran the command with time originally, if it consumes more than a second.

There’s a similar REPORTMEMORY variable to show the same (!) stats for processes that use more than the configured amount of memory during execution.  (Technically, RSS, the Resident Set Size.)  The value is in kilobytes, so REPORTMEMORY=10240 will print time statistics for processes larger than 10 MiB.  Relatedly, one should configure TIMEFMT to include “max RSS %M” in order to actually show the value that made the stats print.

Note that REPORTTIME and REPORTMEMORY do not have to be exported, as they’re only relevant to the executing shell.

# in ~/.zshrc
REPORTTIME=3
REPORTMEMORY=40960
TIMEFMT='%J  %U user %S system %P cpu %*E total; max RSS %M'

Sources: REPORTTIME and REPORTMEMORY are documented in the zshparam man page.  Prompt expansion is described in zshmisc, and the prompt_subst option is in zshoptions.