Tuesday, March 20, 2018

Supposedly Readable Code

There are two hard problems in Computer Science: cache invalidation, and naming things. —Phil Karlton

The problem with long-term evolution of a codebase is that the compatibility requirements end up creating constraints on the design.  Constraints that may be felt for a decade, depending on how history unfolds.

What my company now refers to as “Reserve,” which seems to be fairly universally understood by our clients and b2b partner soup, was initially called “Cost.” That was replaced by “Escrow” because the “Fee” is also a cost, just a different kind.  But escrow didn’t sit right among people who make deals and sign contracts all day, because it wasn’t necessarily being held by a third party.  (Depending on what kind of hash the salesmen made of it, it was held by either the first or second party.)

The point is, before coming up with a universally acceptable term, we needed some term, so Cost and Escrow got baked into the code and database structure to a certain extent.  Along with Reserve.

When someone new comes along, their first instinct is to complain about how “confusing” it is.  And I can see that.  It’s a single concept going by three names.

You get used to it, though.  As you work with it repeatedly, the concept gets compressed in your brain.  Here it’s Cost, there it’s Reserve, it’s the same thing in both places.

But, getting used to it is a symptom of the “ignoring weak signals” problem.  (Is there a better name for that?  “Normalization of deviance” is heavy, too.) If we hired enough people, it would be a clear source of suckage that we’d really want to fix.

On the other hand, I’d love to do a cost-benefit analysis and find out just how important it really is to get fixed.  Unfortunately, that depends on measuring the “loss of productivity” from the multiple names, and measuring productivity to begin with is difficult.  I think the experimental design would also require fixing the problem to get a decent measurement on the single-name productivity.

Therefore, it ends up easy to ignore weak signals because they’re weak, and we don’t know what we’re missing by doing so.

Another justification for ignoring them is that we can’t act on them all.  We have to prioritize.  After all, developers tend to be disagreeable.  I know—whenever I’m making a “quick bugfix” in some code I don’t own, I have to suppress strong urges to “fix” all the things, from naming to consistency to style conventions to getting rid of the variable in $x = thing(); return $x;.  I’m pretty sure the rest of the team does the same for my code.

The funny thing is, I bet each one of us on the team thinks we write the most readable code.  I’ve been doing this longer than anyone, and I put a lot of effort into it.  I standardized my table alias names, and I wish everyone else followed that, because the code was a lot easier for me to read when “clients” was just “C” and not a mix of “C”, “c”, “cl”, or “cli” depending on which SQL statement one happens to be reading.

Between synonyms and the irritating slipperiness of natural language, then—is there such a thing as “readable code?”  There’s certainly code that’s been deliberately obfuscated, but barring that: can we measure code readability? Or is it just doomed to be, “my latest code is best code,” forever?

No comments: