Monday, May 16, 2016

My Coding Style

When I write code, I think in “framework” flavor.  How can I make this code modular and extensible?  How can I let other people use it without having to reimplement 80% of some method when they want to change anything about how it works, assuming they couldn’t just change this code?

I mostly build inside of classes and modules for the information hiding aspect of it.  I use Douglas Crockford’s module pattern in my JavaScript to minimize global namespace pollution.  Likewise, nearly all of my variables in Perl are lexical.

I’m also likely to pack small amounts of state and behavior into a tiny class, no matter how trivial it seems on the surface.  I’m the guy who writes a Csv class so that I can call $csv->put($data) and have all the other options (separator, etc.) baked into the class.  You could create a Tsv class with just a few lines to override the separator.  (You probably won’t ever need to. But you could, and that’s what makes me happy.)

While I get functional programming and stream programming, I generally consider it sub-optimal and avoid it where possible for PHP.

Why do I do it this way? What’s in my head when I’m programming?



Modularity

I’m worried a lot about the modularity. Can this be reused by dropping it into some other program?  Can it be extended for a new purpose, without disrupting the code that is currently using it?  Are there pieces of it that make sense as ‘primitive’ operations, that should be separated from a ‘driver’ layer that assembles primitives into the final algorithm I want to build?

When an object method gets large enough and only uses some variables in specific regions, I try to split it up.  It’s easier to modify code with fewer variables, so getting whole sets of temporary ones out of the method clears things up in and of itself.  It gives that section of code its own name, which can help readability, and it offers more places to override in subclasses.

This focus on modularity informs the cleanliness of the global namespace. Code that doesn’t use globals will not have collisions of global variable names.

I’m also a heavy user of dependency injection as the main form of composition. The application makes a database connection with PDO and passes that to the classes that need to access the database.  (Well, really, the ideal is to call a function to make it, with a host, user, and password all taken from the environment.  It’s very twelve-factor that way.)

The downside of all this is that actually performing extensions—overriding primitives in subclasses—sacrifices some readability.  Each sub-method call that’s added becomes at least one extra place to go look at to understand the overall method.  With the way I can easily add three layers of calls, it can be quite the adventure even without subclasses.

The dependency injection can be a problem for people who aren’t used to the pattern, too.  It’s hard to tell what something is, in a dynamically typed language, if the initial use of it in the class just says $this->dbh = $dbh; and makes no constraints on the class name.

I guess that’s another thing: I’ve generally avoided type hints in my PHP, opting for aggressive duck typing instead.  This was, in part, motivated by the way type hints instantly crashed the script with a fatal error in the 5.x series, with no way to recover gracefully for the user.  This lack of type hints sacrifices even more readability for flexibility.

It makes “reuse without change” easier, but makes understanding how to reuse it more complicated.

Efficiency

I usually have efficiency in mind at the macro level.  Tiny classes also get built because they can put initialization concerns up front and then stick to “only handling data” in the data-handling phase.

It’s a lot like databases: optimal usage is to call connect once and then make many queries on that single connection.

I’m all about the wins like that.  On a micro level, I’m not as concerned anymore, largely because PHP core team has been focusing on performance pretty well.(1)

A tiny class also supports modularity, because outside code can pass in “a Writer object,” suitably configured, and code receiving this object can just use it without caring whether it’s CSV, TSV, or actually building a JSON response on the wire.

Go’s tiny interfaces may have influenced this a bit, but they are also notable for being both implicit and enforced.  PHP isn’t statically checked, so it doesn’t prevent anyone from calling methods that aren’t part of the interface that was hinted.  On the other hand, Go code which asks for a Reader can’t invoke a Close method, even if it actually received a ReadWriteCloser, because that method isn’t part of the interface.(2)

But not too much

There’s something odd about efficiency, though, because increasing modularity tends to decrease efficiency.  The more layers that end up in there, the more cross-layer calls happen.  Calls that wouldn’t necessarily exist in more linear, “non-extensible” code.  Every place an extension could happen is a place where an extra decision is made. Call through it enough times, and efficiency will start to suffer.

But I like modularity.

But I like efficiency.

I end up with systems that are just modular enough, and look at things like the Symfony Components as excessively modularized.

I guess it’s a tradeoff and I favor lean code in general, but make exceptions for places where I can offer a place to extend the system.  As long as that place looks like it might be used at some point.

Avoiding Globals

Generally speaking, when there’s one global namespace for something, it is inviting collisions or action at a distance.  Therefore, my JavaScript uses a minimal number of global variables, preferring instead to wrap everything in IIFEs and use lexicals for everything.

Likewise, data I’d want to store and share between PHP functions becomes properties on a class, and the functions become its methods.  (It turns out, there are virtually no pure functions I want to write, that don’t exist in the language already.)

Python does not have this happen to it so much.  Although the modules are global, each individual module has its own namespace, so I’m free to define lexical variables at the file level to share between functions when they don’t need a class.  On the other hand, if I were publishing the modules, I would want multiple callers to use everything without causing conflict, so it would be back to classes providing independent objects.

Fun

Striving for modularity and efficiency is fun in itself.  Both setting the stage and using it later can be made into interesting problems.

Testability

Some of the modularity focuses on being testable.  We haven’t really gotten around to writing tests for our code, but I’ve written tests before and I have a pretty good idea of what shape they make the code.  So, my code sort of reflects that shape now.

It’s kind of wasted without tests, though.  I wouldn’t be surprised if other developers look at my code and see the complexity of injection and layering without having any idea about the benefit it was intended for.

And, because “code you don’t use is code that’s broken,” it’s probably not even testable in practice.  Without the tests, I can’t prove it was split up well.

Consistency

I try to write idiomatic code, so that there’s a convention that’s recognizable.  I think of a lot of my code as straightforward in the small, with clear names (mostly) and simple algorithms implemented in the obvious manner.

I want wrong code to look wrong. And if it’s an inconsistent mess, the first thing I really want to do is reformat it to my standards.(3)

I’m getting to a point in my career where it “doesn’t matter” if it’s 100% consistent, though.  I’m trying to pass the torch, and I really don’t care if they want to adopt my style, because they’re the ones who are going to be stuck reading it.  (Mostly.)

Meta

All that said, coding isn’t only the mechanical application of principles. It’s also solving a problem. That means finding solutions, weighing tradeoffs, etc.  It means avoiding getting stuck in the details of one solution and forgetting about other possibilities.

That means everything I wrote above is more like a filter that “the code” goes through on the way to becoming an artifact.  That’s the automatic part of doing the ‘coding’ once the ‘design’ of a solution has been created.

I’m not sure, exactly, what happens on the design layer.  I’ve kind of reached a point where I look at the requirements, and they just… decompose themselves into my style.  I already have the outlines of my solution in mind.

In other words, I’ve lost the beginner’s mind.  There are just a few, familiar paths to tread, now.  It’s really weird to read other people’s code, because it’s 50% “I wouldn’t do it that way,” and 50% reminding myself that it does not need changed solely because it is different.

Endgame

(Inspiration for the title of this section: “The Skill Endgame” by Venantius, a post that got 0 love on proggit today, which saddens me, because it’s an excellent, in-depth post that’s widely applicable to life.)

I wish I could offer more, but that’s kind of the whole problem I have with teaching.  How do I get people to make that first leap, from a problem to an idea—any idea—of the code that will solve it?  How do I get people who are staring at their code, wondering why it’s buggy, to simulate it mentally and understand that they put the i = i + 1; outside of the loop, therefore, it’s iterating with i == 0 every time?  (True story.)

By the time I was in college, I could already do these things, and I didn’t really understand how people couldn’t do it.  I’d already met it in BASIC and m68k asm, so I’d developed a lot of the skills the hard way, without trying specifically to acquire them, and through sheer luck, developed a lot of abstract understanding in the process because those languages are stark opposites in usability.

Even back then, I knew I didn’t know how I knew, and I recognized that as a problem for training anyone else to do it.  I still don’t have an answer today.

It’s kind of a problem on both ends.  I don’t know what makes people better or worse at designing, so I can’t improve myself by pursuing the “better” end of the scale.  And I haven’t made much headway at improving others, either. I’m hoping this blog post helps, but I know I wouldn’t have paid it much attention when I was younger…

Notes

1. But I still have my habits from the 5.1.x days.  At that point, the more ‘basic’ code was, the faster it ran.  The exact same code ran faster in global scope than function scope, faster in a function than an object method, and faster in a standard loop than anything requiring a callback.  Likewise, function calls were faster than object calls because they generated fewer Zend opcodes.

2. You can probably get around this with a cast, but you shouldn’t, and you still have to deal with the case of “possibly not receiving something that has Close” at compile time.

3. I can usually resist this urge, because I have a stronger urge to avoid churn in the git history.

(Post has been updated 2016-06-15 to add clarity in some places, and divide some tangents into footnotes.  I don’t know how well that will work on Blogger.)

No comments: