Monday, December 26, 2011

PHP: retrospective

Once in a while, someone on Reddit asks for justification why everyone there hates PHP.  I never reply, because there's too much to list in a comment, but maybe I can write a definitive post here.

Most recently updated: 7 Feb 2014, for new features in PHP 5.6.0a1.

Friday, December 16, 2011

War Story: The Stock List

A day like any other: In order to test that all the categories of products are behaving correctly on the website, I spend an hour writing a page to display a table of in-stock (further subdivided) and out-of-stock items.

About 6 business days after finishing, while waiting for review: instead of reading the entire history of every single Planet MySQL blog, I spend another half hour fancying up the CSS of my page.  My boss catches me, asks what the page is about, rejects the hypothesis that testing is important, and lectures me.  We are not making enough money to pay you your pathetic rate; do not do extra work.

Several business days later: the system is finally approved and live.  Nobody in the office is trained on it when an order comes in.  The order is for an out-of-stock item.  The Big Boss is rather angry, and demands to know whether there is some way to find out "what the site thinks it has in stock."  My boss answers "No."  I am silent.  I'm already looking for a new job.

Business Day 88 (about four months into the 90-day evaluation period): after 2 days and 2 emails, I finally get a meeting with the Big Boss to announce that I'm going to terminate my at-will employment after Day 89 to start my next job, 45 miles closer to home, at $pay * 1.38 + $benefits * 1.25.  (I ultimately decide to tell him the exact offered salary, though I can't tell if he's BS'ing me on whether it's an acceptable/common question to ask, because I figure he won't match it.  He doesn't even try to come up with a counteroffer.)  He threatens that I might need to stay 2 weeks because he doesn't know if I can leave.  The last project was finished somewhere around Day 76, and has been waiting for review.  Every time I pinged my boss on a review, ever, including this final task, the answer was: "later today."

Day 89 was thankfully uneventful.

Tuesday, December 13, 2011

Observations on SPF: Sender Policy Framework

Recently at work, I updated our SPF policy to something accurate.  Along the way, to understand the policy I was deploying and what the previous version actually meant, I had to understand the various rules and types involved.

Thursday, December 8, 2011

War Story:

When I attended community college, the computers in the labs were running Windows 95, pretty much in a state of constant hilarity.  I'll get to that some other time, though; today's wacky hijinks are about The Server: the most secure machine on all of the campus, since it was the master authentication source.

Friday, December 2, 2011

The Chariot

I read "Linking in JSON" the other day.  I knew someone had already gotten started on JSON Schema.  (A quick search shows JSON namespace ideas floating around.)  JSON as the lightweight alternative to XML being turned into XML?  This is beginning to sound familiar.

With lightweight formats, we tend to get a proliferation of variants for different uses.  (Not just images, naturally, and CPAN manages to use more than one.Heavier formats tend to have a problem I'm going to call "Accessories Not Included": they get sufficiently large and complex that not all readers support all format options.  If the growth is arrested early enough, you end up with a handful of profiles; if it gets out of hand, you have over ten of them.

I never expected XML to be so widely used for so much stuff, or spawn so many related specs.  After all, it was verbose!  And you could make up just any tag name you wanted!  But it turns out to scale well from a not-quite-simple tree data structure, with annotated nodes, all the way up to unfashionable Enterprisey uses.  But scalability bothers people, because who knows what wacky thing someone else on the team is going to foist on you, so more restricted alternatives rise to popularity.  If this is true, the rise of Java should correlate to (non-game) companies getting burned enough times on C++.  (If you think about Safety, it makes sense.  He wants you to use Java, because he can't hack reversing the polarity of the template stack flow.)

This sort of thing is practically destined to keep happening.  More features generally cost more memory and processing time, or some other inconvenience like a compilation step, which is against the religion of some developers.  Thus, lightweight versions of things spring up in opposition to whatever is perceived to be too heavy.  Sometimes compilation is considered the lightweight alternative, since it's not done on every request.

Though sometimes many similar projects proliferate because they just aren't that hard.  It's easier to write a web framework than to learn one, so there are a lot of them.

Made it out of the link forest and need something more to kill the time?  Maybe you want to subscribe to my feed, or follow me on twitter.

Monday, November 28, 2011

Resolution Dependence

Why are device pixels so meaningful that we get stuck designing around pixels, even though we "know" we should design for device-independent units?

The main characteristic of a pixel is that it is crisp.  When rendered on a display with 50% more PPI, a 1px line will be either thinner (physical size reduced) or antialiased (blurred).  On the other hand, doubling the PPI lets those 1px lines render exactly as crisply, on precisely 2 physical pixels.  (More crispness is possible, but Apple's version doesn't alter any art.)

If a user wants to zoom so that features are physically 50% larger, then the same problems of rendering 1-pixel features on 1.5px areas occur, but this time we know we can't tweak physical size.  Antialiasing happens instead, resulting in a zoomed but blurry UI.  Worse, subpixel rendering adds noise when not rendering precisely onto the intended subpixels, but the font rendering is done by the time the zooming layer gets to see it on Linux.

Unless everything is lovingly hinted and/or provided at multiple PPI steps, there's basically no solution to the problem.  I'm willing to bet that people will skip properly handling multiple PPI settings if it's any more complicated than supporting power-of-2 sizes.  As long as pixels matter, which they will up to 600 PPI or more, people are going to design for pixels.

Wednesday, November 23, 2011

REST and RPC: Not Actually Antonyms

Last month found me writing a rant about REST and the shortcomings of interpreting it as "REST = HTTP+HATEOAS".  I submerged myself into some writings of Fielding, and took some time for reflection, and I've found one of the sources of my problems with "REST".  (Another.)

This problem is that there's too much writing on the web that attacks "RPC systems" as the logical opposite of REST, and I took this assumption unquestioned.

Friday, November 18, 2011

Smart TV and Split Attention

I don't want to waste much effort on trivialities, but regarding Gruber's semi-recent post:
Imagine watching a baseball game on a TV where ESPN is a smart app, not a dumb channel. When you’re watching a game, you could tell the TV to show you the career statistics for the current batter. You could ask the HBO app which other movies this actress has been in. Point is: it’d be better for both viewers and the networks if a TV “channel” were an interactive app rather than a mere single stream of video.
This is not actually a universal good for viewers.  They'll probably like it and want it, but if there's one thing I have learned about myself, it is simply:

Splitting my attention between things means I don't remember either thing.

Worse, the things that a smart channel offers me in Gruber's vision—the things actually related to the show I'm watching—are useless trivialities.  If I had smart TV and lacked the discipline to avoid these side quests, then I wouldn't gain anything out of my screen time.  I'd forget the answers to the fleeting distractions, and also not be able to remember what I was watching in the first place.

I can say all this because I already know what the price of distraction is.  I refuse to pick up my iPod while watching things, no matter how interesting it seems at the time, because I'd rather focus on the show or movie.

What makes me happy?  It's not the Internet; it's not TV; it's not apps; it won't be all three of them rolled together into smart TV.  However, a smart TV done well will still be a success in the market.  We'll find out sooner or later whether Apple did it well.  It's almost certain that they'll try.

Wednesday, November 16, 2011

Programming Languages to Learn

Many languages these days are fairly Lispy, except for being homoiconic and thus having a full-strength macro system instead of C's token pasting or many other languages' nothing.

But which ones are absolutely vital to learn, and which ones are "just different languages"?

Tuesday, November 15, 2011

Private Streaming with CloudFront: A Guide

Update, 1 Oct 2012: This post is largely obsolete, as Amazon recently added private streaming support to the CloudFront section of the AWS Console.  The original post follows.

I'll just assume you're aware of the IaaS offering known as Amazon Web Services, AWS.  CloudFront is a CDN in the AWS micropayments-as-you-go style, which offers the ability to serve non-public content stored in S3.  This is a compendium of the things I learned setting up a private streaming distribution for use with PHP.

This is going to be fairly low-level, since I like to drink deeply of the systems I'm working with.  I don't think AWS works smoothly enough yet that you can put the API on the "it's magic" side of the line.

Saturday, November 5, 2011

PHP's hash(): how tiger192,3 and tiger192,4 differ

PHP lists a handful of hash algorithms for Tiger:
  • tiger128,3 tiger160,3 tiger192,3
  • tiger128,4 tiger160,4 tiger192,4
What's the difference?  Which one is standard?  Is one harder to break?  Why don't any of the outputs match either of Wikipedia's examples?

Tuesday, November 1, 2011

Renaming buckets on S3

A technical note, since the search engines of the internet don't seem to have noticed: Amazon's S3 management console lets you cut and paste files now (including whole folders).  So the process to "rename" an S3 bucket is simply:
  1. Create the new bucket with the desired name.
  2. Go to the old bucket and select all files: click the first and then shift+click the last.
  3. Above the file listing, in the button row, is one marked "Actions", which opens a menu that includes "Cut" and "Copy".  Pick one.
  4. Go to the new bucket, click Actions, and Paste your files.
Done.  No 3rd-party software required.

Why would anyone want to rename a bucket?  In our case, we created a StudlyCapsStyle bucket, which can't be used with CloudFront's dns-compatible-style.

In double-checking this post for accuracy, I noticed that Cut/Copy are available on the right-click menu for a single selection, but not the multi-select.  Weird.

Tuesday, October 25, 2011

Character Sets: Get PHP, Perl, MySQL, and Unicode to Play Together

This post is a companion to Perl and Unicode in Brief, an attempt to cover the same ground more concisely.

This is an extended remix of my recent post on the subject, only less of a rambling story and more focused.  Again, I'll start with some background definitions.

I'll also assume that you're going to make everything UTF-8, because as a US-centric American who has the luxury of using English, that's what makes the most sense for my systems.  However, if you understand everything I wrote, it should not be difficult to make everything UTF-16 or any other encoding you desire.

Friday, October 21, 2011

The Trouble with REST

Note: this post has been superseded.

REST is easy to describe.  It goes a little something like this: "You have some representation, and you send (or receive) the whole thing to read it or make changes."  People coming from Clojure would understand it as REST sends values.  I can GET an object, receive the value, manipulate it, and PUT the new value.  It's so easy because it just uses HTTP!

Right?  Maybe not.  If REST is so easy, why is there HATEOAS*?  Shouldn't that have been obvious?  Why do we have arguments about versioning and parameters and formats and headers on Reddit?

Wednesday, October 19, 2011

Notes on using mysqlbinlog for copying updates

I commented on this post, but for posterity:
It seems by sheer luck that I stumbled over a way to take care of everything. I save a copy of the interpreted binlog as it files through the pipe:

mysqlbinlog ... | tee binlog-play.sql | mysql ...

Then if I get an error message, mysql will tell me e.g. "Error ... at line 42100". Running "vim +42100 binlog-play.sql" lets me inspect the stream to see what went wrong in detail.

Inside binlog-play.sql, the "#at 112294949" comments can be used in e.g. "--start-position=112294949" to the next mysqlbinlog command, to retry the statement after I fix the problem. (Alternatively end_pos seems to tell the position of the next command, if I need to skip the one which failed, e.g. I was testing out CREATE FUNCTION and it was logged as "CREATE DEFINER=... FUNCTION" which RDS refuses.)

The final piece of the puzzle is that executing "FLUSH LOGS;" or "mysqladmin flush-logs" will push mysqld on to the next binlog file, so you can safely play out the one you want. Once you've finished processing a file through mysqlbinlog, you can just remember the file boundary, and flush mysql's logs if you want to process the one it's presently writing to.
This is in regards to piping mysqlbinlog output from one mysql server into the mysql client to execute on another; the post I linked above discusses doing so for switching to Amazon RDS.  The basic strategy is to minimize downtime by loading a database dump from the source on the destination, then use mysqlbinlog on the source and the mysql client to feed updates from the source to the destination.  The updates can be faster to load than a new dump; and when it's time to switch servers, it's a matter of stopping database clients, turning off the source mysqld, sending the final binlog updates, pointing the clients to the destination server, and turning the clients back on.  As opposed to waiting for a whole dump to load while the clients are off.

Tuesday, October 18, 2011

Character Sets, Encodings, MySQL, and your data

This post is a companion to Perl and Unicode in Brief, an attempt to cover similar ground more concisely.  And this post is a revised version of the one you're currently reading.

I'm currently moving data from a (relatively old now) MySQL 5.0 server into Amazon RDS.  I've been here before, when I was moving data from MySQL 4.x into 5.0 and mangling character sets.  This time, I want to make 100% sure everything comes across with maximum fidelity, and also get the character encoding as stored to be labeled correctly in MySQL.

First, a quick definition or two:
  • Character Set: a specific table to translate between characters and numbers.  Example: ASCII defines characters for numbers 0-127; "A" is 65.  This can also be described as "a set of characters, and their corresponding representation inside the computer."
  • Character Encoding: a means of "packing" numbers from the character set into a container.  Example: UTF-8.  The Unicode character 0x2013 becomes 0xE2,80,99. The "E" signifies "Part 1 of 3", and part of the remaining bytes simply indicate "Continued"; the 0x2013 is then divided up to fit in the parts of the bytes that aren't indicating their "Part 1" or "Continued" status.  In the specific case of UTF-8, the encoding is designed so that the ASCII range 0-127 (0x00-7F) is encoded without change: a leading 0-7 means "Part 1 of 1".
  • 8-bit character encoding: In older, simpler days, character sets defined only as many characters as could fit in 8 bits, and defined the encoding as simply the numbers.  Character number 181 would encode as a byte (8 bits) with value 181.
  • A character encoding implies the associated character set, because the encoding defines how numbers in its character set become individual bytes.  How characters in other sets would be encoded is left undefined and basically impossible.
This last point is why MySQL lets you set "character sets" to UTF-8, though the latter is an encoding.

Tuesday, October 11, 2011

iPad vs. Tablet PC

One of them succumbed to death by risk-aversion.

One of them couldn't let go of the tether and fly.

I think Linus said the same of svn: paraphrased, "If you're trying to make 'a better CVS' then you have already lost, because CVS is too broken to fix."

Hey, sapphirepaw: make sure what you do is good on its own, not "an X only different".

Saturday, October 8, 2011

Steve Jobs

I'm getting old: if I were to pass on at the same age Jobs did, my life would be more than half over already.

What separates me from Jobs?  There's the matter of leverage, where he could take his vision and coordinate the prototyping and development of it, into the iPod, the iPhone, the Macbook Air, the iPad.  There's also the matter of having vision.

In 2006 or so, I beheld my first iPod in real life, an old (FireWire based) model with a physical click-wheel.  In 2008 I picked up a different, small MP3 player and for the first time, immediately noticed the limitations of digital control.  Without having handled the iPod and getting a feel for the analog response of the wheel, I probably wouldn't have given the buttons a second thought.  Do you want to scroll on the generic?  Click-click-click-click.  Or click-and-hold, guess at how long you need to go (since the screen is slow enough to be unreadable at this scrolling speed, and they don't slow updates to compensate), and release.

The point here is, Jobs saw humans as inherently analog, and adapted all of his machines to analog control.  It's a simple thing, but Jobs was apparently devoted to HCI.  The "vision" simply falls out of that.

It's not like the limitations of digital control weren't apparent in the 1980s.  Compare Rad Racer to a real car's steering wheel.  Anyone focused on "how it feels" could have been Jobs back then, inventing 2010 in the 16-bit era instead of carrying 8-bit paradigms through the 1990s.

In contrast, I seem to lack vision because I'm busy implementing arbitrarily complex business rules at work, and staying away from the bleeding edge of gadgetry.  I'm not in the consumer space; I'm not taking any research toward the consumer space; and I'm not thinking about what's next for it, either (at least, not beyond what turns out to actually be the next thing*.)  But, I'm also having little impact on the wider world, writing code that never leaves the house.  It's important, but after I am gone, will these be the best years of my life?  Will I think college was the best time of my life, forever?

I think it's time to put my free time to better use and do something instead of watching the world slowly develop towards Jobs' vision on its own.

* I have a dead draft which discusses the crazy idea of "having a set-top box inside the remote" in 2006 or so.  It then points out that h.264-over-wifi ought to handle the bandwidth to do exactly that from your iPhone now.  It starts fleshing out what would be necessary to make it happen, then abruptly ends with a note: "Two days after I started writing this, Apple announced AirPlay."

Thursday, October 6, 2011

Setting Everything on Fire

I created a new user, gave them wheel group, and in case I needed another admin user, added %wheel to sudoers through visudo.  Then, I was trying to do more stuff, and...

[sudo] password for ec2-user: _

Wait.  What?  Not only does ec2-user have no password, but I didn't change its NOPASSWD line in sudoers.

It turns out that ec2-user is also in group wheel, and when confronted with the two permission sets, sudo did what I didn't mean: applied the %wheel rule and started requiring passwords for ec2-user.  Of course su was no help either: root likewise has no password set, because you have sudo as ec2-user....

Friday, September 9, 2011

War Story: The Training Jump

One of the first things I did at my current job was to rewrite a Perl/CGI (the module, and the actual cgi-script execution model) site into PHP.  Part of this site implemented a single-signon (SSO) system for a partner site that hosts our training videos.  Clicking the link led to the innocuously-named "" CGI script.

The goal in life of the training_jump is to redirect a user to the partner site, passing along the username and email address.  The partner site creates the user if necessary, starts a session for them on its server, and ultimately displays the actual training content.

Inside training_jump is an innocuous-looking "use OAuth::Lite;" line.  I didn't know what OAuth was at the time, so of course I went and looked it up: OAuth is designed to let a site like ExampleMashup authenticate someone as "twitter user chris" without needing to ask chris directly for their twitter password.  Of course, this makes no sense, because in our case, we possess the account, not our partner.  Likewise, once the login is complete, the user should end at our partner's site rather than our own.  We have nothing to use the oauth token for, because we don't perform any operation at the partner site aside from the login.

Yet here inside training_jump was OAuth.  The user hit training_jump; we redirected to the partner by IP address (!) with the OAuth request token, all the necessary user data, a callback URL (training_jump again); they duly redirected; we collected the response token and redirected the user back to the partner with that token as the parameter.  The end result is still kind of fragile, in that AFAIK, it only works in the first browser you sign up with.  If you log in with Firefox, then try it in Chrome, the latter gives you an error somewhere along the line instead of videos.

IIRC, research at the time indicated that there was no good PHP OAuth library, and/or the suitable libraries didn't implement the exact flavor of OAuth API that was being used by the Perl code.  I'm absolutely certain I considered replacing the Perl entirely, but I don't remember why I rejected PHP OAuth as a solution.

I couldn't simply continue using training_jump as-is, because the CGI module and PHP store their session data in different locations, in different formats.  The username in the PHP session wouldn't be accessible to pass through the authentication dance, and it was clearly inadvisable to modify training_jump to accept a username as a URL parameter.

Nowadays, training_jump has been succeeded by the cleverly named training_jump2, which actually reads request variables on stdin and produces an answer on stdout.  (The format of this text is much like LiveJournal's ancient API, from back when I had a LiveJournal client.  There was no convenient interchange format, as the Perl code didn't have JSON installed at the time and PHP didn't have XML.  "Lightweight" eats you again.)  The PHP training_jump manages the connection between server environment and training_jump2, and training_jump2 simply had its server environment replaced with communicating over pipes.

We're in the negotiation phase of moving to the provider's newer platform, which has a proper, encrypted SSO system.  training_jump2 is slated to become irrelevant, eventually.  In the meantime, it's the only bit of Perl CGI that never made the jump to mod_php.

Sunday, September 4, 2011

sapphirepaw's Introduction to Pointers, version 2

I programmed in assembly for some time, using pointers without understanding what they were, or that they were called pointers.  When I finally got to learning C, the pointer syntax was downright inscrutable, but when I got it, suddenly all of C and all of assembler laid clear before me, all at once.  It was a beautiful thing.

I was reminded about this while reading this post from HN.  It inspired me to try explaining pointers from the opposite direction.  Instead of trying to teach pointers via C syntax, let me try to start with pointers outside of programming, then discuss them in relation to C and PHP.

Monday, August 8, 2011

What Would "Better" Be? PDF Reader Edition

I opened up a PDF in the default Gnome PDF reader last night, and it was once again a terrible experience.  It opened with the zoom set to "fit page width", and the scrolling set to continuous.  There's no concept of a persistent user preference, or user preferences that override the document preferences.

Then I got to considering the underlying reasons why I didn't like the default display.

Thursday, July 14, 2011

The Facets of Net Neutrality

In its original conception, "Network Neutrality" as I understood it was about a lack of privilege amongst competing traffic sources: that Google, Viacom, the atheism reddit, the Anglican Council, and the Time Cube site would all be subject to equal traffic slowdowns in the face of congestion.  A bit of thought would suggest that treating individual packets equally was not, in fact, desirable: you probably don't want your VOIP call and each individual P2P connection to be subject to the same rules, really.  You'd rather the call got through even at the expense of delaying a few packets of your (or your neighbor's) download.

Certain large ISPs have been trying to twist it to mean they can charge on both sides, for content providers to be allowed to send data to "their" customers, though the customers are already paying (quite profitably for the companies) for their own access.  They would be charging everyone for access, so it's "neutral," right?  This is an anti-neutrality stance trying to co-opt the word so that it sounds like a good thing.

Pro-neutrality forces (in the first sense) argue that requiring content providers to pay for carriage, or for "premium" speeds, would completely destroy the internet as we know it.  Also, many of them believe they are preserving existing neutrality, but this turns out to be incorrect.  A content delivery network (CDN) essentially is an implementation of pay-for-speed, because the content provider pays for their content to be stored closer to end-users, which reduces load time for those users.  Although the end-user's ISP doesn't receive payment directly, the content provider's payment to the CDN also funds the overall system by paying for the CDN's own connectivity at the ends, and infrastructure in the middle.

I think the value of the Internet is in two things: uniformity of access for end-users, and fair division of capacity.  Uniformity of access is simply that any connection should be able to carry packets from any content provider, so that the view of "the Internet" from any one ISP is the same view as from any other.  Otherwise, "the Internet" would cease to have meaning, as it reverted to the days of online services like CompuServe, Prodigy, and AOL.

Fair division of capacity is exactly what it says on the tin, that speeds and latencies should be balanced among customers of an ISP.  I shouldn't be able to start a download and prevent Netflix from delivering video to my neighbor, and a bunch of people on 6Mbps connections shouldn't be able to deny service to 1.5Mbps subscribers.

The real emotional punch that gets brought into neutrality discussions seems to come from the leonine terms the ISPs would like to apply: around one-tenth of the current (often secret) usage limits, for as low as six-tenths of the price, as in Time-Warner's experiment last year. Though the current arrangement is apparently profitable and growing more so over time: the cost of carriage is falling faster than inflation is diluting revenues.  The fear is that ISPs will establish these terms "in order to build out next-generation networks" and then not follow through on that investment, artificially limiting their service and allowing inflated payments that do nothing but lift the artificial restriction—in order to offer what is on the market today.

Promises, after all, are cheap.

This fear is only exacerbated by the incumbent ISPs' wars against municipal broadband.  City-owned networks are being opposed in many states as 'unfair' competition.  In at least one case, the city in question embarked on its network building course because the ISP claimed they would never offer higher speed.  Yet as soon as the city decided to offer higher speed itself if nobody else was going to, the ISP frantically began upgrading their infrastructure, hurrying to complete it before the city's project was finished, so they could argue that the city network was 'unnecessary' due to the ISP offering its (new) high-speed service.

This fear is further exacerbated by the regular broadband reports showing that countries with more competition amongst ISPs, regardless of urbanization, have the fastest speeds and highest limits on data transferred, where applicable.  If larger companies truly did have more efficiency and more benefit to the customer as they claim, then the average US broadband connection should meet—or exceed—the average connection in Japan.  Instead, large companies' performance suggests they are the major impediment to improved service.

For the Internet to continue its course of innovation and convenience for the American consumer, protection of uniformity of access and fair division of capacity are sorely needed.  Placing these responsibilities into the hands of existing large ISPs who have been actively demonstrating their complete lack of commitment to the principles, or their customers, except when threatened en masse with an alternative network, is clearly the wrong course of action to ensure the result.  It is putting the fox with feathers stuck in its teeth in charge of the hen house.

Monday, July 11, 2011

TCP: Conflicting Goals

David Singleton writes in "Why mobile apps suck when you're mobile (TCP over 3G)":
TCP assumes that the connection has a more or less constant RTT and assumes delays are losses due to congestion somewhere on the path from A to B.
This struck a special chord with me, because I had just recently read about TCP algorithms that had been designed to combat "buffer bloat": instead of scaling strictly based on packet loss, assume increases in latency are due to buffering on the path.  Then, back off to avoid both packet loss and longer latency, which is measured by RTT.

Since 3G attempts to implement reliable delivery itself, TCP-in-3G bears performance characteristics similar to TCP-in-TCP that is explained in Avery Penwarr's sshuttle README.  (sshuttle takes care to extract data from the one TCP connection and copy it to a technically distinct connection, instead of wrapping it, in order to avoid the problem.)  And actually, I see that Singleton linked to another source going into more detail, which I skipped reading the first time around.

So not only is 3G a bad transport for that reason, but the variable RTT its delivery mechanism introduces also sinks TCP algorithms which try to use increased RTT to avoid queueing in buffers.  The buffer-avoidance aspect can't distinguish between "bad" buffers like those in a cheap home router that take huge chunks of data off the Ethernet at 100 Mbps, then dribble it out at 0.6 Mbps to the Internet at large; and "good" buffers like those in the 3G system that are unclogging the spectrum rather than crowding other users of the tubes.

Singleton proposes some mitigations for app developers; I'd rather try to "fix" TCP so that it gracefully handles variable RTT.  It may violate the perfect conceptual segregation of the OSI Seven Layer Model, but simply having the phone's TCP stack aware of the wireless interface itself would go a long way toward mitigating the problem.  Perhaps if the 3G hardware could indicate "link restored" and "backlog cleared", TCP could skip using the RTT of packets received between those events in its congestion avoidance.

It seems like WiFi would need some mitigations as well.  It is particularly prone to periods of "solid" packet loss, occasionally even destroying the beacon signal and thus kicking everyone off, and periods of fairly reliable reception.  However, when you do get reception back, the data pours in without significant degradation in speed, so the underlying issue is a bit different.  However, the connection always seems to be particularly slow if it has the bad luck of being started during a period of loss.

In the end, the problems seem to come from allowing endpoints to specify receive-windows, but not the network.  TCP views the network as a dumb thing that it can draw conclusions about based on end-to-end behavior.  Yet the increasing prevalence of wireless, and of sending TCP over wireless links, seems to indicate that "the network" should be able to add metadata to the packets (probably at the IP level, since the network is conceptually unable to peek inside of IP data) to indicate that the delivery of the packet was delayed for reliability.  Unfortunately, rogue devices could set that bit for their buffer-bloated packets, so it's about as practical as the Evil Bit.

Sunday, June 5, 2011

Python's sum()

In Python, the sum() builtin gives you the ability to take a list, say [1, 2, 10] and find the sum of it as if you had written out 1 + 2 + 10.

The + operator is also defined for lists, where if you write out [1] + [2] + [10] you'll get a list back: [1, 2, 10]

What happens if we put these two observations together? Can we sum() a list of lists to get one flattened list?
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print sum([[1],[2],[10]])
Traceback (most recent call last):
  File "<stdin>", line 1, in 
TypeError: unsupported operand type(s) for +: 'int' and 'list'
Nope. sum() internally starts with "0 + (first element of sequence)" so you can only pass things that can be added to integers.

Friday, June 3, 2011

The First Step of a Long Journey

Over the past couple of weeks, I have assembled a reader in PHP, such that it understands code of the form (print (== (+ 4 4 6) (- 30 15 1))) and will be able to create PHP source that ultimately prints out "1".  It's kind of brokenly stupid in other ways, but it's the bare-bones skeleton of a working compiler.  Something I have never been able to build prior to this attempt, largely because I wanted to tokenize something superficially like PHP, and I always got bored of defining all the stupid tokens.  Going with s-expressions made for only a handful of token types so that I could get on with the interesting bits instead of grinding out pages of /&&|\|\|/ crud.  Because almost anything can go in an identifier, I can treat everything as identifiers for now.

There are a few obvious things it needs next: string types.  Variables.  defun.  defmacro.  Separate namespaces for functions and variables, defined by context, so you can say (array_map htmlspecialchars row) and it will know that the first argument passed is a callable and the second is a expression, so that they can compile to 'htmlspecialchars' and $row, respectively.  And to serve its original purpose as an "enhanced PHP"-to-PHP compiler, it needs to read that source language rather than s-expressions.  Of course, with a non-sexp-based language, macros might not work out so well, but I do want to be able to run code to rewrite the AST (or the whole tokenizer: aka reader macros) at compile-time.

There's a bunch of features I want to add, too.  Proper named arguments.  Multiple-value return.  Ubiquitous lexical scope, so obviously let and its function equivalent (flet perhaps?). Something else that I'm forgetting at the moment.

In the long run, I also want to do some optimizations; ideally, I could turn $efoo = array_map('htmlspecialchars', $foo); into $efoo=array(); foreach ($foo as $k=>$v) $efoo[$k]=htmlspecialchars($v); as well as doing simple optimizations like i++; to ++i;.  I'd also love to be able to compile some 5.3 code like $foo::bar("baz"), ?:, and "nowdoc" syntax into 5.2-compatible renditions (answer to the first: call_user_func(array($foo, 'bar'), "baz") though my accumulated wisdom now considers such things to be a code smell).

The weird thing about this is that if I succeed, I'll be doing what Rasmus did to create PHP—riffing on an existing system in the domain to come up with something a little better.

Thursday, May 19, 2011

Accidental Lisp

It began with a simple bit of laziness: I wanted a preprocessor so that I could write as if PHP had multiple return values.  I'd write "return $x, $y;" in the callee, and "$a, $b = fn();" in the caller, and the preprocessor would rewrite it to valid PHP (throwing array() and list() around the appropriate expressions).

But I'm even too lazy for that.  To do this right, I'd need to fully parse the PHP, so I could understand more complicated return expressions like method calls.  So instead of that, I slapped together a lexer for s-expressions.  They're a lot less hairy, and this is just some twisted experiment.

I was halfway through putting together a parser this evening for the lexer output, when I realized: a few years ago, I ported the metacircular evaluator from the SICP lectures into Ruby... then discovered I would need to write an s-expression parser, which you get for free with Lisp.  (That project then died.)  But if I finish an s-expression parser... I can port the metacircular evaluator to it and have the world's stupidest Lisp-1 implementation, i.e. it'll be done in PHP.*

Alternatively, I can define a package in SBCL that emits PHP, and have the reader and macros for free.  Then my head exploded.

* Because this tool was intended for PHP shops, the compiler would have to be written in and emit PHP so there's no Scary Foreign Language involved, other than the compiler's input.  And originally, that input language was going to be almost PHP.

Tuesday, May 10, 2011

Quickie: Diffie-Hellman Groups

Relying on others' suggested magic numbers for crypto is probably a Bad Idea, so recently I studied Diffie-Hellman a while to understand what the "DH Group" parameter was in my IPSEC setup, and my PuTTY settings.

DH turns out to be a lot like RSA, so bit lengths are comparable between the two and neither is directly comparable to symmetric ciphers like AES.  A specific Diffie-Hellman exchange happens using some parameters: a generator for the base, and a prime to use as modulus.  (An exponent remains secret.)  DH Groups refer to specific, pre-chosen prime-and-generator pairs so that, for example, SSH can negotiate "group 14" instead of transferring the complete parameters themselves.

These groups have been standardized in RFC 2409, with additional groups defined in RFC 3526.  The latter RFC defines the bit lengths of the groups explicitly, stating that group 5 is 1536 bits, group 14 is 2048, and group 16 is 4096 bits.  As far as I can tell, groups 1 and 2 defined in the earlier RFC are only 768 and 1024 bits, respectively.

Note well: I believe this means DH groups 1 and 2 are dangerously short and should not be used to set up an IPSEC VPN today.  Likewise, PuTTY should really be configured out-of-the-box to warn about the use of anything less than DH group 14.  However, before I take my own advice, I need to do some experiments to determine whether the IPSEC client in iOS actually handles DH groups other than 2.  Edit from THE FUTURE: iOS 4.x does not accept other groups; iOS 5.x no longer accepts group 2, AFAICT.  I haven't gotten a working IPSEC VPN set up again, though, since it's not very important to me.  (Work provides a PPTP VPN.)

Wednesday, May 4, 2011

Quickie: The Necessity of Whimsical Names

Rackspace recently announced that they'd like to discontinue Slicehost at some point, migrate everyone to the EC2-like Rackspace Cloud, and make people worry per GB about the bandwidth they're consuming.  So I'm preparing a move to Linode for more of everything*, and in the planning, I've come across a new argument in favor of whimsical names for servers.

If I give each server a whimsical name, like and, I can always refer to the old and new IP addresses as "alice" and "bob", while the change of IP of "www" propagates through the DNS.  Between the time where the new address is set and the old one is expired (and note that there's no way to force an ISP's resolver to honor the TTL if they choose to assume "no TTLs will be shorter than an hour") the name being transitioned points to a more-or-less random server.

Basically, the whimsical name is like a server ID, and the service-based names are just conveniences.  Though a program is three lines long, someday it must be maintained; though a server hosts one service, someday it will have to be replaced.  When an organization gets big enough that it can't generate whimsy as fast as it needs servers, then it should go with something more regular for the server name, but each server should still have a unique, non-service-based name.

* Except bandwidth, but the 11% difference is smaller than my current monthly consumption, so it turns out not to matter much.  Even if it did matter, that much transfer on The Cloud (insert angelic chord here) would be expensive, so Linode still wins.

Thursday, April 21, 2011


They say you don't get a second chance to make a first impression, but that depends on who you are.  Apple seems to have managed a couple of major architecture transitions and their own Vista without too much ill will, yet Microsoft was practically crucified for Vista with no architecture transitions.

Fair warning: many links in this post lead to tvtropes.

Friday, March 4, 2011

The Authority of the User

I used to believe that my computer was mine, and no program had any authority to do anything without my consent.  (This can probably be traced back to my days on Slashdot, a decade ago; if I didn't get the opinion from there, they certainly reinforced it.)  I believed I was sufficiently smart to manage my own software, without everyone's updater constantly nagging me to do so.  I especially didn't want the updater to do it on its own; this often lead to problems, especially when Firefox got updated behind the scenes while I was using it.  However, I liked automatic security updates on Linux, so I got rather used to restarting Firefox when links mysteriously failed to be followed, or menus and tabs couldn't be opened—these being the days before the "Firefox has been updated and needs to be restarted" notification.

Then, everything changed.

Wednesday, March 2, 2011

Quick tip: extending the man search path without $MANPATH

If you've ever tried to add a directory to man's search path, you've undoubtedly noticed that the MANPATH environment variable replaces rather than extends man's built-in search path.  Today, I rediscovered a clever little setup on a machine at work.
  1. Copy /etc/man.config to somewhere in your home dir.  Mine seems to be at ~/.config/man/man.config for optimal redundant redundancy.  (I will say that keeping the "man.config" name of the file makes vim highlight it without additional fuss.)
  2. Add your desired MANPATH lines to this file at whatever position you wish.  Don't forget to curse the lack of an include mechanism at this point, which prevents you from automatically getting changes to /etc/man.config.  Cheer up, because there probably won't be any.
  3. Add an alias to your shell.  For bash, you would put something like alias man='man -C ~/.config/man/man.config' (which obviously includes the name of the file chosen in step 1) into ~/.bashrc.  Remember to source ~/.bashrc to make it take effect in the current session.
That's all!  Now when you run man, your personal manpages will be searched as well.

The documentation for man on the system in question claims that it will use $PATH to guess at additional man page locations, but this does not actually work for me.  Having a command in ~/.install/bin does not allow man to find the manpage in ~/.install/share/man.

Monday, February 28, 2011

Variable scope, require, and use

I ran into some interesting problems in Perl, which invoked more learning around the require/use mechanisms and how constants are interpreted.  In this post, I'll lay out some general terms about variable scoping, such as lexical scope, dynamic scope, the differences between them, and how they all interact in Perl.  And then I'll cover require and use with that foundation in place.

If you've been wondering about lexicals or closures, this is your post.  I've tried to lay things out more or less from basic principles, despite the verbosity of the approach, because this has taken me years to understand.  I started programming with Perl in 2000 and still learned a bit more about it today.  Yes, it's 2011 now.  Hopefully, you can read this and get it in less time.

Saturday, February 19, 2011

Changing a Tablet's Active Area in Ubuntu Lucid

The following information applies to Ubuntu 10.04 LTS, Lucid Lynx, with xserver-xorg-input-wacom installed to provide xsetwacom.  This is about fine-tuning your tablet; if your tablet isn't working at all, you probably need bug #568064.

There used to be a wacomcpl program to graphically configure a Wacom tablet; this quit working with changes to the upstream project and/or the Tcl dependency, so it hasn't been working for me for some time.  Before it quit working, I set up a script to call the xsetwacom command-line program with the desired results, so the loss didn't affect me.  Mainly, I had adjusted the active area so that tracing a circle on the tablet would result in a circular shape on the monitor.

With a new monitor came a new need to reconfigure the tablet, without using wacomcpl this time.  I ultimately created a couple of formulas to make a strip of the tablet inactive.  Without further ado, these are the formulas:

 $x_offset = $w - ($h * $aspect) # narrower monitor
 $y_offset = $h - ($w / $aspect) # wider monitor

$aspect is the aspect ratio of the monitor, obtained by dividing where you write the colon.  For example, 16:10 = 16/10 = 1.6.  Alternatively, you can divide the width in pixels by the height, so a 2560x1600 display has an aspect of 2560/1600 = 1.6.  (If you have square pixels, which practically everyone does because they're so convenient.)  The monitor being narrower or wider refers to whether the monitor's aspect is lower or higher than the tablet's, respectively.  You can calculate the tablet's aspect by dividing $w by $h; obtaining them is the subject of the next section.

$w and $h come from the actual tablet, which you can find easily enough.  In these commands, $T represents your tablet's name, which you can get from `xsetwacom list dev`.  In my case, there's a tool name attached, so it prints "Wacom Bamboo 4x5 Pen STYLUS" (among other things) but only the "Wacom Bamboo 4x5 Pen" portion is the actual device name.  The first command simply resets the coordinates to cover the full tablet, just in case they have been changed.

 xsetwacom set "$T" xyDefault 1
 xsetwacom get "$T" BottomX
 xsetwacom get "$T" TopX
 xsetwacom get "$T" BottomY
 xsetwacom get "$T" TopY

$w is BottomX-TopX, and $h is BottomY-TopY. 

Armed with this information, you should now choose the correct formula from above, and substitute all the numbers.  In my case, the top coordinates are both 0, so BottomX=$w=14720, and BottomY=$h=9200.

My old monitor was much narrower (at 1280/1024=1.25) than the tablet (at 14720/9200=1.6), so I used the first formula, thus:

 $x_offset = 14720 - (9400*1.25) = 3220

And to set that value:

 xsetwacom set "$T" TopX 3220

My new monitor runs at 1920x1080, which yields 1.7778 for aspect.  The monitor is wider than the tablet, so now I need the second formula:

 $y_offset = 9200 - (14720/1.7778) = 920

Now that the offset is known, it's a simple matter to set up.  I just add it to the original TopY value (zero for me, so no different) and set that as the new TopY:

 xsetwacom set "$T" TopY 920

Altering TopX or TopY means that the inactive portion of the tablet runs down the left or across the top.  I don't really care where the dead zone ends up, so I chose the method that results in the fewest calculations needed.  You could just as easily set BottomX to BottomX-$x_offset to move the dead zone to the right side of the tablet, or adjust both TopX and BottomX by half of the $x_offset to keep the active area centered.