Wednesday, August 22, 2012

Think of the Olden Days!

My first Linux machine had 128 MB of RAM.  The bzip2 warnings that you needed at least 4 MB of RAM to decompress any archive seemed obsolete at the time (even our then-4.5-year-old budget computer had shipped with twice that for Windows 95 RTM) and downright comical now that I have 4,096 MB at my disposal.

I was compressing something the other day with xz, which was taking forever, so I opened up top and only one core was under heavy use.  Naturally.  In the man page is a -T<threads> option... that isn't implemented because won't someone think of the memory!

OK, sure.  It appears to be xz -6 based on the resident 93 MB; with four cores, it's still under 10% of RAM.  The only ways it could come close to hurting are to run at xz -9 which consumes 8 times the memory and would seriously undermine the "reasonable speed" goal even with four threads; to run with 44 cores but not more RAM; or to run it on a dual-thread system on 256 MB.  The concern seems to be nearly obsolete already... will we be reading the man page in 2024 and finding that there are no threads because they use memory?

The point of this little rant is this: someone has a bigger, better system than you.  Either one they paid a lot of money for and would like to see a return on investment, or one they got further into the future than yours.  If you tuned everything to work on your system today, left or right shift by 1, then you have a small window of adaptability that will soon be obsolete.  Especially pertinent here is that parallelizing compression does not add requirements to the decompressor.  A single-thread system will unpack just as well, it just takes longer; unlike the choice of per-thread memory which forces the decompressor to allocate enough to handle the compression settings.

(Like gzip and bzip2, there exist some parallel xz utilities.  But only pbzip2 has made it into the repository.)

Friday, August 17, 2012

Troubleshooting cloud-init on Amazon Linux

cloud-init drops files as it works under /var/lib/cloud/data – you'll find your user-data.txt there, and if it was processed as an include, you'll also have user-data.txt.i.

If you're using #include to run a file from s3 and it wasn't public (cloud-init has no support yet for IAM Roles, nor special handling for S3), then user-data.txt.i will contain some XML indicating "Access Denied".  Otherwise, you should see your included script wrapped in an email-ish structure, and an unwrapped (and executable) version under /var/lib/cloud/data/scripts.

Update 23 Aug: Per this thread, user data is run once per instance by default, so you can't test it by simple reboots unless you have edited /etc/init.d/cloud-init-user-scripts to change once-per-instance to always.  Or use your first boot to set up an init script for subsequent boots.  But this doesn't apply if you build an AMI—see the 1 Oct/8 Oct update below for notes on that.

Update 2 Sep: I ended up dropping an upstart job into /etc/init/sapphirepaw-boot from my script; where the user data is #include\nhttp://example.s3....com/stage1.pl and the upstart job is a task script that runs curl http://example.s3....com/stage1.pl | perl.  stage1.pl is public, and knows how to get the IAM Role credentials from the instance data, then use them to pull the private stage2.pl.  That, in turn, actually knows how to read the EC2 tags for the instance and customize it accordingly.  Finally, stage2.pl ends up acting as the interpreter for scripts packed into role-foo.zip (some of them install further configuration files and such, so a zip is a nice, atomic unit to carry them all in).

Note that I have just duct-taped together my own ad-hoc, poorly specified clone of chef or puppet.  A smarter approach would have been to pack an AMI with one of those out-of-the-box, then have stage2.pl fetch the relevant recipe and use chef/puppet to apply it.  Another possibility would be creating an AMI per role, with no changes necessary on boot (aside, perhaps, from `git pull`) to minimize launch time.  That would prevent individual instances from serving multiple roles, but that could be a good thing at scale.

But now I'm just rambling; go forth and become awesome.

Update 1 Oct, 8 Oct: To cloud-init, "once per instance" means once per instance-id.  Building an AMI caches the initial boot script, and instances started from that AMI run the cached script, oblivious to whether the original has been updated in S3.  My scripts now actively destroy cloud-init's cached data.  Also, "the upstart job" I mentioned was replaced by a SysV style script because the SysV script I wanted to depend on is invisible to upstart: rc doesn't emit individual service events, only runlevel changes.

Monday, August 13, 2012

Muddy Waters of Crypto

After doing quite a bit of searching on the Internet, I've come up no clearer on what the state of bcrypt is relative to alternatives.  Although I did find a couple of odd complaints, which I'll take first.


Friday, August 3, 2012

Why nobody uses SRP?

Aside from it being patented in part, the security goal of SRP doesn't quite fit the way we use the Internet these days: it uses a procedure similar to Diffie-Hellman to establish a secure channel based on the username presented.  Meanwhile, we have a standard for anonymous secure channels (TLS) over which we can exchange credentials without further crypto*, and using HTML forms means not being beholden to browser UI, such as HTTP Authorization's ugly modal dialogs with no logout feature.


* Although it would be nice to be able do <input type="password" hashmode="pbkdf2;some-salt" ...> to enable the server to store something other than a cleartext password, without all the dangers of trying to do crypto in javascript.

Bonus chatter: Someone once asked why I would use Digest auth even over TLS.  "In case TLS is broken" didn't appease him, but since then, we've seen high-profile failures like DigiNotar and Comodo, and attacks like BEAST.