Saturday, April 7, 2018

PHP 7.2 exposed our bad code

We upgraded to PHP 7.2, and then something horrible happened.

A “thanks for submitting client $x!” message was distributed not only to the agent who did the submission, but to all agents registered in the system.  A lot of drama ensued as confused messages started coming in, and the account rep forwarded the message to basically everyone in the company saying “what happened!?”

We stared at the code, but it was so simple that it couldn’t possibly be wrong, and besides, it hadn’t changed in months.  But the switch over from PHP 7.1 to 7.2 had happened mere hours before this event.

Because it was built almost a decade ago on raw PHP, the code has roughly this shape:

$sql = "SELECT email FROM agents
    WHERE username=:un AND active";
$sth = $dbh->prepare($sql);
$sth->bindParam(':un', $_SESSION['username'], PDO::PARAM_INT);

$cc = array_merge($cc, $sth->fetchAll(PDO::FETCH_COLUMN));
// ...

$mail #->...()

We had a mysterious variable mysteriously pulling in way too much information that it totally shouldn’t have had.  It didn’t make sense.  That shouldn’t have been physically possible.

So we turned off opcache, and we talked about changing our array management not to use array_merge() so much, and stuff.  We worried the template’s caching system was doing something drastically wrong and corrupting unrelated variables due to a PHP core bug of some sort.  And we waited.

It didn’t happen again on the next submission late in the day, but someone pointed out that the email to internal team was also being odd.  They knew Chris submitted it, but “Dave” appeared in the email.  That code was also pretty simple:

$sql = "SELECT first,last FROM agents
    WHERE username=:un";
$sth = $dbh->prepare($sql);
$sth->bindParam(':un', $client['agent'], PDO::PARAM_INT);
$owner_row = $sth->fetch();

And then this $owner_row was assigned into the template variables, where it was literally copied into the template without change.

What could possibly be going wrong?

And why Dave?  He was the first user of the first agency, and my intuition said something was obviously suspicious about that, but I just couldn’t place the problem.

Four man-hours of debugging, and turning off opcache, hadn’t turned up anything.  I was getting ready to roll back to PHP 7.1 out of desperation. But by then it was 5 PM and we’d gotten nowhere, so we called it a day.

But I hate leaving a mysterious technical problem unsolved, so later that night, I was cutting down the code and making it run on CLI when I saw it.

$sth->bindParam(':un', $_SESSION['username'], PDO::PARAM_INT);

There it was, right in the face.  So simple, I had read right over it at least three times.  We were binding the username string to the query as an integer. It was a latent bug that had just happened to work, but now, it didn’t.

And worse: the failure mode was apparently to skip applying that limitation! We didn’t get any warnings logged, nor exceptions thrown, despite running with PDO::ERRMODE_EXCEPTION activated.

Adding to the disaster, we didn’t have any limits applied—either in the SQL or by fetching out just one row of the response—because we knew the query could only return one row (username is, of course, a unique key.)

It had always silently worked before.  But in PHP 7.2, using native prepares with mysqlnd and MySQL 5.6 at the server, it suddenly switched from innocent to dangerous.

(Why didn’t the late-day submission fail spectacularly?  I simplified a little too much above, but the wild CC query actually happens when trying to CC a “secondary supervisor” of sorts; the second submission did not have a secondary supervisor assigned.  But it still had the bad owner query, which meant we needed to try harder to find the bug.)

No comments: