Friday, August 28, 2009

Snipping the blight... fantastic!

I've been fighting an elusive bug for the past few weeks: I've developed some major enhancements for my passwords site plus upgrading to more recent versions of Rails and associated plugins (improved security and testability being my primary goal).  Things were going along fine until I deployed my app to production: I was able to login without errors, but was then immediately returned to the login screen. Needless to say, the Capistrano deploy:rollback task was just what the doctor ordered (that's why I love Cap: it took a bit to get it setup originally, but it really makes deploying or undeploying applications a snap - if you have an app in production and you don't have Capistrano or something similar, do it now).

Anyway, I setup up a production environment on my development machine and, sure enough, it failed in production mode but ran fine in development mode. I modified the production settings so that I would get error messages [Note to self: put it back like it was!] and got a message saying that a module in the filter chain had returned false... it looked as if it were failing authentication somewhere in the login system.

I started digging into the code and using the debugger to see what was going on: from the symptoms and the error message, it seemed to be a problem with session state. Checking with my hosting tech support and local Rails folks, I learned there were reports of problems with sessions in Rails 2.3.2 and one of the local chaps, Brendon Whateley, reported some serious session issues that he had and chased the code back to Rails initialization and lots of 'magic' aliasing and overriding of ActiveRecord methods (nice stuff when it works). To quote:

"Having spent the best part of a week debugging the sudden failure of sessions recently, I may be able to help.  The first thing to figure out is if the session is saving state and not loading, or loading but not saving.  In my case the ActiveRecordStore was not serializing the session data, but was loading and saving the session object.  This meant that previously set session values still existed, but any changes did not persist across calls.  I was able to validate that the logins were successful, but suffered similar issues to you.

"In the end I tracked the problem down to overriding the ActiveRecord::Base.create_or_update method.  For some reason it completely broke ActiveSessionStore serialization, even if the redefined method contained the exact same code as the original.  I never completely figured out WHY that broke, but if I moved my changes to create and update methods that create_or_update called, I no longer experienced that problem."


Made sense to me, and since I am using some of the ActiveRecord callbacks, it seemed quite reasonable. So off I went, digging thru my code, researching how the callbacks were used, trying to find out if anyone else was using them, more debugging... all of which went nowhere.

Yesterday, I bit the bullet and upgraded to Rails 2.3.3. Very easy; just do
  gem install rails 
and try it again in production mode.

You can imagine my surprise when all of a sudden, the app crashed and emitted the following error messge:

ActiveRecord::StatementInvalid (Mysql::Error: Unknown column 
'assets.type_id' in 'on clause': SELECT count(DISTINCT `assets`.id) AS 
count_all FROM `assets`  LEFT OUTER JOIN `types` ON `types`.id = 
`assets`.type_id WHERE ((user_id = 1 and ( entry like '%' or entry 
like '%' ))) ):
  -e:2:in `load'
  -e:2
I had never received any such error message in Rails 2.3.2 (obviously some kind of bug there which caused this to appear as if it were a filter chain termination), but this made it clear what had happened. In the process of upgrading from Rails 1.x to 2.x, I had experimented with using some database constraint logic. It had been a tad problematic and I wanted to focus on the Rails upgrade so I removed those migrations. It was probably at that point that the development database and the production database got out of sync (I haven't yet gone back to my subversion repository to see exactly what/when it happened, but I may do that yet out of curiosity).  I had added a new "type_id" field that was required to support the new version of ActiveScaffold and it had never been added to the production database, even though the schemo_info table contained the same information in both databases (I know, because I checked before I started the whole deploy process).

So what did I learn from this?
  1. Even though there were several other folks looking at this, we all missed the obvious: development and production mode are different not only in the code base but also in the database. I  suffered from a perceptual set when identifying potential sources of the problem (and Paired Programming isn't always a silver bullet). In the future, I'm going to be a little more diligent when exploring potential solution spaces.
  2. Rails migrations, while very nice, are also fragile. I've had problems with them in the past: simply checking that the schema_info values are the same just isn't always enough. I'm going to add migrations and database state to my checklist of things to review before deployments or when making changes.
  3. Rails migrations, while very nice, are also something of a siren song, luring one into complacency about the database. "When they are good they are very, very good, and when they are bad they are horrid." After a long period of no problems, I had assumed (mistake!) that they were under control: in a way, I thought of them as if they were "source code control" for my database. They are not, and I'm going to pay more attention to that in the future.
  4. Rails upgrades are in the same category: they can be remarkably easy, such as my switch from 2.3.2 to 2.3.3 or they can be remarkably messy, as has been my switch from 1.3.6 to 2.3.3. On the one hand, I want to stick with a known version: there are just too many changes that I view as frivolous in the Rails code base to want to go through all the deprecations and just plan different stuff unless it's really necesary. At the same time, we do need to stay reasonably up-to-date to take advantage of the definite improvements such as better plugins and greater security. This is an individual decision, based on your particular requirements. Regardless, YMMV. 
So... one more bug snipped out of my app and that much closer to deployment. Some cleanup and more testing and it should be good to go.

No comments:

Post a Comment