Friday, August 28, 2009

Snipping the blight... fantastic!

I've been fighting an elusive bug for the past few weeks: I've developed some major enhancements for my passwords site plus upgrading to more recent versions of Rails and associated plugins (improved security and testability being my primary goal).  Things were going along fine until I deployed my app to production: I was able to login without errors, but was then immediately returned to the login screen. Needless to say, the Capistrano deploy:rollback task was just what the doctor ordered (that's why I love Cap: it took a bit to get it setup originally, but it really makes deploying or undeploying applications a snap - if you have an app in production and you don't have Capistrano or something similar, do it now).

Anyway, I setup up a production environment on my development machine and, sure enough, it failed in production mode but ran fine in development mode. I modified the production settings so that I would get error messages [Note to self: put it back like it was!] and got a message saying that a module in the filter chain had returned false... it looked as if it were failing authentication somewhere in the login system.

I started digging into the code and using the debugger to see what was going on: from the symptoms and the error message, it seemed to be a problem with session state. Checking with my hosting tech support and local Rails folks, I learned there were reports of problems with sessions in Rails 2.3.2 and one of the local chaps, Brendon Whateley, reported some serious session issues that he had and chased the code back to Rails initialization and lots of 'magic' aliasing and overriding of ActiveRecord methods (nice stuff when it works). To quote:

"Having spent the best part of a week debugging the sudden failure of sessions recently, I may be able to help.  The first thing to figure out is if the session is saving state and not loading, or loading but not saving.  In my case the ActiveRecordStore was not serializing the session data, but was loading and saving the session object.  This meant that previously set session values still existed, but any changes did not persist across calls.  I was able to validate that the logins were successful, but suffered similar issues to you.

"In the end I tracked the problem down to overriding the ActiveRecord::Base.create_or_update method.  For some reason it completely broke ActiveSessionStore serialization, even if the redefined method contained the exact same code as the original.  I never completely figured out WHY that broke, but if I moved my changes to create and update methods that create_or_update called, I no longer experienced that problem."


Made sense to me, and since I am using some of the ActiveRecord callbacks, it seemed quite reasonable. So off I went, digging thru my code, researching how the callbacks were used, trying to find out if anyone else was using them, more debugging... all of which went nowhere.

Yesterday, I bit the bullet and upgraded to Rails 2.3.3. Very easy; just do
  gem install rails 
and try it again in production mode.

You can imagine my surprise when all of a sudden, the app crashed and emitted the following error messge:

ActiveRecord::StatementInvalid (Mysql::Error: Unknown column 
'assets.type_id' in 'on clause': SELECT count(DISTINCT `assets`.id) AS 
count_all FROM `assets`  LEFT OUTER JOIN `types` ON `types`.id = 
`assets`.type_id WHERE ((user_id = 1 and ( entry like '%' or entry 
like '%' ))) ):
  -e:2:in `load'
  -e:2
I had never received any such error message in Rails 2.3.2 (obviously some kind of bug there which caused this to appear as if it were a filter chain termination), but this made it clear what had happened. In the process of upgrading from Rails 1.x to 2.x, I had experimented with using some database constraint logic. It had been a tad problematic and I wanted to focus on the Rails upgrade so I removed those migrations. It was probably at that point that the development database and the production database got out of sync (I haven't yet gone back to my subversion repository to see exactly what/when it happened, but I may do that yet out of curiosity).  I had added a new "type_id" field that was required to support the new version of ActiveScaffold and it had never been added to the production database, even though the schemo_info table contained the same information in both databases (I know, because I checked before I started the whole deploy process).

So what did I learn from this?
  1. Even though there were several other folks looking at this, we all missed the obvious: development and production mode are different not only in the code base but also in the database. I  suffered from a perceptual set when identifying potential sources of the problem (and Paired Programming isn't always a silver bullet). In the future, I'm going to be a little more diligent when exploring potential solution spaces.
  2. Rails migrations, while very nice, are also fragile. I've had problems with them in the past: simply checking that the schema_info values are the same just isn't always enough. I'm going to add migrations and database state to my checklist of things to review before deployments or when making changes.
  3. Rails migrations, while very nice, are also something of a siren song, luring one into complacency about the database. "When they are good they are very, very good, and when they are bad they are horrid." After a long period of no problems, I had assumed (mistake!) that they were under control: in a way, I thought of them as if they were "source code control" for my database. They are not, and I'm going to pay more attention to that in the future.
  4. Rails upgrades are in the same category: they can be remarkably easy, such as my switch from 2.3.2 to 2.3.3 or they can be remarkably messy, as has been my switch from 1.3.6 to 2.3.3. On the one hand, I want to stick with a known version: there are just too many changes that I view as frivolous in the Rails code base to want to go through all the deprecations and just plan different stuff unless it's really necesary. At the same time, we do need to stay reasonably up-to-date to take advantage of the definite improvements such as better plugins and greater security. This is an individual decision, based on your particular requirements. Regardless, YMMV. 
So... one more bug snipped out of my app and that much closer to deployment. Some cleanup and more testing and it should be good to go.

Wednesday, August 26, 2009

Learning RSpec/Cucumber - Day 1

I've wanted to improve my testing for some time now and so I'm pairing on the RSpec book with my friend, Allan Miller. We're going to work through the book/examples in tandem and get together regularly to review what we've learned. We just downloaded the RSpec book by Pragmatic and assigned Chapters 1-3 (this is the book that was highly recommended by our Ruby Meetup folks). We'll work through this and then meet in a couple of days to see how things went and move on to the next group of chapters.

For now, I'm running all this on my MacBook, without any IDE... just straight terminal access and MacVIM. Chapters 1 & 2 are introduction; chapter 3 is where you start actually doing stuff.

First problem (minor) that I ran into was that the rspec installation wasn't clean; I got the following errors:


Successfully installed rspec-1.2.8

1 gem installed

Installing ri documentation for rspec-1.2.8...

Installing RDoc documentation for rspec-1.2.8...

Could not find main page README.rdoc

Could not find main page README.rdoc

Could not find main page README.rdoc

Could not find main page README.rdoc


So far, this doesn't seem to have affected anything -- I just ignored it and moved on (so far, I don't use the ri or RDoc information at all).

The cucumber installation was a little more disconcerting:

Installing ri documentation for builder-2.1.2...

ERROR: While generating documentation for builder-2.1.2

... MESSAGE: Unhandled special: Special: type=17, text=""

... RDOC args: --ri --op /Library/Ruby/Gems/1.8/doc/builder-2.1.2/ri --title Builder -- Easy XML Building --main README --line-numbers --quiet lib CHANGES Rakefile README doc/releases/builder-1.2.4.rdoc doc/releases/builder-2.0.0.rdoc doc/releases/builder-2.1.1.rdoc --title builder-2.1.2 Documentation

(continuing with the rest of the installation)


but again, it was only documentation so I moved on from here as well (have to pick your battles, you know).

Everything went just fine, except for a couple of glitches: one was just some missing default snippets and the other was a result of those missing snippets. Here's what happened:
  1. The first attempt at cucumber did NOT display the snippets for the 3 undefined steps (pp 27-28 in my beta copy of the book) as expected. Rather than being able to copy and modify them I had to type them in by hand. [I tried the command twice just to make sure.. sure enough, didn’t display, although a later point where I had only one step undefined, it DID display the snippet template. Go figure.]
  2. Because of #1, I made a transcription error in the regexp for step 3: I omitted the ‘^’ exclusion character. This cost me several minutes (regexp’s are like that!). I have a feeling that there may be more of these... hopefully I’ll be able to get the step snippets more reliably in the future and then just copy them into my step definitions as a starting point.
All in all, a successful beginning... But I do have a concern -- given that we're using regexp's, there are MANY opportunities for small errors that can cost time. Seems like a bit of a contradiction: cucumber is a "higher level" testing language than rspec, but it uses one of the lowest level syntaxes to accomplish this.

I'll just have to wait and see what this leads to... and I'm sure going to copy/paste those snippets whenever I can.

Pair programming really works!

I've known this in the past and I had another "Aha!" moment the other day: pair programming (or debugging in this case) is great!

I've had a problem for the past week or so in one of my applications. I'm in the middle of a big upgrade (from Rails 1.3.6 to Rails 2.3.2 and lots of other associated changes), when I hit a hard stop; my screen wasn't getting updated correctly after an Ajax call.

I struggled with this for quite some time and then today called Allan Miller, a good friend of mine over to help me debug it. He sat down, I described he problem and demonstrated it. He said "Well, let's take a look and make sure that you're actually getting an XHR request." so we fired up FireBug and entered a couple of trials: the POST request was properly executed and my application did see it as an XHR request. Everything's working as exepcted so far, but no errors and no screen refresh.

Since he had been focusing on the Ajax call, I decided to show him the code on the page, and used FireBug's feature to "click an element to inspect" and pointed out the code [line breaks inserted for readability; this is all one line of code]:
<a class="ltr-locate" onclick="new Ajax.Updater('asset-content', '/asset/get_assets?ltr=F',
{asynchronous:false, evalScripts:true, parameters:'authenticity_token=
' + encodeURIComponent('xxx')}); return false;" href="/asset/get_assets" align="right">F</a>

"See" I said. "There's the AjaxUpdater call to update the... ... that's it!" The first field in the AjaxUpdater call signifies the named area on the page that will be updated as a result of the call. In this case 'asset-content'. "I'll bet that they changed the name in the intervening releases" I said.

Sure enough, that was the case: the name name is 'as_asset-content' and updating the link_to_remote call solved the problem. What a relief!

I've got 3 take-aways from this:
  1. Two heads are at least 10 times better than 1, if you compare 50 total minutes of work (25 minutes times 2 heads) with at least 10 hours of work with my one head. There's just something magical about having to REALLY explain your code to an outside observer that makes you think thru things much more carefully. It's making the thinking process external, rather than internal where your hidden assumptions come into play without your even knowing it.
  2. The AjaxUpdater call gives NO ERROR if the named area doesn't exist; it just quietly closes up shop, assuming that you knew what you were doing.
  3. I was reminded how great FireBug is for this kind of thing. Quick and easy to see that the Ajax call was firing with the right information so that we could move on to the next question.
Extrapolating that out, I would imagine that -- had there beeen two folks in the seat from the beginning -- this whole thing wouldn't have taken so long... and the code would probably be better for it.

Reminds me of the old "Double your pleasure..." chewing gum ad, only in this case it's really true.

Wednesday, August 19, 2009

Interesting svn "gotcha"... II

Turns out, there's any easy fix for this situation (see Interesting svn 'gotcha'). All you have to do is:
  1. Checkout your entire project into a different directory using your command line client.
  2. Run the script/plugin install... command on that working copy.
  3. Checkin the updates you just made.
  4. Update the IDE working copy to the latest from repository.
Easy; problem solved...

Monday, August 17, 2009

Interesting svn "gotcha"...

I just learned that you have to be careful about which version of subversion you are using if you have an IDE and also want to use the command-line client.

Starting about V1.4, there were changes in the internal structure of the subversion directory hierarchy (i.e., the .svn files that it creates to track things) which means that you have to make sure that your versions are in synch.

For example, I'm using Aptana Studio, which is eclipse-based, and which uses the latest subclipse client (subversion for Eclipse). That latest version is V1.6. However, on my Kubuntu development box (V8.04/Hardy Heron), the latest subversion package right now is V1.4. If I try to run a subversion command from the shell, I get this error message:
jseidel@EDP15:~/AptanaStudio/4MyPasswords$ svn status
svn: This client is too old to work with working copy '.'. You need
to get a newer Subversion client, or to downgrade this working copy.
See http://subversion.tigris.org/faq.html#working-copy-format-change
for details.

The help link above will explain that -- due to new features in the later versions -- they had to introduce new structure which is incompatible with the older clients.

Since I don't want to downgrade my working copy, I went looking for V1.6 command-line clients. Sad to say, they aren't packaged (yet) for Kubuntu/Hardy --so far I only found it for the Karmic version of Ubuntu.

There are other svn clients out there, either as wrappers around the command line client, or full-on clients. One that I looked at was kdesvn. This looked to be a nice client and was at V1.6. Unfortunately, the developer only provides a Fedora package, so I would have to build my own from source.

I went to the subversion site, but the referenced Ubuntu binaries only pointed back to the Ubuntu site and the older version, so I tried building it from source. Bad move! As stated on their site, there are a number of dependencies which I started to work through one at a time. When I reached a point of having to figure this out:
jseidel@EDP15:~/downloads/subversion-1.6.4/subversion-1.6.4$ apr/buildconf
buildcheck: checking installation...

buildcheck: autoconf version 2.61 (ok)

buildcheck: autoheader version 2.61 (ok)

buildcheck: libtool version 1.5.26 (ok)

buildcheck: local copy of find_apr.m4 does not match APR's copy.

An updated copy of find_apr.m4 may need to be checked in.
buildcheck: local copy of PrintPath does not match APR's copy.

An updated copy of PrintPath may need to be checked in.
Copying libtool helper files ...

buildconf: Using libtool.m4 at /usr/share/aclocal/libtool.m4.

Creating include/arch/unix/apr_private.h.in ...

Creating configure ...

configure.ac:185: error: possibly undefined macro: AC_PROG_LIBTOOL

If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
Generating 'make' outputs ...

apr/buildconf: 91: build/gen-build.py: not found rebuilding rpm spec file

apr/buildconf: 107: build/get-version.sh: not found

cat: ./build/rpm/apr.spec.in: No such file or directory


I quit trying: I want to work on my application, not have to learn THIS much about the internals of the tool I want to use... Argh! For now, I'm sticking with the Aptana support which is working (with some help from the Aptana support folks), although I do miss the ability to do quick svn commands from the shell.

I think Linux needs to come up with a better installation mechanism...