Parsing XML is expensive

Using tests to identify performance issues

Posted by John Thomas 3-Aug-2016

These days, testing your code is easy. Tools like RSpec are mature, giving you little excuse to not having good test coverage over your code. (And don't forget that TDD is sexy...) Most people find that tests can improve the design of the code, and that tests give visibility and feedback nearly instantaneously. Today I found another great benefit of unit tests: identifying performance bottlenecks.

It's easy to get lost in building new features and meeting deadlines, while forgetting to benchmark code. I was running my RSpec test suite the other day, and noticed some rather sluggish behavior. These tests where entirely composed of unit tests; no integration tests where there slowing down performance. Annoyed with the length of my tests, I dug deeper into which tests were taking too much time. I noticed a couple things:

1) My tests were not written well :).

I had a lot of tests that were there solely to validate certain data transformations. Transformations that would be done by a single method. My tests looked like this:

context '#transform_data' do

  before do
    @my_data = MyDataObject.new
    @my_data.transform_data
  end

  it 'sets key1' do
    expect(@my_data["key1"]).to_no eq(nil)
  end

  ...

  it 'sets key2' do
    expect(@my_data["key2"]).to_no eq(nil)
  end


end

Since I was doing validation in all of my 'it' assertions, and no transformation, I could have used a before(:all) block to call the #transform_data() method only once. That would have made my test suite run much faster, but at the time I didn't care, as my tests were "dry".

2) The #transform_data() funciton was slow.

The fact that I was running #transform_data() on every assertions actually allowed me to recognize how slow that method was. Looking into this method, I could see that it was basically just parsing XML and then transforming the data into a modified structure. Although I thought I knew the culprit, I ran some profilers to see what was taking so long. Sure enough, it was the XML parser. Up to this point I was parsing XML using the standard Hash.from_xml() method. Needless to say, I was able to replace that method with a much more efficient XML to Hash parser (XmlHasher.) After I made the change, the method was about 3 times faster, my test suite ran much faster, and I was a happy coder again.

The interesting part about this is that the Hash.from_xml() method only took a couple hundred milliseconds to complete (this was a rather large XML string.) And there was almost no way that I would have noticed a 100ms slowdown without running a profiler on my code (something I probably should have done.) It was actually due to my inefficient tests that I was able to notice this potential performance issue.