Wednesday, July 29, 2009

Perlish code part deux

Wow. Lots of good comments on my last post. (well, lots of comments by this novice's standards, at any rate.) And, while I'm pretty sure some were tongue-in-cheek, it's obvious that my rather sparse discussion of conciseness was a bit too concise. So, maybe a longer example is in order.

Most programmers (perlers or not) will easily understand the following:
my @data = (
id => 1,
first_name => 'Barney',
last_name => 'Rubble',
age => 33,
gender => 'male',
# more of the same

# Create a string of "first last, first last, ...", sorted by name, for just males
my $names = join ", ", map {
"$_->{first_name} $_->{last_name}"
} sort {
$a->{last_name} cmp $b->{last_name}
|| $a->{first_name} cmp $b->{first_name}
} grep {
$_->{gender} eq 'male'
} @data;

I'm not going to even attempt how to do that in C. Needless to say, I'm betting it's going to be significantly longer than 8 lines. In fact, I'm betting it'll be closer to 10x that. The Java version is likely to be 5x that and likely more if you add in the building and maintenance of the initial data structure.

That is concise Perl. It takes advantage of the native high-level data structures (hashes and arrays) and the operators built for them (sort, map, and grep) to create very easy-to-understand complex code. And, we're not even touching string manipulation, supposedly Perl's greatest strength.

Tuesday, July 28, 2009

What makes code Perlish?

There is a lot of discussion in lots of places about "perlish" code. What perlish code is. What it isn't. But, not a lot about how you can tell the difference.

The first metric to look for is conciseness. How many lines of code does it take to express a concept? For example, swapping a two variables. In most languages, you have to use an explicit temp variable, such as in the following C code:

temp = i;
i = j;
j = temp;

That's three lines to express a single concept. Most imperative languages require the same thing. Perl (and similar languages like Python and Ruby) does it in one.

( $i, $j ) = ( $j, $i );

Two lines doesn't sound like much. But, think about it as a 3-to-1 reduction. That means 300 lines is now 100. Ohhhh.

Monday, July 20, 2009

Converting CDBI to DBIC (part 6): Phase 1

In part 5 of this series, the plan for how we're going to do this conversion was laid out. Now, for some actual working code.

The plan for our migration is going to leverage DBIx::Class::ResultSource's result_class attribute. First, some explanation. Unlike every other ORM I know about, DBIC decouples the operations on a group of rows from inflating those rows. This is the whole resultset thing. So, it only makes sense that you would be able to specify how you want to actually go about inflating the rows returned from a search. And DBIC does exactly that.

So, we have a set of CDBI classes. Let's work with one of them called App::CDBI::Foo. In order to make this work, we're going to want to have a corresponding DBIx::Class::ResultSource object. That resultsource object will be registered with our schema object (q.v. DBIC for more info) as handling stuff for the foo table that App::CDBI::Foo used to manage.

In order to get everything to work, we're going to need to tell that resultsource everything that the CDBI class knows. We're also going to have to inject an inflate_result() method into the CDBI class.

my $source = DBIx::Class::ResultSource::Table->new({});

$source->add_columns( $cdbi->columns );
$source->set_primary_key( $cdbi->primary_columns );

*{ $cdbi . '::inflate_result' } = sub {
my $self = shift;
my ($source, $data, $prefetch) = @_;

return $cdbi->construct( $data );

$source->result_class( $cdbi );

$schema->register_source( $tablename => $source );

And, that's the basic structure. Unfortunately, there are issues. Some of which I'll deal with in later posts, some of which I can't (because they're your problems).
  1. One of the biggest reasons to migrate to DBIC is prefetch. You'll notice that our inflate_result doesn't actually do anything with $prefetch. I'll post a better one later.
  2. add_columns() can take a lot more information than CDBI ever stored. You'll want to populate that information somehow.
  3. $tablename and other variables appeared out of mid-air. You'll want to fix that. :)
  4. This code doesn't actually do anything for things like unique constraints, defining relationships, and the like. You'll want to fix that, too. :)
  5. Unless your code is amazingly clean, you probably have snippets like $obj->search(...); You may need an AUTOLOAD to catch those (for now).
And, later on, I'll show you how to migrate your actual objects to DBIx::Class::Row objects from CDBI.

Wednesday, July 15, 2009

Converting CDBI to DBIC (part 5): The plan- requirements

So, in part 4 of this series, I discussed why CDBICompat just wasn't going to cut it. What I didn't explain in great detail is just why CDBICompat needs to use tied variables (thus causing a nasty slowdown). It goes something like this:
  1. CDBI has pretty poor searching capabilities
  2. CDBI doesn't have prefetch
  3. CDBI doesn't cache very smartly
So, most heavy users of CDBI tend to write their own caching mechanisms. Given that CDBI is a row-centric ORM, these caches are almost always in the row. Given that most of these developers are smart, but under serious time constraints, these caches break encapsulation. So, something like

my @rows = CDBI::Class->search( ... );
foreach my $row ( @rows ) {
$row->{_cache} = $row->expensive_method();

is very normal to see. And very expensive to convert away from. Any changeset that converts over every single one of these encapsulation breakages is going to be too huge to test with any confidence. As the applications we're looking at are large (> 100kLOC) and big moneymakers (often $M's per year), having confidence in the next push to production is key.

So, the conversion plan has to meet the following requirements:
  • allows us to use DBIC's big features - resultsets, prefetch, and SQLA searching.
  • allows us to phase our conversion so that we don't have massive changesets which are impossible to test.
  • doesn't impose any noticeable slowdowns, at least not noticeable by the users
With any other distribution, that would be a tall order. DBIx::Class, however, already has the single feature we need to make this happen. More on this in part 6.

For those who can't wait, I'll give you a hint. Go look at


Monday, July 13, 2009

Converting CDBI to DBIC (part 4): Why not CDBICompat?

This isn't the first time someone has tried to convert from one ORM to another. In fact, DBIC grew out of working with CDBI and needing something better. In the past, ORMs have usually been similar enough in their inner workings that a compatibility layer was enough. The userland could be migrated at some later date, if ever. The cost of the indirection would be nearly nothing.

DBIC, though, is different. So different that a compatibility layer obscures the whole purpose of converting from CDBI to DBIC. The very concept of resultsets breaks the mold. Like, shatters it, stomps all over it, and melts it in a vat of molten iron. The whole point of switching to DBIC is to get access to resultsets. CDBI's API has absolutely no facility to provide access to these features.

Second, you're stuck with CDBI's searching facilities. SQL::Abstract is extremely powerful. So powerful that several CDBI plugins were created to mimic its power. SQL::Abstract v2 will be even more powerful yet. But, CDBICompat cannot expose any of it.

And, worse yet, in order to accommodate the numerous abuses of CDBI that most people did in order to make it usable in larger applications, the compat layer ends up being about 20% slower than either raw DBIC or raw CDBI. In fact, the row object provided by CDBICompat is tied just to make sure things work.

It's the worst of both worlds - you're stuck with CDBI's API (the bad thing you're trying to escape) and a slowdown of your application's code. Given that you're usually trying to speed up your application and improve the ease of writing new features, this doesn't sound like a win.

Next up - the actual solution.

Tuesday, July 7, 2009

Performance vs. Correctness

This would seem to be a truism, but it bears repeating. Bertrand Meyer once said "Correctness is clearly the prime quality. If a system does not do what it is supposed to do, then everything else about it matters little." A lot of people seem to forget that if you get the wrong answer very quickly, it's still the wrong answer. And, a lot of those people are in the OSS and Perl communities. A corollary to that could be "Try and do it right. If you can't guarantee the right answer, then fail very quickly and and even louder."

Why bring this up now? Well, in Perl, performance concerns seem to crop up every so often, and in places where it makes no sense to bring them up. For example, when dealing with crazy date manipulations. DateTime is the de-facto standard for date manipulations in Perl, and for good reason. It is correct and, in most cases, fast. But, its interface is very low level. There is another module that provides different interfaces to dates - Date::Manip. But, even the author acknowledges in the POD that "It's the most powerful of the date modules. It's also the biggest and slowest." And, whenever it's brought up, inevitably someone will say "Yeah, it's pretty heavy."

My answer to that is "So what?" The very first criterion for a computer program is "Does it work?" And, by that I mean "Is it correct?" Once it's correct (and you have a regression suite to validate that), then you measure its performance. If its overall performance is acceptable, then you leave it alone.

Let me repeat - if overall performance is acceptable, then you leave it alone.

If overall performance needs improved, then you profile it. Chances are, the module you think is heavy isn't your bottleneck. Even if it is, it's very likely that a small change to one method will net you 80% (or more!) of your potential speed improvements.

Pick modules based on features. Then, if you need it, optimize after profiling. And, module authors, don't refuse to fix a bug because it would hurt performance. If your module is wrong, then it needs to be fixed.