Monday, August 17, 2009

Falling off the Ironman wagon

*sighs* I've fallen off the wagon. I knew I should have posted last week, but I didn't. Blogging is much harder than I thought. If you know me at all, you know that I have at least one opinion about nearly every topic in the word. I love discussing pretty much every topic, particularly if there are more than two options and it's impossible to determine which is better in any given situation. Those kinds of indeterminate discussions are absolutely fun.

Yet, here we are - I have people actually reading this blog and I can't actually write something every 8 days. I think it's because people aren't telling me how wrong I am on a regular basis. :)

Thursday, August 6, 2009

The first "usable" release of P6

Patrick Michaud just discussed the idea of releasing Rakudo, the Perl6 implementation bootstrapped in Perl6 targeting Parrot. He makes a very interesting point that language development doesn't go for a static target. Instead, it's very evolutionary. And, so, his goal is to have a "usable" and "useful" release of Perl 6 by Spring 2010 (April-ish).

So, what does this mean? Well, in my mind, it means a few things.

The first is that we, as a community, have been thinking we're going to get the most amazing thing in the world from the get-go. And, in retrospect, it's rather obvious that this just isn't reasonable. The whole point of agile development (and, believe me, P6 development is nothing if not agile!) is that you "release early, release often".

The second is that, frankly, a well-supported feature-light Perl 6 now is much better than a well-supported feature-complete Perl 6 released right before Duke Nuke'em Forever. I want to have some frozen API in my grubby hands before I spend my evenings away from my family and hacking on CP6AN stuff.

And, lastly, we need to remember our history. Perl had been around for 10 years before objects and sane packaging were supported at all. Threads? Unicode? Both became stable with 5.8.0 - that's barely five years ago. Moose is less than three years old. DBIx::Class and Catalyst aren't much older. And, frankly, we drove the development of every one of those features. As a dev, I want users to tell me what they want to use. The P6 devs aren't any different.

So, instead of griping that a feature-light P6 will be available, start thinking about how you can (ab)use those P6 features that will be available (and we should have that list in the next month or two). Maybe we should start voting for our favorite features and see what we can do to help get them into the first production-ready Rakudo release. Go Team!

Wednesday, July 29, 2009

Perlish code part deux

Wow. Lots of good comments on my last post. (well, lots of comments by this novice's standards, at any rate.) And, while I'm pretty sure some were tongue-in-cheek, it's obvious that my rather sparse discussion of conciseness was a bit too concise. So, maybe a longer example is in order.

Most programmers (perlers or not) will easily understand the following:
my @data = (
{
id => 1,
first_name => 'Barney',
last_name => 'Rubble',
age => 33,
gender => 'male',
},
# more of the same
);

# Create a string of "first last, first last, ...", sorted by name, for just males
my $names = join ", ", map {
"$_->{first_name} $_->{last_name}"
} sort {
$a->{last_name} cmp $b->{last_name}
|| $a->{first_name} cmp $b->{first_name}
} grep {
$_->{gender} eq 'male'
} @data;

I'm not going to even attempt how to do that in C. Needless to say, I'm betting it's going to be significantly longer than 8 lines. In fact, I'm betting it'll be closer to 10x that. The Java version is likely to be 5x that and likely more if you add in the building and maintenance of the initial data structure.

That is concise Perl. It takes advantage of the native high-level data structures (hashes and arrays) and the operators built for them (sort, map, and grep) to create very easy-to-understand complex code. And, we're not even touching string manipulation, supposedly Perl's greatest strength.

Tuesday, July 28, 2009

What makes code Perlish?

There is a lot of discussion in lots of places about "perlish" code. What perlish code is. What it isn't. But, not a lot about how you can tell the difference.

The first metric to look for is conciseness. How many lines of code does it take to express a concept? For example, swapping a two variables. In most languages, you have to use an explicit temp variable, such as in the following C code:

temp = i;
i = j;
j = temp;

That's three lines to express a single concept. Most imperative languages require the same thing. Perl (and similar languages like Python and Ruby) does it in one.

( $i, $j ) = ( $j, $i );

Two lines doesn't sound like much. But, think about it as a 3-to-1 reduction. That means 300 lines is now 100. Ohhhh.

Monday, July 20, 2009

Converting CDBI to DBIC (part 6): Phase 1

In part 5 of this series, the plan for how we're going to do this conversion was laid out. Now, for some actual working code.

The plan for our migration is going to leverage DBIx::Class::ResultSource's result_class attribute. First, some explanation. Unlike every other ORM I know about, DBIC decouples the operations on a group of rows from inflating those rows. This is the whole resultset thing. So, it only makes sense that you would be able to specify how you want to actually go about inflating the rows returned from a search. And DBIC does exactly that.

So, we have a set of CDBI classes. Let's work with one of them called App::CDBI::Foo. In order to make this work, we're going to want to have a corresponding DBIx::Class::ResultSource object. That resultsource object will be registered with our schema object (q.v. DBIC for more info) as handling stuff for the foo table that App::CDBI::Foo used to manage.

In order to get everything to work, we're going to need to tell that resultsource everything that the CDBI class knows. We're also going to have to inject an inflate_result() method into the CDBI class.

my $source = DBIx::Class::ResultSource::Table->new({});

$source->add_columns( $cdbi->columns );
$source->set_primary_key( $cdbi->primary_columns );

*{ $cdbi . '::inflate_result' } = sub {
my $self = shift;
my ($source, $data, $prefetch) = @_;

return $cdbi->construct( $data );
};

$source->result_class( $cdbi );

$schema->register_source( $tablename => $source );

And, that's the basic structure. Unfortunately, there are issues. Some of which I'll deal with in later posts, some of which I can't (because they're your problems).
  1. One of the biggest reasons to migrate to DBIC is prefetch. You'll notice that our inflate_result doesn't actually do anything with $prefetch. I'll post a better one later.
  2. add_columns() can take a lot more information than CDBI ever stored. You'll want to populate that information somehow.
  3. $tablename and other variables appeared out of mid-air. You'll want to fix that. :)
  4. This code doesn't actually do anything for things like unique constraints, defining relationships, and the like. You'll want to fix that, too. :)
  5. Unless your code is amazingly clean, you probably have snippets like $obj->search(...); You may need an AUTOLOAD to catch those (for now).
And, later on, I'll show you how to migrate your actual objects to DBIx::Class::Row objects from CDBI.

Wednesday, July 15, 2009

Converting CDBI to DBIC (part 5): The plan- requirements

So, in part 4 of this series, I discussed why CDBICompat just wasn't going to cut it. What I didn't explain in great detail is just why CDBICompat needs to use tied variables (thus causing a nasty slowdown). It goes something like this:
  1. CDBI has pretty poor searching capabilities
  2. CDBI doesn't have prefetch
  3. CDBI doesn't cache very smartly
So, most heavy users of CDBI tend to write their own caching mechanisms. Given that CDBI is a row-centric ORM, these caches are almost always in the row. Given that most of these developers are smart, but under serious time constraints, these caches break encapsulation. So, something like

my @rows = CDBI::Class->search( ... );
foreach my $row ( @rows ) {
$row->{_cache} = $row->expensive_method();
}

is very normal to see. And very expensive to convert away from. Any changeset that converts over every single one of these encapsulation breakages is going to be too huge to test with any confidence. As the applications we're looking at are large (> 100kLOC) and big moneymakers (often $M's per year), having confidence in the next push to production is key.

So, the conversion plan has to meet the following requirements:
  • allows us to use DBIC's big features - resultsets, prefetch, and SQLA searching.
  • allows us to phase our conversion so that we don't have massive changesets which are impossible to test.
  • doesn't impose any noticeable slowdowns, at least not noticeable by the users
With any other distribution, that would be a tall order. DBIx::Class, however, already has the single feature we need to make this happen. More on this in part 6.

For those who can't wait, I'll give you a hint. Go look at

DBIx::Class::ResultClass::HashRefInflator.

Monday, July 13, 2009

Converting CDBI to DBIC (part 4): Why not CDBICompat?

This isn't the first time someone has tried to convert from one ORM to another. In fact, DBIC grew out of working with CDBI and needing something better. In the past, ORMs have usually been similar enough in their inner workings that a compatibility layer was enough. The userland could be migrated at some later date, if ever. The cost of the indirection would be nearly nothing.

DBIC, though, is different. So different that a compatibility layer obscures the whole purpose of converting from CDBI to DBIC. The very concept of resultsets breaks the mold. Like, shatters it, stomps all over it, and melts it in a vat of molten iron. The whole point of switching to DBIC is to get access to resultsets. CDBI's API has absolutely no facility to provide access to these features.

Second, you're stuck with CDBI's searching facilities. SQL::Abstract is extremely powerful. So powerful that several CDBI plugins were created to mimic its power. SQL::Abstract v2 will be even more powerful yet. But, CDBICompat cannot expose any of it.

And, worse yet, in order to accommodate the numerous abuses of CDBI that most people did in order to make it usable in larger applications, the compat layer ends up being about 20% slower than either raw DBIC or raw CDBI. In fact, the row object provided by CDBICompat is tied just to make sure things work.

It's the worst of both worlds - you're stuck with CDBI's API (the bad thing you're trying to escape) and a slowdown of your application's code. Given that you're usually trying to speed up your application and improve the ease of writing new features, this doesn't sound like a win.

Next up - the actual solution.

Tuesday, July 7, 2009

Performance vs. Correctness

This would seem to be a truism, but it bears repeating. Bertrand Meyer once said "Correctness is clearly the prime quality. If a system does not do what it is supposed to do, then everything else about it matters little." A lot of people seem to forget that if you get the wrong answer very quickly, it's still the wrong answer. And, a lot of those people are in the OSS and Perl communities. A corollary to that could be "Try and do it right. If you can't guarantee the right answer, then fail very quickly and and even louder."

Why bring this up now? Well, in Perl, performance concerns seem to crop up every so often, and in places where it makes no sense to bring them up. For example, when dealing with crazy date manipulations. DateTime is the de-facto standard for date manipulations in Perl, and for good reason. It is correct and, in most cases, fast. But, its interface is very low level. There is another module that provides different interfaces to dates - Date::Manip. But, even the author acknowledges in the POD that "It's the most powerful of the date modules. It's also the biggest and slowest." And, whenever it's brought up, inevitably someone will say "Yeah, it's pretty heavy."

My answer to that is "So what?" The very first criterion for a computer program is "Does it work?" And, by that I mean "Is it correct?" Once it's correct (and you have a regression suite to validate that), then you measure its performance. If its overall performance is acceptable, then you leave it alone.

Let me repeat - if overall performance is acceptable, then you leave it alone.

If overall performance needs improved, then you profile it. Chances are, the module you think is heavy isn't your bottleneck. Even if it is, it's very likely that a small change to one method will net you 80% (or more!) of your potential speed improvements.

Pick modules based on features. Then, if you need it, optimize after profiling. And, module authors, don't refuse to fix a bug because it would hurt performance. If your module is wrong, then it needs to be fixed.

Sunday, June 28, 2009

Converting CDBI to DBIC (part 3): Why C3 is important

If you want to override any of the methods in CDBI, you have to inherit from CDBI, then override using standard OO methods. With any large application, you end up with the explosion of classes in your hierarchy. More than any other hierarchy, the ORM hierarchy really cries out for roles (or traits). Enough of your tables need to be treated the same in some places, but different in others, that you want to be able to say "for classes A, B, and C, their create() is overrided as so, but A and B have their update() overrided as so, but B and C have their delete() overrided as so". With standard inheritance, it's almost impossible to do that and make it comprehensible for the next developer.

DBIC, on the other hand, allows you to define capabilities using components. By providing a traits-like solution, you can easily extend the behavior of your Rows and ResultSets in ways that can cross class hierarchy lines. In other words, it's sane multiple-inheritance-like behavior for a very common use-case of multiple inheritance. In our CDBI example above, you would have three components - one each for the create, update, and delete overrides. A, B, and C would all use the create override, A and B use the update override, and B and C use the delete override. Given that each one is likely to be independent (not related to each other), the code becomes more self-documenting.

Saturday, June 27, 2009

Converting CDBI to DBIC (part 2): The supposed similarities

(You can catch Part 1 for the context)

The first thing you notice when comparing CDBI and DBIC is that the APIs look really really similar. They both have the same method names for almost everything most people use them for. So, you'd think that it's as easy as swapping out use statements, adding a few lines to the table definitions, and calling it a day.

And, unless your application is so exceedingly simple that you were able to keep to the officially published CDBI API, not a single test will pass. (You do have tests, right?) Every problem arises out of the need to have abused CDBI in order to get work done.

Class::DBI is built upon Ima::DBI, a connection caching and SQL management distribution. Every method provided by CDBI is built using Ima::DBI's set_sql() and transform_sql() methods. These methods, while pretty neat, are very hard to extend because they use string transformation.

DBIx::Class, on the other hand, is built on three major concepts that CDBI doesn't have:
  • everything is componentized using C3 method resolution
  • SQL::Abstract to generate the SQL
  • first-class distinction between ResultSets and Rows
These three differences mean that what should, in theory, be a simple conversion between two modules that expose similar APIs becomes a much more difficult thing to do. Over the next few posts, I'll examine why each of these differences is important and how each one complicates the conversion process.

Friday, June 26, 2009

Converting CDBI to DBIC (part 1): The context

For years, Class::DBI was the gold standard ORM in the Perl community, and for good reason. It was simple to deploy, easy to use, and, for the most part, dwim'ed. Oh, we Perlers love our dwimmery. If it doesn't dwim, then we get pissy.

Now, CDBI wasn't designed like Perl. Perl has always been built to make the hard things easy and the impossible merely hard. CDBI, on the other hand, was designed with the 80/20 rule in mind - make as many of the things most people do every day extremely easy. And, for everything else, there's always an easy way to get to use raw SQL. And this was, for most people, good enough. It easily supported rapid prototyping and small applications and, while slighly annoying on the edges, it worked.

But, as applications are wont to do, some of those applications grew up. They didn't stay small or fade away. The schemas grew from 5 or 10 tables to monstrosities of 250 tables or more, many having dozens of columns. The codebases weighed in at well over hundreds of thousands of lines of code. And, instead of building the code, the focus became maintenance. And, those things that were rare in the past became more common. Not relatively more common, but appeared more often. Instead of 3 or 4 dips into raw SQL, there were 90 or 100. And, wow, was scaling an issue.

Many ORMs were built to try and take its place. Lots of good work was done, but the gold standard has seemed to settle around to DBIx::Class, and for good reason. DBIC maintains that 80/20 design philosophy, but manages to marry it to the Perl philosophy of making hard things easy.

Unlike CDBI, DBIC uses SQL::Abstract for its SQL generation, meaning less dips into raw SQL. Scaling is saner because lessons were learned. Yet, all the really easy things are still really easy. Rapid prototyping is still rapid.

And, this is why we want to convert.

(Follow on in Part 2)