May 31

I’ve been searching for the best way to normalize an argument list to a string, such that two argument lists convert to the same string iff they are equivalent. My ideal algorithm would

  1. Compare embedded hashes and lists deeply, rather than by reference
  2. Ignore hash key order
  3. Ignore difference between 3 and “3″
  4. Generate a relatively readable string
  5. Perform well (XS preferred over Perl)

This is necessary for memoizing a function, or for caching a web page with query arguments.

As a strawman example, Memoize uses this as a default normalizer, which fails #1 and #3:

$argstr = join chr(28),@_;  

The best candidate I’ve found to date is

JSON::XS->new->utf8->canonical

as it is fast, readable, and hash-key-order agnostic. CHI uses this to generate keys from arbitrary references.

However, JSON::XS treats the number 3 and the string “3″ differently, based on how the scalar was used recently. This can generate different strings for essentially equivalent argument lists and reduce the memoization effect. (The vast majority of functions won’t know or care if they get 3 or “3″.)

For fun I looked at a bunch of serializers to see which ones differentiate 3 and “3″:

Data::Dump   : equal - [3] vs [3]
Data::Dumper : not equal - [3] vs ['3']
FreezeThaw   : equal - FrT;@1|@1|$1|3 vs FrT;@1|@1|$1|3
JSON::PP     : not equal - [3] vs ["3"]
JSON::XS     : not equal - [3] vs ["3"]
Storable     : not equal - <unprintable>
YAML         : equal - ---n- 3n vs ---n- 3n
YAML::Syck   : equal - --- n- 3n vs --- n- 3n
YAML::XS     : not equal - ---n- 3n vs ---n- '3'n

It seems in general like the more sophisticated modules make this differentiation, perhaps because it is more “correct”, though it is the opposite of what I want in this case. :) Of the ones that report “equal”, not sure how to get them to ignore hash key order.

I could walk the argument list beforehand and stringify all numbers, but this would require making a deep copy and would violate #5.

If I find a great result that requires more than a few lines of code, I’ll stick it in CPAN, e.g. Params::Normalize.

May 06

Memoization is a technique for optimizing a function over repeated calls. When you call the function, the return value is cached (based on the arguments passed) before being returned to you. Next time you call the function with the same arguments, you’ll get the value back immediately.

Memoize is the standard Perl memoization solution and after twelve+ years still works well in the common case. However, since Perl caching support has come a long way, and memoization is just a specific form of caching, I wanted to try pairing memoization with modern cache features. Hence, CHI::Memoize.

Here are some of the nice features that came out of this:

  • The ability to cache to any of CHI’s other backends. e.g.
        memoize( 'func', driver => 'File', root_dir => '/path/to/cache' );
        memoize( 'func', driver => 'Memcached', servers => ["127.0.0.1:11211"] );
    
  • The ability to expire memoized values based on time or a condition. e.g.

        memoize( 'func', expires_in => '1h' );
        memoize( 'func', expire_if => sub { ... } );
    
  • A better key normalizer. Memoize just joins the keys into a string, which doesn’t work for references/undef and can generate multiple keys for the same hash. In contrast, C relies on CHI’s automatic serialization of non-scalar keys. So these will be memoized together:

        memoized_function( a => 5, b => 6, c => { d => 7, e => 8 } );
        memoized_function( b => 6, c => { e => 8, d => 7 }, a => 5 );
    

    and it’s easy to specify your own key, e.g. memoize on just the second and third arguments:

        memoize( 'func', key => sub { $_[1], $_[2] } );
    

Subsets of these features were already available in separate distributions, but not all in one place.

Now that this available I’m curious to see where I use it in place of the traditional get-and-set cache pattern.

preload preload preload