Sep 21

Mason finally has a tidier: masontidy. It runs perltidy on the various sections of Perl code embedded in Mason components. It works on both Mason 1 and Mason 2 syntax.

One nice trick is that it indents %-lines relative to each other, regardless of intervening content. e.g.

    % foreach my $article (@articles) {
    %     my $title = $article->title;
    <% $title %>
    %     if ( $article->has_related_links ) {
    %         foreach my $link ( $article->related_links ) {

Given my devotion to code tidying in general, this ought to have been written a long time ago, but the perfect became the enemy of the good. I always imagined a tidier that would reformat the HTML content simultaneously, so that for example the <li> would be indented inside the <ul> above.

This turns out to be difficult and fraught with edge cases. What if an HTML tag is generated inside Perl code and can’t be seen by the tidier? What about embedded javascript and CSS? Every time I encountered these problems I’d shelve the project.

In the end I decided half a solution is better than none. The current masontidy doesn’t attempt to tidy the HTML or non-Perl content. Perhaps someday I’ll figure out how to do it, but the current tool is still a big improvement.

Sep 05

Tools like perltidy and perlcritic are great for cleaning and validating code, but how do you ensure they get run consistently?

One way is to add an enforcement check at commit time. That is, don’t allow a commit unless the code is tidied and valid.

This prevents the “alternating tidy commit” phenomenon, as in

swartz> git log --pretty=oneline 
5cbbf6c    oops, run perltidy again
055a43a    specify append return value
d26a2c0    run perltidy
b0a97f8    add stats
2cb0a4b    run perltidy
de34b12    fix bug

and eliminates useless stylistic differences between revisions of files.

Using hooks

The latest tidyall distribution contains hooks for running tidyall whenever you commit/push to svn or git. If a file has not been tidied or is deemed invalid, then
the operation is aborted and you must fix the problem before retrying.

In each case, you should commit a tidyall.ini file at the top of your project specifying
which tidiers/validators to apply to which files.


A subversion pre-commit hook. In hooks/pre-commit in your svn repo:

use Code::TidyAll::SVN::Precommit;
use Log::Any::Adapter (File => "/path/to/hooks/logs/tidyall.log");
use strict;
use warnings;


and then

% svn commit -m "fixups" CHI/ 
Sending        CHI/
Transmitting file data ..svn: Commit failed (details follow):
svn: Commit blocked by pre-commit hook (exit code 255) with output:
2 files did not pass tidyall check
lib/ *** 'PerlTidy': needs tidying
lib/CHI/ *** 'PerlCritic': Code before strictures are enabled
  at /tmp/Code-TidyAll-0e6K/ line 2

In an emergency the hook can be bypassed by prefixing the comment with “NO
TIDYALL”, e.g.

% svn commit -m "NO TIDYALL - this is an emergency!" CHI/ 


A git pre-commit hook. In .git/hooks/pre-commit:

use Code::TidyAll::Git::Precommit;
use strict;
use warnings;


and then

% git commit -m "fixups" CHI/ 
2 files did not pass tidyall check
lib/ *** 'PerlTidy': needs tidying
lib/CHI/ *** 'PerlCritic': Code before strictures are enabled

In an emergency the hook can be bypassed by passing –no-verify:

% git commit --no-verify ...

This hook must be explicitly placed in every copy of the repo, although you can partially automate this process. There is (unlike the other two hooks here) no way to require or enforce that the hook is in place, so it may not be ideal for large groups.


A git pre-receive hook, which runs whenever something is pushed to a repo. You might use this to validate pushes from multiple developers to a shared repo on a remote server.

In .git/hooks/pre-receive in the shared repo:

use Code::TidyAll::Git::Prereceive;
use strict;
use warnings;


and then

% git push
Counting objects: 9, done.
remote: [checked] lib/CHI/        
remote: Code before strictures are enabled on line 13 [TestingAndDebugging::RequireUseStrict]        
remote: 1 file did not pass tidyall check        
To ...
 ! [remote rejected] master -> master (pre-receive hook declined)

Unlike the git pre-commit hook above, this can be enforced, and in fact there is no current way to skip the check without going in and disabling the hook. I’d like to add a flag, but not sure how that would get passed to the hook; advice welcome.

Using a commit alias

A problem with all of the hooks above is that they won’t actually tidy your files for you. They’ll simply tell you what hasn’t been tidied, then send you off to fix things. It’s all a bit tedious.

Unfortunately, modifying your code from svn/git hooks is a no-no; see here, here and here for explanations. (How did we manage before stackoverflow?)

So what I like to do is create an alias for my commit commands, like so:

alias svc='tidyall --svn && svn commit -m '
alias gic='tidyall --git && git commit -m '

Now, if I type

gic "fix some bugs" -a

It will run tidyall --git and only proceed with the commit if that succeeds. The --git and --svn flags mean “process all files that have been added/modified according to git/svn status”. This might be overkill if you’re only committing some of the files, but it’s more efficient than using --all.

As long as I use these aliases, my files should always be tidied by the time the hooks are checking them. But the hooks are still useful as a double-check, and an enforcement layer
that’s harder to skip accidentally.


Some will argue that commits should never be blocked by correctness checks. The “commit early and often” philosophy suggests that a commit might be valuable even if the code is currently untidy or invalid; this is especially true in the case of git, where commits are not shared and are designed to be performed often.

Moreover, if you’re in a technical emergency and need to commit code to deploy a fix, it would be unfortunate to be delayed by a nagging validator. (Jeff Thalhammer, perlcritic author, has said that he dislikes running perlcritic on commit for this reason.)

My responses to these arguments are (1) I’ve never personally seen a situation where it was important to commit untidy or invalid code, and (2) the escape-hatches built into the first two hooks (“NO TIDYALL” and –no-verify) will hopefully allow you to proceed during an emergency. But we can agree to disagree. :)

An alternative to running tidyall on commit is to run it during unit tests via Test::Code::TidyAll. In fact, if you’ve got a smoke tester that runs after every commit, this might end up being about the same.

Aug 21

I’m an avid user of code tidiers and validators to enforce quality and consistent style, both in personal and team projects.

In the Perl world you are probably familiar with perltidy, perlcritic and podtidy, but every language has its own tools: htmltidy for HTML, jslint and jshint for Javascript, csstidy for CSS, etc.

In a web site project I might work with half a dozen of these tools, each with their own syntax and applicable only to certain files. I want to apply some of them while editing, some when I commit, and some only when I run tests. There must be a better way!

Enter tidyall

tidyall is a unifier for code tidiers and validators. You can run it on a single file or an entire project hierarchy, and configure which tidiers/validators are applied to which files. Features include:

  • A cache to only process files that have changed

  • A standard backup mechanism with auto-pruning

  • A plugin API that makes it trivial to add new tidiers, validators, and pre/post processors

  • Support for multiple modes (e.g. editor, commit, test, dzil), with different plugins running in each mode


To use tidyall in a project, simply put a tidyall.ini file at the top of it. Here’s the tidyall.ini that I’m using for CHI:

argv = -noll -blbp=0
select = {bin,lib,t}/**/*.{pl,pm,t}

select = {bin,lib}/**/*.{pl,pm,pod}

select = {bin,lib}/**/*.{pl,pm}
argv = --profile $ROOT/perlcriticrc
except_modes = editor

select = {bin,lib,t}/**/*.{pl,pm,t}

select = {bin,lib,t}/**/*.{pl,pm,t}

These sections, in order, do the following:

  • Apply perltidy with settings “-noll -blbp=0″ to *.pl, *.pm, and *.t files.

  • Apply podtidy with default settings to *.pl, *.pm and *.pod files.

  • Apply perlcritic using the perlcriticrc in the same directory to *.pl and *.pm files; but skip this when invoking tidyall from an editor.

  • Use a preprocessor/postprocessor to hide Method::Signatures::Simple keywords (method and function) from perltidy and perlcritic.

  • Use a preprocessor/postprocessor to hide Moose attributes from perltidy, then sort and align attributes in a way I prefer.

Ways of using tidyall

Here are a variety of modes you might use tidyall in.

In your code editor

I like having a single keystroke (ctrl-t) to process the file I’m working on. The distribution contains an Emacs implementation of this command. Its effects are fully undoable and it reports any errors in a separate window.

This is the only editor I know how to program, so others will have to be contributed. :)

From the command line

Of course tidyall can be run manually, against a specific file:

% tidyall file [file...]

or against all the files in the project (skipping those that haven’t changed):

% tidyall -a

or against all the files you’ve added or modified according to svn:

% tidyall --svn

In svn and git commit hooks

The distribution includes an SVN precommit hook that checks if all files are tidied and valid according to tidyall, and rejects the commit if not. e.g.

% svn commit -m "fixups" CHI/ 
Sending        CHI/
Transmitting file data ..svn: Commit failed (details follow):
svn: Commit blocked by pre-commit hook (exit code 255) with output:
2 files did not pass tidyall check
lib/ *** 'PerlTidy': needs tidying
lib/CHI/ *** 'PerlCritic': Code before strictures are enabled
  at /tmp/Code-TidyAll-0e6K/ line 2

This replaces myriad scripts out there that perform perltidy or perlcritic in a precommit hook.

Git support is coming next.

In unit tests

Test::Code::TidyAll checks that all the files in your distribution are in a tidied and valid state. This replaces a bunch of separate testing modules (Test::Perl::Tidy, Test::Perl::Critic, Test::Pod, etc.), each with their own syntax and rules about which files to select.

In Dist::Zilla

Dist::Zilla::Plugin::TidyAll is a Dist::Zilla plugin that runs tidyall on files when building a release.

Next steps

Git support, then more plugins for Perl and other languages!

Any other neat-freaks find this useful? Feedback welcome.

Aug 16

Lately I’ve become a big fan of the Nginx + Starman combination for Perl-based web development. mod_perl has served me well for fifteen years, but with Plack/PSGI replacing the mod_perl API, it makes less sense to configure and install Apache just to invoke a PSGI handler. Starman has near-zero configuration, and Nginx provides a perfect complement for HTTP acceleration and serving static files.

To facilitate using these servers I added Nginx and Starman plugins for Server::Control, which previously existed mainly for Apache.

So what’s Server::Control?

Server::Control is a set of libraries for controlling servers, where a server is any background process which listens to a port and has a pid file. Think apachectl on steroids
(and not for just Apache).

In the happy case, controlling a pid-file server is simple – just run the command to start it, and run kill `cat /path/to/pidfile` to stop it. Where Server::Control comes in is handling all the little unhappy cases.

For example, accidentally starting a server that’s already running or stopping a server that isn’t:

    % bin/ -k start
    server 'mhq' is already running (pid 5912) and listening to port 5000

    % bin/ -k stop
    server 'mhq' is not running

or trying to start a server whose port being blocked by another process:

    % bin/ -k start
    cannot start server 'mhq' - pid file
    '/Users/swartz/git/mason-site.git/data/' does not exist,
    but something (possibly pid 5943 - "/usr/local/bin/plackup") is listening
    to localhost:5000

or a corrupt pid file, left over from a reboot or a kill -9:

    % bin/ -k start
    pid file '/Users/swartz/git/mason-site.git/data/' contains 
    a non-existing process id '5985'!
    deleting bogus pid file '/Users/swartz/git/mason-site.git/data/'

or a server that starts but isn’t listening to the expected port:

    % bin/ -k start
    waiting for server start
    after 10 secs, server 'mhq' appears to be running (pid 6167), but not
    listening to port 5000

or a server that starts but isn’t serving content correctly (using the validate_url and validate_regex parameters):

    % bin/ -k start
    waiting for server start
    server 'mhq' is now running (pid 6080) and listening to port 5000
    validating url 'http://localhost:5000/'
    content of 'http://localhost:5000/' (12798 bytes) did not match
    regex 'qr/(?-xism:Welcome to Mason and Poet)/'

So I always take the extra time to set up Server::Control, and it usually pays off in reduced frustration in the end. For convenience I have aliases like this set up to start, stop, restart and ping (check the status of) each server on a machine:

    alias ctlmhq='/home/swartz/servers/mhq/bin/ -k'
    alias stamhq='ctlmhq start'
    alias stomhq='ctlmhq stop'
    alias remhq='ctlmhq restart'
    alias pingmhq='ctlmhq ping'
May 31

I’ve been searching for the best way to normalize an argument list to a string, such that two argument lists convert to the same string iff they are equivalent. My ideal algorithm would

  1. Compare embedded hashes and lists deeply, rather than by reference
  2. Ignore hash key order
  3. Ignore difference between 3 and “3″
  4. Generate a relatively readable string
  5. Perform well (XS preferred over Perl)

This is necessary for memoizing a function, or for caching a web page with query arguments.

As a strawman example, Memoize uses this as a default normalizer, which fails #1 and #3:

$argstr = join chr(28),@_;  

The best candidate I’ve found to date is


as it is fast, readable, and hash-key-order agnostic. CHI uses this to generate keys from arbitrary references.

However, JSON::XS treats the number 3 and the string “3″ differently, based on how the scalar was used recently. This can generate different strings for essentially equivalent argument lists and reduce the memoization effect. (The vast majority of functions won’t know or care if they get 3 or “3″.)

For fun I looked at a bunch of serializers to see which ones differentiate 3 and “3″:

Data::Dump   : equal - [3] vs [3]
Data::Dumper : not equal - [3] vs ['3']
FreezeThaw   : equal - FrT;@1|@1|$1|3 vs FrT;@1|@1|$1|3
JSON::PP     : not equal - [3] vs ["3"]
JSON::XS     : not equal - [3] vs ["3"]
Storable     : not equal - <unprintable>
YAML         : equal - ---n- 3n vs ---n- 3n
YAML::Syck   : equal - --- n- 3n vs --- n- 3n
YAML::XS     : not equal - ---n- 3n vs ---n- '3'n

It seems in general like the more sophisticated modules make this differentiation, perhaps because it is more “correct”, though it is the opposite of what I want in this case. :) Of the ones that report “equal”, not sure how to get them to ignore hash key order.

I could walk the argument list beforehand and stringify all numbers, but this would require making a deep copy and would violate #5.

If I find a great result that requires more than a few lines of code, I’ll stick it in CPAN, e.g. Params::Normalize.

May 06

Memoization is a technique for optimizing a function over repeated calls. When you call the function, the return value is cached (based on the arguments passed) before being returned to you. Next time you call the function with the same arguments, you’ll get the value back immediately.

Memoize is the standard Perl memoization solution and after twelve+ years still works well in the common case. However, since Perl caching support has come a long way, and memoization is just a specific form of caching, I wanted to try pairing memoization with modern cache features. Hence, CHI::Memoize.

Here are some of the nice features that came out of this:

  • The ability to cache to any of CHI’s other backends. e.g.
        memoize( 'func', driver => 'File', root_dir => '/path/to/cache' );
        memoize( 'func', driver => 'Memcached', servers => [""] );
  • The ability to expire memoized values based on time or a condition. e.g.

        memoize( 'func', expires_in => '1h' );
        memoize( 'func', expire_if => sub { ... } );
  • A better key normalizer. Memoize just joins the keys into a string, which doesn’t work for references/undef and can generate multiple keys for the same hash. In contrast, C relies on CHI’s automatic serialization of non-scalar keys. So these will be memoized together:

        memoized_function( a => 5, b => 6, c => { d => 7, e => 8 } );
        memoized_function( b => 6, c => { e => 8, d => 7 }, a => 5 );

    and it’s easy to specify your own key, e.g. memoize on just the second and third arguments:

        memoize( 'func', key => sub { $_[1], $_[2] } );

Subsets of these features were already available in separate distributions, but not all in one place.

Now that this available I’m curious to see where I use it in place of the traditional get-and-set cache pattern.

Apr 21

I’m pleased to announce Poet, a modern Perl web framework designed especially for Mason developers.

Features include:

Poet was designed and developed over the past six years at Hearst Digital Media. Today it is used to generate Hearst’s magazine websites (including Cosmopolitan, Esquire, and Good Housekeeping) as well as associated content management, subscription management and ad rotation systems. I’m very grateful to Hearst for agreeing to this open source release (though they bear no responsibility for its support or maintenance).

Why another Perl web framework?

To answer this requires a bit of history.

HTML::Mason was one of the early Perl “web frameworks”. Like its JSP/ASP/PHP contemporaries, its main trick was embedding code in HTML, but it contained enough web-specific goodies to serve as a one-stop solution. It relied heavily on mod_perl and had mailing lists filled with web-related discussions having nothing to do with templating.

Over time, a new breed of web framework emerged – Catalyst and Jifty and Mojolicious and Dancer in the Perl world, Rails and Sinatra and Django elsewhere. In these frameworks the templates moved from center stage to become just one piece of a large system.

HTML::Mason faced an identity dilemma; should it be a pure templating framework, or try to expand and better serve its traditional web development audience? In the end, and with coaxing from co-author Dave Rolsky, Mason 2 shifted decisively towards the former. It shed most of its web-specific code, thanks in large part to Plack/PSGI, and became more of a generic templating system (albeit still destined to spend much of its time generating HTML).

There’s two ways to use Mason

So one legitimate way to use Mason is as a dedicated View component in a larger MVC framework like Catalyst or Dancer. Hence Catalyst::View::Mason2 and Dancer::Template::Mason2. (This is how Dave prefers to roll.)

But for me, and for some others, Mason remains a great way to handle the whole web request – to dispatch URLs to components and process HTTP arguments and implement common behaviors for sets of pages. I prefer my page logic right next to my page view, rather than flipping between a controller and view that are often annoyingly coupled.

Moreover, fifteen+ years had left me with a pile of useful ideas, techniques, and conventions for web development. Mason wasn’t the appropriate place for them any more (if it ever was) but I need to collect them somewhere.

This is where Poet comes in. Poet doesn’t need a controller layer; it turns web requests into Mason requests, and happily lets Mason handle the rest of the work. Poet doesn’t have Mason’s identity crisis; it is proudly web-centric, the place to put all the web-related goodness that Mason developers want nearby.

There’s much more to come than I could put in this initial release, and I’m looking forward to pressing on with it! I hope it makes at least a few of your lives’ easier, and as always I welcome the feedback.

Mar 04

At work we have over 200 modules and Mason components that use
CHI to cache some data or HTML. Each has its own
distinct namespace to prevent collisions.

Each namespace uses one of several different storage types — memcached, local file, NFS
file – depending on its usage characteristics. Each storage type has a set of default
parameters (e.g. root_dir for file) that rarely change. Finally, there are some defaults
we want to use across all of our caches.

To maintain a coherent cache strategy — and our sanity — we need a single place to
see and adjust all this configuration.

So instead of repeatedly embedding parameters like this:

my $cache = CHI->new
   (namespace => 'Foo', driver => 'File', root_dir => '/path/to/root',
    depth => 3, expires_in => '15m', expires_variance => 0.2);


my $cache = CHI->new
   (namespace => 'Bar', driver => 'Memcached',
    servers => [ "", "" ],
    compress_threshold => 10_000, expires_in => '1h', expires_variance => 0.2);

we can do this:

my $cache = CHI->new(namespace => 'Foo');


my $cache = CHI->new(namespace => 'Bar');

then in a YAML configuration file:

  expires_variance: 0.2

    driver: File
    root_dir: /path/to/local/root
    driver: File
    root_dir: /path/to/nfs/root
    driver: Memcached
    servers: [ ... ]
    compress_threshold: 10_000

  Foo: { storage: local_file, expires_in: 15m }
  Bar: { storage: memcached,  expires_in: 1h }
  Baz: { storage: memcached,  expires_in: 2h }

In the first paragraph we define overall defaults. In the second we define a set of
storage types, each with their own defaults. In the third we assign each namespace to a
storage type and an expiration time. Each level can override the defaults of previous
levels, and arguments passed in CHI->new override anything in configuration.

Support for this kind of configuration is available as of CHI 0.52. You should first
create a CHI subclass for your application, so as not to interfere with other CHI users in
the same process:

package My::CHI;
use base qw(CHI);

Then specify configuration with a hash or file:

My::CHI->config({ storage => ..., namespace => ..., defaults => ... });


Even if you don’t have 200 namespaces, it’s nice to have a single place where you can
fiddle with cache controls.

Jan 31

chromatic shows users how to run tests faster on cpanm and perlbrew installs. This strikes me as well-meaning advice that misses a much more basic point:

cpanm and perlbrew should not run tests by default.

This may sound heretical. Perl has always had a strong testing culture, and end-user testing may have once played a valuable role in testing a distribution under many systems. But we now have a CPAN Testers network which will run tests on countless systems and Perl versions, and report failures back to the author promptly and automatically. Distributions can be sent through the Testers’ gauntlet before ever being officially released. In this environment, it’s hard to see much additional value in ad hoc end-user testing.

As Dave Rolsky points out, we install most other software without running tests and rarely give it a thought.

None of this would matter if end-user testing was free. But it is not.

The costs of end-user testing

Slower installs. On my system, a fresh install of Moose and its dependencies takes three times longer with tests (2 minutes versus 41 seconds). A fresh install of Catalyst and its dependencies takes nearly four times longer with tests (9.5 minutes versus 2.5 minutes).

How many new Perl users find CPAN installs much slower than they need to be? How many would choose a 3-4 times speedup if they knew it was an option? It’s like having a turbo button and leaving it unpressed by default.

False positives. The more tests CPAN authors write, the more likely an occasional false-positive failure sneaks through (as in “Failed 1/1746 tests.”) In most such cases the module will still work for the user’s purposes. But the default behavior is to prevent the module from being installed at all. If your module depends on other modules, then any failure up the dependency chain likewise prevents your module from being installed, even if the failure has no bearing on your module’s efficacy.

How many new Perl users have unnecessarily failed to install a module like Moose or Catalyst because of an obscure, temporary failure deep in the dependency chain?

Fear of dependencies and code reuse. Slower installs and false positives are the main reasons why people complain about distributions having “too many dependencies”. (If the dependencies installed quickly and reliably, as they do with –notest and apt-get and yum, would anyone complain or even notice?) These complaints in turn encourage module authors to reduce or eliminate their dependencies, thus reinventing where they could be reusing.

It’s the wrong default for new users

Some veteran Perl folks may like tests to run on every install. That’s fine. But I suspect new Perl users just want things to install quickly and reliably, and in any event don’t have the experience to evaluate or take action on a test failure (especially one in an obscure dependency). For these users running tests is simply the wrong default.

I turn on –notest on each system I administer and preach it enthusiastically to every new Perl user I encounter. But I wish I didn’t have to mention it at all.

Oct 07

Despite its lofty martial-arts name, Tie::CHI is a simple module that allows you to tie a hash to a persistent CHI cache, using any of CHI’s backends.

I hardly ever choose Tie interfaces — too much magic — but occasionally they do produce pretty code. In this case, we have a watchdog script that sends USR2 signals to httpd processes that grow too large (so that they’ll log their call stack). Sometimes these processes stick around for a while, so I only want to send a certain number of signals per process.

    my %kill_count;
    if ( $vsize > $max_vsize ) {
        if ( $kill_count{$pid} < $max_kills ) {
            kill( 'USR2', $pid );
            $log->warn(sprintf( "pid %d vsize %dmb > %dmb, sending USR2",
                $pid, $vsize, $max_vsize ));

Then it occurred to me that the watchdog restarts frequently, so I need to keep the kill counts persistent; and I should only limit on an hourly basis, because the same pid will eventually come around again. We already have a custom CHI subclass that we use for caching all over our application, so it was easy to plug it in:

    use Tie::CHI;
    my $cache = HM::Cache->new
       (namespace => 'watchdog/kill_count', expires_in => '1 hour');
    my %kill_count;
    tie %kill_count, 'Tie::CHI', $cache;

And voila, %kill_count is persistent, and its values decay after an hour.

preload preload preload