Friday, February 06, 2009

Alarming

I found the following weirdness in an old script of mine at work:
for (@purge_array) {
print "Running command $_\n";
eval {
local $SIG{ALRM} = sub {
my $mod_time = time - (stat($logfile))[9];
if ( $mod_time > 180 ) {
die "alarm\n";
}
else {
alarm 0;
alarm $timeout;
}
};
alarm $timeout;
system($_);
alarm 0;
};
}


It took me a minute to figure out what I was thinking. Basically there is this list of commands (@purge_array) that I have to run through a third-party program that sometimes just dies without exiting. It doesn't take up CPU when it dies, and it doesn't hold file handles open, it just dies, but still shows up in the process list.

Originally all the commands were part of our nightly maintenance scripts, and once in a while the maintenance script would still be running when everyone got into the office the next day. This was my approach to fix it. We decided that if the logfile hadn't been updated in three minutes, that the command had failed. The command may take more than three minutes to run, but it would log periodically, and never longer than a minute between log entries, so we picked three minutes as the worst-case scenario.

Perl has this neat alarm function. You call alarm with a number of seconds until the script exits with an error, and then call whatever expensive function, then you set call alarm again and pass it "0", to turn off the alarm. If the expensive function completes in time, everything's cool and the script continues, but if the expensive function takes longer than the number of seconds you passed to alarm, the script exits.

Perl also has a way to interrupt system calls, such as the alarm signal ($SIG{ALRM}). I wrote my own subroutine for alarms that took the current time, and subtracted it from the time the logfile was last modified. If it was greater than 3 minutes, go ahead and exit, otherwise reset the alarm and let the script wait some more. This is, in theory, happening in the background while the third-party program is still running, and can continue to reset the alarm as long as logfile entries keep coming.

The final perk of all this is I'm doing it through "eval", a sort of virtual machine, which makes purists cringe for reasons that aren't clear to me. The only thing that exits if the alarm goes off is the eval statement, and the next item in @purge_array can be processed without the whole script failing.

It was "eval", in fact, that led to my fight Randal Schwartz and my eventual apostasy from the church of Perlmonks.org. You can read more about that here, if you like. I promise there is hardly any real code discussed.

No comments:

Post a Comment