Forking jobs on a multiprocessor machine

If you’re running on a multi-thread or multi-processor machine but your application doesn’t take advantage of those additional cpus, you can run lots of jobs simultaneously. We use it for running the same application (e.g. phispy or mauve) on multiple genomes. Here’s a code snippet to allow you to run as many jobs as you want simultaneously.

 

With this piece of perl you can set how many processors you want to occupy using the Proc::Queue size setting (be nice to your colleagues and save them one or two!) and then run jobs in parallel.

 

 #!/usr/bin/perl -w use strict; use Proc::Queue size => 6; use POSIX ":sys_wait_h"; # imports WNOHANG  my @files = ("file1", "file2" ... "filen"); # a list of files to process.  while (@files) {     my $file = shift(@files);          my $pid = fork;          if (defined $pid && $pid == 0) {                  # a child                  my $cmd = "progressiveMauve --output=out.xfma $file";                  print STDERR "Running $cmdn";                  system("$cmd");                  exit(0);           }          1 while waitpid(-1, WNOHANG)>0; # reaps children  }