Before I start, I have to use the usual disclaimer (I’m trained as a biologist- don’t be surprised I didn’t know “map”).
Rob directed me today to Perl’s “map,” a little function that I didn’t know about, and that seems to have the potential to solve many of my problems.
Map is documented here.
So, why do I need it?
Let’s say, for example, I have this list of protein pairs that are similar to each other: @sims = (1115.1, 1116.1, 1116.1, 1115.1, 1115.1, 1118.2, 1118.2, 1115.1, 1118.2, 1116.1, 1116.1, 1118.2). These are simply 4 homologs that hit each other reciprocally. To get a set of unique IDs of these homologs using Perl, my options are:
Option 1:
my %hash;
for my $k (@sims) {
$hash {$k} = 1
}
print join “t”, keys %hash;
Option 2:
my %hash;
for (@sims) {
$hash {$_} = 1
}
print join “t”, keys %hash;
Option 3: really shorter
my %hash;
map {$hash{$_} =1} @sims;
print join “t”, keys %hash;
They all return:
1118.2 1115.1 1116.1