minimap2 hints

Here are some tips and tricks for minimap2 that I keep forgetting!

–split-prefix

If you have a large (>4 GB) multisequence index file, there are two options.

The first is to increase the value of -I when you build the index (preferred) so that the whole index is kept in memory. Note: This must be done when you build the index, you can’t build the index and then change -I during runtime.

The second is to use --split-prefix with a string. For snakemake, there are two options:

  1. You can use "{sample}" as your prefix like so:
params:
    prfx = "{sample}"
...
shell:
    """
         minimap2 --split-prefix {params.prfx} ...
    """

2. You can use a random 6 character string like so:

import random, string

params:
        pfx = ''.join(random.choices(string.ascii_uppercase + string.digits, k=6)) 
...
shell:
    """
         minimap2 --split-prefix {params.prfx} ...
    """

The trick is here, things will probably break if your index file is small. If you see the errorr: [W::sam_hdr_create] Duplicated sequence it is probably because you have split a small index sequence, and the sequence IDs are being duplicated. Remove the --split-prefix option and you should be good.