Metagenome analysis spans a large range of different methods and tools in the bioinformatics community. These tools provide scientists with biological information present in a sequenced environmental sample, more specifically the genetic functions encoded in the DNA of the sampled metagenome. Most often those tools have been developed to compare a specific metagenome file against databases that are filled with sequences and annotation data.
This project is directed to performing a comparative analysis between multiple metagenomic FASTA files. By importing n-length pieces of the sequences from one file into a hash table structure, comparing other metagenome sequences from other files will be done quickly and precisely. Finding similar sequences and structures between numerous metagenomes can give insight into what biological functions are shared between related and unrelated organisms.