Recently, I’ve been faced with a problem where I feel my metagenome comparator program is running too slow. The main reason behind it is that it’s performing operations that occur multiple times in a loop. These operations involve different tasks such as: reading lines from text, creating objects, inserting those objects into a data structure, retrieving those objects from the data structure, and writing the data structures to disk (just to name a few). So it would be natural to suggest to someone in my position to parallelize it all, and that’s exactly what I want to do. However, I’ve never written any type of parallel applications, and thus, I need to do a little bit of learning and researching into parallel programming. (More of my ramblings after the Read More break)
What I Know
- Multi-threading is the implementation most people would go with when parallelizing their code. With our multi-core cpus, multi-threading becomes a very powerful idea for any program or application that utilizes it. Each thread spawned has access to the same information in memory and can synchronize their tasks efficiently and effectively. The programming overhead to manage threads does not seem to be too complicated with the ability to easily control which piece of data and information each thread accesses.
- Cluster computing is another method in parallel computing where multiple linked computers can achieve parallelism by splitting the work done to complete an entire task. This type is sometimes known as Beowulf computing. Most use a Message Passing Interface (MPI) to synchronize any communication needed between each node in the cluster.
What I Need
- The implementation that I choose will most likely depend on the hardware available. The lab has a 8-core process (Octopussy) and a cluster. I’ve been using Octopussy to develop my program and have run into a few problems with it already. The amount of memory I’m allowed to reserve for allocation for the Java Virtual Machine (JVM) is maxed at 2500MB. Although there is usually more than 3GB available, the JVM can only reserve a contiguous block of memory due to memory arithmetic when dealing with references and Java Heap space. So, memory is the main issue to consider because of the high amount of data and metagenome sequences being analyzed.
- The ease of implementation is another factor. Usually when programming for parallel computation, the program must be written with parallelization in mind. Of course, good abstraction and method usage ahead of time can make this switch much simpler, something I feel I did well enough for it not to be much of a problem. In this case, the amount of time to learn to use either multi-threading or cluster computing will define the ease of implementation.
- The entire program is written in Java so this will also play a major factor into which I choose. If there are too many hoops to jump through just to get this running on the cluster, then I’ll probably lean more in the multi-threading direction.
I am hoping to make a choice soon so I can get this rolling. I need to look into running jobs in our cluster and find out the peculiarities that come along with it. Multi-threading looks to be straightforward with only adjusting the necessary methods to accommodate the limited amount of memory available in Octopussy.