Tag Archives: python

GPU

Creating a conda environment for GPU programming with pytorch and tensorflow

After a few mis-steps, here is how I set up a conda environment to use in Jupyter with tensorflow, pytorch, and using the GPU.

As a note, I do this on the node with the GPU, so that things (hopefully) compile correctly!

1. Create an environment

First, create a new environment for tensorflow and friends, and activate it.

mamba create -n gpu_notebook cudatoolkit tensorflow nvidia::cuda pytorch torchvision torchaudio pytorch-cuda -c pytorch -c nvidia
mamba activate gpu_notebook

2. Install the python libraries

Install the usual suspect python packages that you will probably want to use. For convenience, I usually put these in a file in my
Git repo called requirements.txt.

$ cat requirements.txt 
jupyter
matplotlib
natsort
numpy
pandas
scipy
scikit-learn
seaborn
statsmodels
pip install -r requirements.txt

3. Reame your jupyter kernel

When you open jupyter there is a list of kernels that you can connect to. (If you have a window open that list will be on the top right.) If you rename your jupyter kernel it makes it
much easier to find the kernel associated with this conda environment. The default name is something like Python 3 which is not helpful if you have lots of them!

a. Find where your kernel is installed

This command shows your jupyter kernels

jupyter kernelspec list

You’ll see your kernel(s) and the locations of them. In the location listed there is a file called kernel.json.

b. Edit that file:

vi $HOME/miniconda3/envs/gpu_notebook/share/jupyter/kernels/python3/kernel.json

c. Change the name to be meaningful

Change the value associated with the display_name key. Set it to something meaningful so you can find it in your browser!

4. Set up the XLA_FLAGS environment variable.

This was essential for me to get tensorflow working. There is a directory somewhere in your conda environment with the libdevice library that is needed. For my installation that was in nvvm/libdevice/libdevice.10.bc. Of course you can find yours with:

find ~/miniconda3/ -name libdevice

You want to set the XLA_FLAGS variable to point to the base of the nvvm folder. This command sets it inside the conda environment so it is always set when the conda environment is activated, and unset when it is deactivated.

conda env config vars set XLA_FLAGS=--xla_gpu_cuda_data_dir=$HOME/miniconda3/envs/gpu_notebook

5. Activate the environment

Don’t forget to submit this to a node with GPU capabilities!

statsmodels.mixedlm Singular Matrix error

When building linear mixed models with Python’s statsmodules module, I repeatedly, and often incoherently, ran into np.linalg.LinAlgError errors that are Singular matrix errors.

There are a couple of things to check for with these errors:

First, drop any rows where there are NaN values for the predictors:

e.g. if your predictors are in a list called predictors, try this

df= df.dropna(subset=predictors)

Second, remove any columns whose sum is zero:

to_drop = list(df.loc[:,df.sum(axis=0) <1].columns)
df.drop(columns=to_drop)

Third, now that you have dropped columns, make sure they are still in your predictors. Something like

updated_predictors = list(set(predictors).intersection(set(df.columns)))

Finally, when all that doesn’t work, you should try different methods to fit the model. These are the methods I currently use, and I try them in this order and save the results for the first one that completes.

results = None
for meth in 'bfgs', 'lbfgs', 'cg', 'powell', 'nm':
    try:
        result = model.fit(method=meth)
        print(f"Method {meth} PASSED", file=sys.stderr)
        break
    except np.linalg.LinAlgError as e:
        print(f"Method {meth} failed", file=sys.stderr)
if results:
    print(results.summary)
phyloseq logo

Converting phyloseq objects to read in other languages

Phyloseq is an R package for microbiome analysis that incorporates several data types.

Occassionally our colleagues share a phyloseq object with as an .rds file (R Data Serialization format). It is quite simple to convert that for use in other languages (e.g. python or even Excel!)

Converting the data to .tsv format

This approach requires an R installation somewhere, but we don’t need many commands, so you can probably use a remote R installation on a server!

If you have not yet installed phyloseq, you can do so with bioconductor:

if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("phyloseq")

Next, we load the phyloseq package and read the .RDS file:

library("phyloseq");
packageVersion("phyloseq"); # check the version we are using
# you may need to use setwd("C:/Users/username/Downloads") to move to whereever you downloaded the file!
p <- readRDS("phyloseq.rds"); # change the filename here! 
print(p)

This will print typical output from a phyloseq object like:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 3210 taxa and 11 samples ]
sample_data() Sample Data:       [ 11 samples by 12 sample variables ]
tax_table()   Taxonomy Table:    [ 3210 taxa by 7 taxonomic ranks ]

These are our base phyloseq objects, and we can explore them:

print(otu_table(p))
print(sample_data(p))

And we can also write them to tab separated text in a .tsv file:

write.table(otu_table(p), "p_otu.tsv", sep="\t")
write.table(sample_data(p), "p_sample.tsv", sep="\t")
write.table(tax_table(p), "p_tax.tsv", sep="\t")

Read those files into Python

You can now use pandas to read those files into Python:

import pandas as pd

otu = pd.read_csv("p_otu.tsv", sep="\t")
otu

# sometimes the sample metadata has characters that can't be read using `utf-8` so we have to use `latin-1`
samples = pd.read_csv("p_sample.tsv", sep="\t", encoding='latin-1')
samples

tax = pd.read_csv("p_tax.tsv", sep="\t")
tax
Global Distribution of Crassphage Map

How to make beautiful maps

Making maps is hard. Even though we’ve been making maps for hundreds of years, it is still hard. Making good looking maps is really hard. We published a map that is both beautiful and tells a story, and this is the story of how we made that map.

But a figure like this does not appear immediately, it takes work to get something to look this good, and needless to say it wasn’t me that made it look so great!

Continue reading

Installing PyFBA (and necessary modules) without admin permissions

As easy as it is to install PyFBA using the pip command, it can be quite cumbersome to do so when you are working on a system without granted administrative or sudo permissions. Here is a quick guide that has worked for me when installing PyFBA on a CentOS 6.3 system running a SunGrid Engine cluster system. If you are working on a Linux system and you do have admin and sudo permissions, please follow the install guide here. Continue reading

Autocompletion in default Python shell

Some Python shells, like iPython, provide autocompletion functionality while typing in code. If you’re like me and use the default shell in your terminal (the one that starts up when you just execute python) this feature isn’t automatically available. Alas, I discovered a way to make this possible! Click the read more to find out how.

Continue reading