Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Hail is an open-source, scalable framework for exploring and analyzing genomic data.

http://blog.cloudera.com/blog/2017/05/hail-scalable-genomics-analysis-with-spark/

Running Hail locally 

You’ll need:

Unzip the distribution after you download it. Next, edit and copy the below bash commands to set up the Hail environment variables. You may want to add the export lines to the appropriate dot-file (consider adding them to your .bashrc or .profile) so that you don’t need to rerun these commands in each new session.

Here, fill in the path to the untarred Spark package.

Code Block
$ export SPARK_HOME=<path to spark> 

Unzip the Hail distribution.

Code Block
$ unzip <path to hail.zip>

Here, fill in the path to the unzipped Hail distribution.

Code Block
$ export HAIL_HOME=<path to hail> 
$ export PATH=$PATH:$HAIL_HOME/bin/

To install Python dependencies, create a conda environment for Hail:

Code Block
$ module load python/3.5_intel
$ module load java/1.8
$ conda env create -n hail -f $HAIL_HOME/python/hail/environment.yml $ source activate hail

Start Hail

Code Block
$ jhail
>>> import hail as hl
>>> import hail.expr.aggregators as agg
>>> hl.init()

If the above cell ran without error, you can get started!

Once you’ve set up Hail, we recommend that you run the Python tutorials to get an overview of Hail functionality and learn about the powerful query language. To try Hail out, start a Jupyter Notebook server in the tutorials directory.

You can click on the “01-genome-wide-association-study” notebook to get started!

In the future, if you want to run:

  • Hail in Python use hail

  • Hail in IPython use ihail

  • Hail in a Jupyter Notebook use jhail

Hail will not import correctly from a normal Python interpreter, a normal IPython interpreter, nor a normal Jupyter Notebook.

BLAS and LAPACK

Code Block
$ conda install -c conda-forge blas
$ conda install -c conda-forge lapack

Building Hail from source

 You’ll need:

On a Debian-based Linux OS like Ubuntu, run:

$ Module load cmake gcc/7 git

$ git clone --branch 0.1 https://github.com/broadinstitute/hail.git

$ cd hail

$ ./gradlew -Dspark.version=2.0.2 shadowJar

Add the following environmental variables by filling in the paths to SPARK_HOME and HAIL_HOME below and exporting all four of them (consider adding them to your .bashrc or .profile):

export SPARK_HOME="/path to spark"

export HAIL_HOME="/path to spark"

export PATH=$PATH:$HAIL_HOME/bin/

export SPARK_CLASSPATH=$HAIL_HOME/build/libs/hail-all-spark.jar

export PYTHONPATH="$PYTHONPATH:$HAIL_HOME/build/distributions/hail-python.zip:$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip"

Running on a Spark cluster

Hail can run on any cluster that has Spark 2 installed.

$ ./gradlew -Dspark.version=2.0.2 shadowJar archiveZip

Within the interactive shell, check that you can create a HailContext by running the following commands. Note that you have to pass in the existing SparkContext instance sc to the HailContext constructor.

>>> from hail import *
>>> hc = HailContext()