12.2 AlphaFold 3 (AF3)
Description
AlphaFold 3, developed by Google DeepMind and Isomorphic Labs, is an advanced AI system that predicts the 3D structures of proteins and their interactions with molecules such as DNA, RNA, ligands, and ions.
Model Parameters Terms of Use
With the release of AlphaFold 3, Google DeepMind has changed the software’s license structure. This means that, unlike AlphaFold 2, AlphaFold 3 is not entirely free. The source code is still open source. We provide the necessary datasets for inference. The model parameters, which result from training the AlphaFold model, are required to perform the inference calculations that predict molecular structures. However, the model parameters are provided separately from the source code and come with their own terms of use. As a result, you must acquire the model parameters directly from Google. LRZ users who want to run AlphaFold 3 must read and understand the terms of use for the model parameters outlined below:
- AlphaFold 3 MODEL PARAMETERS TERMS OF USE
- AlphaFold 3 MODEL PARAMETERS PROHIBITED USE POLICY
- AlphaFold 3 OUTPUT TERMS OF USE
Obtaining the Model Parameters
In order to obtain a copy of the model parameters, you must complete and submit this form. Please make sure you have read the terms and conditions in the form and that you are able to comply with them. Approval of the request is at the sole discretion of Google DeepMind. LRZ is unable to help with completing the form or intervene if Google DeepMind rejects the request.
To comply with the terms of use, each AlphaFold 3 user on the LRZ must request, download, and use their own individual copy of the model parameters.
Instructions for filling out the form
- Make sure to read and understand the section titled Key Things to Know When Using the AlphaFold 3 Model Parameters and Output.
- In the first email field (labeled “Email”), enter your organisational email address.
- In the second email field (labeled “Google account email address (e.g., Gmail)”), you must enter an email address that ends with gmail.com. Entering your organizational email address in this field will result in the rejection of your request to access the model parameters. If you do not have a Gmail address, please visit gmail.com to create one.
- In the field labeled “URL of public-facing website for non-commercial organization,” enter your organization's website.
- Once all the fields on the first page are completed, click the Next button.
- On the second page, you will find a single question: "Do you intend to provide access to the AlphaFold 3 model parameters to other researchers within your non-commercial organization? (E.g. as part of a centrally managed computing cluster)" Select No as your answer.
- Click Next, read the text on the final page, and if you fully understand the terms of use, complete the form and submit it.
- The response time for form submission can vary from a few hours to several days.
Download the Model Parameters
You will receive two emails. The first email will arrive shortly after you submit the form and will serve as an acknowledgment of receipt of your request. After Google approves your request, you will receive a second email within a few hours to several days. This email will include a link to download a file containing the model parameters. Clicking the link will take you to a Google Drive page where you can download the model parameters. The model parameters are contained in a single file, approximately 1GB in size. Please note that access to the model parameters file link will expire 7 days after you receive the email. You can then store this file in your home directory on the LRZ. The model parameters file is in a compressed format (af3.bin.zstd).
Please be aware that the link to the model parameters file will expire 7 days after you receive the email. Make sure to store the model parameters in a secure location within this time frame.
Running AlphaFold 3
The AlphaFold 3 model prediction process involves two steps:
- Data Pipeline & Database Search (CPU-Only)
- This step runs entirely on CPUs and does not require a GPU.
- It is both memory- and CPU-intensive and may take several hours to complete.
- Model Inference (GPU Required)
- This step requires a GPU.
- Only NVIDIA A100 and H100 GPUs with 80 GB of GPU memory are officially supported.
- Older or lower-memory GPUs, such as V100, P100, or A100 slices (20 GB), can be used, but they will require smaller input sizes.
- The AlphaFold 3 documentation offers strategies you can use to run with larger inputs.
To run it on a node with older GPUs, please refer to the section below Running With Older GPUs.
You can run AlphaFold 3 in two ways:
- As a Batch Job
- As an Interactive Job
Below, you'll find examples for both. Please update the necessary parts according to your workflow.
Please note that an LRZ AI system does not officially support Docker containers; instead, we use Enroot containers for AlphaFold 3. The Enroot container is built from the Dockerfile. Further information regarding Enroot containers in the LRZ AI system can be found Enroot Container.
Releases
- This document uses the Enroot container image for AlphaFold 3 based on version v3.0.1, named alphafold3_v3-0-1.sqsh.
- See an example of the Enroot container image name in af3.sh script at line 15 (ENROOT_IMAGE).
- For more details, refer to 12.1 Available Enroot Container Images.
- If a new version is provided by LRZ and you want to use it, please update the Enroot container image name accordingly.
Batch Job
The script below provides an example of running AlphaFold 3 as a batch job. You can save the script below as af3.sh.
#!/bin/bash #SBATCH --job-name=af3 # Specify a name for the job #SBATCH --partition=lrz-hgx-h100-94x4 # Specify the partition #SBATCH --gres=gpu:1 # Request 1 #SBATCH --cpus-per-task=20 # Number of CPU cores per task #SBATCH -o logs/job_output_%j.out # Standard output log with job ID #SBATCH -e logs/job_error_%j.err # Error log with job ID # Create log directory if they don't exist mkdir -p logs # Do not change these paths: # Path to the dataset and the Enroot image directory. export DATABASE_DIR=/dss/dssfs04/pn69za/pn69za-dss-0004/datasets/alphafold3 export ENROOT_IMAGE=/dss/dssfs04/pn69za/pn69za-dss-0004/containers/shared/alphafold3/alphafold3_v3-0-1.sqsh # User-defined locations (can be modified): # Path to the model directory that you downloaded export MODEL_DIR=$HOME/alphafold3/model_parameter # Directory containing the input JSON file(s) (single or multiple files) export INPUT_DIR=$HOME/alphafold3/af_input # Directory where the output will be written export OUTPUT_DIR=$HOME/alphafold3/af_output # Run AlphaFold3 inside the Enroot container with the specified mounts srun --container-mounts=$MODEL_DIR:/workspace/models,$DATABASE_DIR:/workspace/public_databases,$INPUT_DIR:/workspace/af_input,$OUTPUT_DIR:/workspace/af_output \ --container-image=$ENROOT_IMAGE \ python3 run_alphafold.py \ --json_path=/workspace/af_input/fold_input.json \ --model_dir=/workspace/models \ --output_dir=/workspace/af_output # Print completion message echo 'AlphaFold3 execution completed.'
An appropriate method for running the script involves creating the folder structure as shown below.
alphafold3 ├── af3.sh ├── af_input │ └── fold_input.json ├── af_output ├── model_parameter └── af3.bin.zstd
Please follow the instructions provided to complete this setup:
- Create a folder named alphafold3 in your home directory.
- Inside this folder, create the af3.sh script as shown above.
- Inside alphafold3, create three folders:
- The first, named af_input, will contain your input files.
- The second, named af_output, will be where the output will be written.
- The third, named model_parameter, is where you should place the model parameter file (af3.bin.zstd), which you downloaded by following the instructions in the Google Form.
- Inside the af_input folder you created, place your input file, which can either be a single JSON file or a directory containing multiple JSON files.
- For a single JSON file, use the --json_path flag followed by the path to the file.
- For multiple JSON files, use the --input_dir flag followed by the path to the directory containing the JSON files.
- Please note that if you want to test with multiple JSON files, you must change the flag in the af3.sh script at line 31.
- Instead of using --json_path=/workspace/af_input/fold_input.json for a single JSON file, change it to --input_dir=/workspace/af_input/ to specify the directory containing multiple JSON files.
- For further details about the input file can be found AlphaFold 3 Input.
- Go to the alphafold3 folder on the login node and run the following command:
sbatch af3.sh
Interactive Job
AlphaFold 3 can also be run via an interactive job. You can follow the instructions below to submit interactive jobs.
First, allocate one of the available partitions from the login node, as shown below. In this example, the lrz-hgx-h100-94x4 partition is allocated.
salloc --partition lrz-hgx-h100-94x4 --job-name af3 --gres=gpu:1 --cpus-per-gpu=20
- If the run time of an interactive job is not defined, it will run for one hour by default! Once the allocation expires, the program will be signalled and killed. The interactive job is no longer available for starting further programs!
- If you specify a runtime for your interactive job and it finishes earlier than the allocated time or if you disconnect from the terminal, the interactive session will remain active until the specified time expires. This may lead to unnecessary resource consumption and potentially impact system availability. Therefore, once your job is complete and you do not intend to use the allocated partition, remember to terminate your interactive session.
Once the partition is allocated, you will be placed on a compute node. Next, start an interactive batch session on the compute node using the following command.
srun --pty bash
After the interactive session starts, create a container named alphafold3 from the Enroot image alphafold3_v3-0-1.sqsh by running the following command.
ENROOT_IMAGE=/dss/dssfs04/pn69za/pn69za-dss-0004/containers/shared/alphafold3/alphafold3_v3-0-1.sqsh enroot create --name alphafold3 $ENROOT_IMAGE
Finally, use the command below to start the container and run AlphaFold 3.
DATABASE_DIR=/dss/dssfs04/pn69za/pn69za-dss-0004/datasets/alphafold3 MODEL_DIR=$HOME/alphafold3/model_parameter INPUT_DIR=$HOME/alphafold3/af_input OUTPUT_DIR=$HOME/alphafold3/af_output enroot start \ --mount $MODEL_DIR:/workspace/models \ --mount $DATABASE_DIR:/workspace/public_databases \ --mount $INPUT_DIR:/workspace/af_input \ --mount $OUTPUT_DIR:/workspace/af_output \ alphafold3 \ python3 run_alphafold.py \ --json_path=/workspace/af_input/fold_input.json \ --model_dir=/workspace/models \ --output_dir=/workspace/af_output
Running With Older GPUs
All CUDA Capability 7.x GPUs (e.g., V100) produce incorrect outputs with many clashing residues, leading to a ranking score of -99 or lower. To avoid this issue, set the environment variable XLA_FLAGS to include the following flag:
XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"
This flag tells XLA to skip the "custom-kernel-fusion-rewriter" high-level optimizer pass, preventing kernel fusion that can cause incorrect outputs on older GPUs. Make sure to include this flag in your setup to ensure proper behavior on CUDA 7.x GPUs.
Moreover, AlphaFold 3 utilizes Flash Attention, an optimized attention mechanism that improves both speed and memory efficiency. In newer versions, Flash Attention leverages Triton, which optimizes performance by using custom GPU kernels, resulting in faster operations. However, Triton is not supported on all GPU architectures, particularly those with a compute capability lower than 8.0. Therefore, you may encounter errors, such as the message "Triton Flash Attention is unsupported on this GPU generation", when using Triton on older GPUs. To avoid this issue, you should use the following flag, which switches to the XLA (Accelerated Linear Algebra) backend, the older version of Flash Attention that is compatible with a wider range of GPUs.
--flash_attention_implementation=xla
You can find full examples for both batch and interactive jobs that use XLA to skip the "custom-kernel-fusion-rewriter" and switch to XLA for older GPUs in the sections below: Running Batch Job on Older GPUs and Running Interactive Job on Older GPUs.
Running Batch Job on Older GPUs
#!/bin/bash #SBATCH --job-name=af3 # Specify a name for the job #SBATCH --partition=lrz-dgx-1-v100x8 # Specify the partition #SBATCH --gres=gpu:1 # Request 1 #SBATCH --cpus-per-task=20 # Number of CPU cores per task #SBATCH -o logs/job_output_%j.out # Standard output log with job ID #SBATCH -e logs/job_error_%j.err # Error log with job ID # Create log directory mkdir -p logs # Do not change these paths: # Path to the dataset and the Enroot image directory. export DATABASE_DIR=/dss/dssfs04/pn69za/pn69za-dss-0004/datasets/alphafold3 export ENROOT_IMAGE=/dss/dssfs04/pn69za/pn69za-dss-0004/containers/shared/alphafold3/alphafold3_v3-0-1.sqsh # User-defined locations (can be modified): # Path to the model directory that you downloaded export MODEL_DIR=$HOME/alphafold3/model_parameter # Directory containing the input JSON file(s) (single or multiple files) export INPUT_DIR=$HOME/alphafold3/af_input # Directory where the output will be written export OUTPUT_DIR=$HOME/alphafold3/af_output # to run on older GPUs export XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter" # Run AlphaFold3 inside the Enroot container with the specified mounts srun --container-mounts=$MODEL_DIR:/workspace/models,$DATABASE_DIR:/workspace/public_databases,$INPUT_DIR:/workspace/af_input,$OUTPUT_DIR:/workspace/af_output \ --container-image=$ENROOT_IMAGE \ --container-env=XLA_FLAGS \ python3 run_alphafold.py \ --json_path=/workspace/af_input/fold_input.json \ --model_dir=/workspace/models \ --output_dir=/workspace/af_output \ --flash_attention_implementation=xla # Print completion message echo 'AlphaFold3 execution completed.'
Running Interactive Job on Older GPUs
DATABASE_DIR=/dss/dssfs04/pn69za/pn69za-dss-0004/datasets/alphafold3 MODEL_DIR=$HOME/alphafold3/model_parameter INPUT_DIR=$HOME/alphafold3/af_input OUTPUT_DIR=$HOME/alphafold3/af_output # to run on older GPUs export XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter" enroot start \ --mount $MODEL_DIR:/workspace/models \ --mount $DATABASE_DIR:/workspace/public_databases \ --mount $INPUT_DIR:/workspace/af_input \ --mount $OUTPUT_DIR:/workspace/af_output \ --env XLA_FLAGS \ alphafold3 \ python3 run_alphafold.py \ --json_path=/workspace/af_input/fold_input.json \ --model_dir=/workspace/models \ --output_dir=/workspace/af_output \ --flash_attention_implementation=xla