Glad you found the video useful. ChimeraX uses ColabFold which is an optimized version of AlphaFold and you can also run ColabFold without ChimeraX using their web page colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb. Setting up AlphaFold on your own computer is a lot of trouble, requiring Linux, Docker, about 3 Tbytes of free disk and days to download databases, and hopefully a high-end Nvidia GPU, and even after all that work it is 10 times slower than ColabFold. So thanks should go to the developers of ColabFold.
Hi. First, thanks for all of the helpful videos that ChimeraX puts out to use this great software. I really enjoy using it, especially with the Colabfold features added in. Second, is there a way to access the sequence alignments that Colabfold finds for your protein sequences at the beginning of the structure prediction? It would be nice to see which organisms these alignments are coming from and be able to analyze in greater detail the sequence variation.
I was trying to predict a dimer of a relative long protein (875 aa long) but getting the following error: 19:39:12 Could not predict af1750. Not Enough GPU memory? INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED 19:39:12 Done Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold cp: cannot stat '*_relaxed_rank_1_model_*.pdb': No such file or directory cp: cannot stat '*_unrelaxed_rank_1_model_*_scores.json': No such file or directory Is there a way to resolve this error? I am using a Google Colab subscription. Thanks.
You are predicting for 1750 residues which is too large. As the error message says, not enough GPU memory. Unfortunately old Google Colab GPUs (16 GB memory) often run out of memory above around 1000 amino acids. To run these larger structures requires installing your own AlphaFold on a Linux machine with a high-end GPU -- not an easy task. Here are some benchmarks for large sequences running AlphaFold www.rbvi.ucsf.edu/chimerax/data/alphafold-jan2022/afspeed.html
Hi, Very very nice explanations, indeed, I know that this video is not new but I've seen it just today. I have a quetsion, I use version 1.6.1 (2023-05-09) of Chimera X and I would like to know how to proceed to run colabfold (now it is the 1.5.2) there is only the command alphafold on this version, in th ebeginning of your video you said that you modified chimera to do this. So, how to do that is my naive question. Thanks a lot, Didier
Thanks so much for this helpful video and clear explanation! I just wonder why you predicted two proteins? Do you think this can predict ligand bind to the protein? Such as to see if ligand bind or not to the target protein? Many thanks
Most proteins function as part of multi-protein complexes, so it is useful to predict complexes of more than one protein. AlphaFold cannot make predictions with ligands, ions, solvent, nucleic acids. It only handles the 20 standard (unmodified) amino acids.
My protein is above 3000 Amino acid sequence, and it's full pdb structure is not available, what you recommend which server / tool should I use , please reply?
If you predict a multiprotein complex the interface between the proteins often contains tens to hundreds of residues. Lots of ChimeraX tools help you look at that, such as the Contacts tool and Interfaces tool. There is documentation online.
Thank you for the tutorial. I wonder how I can display the protein by prediction confidence. I just displayed it by chain and I wanted to reset it to displaying by prediction confidence.
The AlphaFold pLDDT per-residue confidence values are in the bfactor column of the PDB file. You can color the ribbon by confidence values using ChimeraX command "color bfactor palette alphafold". Also the AlphaFold Error Plot panel (menu Tools / Structure Prediction) that shows the predicted aligned error has a button "Color pLDDT" that does the same thing.
The first line of the ChimeraX Log panel in the video shows the version is an August 15, 2022 daily build. But all ChimeraX versions 1.4 and newer are using ColabFold. ChimeraX loads the Python script from GitHub (github.com/RBVI/ChimeraX/blob/develop/src/bundles/alphafold/src/alphafold21_predict_colab.ipynb) each time you run an AlphaFold prediction and that script is periodically updated and all ChimeraX versions 1.4 and newer use it.
What is use_amber among the functions of Colabfold? And in the pseudocode "to specify inter-protein chainbreaks for modeling complexes", does chainbreaks mean chain damage such as SS break?
The ChimeraX AlphaFold user interface does not have a use_amber flag, but I guess you are looking at ColabFold code. That probably means to energy minimize the predicted structures using the Amber force field using OpenMM. The ChimeraX flag to enable that is in the AlphaFold panel, press Options, it is called "Energy-minimize predicted structures" and is off by default.
ChimeraX uses ColabFold which is an optimized version of AlphaFold that is about 10 times faster. It also uses enhanced sequence databases. This is described in the ChimeraX documentation, google search chimerax alphafold and you will find it including a reference to the ColabFold journal article.
I'm running ChimeraX version 1.5 (2022-11-24), but it launches alphafold21_predict_colab.ipynb as in your other video ("Running AlphaFold to Predict Protein Complexes from ChimeraX") and not colabfold_predict.ipynb as shown here.. do you know how to change this? thanks!
All versions of ChimeraX use the same version of ColabFold. The alphafold21_predict_colab.ipynb notebook that runs the calculation on Google Colab is identical to colabfold_predict.ipynb. Also ChimeraX fetches this script from GitHub for each prediction which allows me to periodically update the script to newer AlphaFold / ColabFold versions. In the next few months I will hopefully be updating to ColabFold that uses AlphaFold 2.3 which is more memory efficient allowing larger structures to be predicted.
Hi, can you please help me with modified sequence predictions? like I have a protein that has an acetyl group, how to input that in the sequence to predict hetero dimeric structure?
Google Colab only offers old GPUs with at most 16 GB of memory that can handle total sequence length of about 1000. More details here: www.rbvi.ucsf.edu/pipermail/chimerax-users/2022-October/004507.html
Many thanks for this update but I also encountered an error, which seems to have a rather complicated solution. However, before the crash I was able to obtain the unrelaxed model... --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 393 remove_from_list(seq_list, 'prokaryote') # Obsolete "prokaryote" flag 394 --> 395 run_prediction(seq_list, use_templates = use_templates, energy_minimize = not dont_minimize) 3 frames /usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat, random_seed) 192 193 sub_feat["prev"] = result["prev"] --> 194 result, _ = self.apply(self.params, key, sub_feat) 195 confidences = get_confidence_metrics(result, multimer_mode=self.multimer_mode) 196 if self.config.model.stop_at_score_ranker == "plddt": ValueError: INTERNAL: Failed to launch CUDA kernel: fusion_848 with block dimensions: 96x1x1 and grid dimensions: 22240x1x1: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
I've seen various CUDA_ERROR_ILLEGAL_ADDRESS errors running AlphaFold. CUDA is the language used on the Nvidia graphics processor. I think this error is also usually associated with running sequences that are too long and result from running out of memory. But I don't have good evidence for that. The same error occurs on the PDB 6UM1 test case shown here www.rbvi.ucsf.edu/chimerax/data/alphafold-jan2022/afspeed.html. If you had a more modern GPU with more memory than what Google Colab offers it would probably work. Another possibility is that it is a bug. To test that you could slight vary your input, for example, deleting one residue at the end of the sequence and seeing if the same error occurs.
Thank you for showing where the files were downloaded!
I never comment, but honestly thank you so much for this! So straightforward and I was about to give up completely on alpha fold!
Glad you found the video useful. ChimeraX uses ColabFold which is an optimized version of AlphaFold and you can also run ColabFold without ChimeraX using their web page
colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb.
Setting up AlphaFold on your own computer is a lot of trouble, requiring Linux, Docker, about 3 Tbytes of free disk and days to download databases, and hopefully a high-end Nvidia GPU, and even after all that work it is 10 times slower than ColabFold. So thanks should go to the developers of ColabFold.
thank you so much, this was very helpful for my project. It took a whopping 22 minutes for me. But I believe faster versions will pop up soon.
Alpha fold gives me 5 models. Which model should I count as the best model for my query protein?
Hi. First, thanks for all of the helpful videos that ChimeraX puts out to use this great software. I really enjoy using it, especially with the Colabfold features added in. Second, is there a way to access the sequence alignments that Colabfold finds for your protein sequences at the beginning of the structure prediction? It would be nice to see which organisms these alignments are coming from and be able to analyze in greater detail the sequence variation.
I was trying to predict a dimer of a relative long protein (875 aa long) but getting the following error:
19:39:12 Could not predict af1750. Not Enough GPU memory? INTERNAL: CUBLAS_STATUS_EXECUTION_FAILED
19:39:12 Done
Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold
cp: cannot stat '*_relaxed_rank_1_model_*.pdb': No such file or directory
cp: cannot stat '*_unrelaxed_rank_1_model_*_scores.json': No such file or directory
Is there a way to resolve this error? I am using a Google Colab subscription.
Thanks.
You are predicting for 1750 residues which is too large. As the error message says, not enough GPU memory. Unfortunately old Google Colab GPUs (16 GB memory) often run out of memory above around 1000 amino acids. To run these larger structures requires installing your own AlphaFold on a Linux machine with a high-end GPU -- not an easy task. Here are some benchmarks for large sequences running AlphaFold www.rbvi.ucsf.edu/chimerax/data/alphafold-jan2022/afspeed.html
Do you need to add a space or new line between the commas?
Hi, Very very nice explanations, indeed, I know that this video is not new but I've seen it just today. I have a quetsion, I use version 1.6.1 (2023-05-09) of Chimera X and I would like to know how to proceed to run colabfold (now it is the 1.5.2) there is only the command alphafold on this version, in th ebeginning of your video you said that you modified chimera to do this. So, how to do that is my naive question. Thanks a lot, Didier
I want to predict my vaccine structure. Do I also put the adjuvant sequence or just the epitope protein from MHC 1 AND 2? Any response is appreciated.
Thanks so much for this helpful video and clear explanation! I just wonder why you predicted two proteins? Do you think this can predict ligand bind to the protein? Such as to see if ligand bind or not to the target protein? Many thanks
Most proteins function as part of multi-protein complexes, so it is useful to predict complexes of more than one protein. AlphaFold cannot make predictions with ligands, ions, solvent, nucleic acids. It only handles the 20 standard (unmodified) amino acids.
My protein is above 3000 Amino acid sequence, and it's full pdb structure is not available, what you recommend which server / tool should I use , please reply?
Hi, is there any way to pin point at what specific residue the proteins are interacting at?
If you predict a multiprotein complex the interface between the proteins often contains tens to hundreds of residues. Lots of ChimeraX tools help you look at that, such as the Contacts tool and Interfaces tool. There is documentation online.
Thank you for the tutorial. I wonder how I can display the protein by prediction confidence. I just displayed it by chain and I wanted to reset it to displaying by prediction confidence.
The AlphaFold pLDDT per-residue confidence values are in the bfactor column of the PDB file. You can color the ribbon by confidence values using ChimeraX command "color bfactor palette alphafold". Also the AlphaFold Error Plot panel (menu Tools / Structure Prediction) that shows the predicted aligned error has a button "Color pLDDT" that does the same thing.
More info about coloring by PAE and pLDDT is here: www.rbvi.ucsf.edu/chimerax/data/pae-apr2022/pae.html
Thanks for this video.I wonder which version of chimeraX is used in this video.
The first line of the ChimeraX Log panel in the video shows the version is an August 15, 2022 daily build. But all ChimeraX versions 1.4 and newer are using ColabFold. ChimeraX loads the Python script from GitHub (github.com/RBVI/ChimeraX/blob/develop/src/bundles/alphafold/src/alphafold21_predict_colab.ipynb) each time you run an AlphaFold prediction and that script is periodically updated and all ChimeraX versions 1.4 and newer use it.
What is use_amber among the functions of Colabfold? And in the pseudocode "to specify inter-protein chainbreaks for modeling complexes", does chainbreaks mean chain damage such as SS break?
The ChimeraX AlphaFold user interface does not have a use_amber flag, but I guess you are looking at ColabFold code. That probably means to energy minimize the predicted structures using the Amber force field using OpenMM. The ChimeraX flag to enable that is in the AlphaFold panel, press Options, it is called "Energy-minimize predicted structures" and is off by default.
there are many Collab notebooks, which one is best for predicting protein complexes? and which google Collab notebook is used by chimeraX?
ChimeraX uses ColabFold which is an optimized version of AlphaFold that is about 10 times faster. It also uses enhanced sequence databases. This is described in the ChimeraX documentation, google search chimerax alphafold and you will find it including a reference to the ColabFold journal article.
I'm running ChimeraX version 1.5 (2022-11-24), but it launches alphafold21_predict_colab.ipynb as in your other video ("Running AlphaFold to Predict Protein Complexes from ChimeraX") and not colabfold_predict.ipynb as shown here.. do you know how to change this? thanks!
All versions of ChimeraX use the same version of ColabFold. The alphafold21_predict_colab.ipynb notebook that runs the calculation on Google Colab is identical to colabfold_predict.ipynb. Also ChimeraX fetches this script from GitHub for each prediction which allows me to periodically update the script to newer AlphaFold / ColabFold versions. In the next few months I will hopefully be updating to ColabFold that uses AlphaFold 2.3 which is more memory efficient allowing larger structures to be predicted.
Hi, can you please help me with modified sequence predictions? like I have a protein that has an acetyl group, how to input that in the sequence to predict hetero dimeric structure?
AlphaFold only predicts structures containing the 20 standard amino acids.
What is the maximum number of sequence (residues) it can do?
Google Colab only offers old GPUs with at most 16 GB of memory that can handle total sequence length of about 1000. More details here: www.rbvi.ucsf.edu/pipermail/chimerax-users/2022-October/004507.html
Hi, is there any way to predict a homodimer protein?
Yes, paste two copies of the sequence into the AlphaFold Panel separated by a comma and press the Predict button.
@@ucsfchimerax8387 thank you !
Thanks for sharing.
excellent
😊
Many thanks for this update but I also encountered an error, which seems to have a rather complicated solution. However, before the crash I was able to obtain the unrelaxed model...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
393 remove_from_list(seq_list, 'prokaryote') # Obsolete "prokaryote" flag
394
--> 395 run_prediction(seq_list, use_templates = use_templates, energy_minimize = not dont_minimize)
3 frames
/usr/local/lib/python3.7/dist-packages/alphafold/model/model.py in predict(self, feat, random_seed)
192
193 sub_feat["prev"] = result["prev"]
--> 194 result, _ = self.apply(self.params, key, sub_feat)
195 confidences = get_confidence_metrics(result, multimer_mode=self.multimer_mode)
196 if self.config.model.stop_at_score_ranker == "plddt":
ValueError: INTERNAL: Failed to launch CUDA kernel: fusion_848 with block dimensions: 96x1x1 and grid dimensions: 22240x1x1: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
I've seen various CUDA_ERROR_ILLEGAL_ADDRESS errors running AlphaFold. CUDA is the language used on the Nvidia graphics processor. I think this error is also usually associated with running sequences that are too long and result from running out of memory. But I don't have good evidence for that. The same error occurs on the PDB 6UM1 test case shown here www.rbvi.ucsf.edu/chimerax/data/alphafold-jan2022/afspeed.html. If you had a more modern GPU with more memory than what Google Colab offers it would probably work. Another possibility is that it is a bug. To test that you could slight vary your input, for example, deleting one residue at the end of the sequence and seeing if the same error occurs.