Understanding AlphaFold outputs - Exercises


A. Setup

1. You will need ChimeraX installed & open

If not, it’s easily downloadable from their website for any OS (Windows, Linux, Mac): https://www.cgl.ucsf.edu/chimerax/download.html

NB: CEA and I2BC members should have ChimeraX accessible within the app store

2. Fetch the input files

Input files for this exercise can be downloaded from AlphaFold2@I2BC or from Zenodo using either one of these links respectively:

These input files are organised as follows and contain AlphaFold2 predictions for the Nter dimer of pORF1.

ORF1_Nter_dimer/
├── msas
│   └── ORF1_Nter_dimer.a3m
├── predictions
│   ├── cite.bibtex
│   ├── config.json
│   ├── log.txt
│   └── ORF1_Nter_dimer
│       ├── models
│       │   ├── ORF1_Nter_dimer_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_002_alphafold2_multimer_v3_model_2_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_003_alphafold2_multimer_v3_model_5_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_004_alphafold2_multimer_v3_model_4_seed_42555.pdb
│       │   └── ORF1_Nter_dimer_unrelaxed_rank_005_alphafold2_multimer_v3_model_3_seed_42555.pdb
│       ├── ORF1_Nter_dimer.a3m
│       ├── ORF1_Nter_dimer.done.txt
│       ├── ORF1_Nter_dimer.fasta
│       ├── plots
│       │   ├── ORF1_Nter_dimer_coverage.png
│       │   ├── ORF1_Nter_dimer_pae.png
│       │   └── ORF1_Nter_dimer_plddt.png
│       └── scores
│           ├── ORF1_Nter_dimer_predicted_aligned_error_v1.json
│           ├── ORF1_Nter_dimer_scores_rank_001_alphafold2_multimer_v3_model_1_seed_42555.json
│           ├── ORF1_Nter_dimer_scores_rank_002_alphafold2_multimer_v3_model_2_seed_42555.json
│           ├── ORF1_Nter_dimer_scores_rank_003_alphafold2_multimer_v3_model_5_seed_42555.json
│           ├── ORF1_Nter_dimer_scores_rank_004_alphafold2_multimer_v3_model_4_seed_42555.json
│           └── ORF1_Nter_dimer_scores_rank_005_alphafold2_multimer_v3_model_3_seed_42555.json
├── progress.log
├── query.fasta
├── README.txt
├── scripts
│   └── utils.py
└── start_analysis.pml

7 directories, 27 files

Note that models are in the models folder in mmCIF format and their associated scores are in the scores folder in JSON format with the same basename. ORF1_Nter_dimer_predicted_aligned_error_v1.json is the old format for AF scores and can be ignored in this exercise. *****


The following sections are divided into questions (Q) and tasks (T). Explanations and answers can be found in the relative sections by developping the dedicated links.

B. Getting a general idea

T1: load all 5 models in ChimeraX

Click to have a few hints

To load a structure into ChimeraX, you can use one of the following:

Click here to see what your window should look like

You should see 5 structures on the left in the viewing pannel and 5 models listed within the Models pannel on the right.

T2: align the structures to more easily compare them

Hints:

Click to see the answer

In the command line, you can type the following:

matchmaker #2-5 & /A to #1/A

This will align chain A from models 2 to 5 onto chain A of model 1

Q3: are the models consistent with each other?

Q4: how many binding modes do you observe?

Q5: are the predicted interfaces symetrical?

Q6: are the answers to Q3-5 expected considering the possible dodecameric context?

Hints:

Click to see the answer

To help you answer questions 3 & 4, you can type the following in the command line:

color /A grey;
hide #2-5 & /A cartoons;
show #1/A surfaces;

This will colour all “A” chains in grey, then hide all “A” chain cartoon representations except the one in model 1 and finally show chain A in model 1 as surface.
You can, of course, also use the buttons in the menu and tool bars to do the same thing.

Q3: the answer is yes & no! The 5 models are clearly not all superimposable. However, models 1 & 2 are nearly superimposable, and we observe the same with models 3, 4 & 5. In that aspect, the models are quite consistent with each other.

Q4: 2. Part of this question was already answered in Q3: models 1 and 2 form a first potential binding mode and models 3 to 5 form the other potential binding mode.

To help you answer question 5, you can type the following in the command line:

preset "initial styles" "original look";
hide #2-5 models;
color #1 bychain;

This will colour all “A” chains in grey, then hide all “A” chain cartoon representations except the one in model 1 and finally show chain A in model 1 as surface.
You can, of course, also use the buttons in the menu and tool bars to do the same thing.

For a more detailed analysis, you can open the sequence pannel in the Menu bar > Tools > Sequence > Show Sequence Viewer, when prompted, don’t group identical sequences together and select only chains A & B of model 1. Then select the residues at the interface between chains A & B of model 1:

select #1/A :<4.5 & #1/B | #1/B :<4.5 & #1/A residues true

You’ll see in the sequence pannel that it’s different residues that are highlighted from one chain to the other.

Q5: no. You can see this visually if you look hard enough but you can also help yourself of the sequence pannel & selection of the interface residues.

Q6: yes. What we see doesn’t go against the theory that the dimer is actually part of a subset of a dodecamer. If you feel up to it, you can even try duplicating the first model and superimpose the “B” chain of the duplicated one onto the “A” chain for the initial one, then compare the duplicated complex with one of the models in the other binding mode, like so:

combine #1 name rank1bis;
matchmaker #6/B to #1/A;
hide all models;
show #3,6 models;

See how close models 3 and 6 (=duplicated #1) are?




C. Looking at the scores

AlphaFold2 scores for pORF Nter dimer

Q7: what do you think of the global scores?

Hints:

Click to see the answer

Both the average pLDDT and the pTM-scores are relatively low. As a rule of thumb, anything between 0.5 and 0.7 for the pTM-score is ok-ish but you should be careful, and anything below 70 in pLDDT is not very reliable. Models of ranks 1 and 2 are flirting with this limit.

NB: it’s a multimeric prediction, thus models are ranked according to their ipTM-score.

T8: upload the local scores for model 1

Hints:

Click to see what your window should look like
Initial PAE plot for model 1

The PAE pannel might appear as a separate window which you can integrate into the ChimeraX window by dragging & dropping it into it. In this example, the PAE pannel shares a spot with the Log pannel, accessible through the tabs at the bottom of the image.

NB: JSON files can also be uploaded through the command line e.g.:

alphafold pae #1 file /path/to/scores/file.json

T9: change the PEA plot colours

Hints:

Click to see a few examples

Colours in the “right click menu” are quite limited. If you’d like to colour your plot as red-white-blue for example, you can use the following command:

alphafold pae #1 palette bluered plot true

To have a list of possible palettes, use palette list. You can also create your own colour gradient by separating colour names with colons : e.g. blue:white:red. Possible colours can be listed with color list

T10: colour model 1 by its pLDDT scores

Hints:

Click to see what your windows should look like
PAE plot in red-blue scale and model 1 coloured per pLDDT

Here, we coloured the PAE with the bluered colour scale and coloured the structure with the following command:

color bfactor #1 palette alphafold

Q11: why do you think the region highlighted below has a high predicted error?

PAE plot of model 1

Hints:

Click to see the answer

This zone in the PAE plot shows the predicted distance error between a well-structured domain and a predicted loop region with very poor pLDDT scores. The latter even suggests that this region might be disordered, thus AlphaFold isn’t confident in the actual position of this loop in reference to the rest of the structure.

Selected zone in PAE plot & structure

T12: colour the structure according to domains in the PAE

Hints:

Click to see what your window should look like
Detected domains within the PAE

ChimeraX has many inbuilt features, of which, the identification of predicted well-packed domains from a PAE plot. Here, we used the command line to colour the structure by detected domain:

alphafold pae #1 colorDomains true

And we used the “right click menu” on the PAE plot to colour the PAE as the structure.

This is a useful tool to more easily find correspondances between the structure and the PAE plot and to help you read the PAE plot.

Q13: are pLDDT and PAE values in agreement along the structure?

Hints:

Click to see what your window should look like
PAE & structure coloured by pLDDT

To help you answer this question, you can use the same trick as before and colour the structure using the pLDDT and right click on the PAE to update the colouring as in the structure.

color bfactor #1 palette alphafold

The resulting plot isn’t the most easy to read but if you look closely, it depicts the pLDDT score along the diagonal, together with the PAE error values in grey scale.

PAE values and pLDDT scores represent 2 different types of confidence levels, but as you can see, in this prediction, they are quite in agreement in the Nter predicted domains.

Q14: what are your general conclusions on the confidence of these predictions?

Click to see the answer

The results are promising but further analysis might help to get more confidence in this interface.



D. Including accessory analyses

It’s important to double-up the output analysis with complementary information such as electrostatics, possible hydrogen bond networks or salt bridges, conservation etc. to try to answer the following questions: is the interface plausible? Does it have strong contacts? Is it stable?

As a reminder, here’s a table of forces that commonly stabilise proteins:

Forces that stabilise proteins

H-bonds & salt bridges

T15: find the H-bonds between chains A & B of model 1

Hints:

Click to see what your window should look like
H-bonds within model 1

If you’ve used the same criteria and the same model, you should see in the Log pannel that there are:

You’ll also see them as pseudo-bonds in your visualisation window and a sub-object will appear in the models pannel corresponding to this “contacts object”. You can hide or delete them anytime by selecting them within that pannel and clicking on “Hide” or “Close” respectively.


About the criteria used above:

When searching for H-bonds, you can, for example, relax the constraints on angles and distances to allow for “imperfect” models (i.e. models with possible clashes and “deformaties” in the side chains). You can also choose if you want to restrict yourself to bonds between two different chains (“Include intramodel” true) or if you would like to include intermolecule bonds too.

If you use the menu bar: Tools > Structure Analysis > H-bonds, you can define these options easily:

H-bonds menu

This will find H-bonds with ±0.5 Å distance tolerance and ±30° angle tolerance.

If you ticked “Select atoms” above, you can easily add the labels of the residues involved in H-bonds and Salt bridges if you go in Tools > Actions > labels, then select the label type you want (e.g. Chain, code, number).

In the command line, your command could look like this:

hbonds #1 color yellow interModel false distSlop 0.5 angleSlop 30.0 twoColors true slopColor orange intraMol false intraRes false select true reveal true log true;
color #1 bychain;
color #1 byhetero;
select up;
label sel text "{0.chain_id} {0.name} {0.number}{0.insertion_code}";

In the above, you should read the command lines as “option name”/“option value” pairs e.g. color yellow to show bonds as yellow, interModel false to not include inter-model bonds, etc.

Q16: repeat the task above but only select salt-bridges: how many salt bridges can you find?

Hints:

Click to see the answer

Q16: 3. If you have a look at the log pannel, you should see something like this:

Finding intramodel H-bonds
Constraints relaxed by 0.5 angstroms and 30 degrees
Models used:
    1 ORF1_Nter_dimer_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb

3 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/A ARG 113 NE   /B GLU 254 OE2  no hydrogen  3.149  N/A
/A ARG 113 NH2  /B GLU 254 OE1  no hydrogen  1.710  N/A
/B ARG 283 NE   /A GLU 222 OE1  no hydrogen  3.117  N/A

3 hydrogen bonds found

As you can see, there are 3 detected salt bridges, involving 2 different pairs of residues:

Salt bridge found by ChimeraX in model 1

Salt bridges can be seen as molecular clips of interfaces. They provide conformational specificity and can contribute to molecular recognition and catalysis.


How we got these results:

The proceedings are basically the same as above except you should tick the “salt bridges only” checkbox when using the menu bar or add the saltOnly true option in the command line like so:

hbonds #1 color lime interModel false distSlop 0.5 angleSlop 30.0 intraMol false intraRes false select true reveal true log true saltOnly true name salt_bridges;
select up;
label sel text "{0.chain_id} {0.name} {0.number}{0.insertion_code}";

In the above, we also decided to give the contact object a new name: “salt_bridges”


NB: creating custom functions

There are ways of creating home-made functions in ChimeraX (like ChimeraX’s hbonds, color or label commands that you’ve seen previously). They are called aliases. Below is an example which will create the check_interface command, which will put structures in cartoon style, colour them by chain, show the interface residues as sticks and finally calculate the hbonds and the salt bridges and show them in 2 different colours (yellow and lime):

alias check_interface show $1 cartoons; delete H; color $1 bychain; select $1/$2 :<4.5 & $1/$3 | $1/$3 :<4.5 & $1/$2 residues true; show sel atoms; color sel byhetero; hbonds $1/$2 restrict $1/$3 distSlop 0.5 angleSlop 30.0 color yellow twoColors true reveal true log true; hbonds $1/$2 restrict $1/$3 distSlop 0.5 angleSlop 30.0 color lime reveal true log true saltOnly true name salt_bridges; view sel clip true; ~select;

To use the new command, you have to specify the model number and the 2 chain IDs you would like to analyse the interface of. For example: check_interface #1 A B to highlight the interface between chains A & B of model 1.

Electrostatics & hydrophobicity

Electrostatic forces are guiding long-range forces. These interactions already start at larger distances and help binding partners come together (the estimated electrostatic energy between 2 molecules positioned at 10A from one another is ~1 KJ/mol, which is larger that any other energy component at equal distance (Zhang 2012)). Electrostatic interactions not only involve the binding partners, but also the solvent, which has to be displaced from the binding interface first, creating a large desolvation penalty compensated through binding of both partners. These interactions tend to be more specific for a binding partner.

On the other hand, hydrophobic interactions can be considered as glue and will stabilise the interface in a generally non-specific way.

T17: colour model 1 by electrostatics

Hints:

Click to see what your window should look like
Model 1 coloured by electrostatics

You can whether use the Tool bar: Molecule Display > electrostatic or use the command line with:

coulombic #1 key true

key true activates the colour scale that shows up on the bottom right corner

This might take a while if your protein/protein complex is big as ChimeraX computes the electrostatic surface on the go.

As you can see in the models pannel, the surface of each chain is saved within a separate subobject (e.g. #1.1 and #1.2 for chains A and B) and can be easily switched on an off by ticking the corresponding box.

Q18: To what colour does red and blue correspond?

Hints:

Click to see the answer

Red = Negative charge (acidic) Blue = Positive charge (base)

T19: colour model 1 by hydrophobicity

Hints:

Click to see what your window should look like
Model 1 coloured by hydrophobicity

You can whether use the Tool bar: Molecule Display > hydrophobic or use the command line with:

mlp #1 key true

key true activates the colour scale that shows up on the bottom right corner

Q20: To what colour does blue and yellow correspond?

Hints:

Click to see the answer

Blue = hydrophilic (likes water) Yellow = hydrophobic / lipophilic (avoids water, e.g. aromatic residues like PHE or TYR, or non-polar side-chains like LEU or MET)

Q21: can you see any hydrophobic patches on the surface of the protein?

Q22: given the function of replication polyprotein pORF1 HEV, why do you think they’re important?

Click to see the answers

Q21: yes. There’s an alpha-helix in Cter (residues ~410) that contains many lipophilic residues.

Hydrophobic patch in chain B of model 1

Q22: the theory is that replication polyprotein pORF1 directs synthesis of new viral RNA in infected cells through functionning as a dodecameric pore, homologously to what has been described for Chikungunya virus. As you can see, in the dodecameric context, these alpha-helices surround the complex.

Hydrophobic patches in dodecamer context

Conservation

Proteins are subjected to evolutionary pressures in order to conserve their function, of which, their interaction with binding partners. Thus, looking at how conserved residues are in the protein can be an important indicator.

In order to calculate a conservation score for each residue, we need a Multiple Sequence Alignment (MSA) containing homologous sequences to our protein(s).

ChimeraX has a function to compute these scores directly from an MSA. There are also external tools such as Consurf or MSA-tools within our institute that do the same thing with different algorithms.

T23: open the MSA in fasta format in ChimeraX

Hints:

T24: colour model 1 by conservation

Hints:

Click to see the answer

You can use the following command to colour by conservation with the default colour palette:

color byattr r:seq_conservation #1

An example of a more complete command to color in the (-2,3) range and add a color bar (cf. key true):

color byattr r:seq_conservation #1 palette cyanmaroon range -2,3 novalue silver key true

An example to colour chains A & B with two different colours scales:

color byattr r:seq_conservation #1/A palette yellow:red range -2,3 novalue silver key true;
color byattr r:seq_conservation #1/B palette white:cyan:blue range -2,3 novalue silver key true
Model 1 coloured by conservation