Understanding AlphaFold outputs - Exercises

A. Setup

1. You will need ChimeraX installed & open

If not, it’s easily downloadable from their website for any OS (Windows, Linux, Mac): https://www.cgl.ucsf.edu/chimerax/download.html

NB: CEA and I2BC members should have ChimeraX accessible within the app store

2. Fetch the input files

Input files for this exercise can be downloaded from AlphaFold2@I2BC or from Zenodo using either one of these links respectively:

These input files are organised as follows and contain AlphaFold2 predictions for the Nter dimer of pORF1.

ORF1_Nter_dimer/
├── msas
│   └── ORF1_Nter_dimer.a3m
├── predictions
│   ├── cite.bibtex
│   ├── config.json
│   ├── log.txt
│   └── ORF1_Nter_dimer
│       ├── models
│       │   ├── ORF1_Nter_dimer_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_002_alphafold2_multimer_v3_model_2_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_003_alphafold2_multimer_v3_model_5_seed_42555.pdb
│       │   ├── ORF1_Nter_dimer_unrelaxed_rank_004_alphafold2_multimer_v3_model_4_seed_42555.pdb
│       │   └── ORF1_Nter_dimer_unrelaxed_rank_005_alphafold2_multimer_v3_model_3_seed_42555.pdb
│       ├── ORF1_Nter_dimer.a3m
│       ├── ORF1_Nter_dimer.done.txt
│       ├── ORF1_Nter_dimer.fasta
│       ├── plots
│       │   ├── ORF1_Nter_dimer_coverage.png
│       │   ├── ORF1_Nter_dimer_pae.png
│       │   └── ORF1_Nter_dimer_plddt.png
│       └── scores
│           ├── ORF1_Nter_dimer_predicted_aligned_error_v1.json
│           ├── ORF1_Nter_dimer_scores_rank_001_alphafold2_multimer_v3_model_1_seed_42555.json
│           ├── ORF1_Nter_dimer_scores_rank_002_alphafold2_multimer_v3_model_2_seed_42555.json
│           ├── ORF1_Nter_dimer_scores_rank_003_alphafold2_multimer_v3_model_5_seed_42555.json
│           ├── ORF1_Nter_dimer_scores_rank_004_alphafold2_multimer_v3_model_4_seed_42555.json
│           └── ORF1_Nter_dimer_scores_rank_005_alphafold2_multimer_v3_model_3_seed_42555.json
├── progress.log
├── query.fasta
├── README.txt
├── scripts
│   └── utils.py
└── start_analysis.pml

7 directories, 27 files

Note that models are in the models folder in mmCIF format and their associated scores are in the scores folder in JSON format with the same basename. ORF1_Nter_dimer_predicted_aligned_error_v1.json is the old format for AF scores and can be ignored in this exercise. *****

The following sections are divided into questions (Q) and tasks (T). Explanations and answers can be found in the relative sections by developping the dedicated links.

B. Getting a general idea

T1: load all 5 models in ChimeraX

Click to have a few hints

To load a structure into ChimeraX, you can use one of the following:

the Menu bar: File > Open
the Tool bar: Home tab > Open button

Click here to see what your window should look like

You should see 5 structures on the left in the viewing pannel and 5 models listed within the Models pannel on the right.

T2: align the structures to more easily compare them

Hints:

the tools is accessible in the Tool bar under Structure Analysis > Matchmaker
or you can use the matchmaker command in the command line directly
another hint: align just one of the 2 chains (e.g. chain A)

Click to see the answer

In the command line, you can type the following:

matchmaker #2-5 & /A to #1/A

This will align chain A from models 2 to 5 onto chain A of model 1

Q3: are the models consistent with each other?

Q4: how many binding modes do you observe?

Q5: are the predicted interfaces symetrical?

Q6: are the answers to Q3-5 expected considering the possible dodecameric context?

Hints:

it might help to play around with the colours & textures to answer these 2 questions e.g. colour all “A” chains a same colour, show them as surface and/or hide all “A” chains except 1

Click to see the answer

To help you answer questions 3 & 4, you can type the following in the command line:

color /A grey;
hide #2-5 & /A cartoons;
show #1/A surfaces;

This will colour all “A” chains in grey, then hide all “A” chain cartoon representations except the one in model 1 and finally show chain A in model 1 as surface.
You can, of course, also use the buttons in the menu and tool bars to do the same thing.

Q3: the answer is yes & no! The 5 models are clearly not all superimposable. However, models 1 & 2 are nearly superimposable, and we observe the same with models 3, 4 & 5. In that aspect, the models are quite consistent with each other.

Q4: 2. Part of this question was already answered in Q3: models 1 and 2 form a first potential binding mode and models 3 to 5 form the other potential binding mode.

To help you answer question 5, you can type the following in the command line:

preset "initial styles" "original look";
hide #2-5 models;
color #1 bychain;

For a more detailed analysis, you can open the sequence pannel in the Menu bar > Tools > Sequence > Show Sequence Viewer, when prompted, don’t group identical sequences together and select only chains A & B of model 1. Then select the residues at the interface between chains A & B of model 1:

select #1/A :<4.5 & #1/B | #1/B :<4.5 & #1/A residues true

You’ll see in the sequence pannel that it’s different residues that are highlighted from one chain to the other.

Q5: no. You can see this visually if you look hard enough but you can also help yourself of the sequence pannel & selection of the interface residues.

Q6: yes. What we see doesn’t go against the theory that the dimer is actually part of a subset of a dodecamer. If you feel up to it, you can even try duplicating the first model and superimpose the “B” chain of the duplicated one onto the “A” chain for the initial one, then compare the duplicated complex with one of the models in the other binding mode, like so:

combine #1 name rank1bis;
matchmaker #6/B to #1/A;
hide all models;
show #3,6 models;

See how close models 3 and 6 (=duplicated #1) are?

C. Looking at the scores

Q7: what do you think of the global scores?

Hints:

pLDDT judges confidence in local environments, [0-100], the higher, the better
*pTM-scores judge the global topology of the structure, [0-1], the higher, the better

Click to see the answer

Both the average pLDDT and the pTM-scores are relatively low. As a rule of thumb, anything between 0.5 and 0.7 for the pTM-score is ok-ish but you should be careful, and anything below 70 in pLDDT is not very reliable. Models of ranks 1 and 2 are flirting with this limit.

NB: it’s a multimeric prediction, thus models are ranked according to their ipTM-score.

T8: upload the local scores for model 1

Hints:

local scores are in JSON format and are named after their respective mmCIF file
to upload the scores, use the Menu bar: Tools > Structure Prediction > AlphaFold Error Plot. Make sure the right structure is selected and click on Browse to fetch the corresponding JSON file within the scores folder.

Click to see what your window should look like

The PAE pannel might appear as a separate window which you can integrate into the ChimeraX window by dragging & dropping it into it. In this example, the PAE pannel shares a spot with the Log pannel, accessible through the tabs at the bottom of the image.

NB: JSON files can also be uploaded through the command line e.g.:

alphafold pae #1 file /path/to/scores/file.json

T9: change the PEA plot colours

Hints:

right click on the PAE and you’ll have the choice between “colour from structure”, “rainbow” (current view) and “green”
for more control over the colours, you can use the command line with the alphafold pae command

Click to see a few examples

Colours in the “right click menu” are quite limited. If you’d like to colour your plot as red-white-blue for example, you can use the following command:

alphafold pae #1 palette bluered plot true

To have a list of possible palettes, use palette list. You can also create your own colour gradient by separating colour names with colons : e.g. blue:white:red. Possible colours can be listed with color list

T10: colour model 1 by its pLDDT scores

Hints:

the buttons at the bottom of the PAE plot are quite useful for this
you can also use the color bfactor command

Click to see what your windows should look like

PAE plot in red-blue scale and model 1 coloured per pLDDT

Here, we coloured the PAE with the bluered colour scale and coloured the structure with the following command:

color bfactor #1 palette alphafold

Q11: why do you think the region highlighted below has a high predicted error?

Hints:

you can use the mouse to select this region on the PEA plot, this will highlight the corresponding regions in the structure
compare this also with the pLDDT scores of this region

Click to see the answer

This zone in the PAE plot shows the predicted distance error between a well-structured domain and a predicted loop region with very poor pLDDT scores. The latter even suggests that this region might be disordered, thus AlphaFold isn’t confident in the actual position of this loop in reference to the rest of the structure.

T12: colour the structure according to domains in the PAE

Hints:

you can use the “Color PAE Domains” button under the PAE plot
or you can use the alphafold pae command to do the same

Click to see what your window should look like

ChimeraX has many inbuilt features, of which, the identification of predicted well-packed domains from a PAE plot. Here, we used the command line to colour the structure by detected domain:

alphafold pae #1 colorDomains true

And we used the “right click menu” on the PAE plot to colour the PAE as the structure.

This is a useful tool to more easily find correspondances between the structure and the PAE plot and to help you read the PAE plot.

Q13: are pLDDT and PAE values in agreement along the structure?

Hints:

you can use the “Color PAE Domains” button under the PAE plot
or you can use the alphafold pae command to do the same

Click to see what your window should look like

To help you answer this question, you can use the same trick as before and colour the structure using the pLDDT and right click on the PAE to update the colouring as in the structure.

color bfactor #1 palette alphafold

The resulting plot isn’t the most easy to read but if you look closely, it depicts the pLDDT score along the diagonal, together with the PAE error values in grey scale.

PAE values and pLDDT scores represent 2 different types of confidence levels, but as you can see, in this prediction, they are quite in agreement in the Nter predicted domains.

Q14: what are your general conclusions on the confidence of these predictions?

Click to see the answer

Global scores are a bit limit
pLDDT scores are reasonable in Nter of each protein but the Cter weren’t predicted with high confidence
PAE values are in general accordance with the pLDDT and are relatively low at the predicted interface between the 2 well-structured Cter domains

The results are promising but further analysis might help to get more confidence in this interface.

D. Including accessory analyses

It’s important to double-up the output analysis with complementary information such as electrostatics, possible hydrogen bond networks or salt bridges, conservation etc. to try to answer the following questions: is the interface plausible? Does it have strong contacts? Is it stable?

As a reminder, here’s a table of forces that commonly stabilise proteins:

H-bonds & salt bridges

T15: find the H-bonds between chains A & B of model 1

Hints:

in the menu bar, Tools > Structure Analysis > H-bonds will open up a window that you might find useful
alternatively, you can opt for using the command line with hbonds

Click to see what your window should look like

If you’ve used the same criteria and the same model, you should see in the Log pannel that there are:

25 hydrogen bonds found
5 strict hydrogen bonds found

You’ll also see them as pseudo-bonds in your visualisation window and a sub-object will appear in the models pannel corresponding to this “contacts object”. You can hide or delete them anytime by selecting them within that pannel and clicking on “Hide” or “Close” respectively.

About the criteria used above:

When searching for H-bonds, you can, for example, relax the constraints on angles and distances to allow for “imperfect” models (i.e. models with possible clashes and “deformaties” in the side chains). You can also choose if you want to restrict yourself to bonds between two different chains (“Include intramodel” true) or if you would like to include intermolecule bonds too.

If you use the menu bar: Tools > Structure Analysis > H-bonds, you can define these options easily:

This will find H-bonds with ±0.5 Å distance tolerance and ±30° angle tolerance.

If you ticked “Select atoms” above, you can easily add the labels of the residues involved in H-bonds and Salt bridges if you go in Tools > Actions > labels, then select the label type you want (e.g. Chain, code, number).

In the command line, your command could look like this:

hbonds #1 color yellow interModel false distSlop 0.5 angleSlop 30.0 twoColors true slopColor orange intraMol false intraRes false select true reveal true log true;
color #1 bychain;
color #1 byhetero;
select up;
label sel text "{0.chain_id} {0.name} {0.number}{0.insertion_code}";

In the above, you should read the command lines as “option name”/“option value” pairs e.g. color yellow to show bonds as yellow, interModel false to not include inter-model bonds, etc.

Q16: repeat the task above but only select salt-bridges: how many salt bridges can you find?

Hints:

the “salt bridges only” checkbox might come in handy
alternatively, in the command line, checkout the saltOnly option

Click to see the answer

Q16: 3. If you have a look at the log pannel, you should see something like this:

Finding intramodel H-bonds
Constraints relaxed by 0.5 angstroms and 30 degrees
Models used:
    1 ORF1_Nter_dimer_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb

3 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/A ARG 113 NE   /B GLU 254 OE2  no hydrogen  3.149  N/A
/A ARG 113 NH2  /B GLU 254 OE1  no hydrogen  1.710  N/A
/B ARG 283 NE   /A GLU 222 OE1  no hydrogen  3.117  N/A

3 hydrogen bonds found

As you can see, there are 3 detected salt bridges, involving 2 different pairs of residues:

ARG-113/A - GLU-154/B
GLU-222/A - ARG-283/B

Salt bridge found by ChimeraX in model 1

Salt bridges can be seen as molecular clips of interfaces. They provide conformational specificity and can contribute to molecular recognition and catalysis.

How we got these results:

The proceedings are basically the same as above except you should tick the “salt bridges only” checkbox when using the menu bar or add the saltOnly true option in the command line like so:

hbonds #1 color lime interModel false distSlop 0.5 angleSlop 30.0 intraMol false intraRes false select true reveal true log true saltOnly true name salt_bridges;
select up;
label sel text "{0.chain_id} {0.name} {0.number}{0.insertion_code}";

In the above, we also decided to give the contact object a new name: “salt_bridges”

NB: creating custom functions

There are ways of creating home-made functions in ChimeraX (like ChimeraX’s hbonds, color or label commands that you’ve seen previously). They are called aliases. Below is an example which will create the check_interface command, which will put structures in cartoon style, colour them by chain, show the interface residues as sticks and finally calculate the hbonds and the salt bridges and show them in 2 different colours (yellow and lime):

alias check_interface show $1 cartoons; delete H; color $1 bychain; select $1/$2 :<4.5 & $1/$3 | $1/$3 :<4.5 & $1/$2 residues true; show sel atoms; color sel byhetero; hbonds $1/$2 restrict $1/$3 distSlop 0.5 angleSlop 30.0 color yellow twoColors true reveal true log true; hbonds $1/$2 restrict $1/$3 distSlop 0.5 angleSlop 30.0 color lime reveal true log true saltOnly true name salt_bridges; view sel clip true; ~select;

To use the new command, you have to specify the model number and the 2 chain IDs you would like to analyse the interface of. For example: check_interface #1 A B to highlight the interface between chains A & B of model 1.

Electrostatics & hydrophobicity

Electrostatic forces are guiding long-range forces. These interactions already start at larger distances and help binding partners come together (the estimated electrostatic energy between 2 molecules positioned at 10A from one another is ~1 KJ/mol, which is larger that any other energy component at equal distance (Zhang 2012)). Electrostatic interactions not only involve the binding partners, but also the solvent, which has to be displaced from the binding interface first, creating a large desolvation penalty compensated through binding of both partners. These interactions tend to be more specific for a binding partner.

On the other hand, hydrophobic interactions can be considered as glue and will stabilise the interface in a generally non-specific way.

T17: colour model 1 by electrostatics

Hints:

you should find what you’re looking for in the “Molecule Display” tab of the tool bar
alternatively, in the command line, checkout the coulombic command

Click to see what your window should look like

You can whether use the Tool bar: Molecule Display > electrostatic or use the command line with:

coulombic #1 key true

key true activates the colour scale that shows up on the bottom right corner

This might take a while if your protein/protein complex is big as ChimeraX computes the electrostatic surface on the go.

As you can see in the models pannel, the surface of each chain is saved within a separate subobject (e.g. #1.1 and #1.2 for chains A and B) and can be easily switched on an off by ticking the corresponding box.

Q18: To what colour does red and blue correspond?

Hints:

have a look at what amino acids are behind the red and blue patches & what chemical groups are involved

Click to see the answer

Red = Negative charge (acidic) Blue = Positive charge (base)

T19: colour model 1 by hydrophobicity

Hints:

you should find what you’re looking for in the “Molecule Display” tab of the tool bar
alternatively, in the command line, checkout the mlp command

Click to see what your window should look like

You can whether use the Tool bar: Molecule Display > hydrophobic or use the command line with:

mlp #1 key true

key true activates the colour scale that shows up on the bottom right corner

Q20: To what colour does blue and yellow correspond?

Hints:

have a look at what amino acids are behind the blue and yellow patches & what chemical groups are involved

Click to see the answer

Blue = hydrophilic (likes water) Yellow = hydrophobic / lipophilic (avoids water, e.g. aromatic residues like PHE or TYR, or non-polar side-chains like LEU or MET)

Q21: can you see any hydrophobic patches on the surface of the protein?

Q22: given the function of replication polyprotein pORF1 HEV, why do you think they’re important?

Click to see the answers

Q21: yes. There’s an alpha-helix in Cter (residues ~410) that contains many lipophilic residues.

Q22: the theory is that replication polyprotein pORF1 directs synthesis of new viral RNA in infected cells through functionning as a dodecameric pore, homologously to what has been described for Chikungunya virus. As you can see, in the dodecameric context, these alpha-helices surround the complex.

Hydrophobic patches in dodecamer context

Conservation

Proteins are subjected to evolutionary pressures in order to conserve their function, of which, their interaction with binding partners. Thus, looking at how conserved residues are in the protein can be an important indicator.

In order to calculate a conservation score for each residue, we need a Multiple Sequence Alignment (MSA) containing homologous sequences to our protein(s).

ChimeraX has a function to compute these scores directly from an MSA. There are also external tools such as Consurf or MSA-tools within our institute that do the same thing with different algorithms.

T23: open the MSA in fasta format in ChimeraX

Hints:

Tool bar > Open and select the fasta file within the predictions > ORF1_Nter_dimer folder
then right click on the new MSA pannel Structure > Associations to associate it with the already uploaded structures

T24: colour model 1 by conservation

Hints:

try using the color byattr function

Click to see the answer

You can use the following command to colour by conservation with the default colour palette:

color byattr r:seq_conservation #1

An example of a more complete command to color in the (-2,3) range and add a color bar (cf. key true):

color byattr r:seq_conservation #1 palette cyanmaroon range -2,3 novalue silver key true

An example to colour chains A & B with two different colours scales:

color byattr r:seq_conservation #1/A palette yellow:red range -2,3 novalue silver key true;
color byattr r:seq_conservation #1/B palette white:cyan:blue range -2,3 novalue silver key true