If not, it’s easily downloadable from their website for any OS (Windows, Linux, Mac): https://www.cgl.ucsf.edu/chimerax/download.html
NB: CEA and I2BC members should have ChimeraX accessible within the app store
Input files for this exercise can be downloaded from AlphaFold2@I2BC or from Zenodo using either one of these links respectively:
These input files are organised as follows and contain AlphaFold2 predictions for the Nter dimer of pORF1.
ORF1_Nter_dimer/
├── msas
│ └── ORF1_Nter_dimer.a3m
├── predictions
│ ├── cite.bibtex
│ ├── config.json
│ ├── log.txt
│ └── ORF1_Nter_dimer
│ ├── models
│ │ ├── ORF1_Nter_dimer_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb
│ │ ├── ORF1_Nter_dimer_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb
│ │ ├── ORF1_Nter_dimer_unrelaxed_rank_002_alphafold2_multimer_v3_model_2_seed_42555.pdb
│ │ ├── ORF1_Nter_dimer_unrelaxed_rank_003_alphafold2_multimer_v3_model_5_seed_42555.pdb
│ │ ├── ORF1_Nter_dimer_unrelaxed_rank_004_alphafold2_multimer_v3_model_4_seed_42555.pdb
│ │ └── ORF1_Nter_dimer_unrelaxed_rank_005_alphafold2_multimer_v3_model_3_seed_42555.pdb
│ ├── ORF1_Nter_dimer.a3m
│ ├── ORF1_Nter_dimer.done.txt
│ ├── ORF1_Nter_dimer.fasta
│ ├── plots
│ │ ├── ORF1_Nter_dimer_coverage.png
│ │ ├── ORF1_Nter_dimer_pae.png
│ │ └── ORF1_Nter_dimer_plddt.png
│ └── scores
│ ├── ORF1_Nter_dimer_predicted_aligned_error_v1.json
│ ├── ORF1_Nter_dimer_scores_rank_001_alphafold2_multimer_v3_model_1_seed_42555.json
│ ├── ORF1_Nter_dimer_scores_rank_002_alphafold2_multimer_v3_model_2_seed_42555.json
│ ├── ORF1_Nter_dimer_scores_rank_003_alphafold2_multimer_v3_model_5_seed_42555.json
│ ├── ORF1_Nter_dimer_scores_rank_004_alphafold2_multimer_v3_model_4_seed_42555.json
│ └── ORF1_Nter_dimer_scores_rank_005_alphafold2_multimer_v3_model_3_seed_42555.json
├── progress.log
├── query.fasta
├── README.txt
├── scripts
│ └── utils.py
└── start_analysis.pml
7 directories, 27 files
Note that models are in the models
folder in mmCIF
format and their associated scores are in the scores
folder
in JSON format with the same basename.
ORF1_Nter_dimer_predicted_aligned_error_v1.json
is the old
format for AF scores and can be ignored in this exercise. *****
The following sections are divided into questions (Q) and tasks (T). Explanations and answers can be found in the relative sections by developping the dedicated links.
T1: load all 5 models in ChimeraX
To load a structure into ChimeraX, you can use one of the following:
File
> Open
Home
tab > Open
buttonT2: align the structures to more easily compare them
Hints:
Structure Analysis
> Matchmaker
matchmaker
command in the command
line directlyIn the command line, you can type the following:
matchmaker #2-5 & /A to #1/A
This will align chain A from models 2 to 5 onto chain A of model 1
Q3: are the models consistent with each other?
Q4: how many binding modes do you observe?
Q5: are the predicted interfaces symetrical?
Q6: are the answers to Q3-5 expected considering the possible dodecameric context?
Hints:
To help you answer questions 3 & 4, you can type the following in the command line:
color /A grey;
hide #2-5 & /A cartoons;
show #1/A surfaces;
This will colour all “A” chains in grey, then hide all “A” chain
cartoon representations except the one in model 1 and finally show chain
A in model 1 as surface.
You can, of course, also use the buttons in the menu and tool bars to do
the same thing.
Q3: the answer is yes & no! The 5 models are clearly not all superimposable. However, models 1 & 2 are nearly superimposable, and we observe the same with models 3, 4 & 5. In that aspect, the models are quite consistent with each other.
Q4: 2. Part of this question was already answered in Q3: models 1 and 2 form a first potential binding mode and models 3 to 5 form the other potential binding mode.
To help you answer question 5, you can type the following in the command line:
preset "initial styles" "original look";
hide #2-5 models;
color #1 bychain;
This will colour all “A” chains in grey, then hide all “A” chain
cartoon representations except the one in model 1 and finally show chain
A in model 1 as surface.
You can, of course, also use the buttons in the menu and tool bars to do
the same thing.
For a more detailed analysis, you can open the sequence pannel in the
Menu bar
> Tools
> Sequence
> Show Sequence Viewer
, when prompted, don’t group
identical sequences together and select only chains A & B of model
1. Then select the residues at the interface between chains A & B of
model 1:
select #1/A :<4.5 & #1/B | #1/B :<4.5 & #1/A residues true
You’ll see in the sequence pannel that it’s different residues that are highlighted from one chain to the other.
Q5: no. You can see this visually if you look hard enough but you can also help yourself of the sequence pannel & selection of the interface residues.
Q6: yes. What we see doesn’t go against the theory that the dimer is actually part of a subset of a dodecamer. If you feel up to it, you can even try duplicating the first model and superimpose the “B” chain of the duplicated one onto the “A” chain for the initial one, then compare the duplicated complex with one of the models in the other binding mode, like so:
combine #1 name rank1bis;
matchmaker #6/B to #1/A;
hide all models;
show #3,6 models;
See how close models 3 and 6 (=duplicated #1) are?
Q7: what do you think of the global scores?
Hints:
Both the average pLDDT and the pTM-scores are relatively low. As a rule of thumb, anything between 0.5 and 0.7 for the pTM-score is ok-ish but you should be careful, and anything below 70 in pLDDT is not very reliable. Models of ranks 1 and 2 are flirting with this limit.
NB: it’s a multimeric prediction, thus models are ranked according to their ipTM-score.
T8: upload the local scores for model 1
Hints:
Tools
>
Structure Prediction
>
AlphaFold Error Plot
. Make sure the right structure is
selected and click on Browse
to fetch the corresponding
JSON file within the scores
folder.The PAE pannel might appear as a separate window which you can integrate into the ChimeraX window by dragging & dropping it into it. In this example, the PAE pannel shares a spot with the Log pannel, accessible through the tabs at the bottom of the image.
NB: JSON files can also be uploaded through the command line e.g.:
alphafold pae #1 file /path/to/scores/file.json
T9: change the PEA plot colours
Hints:
alphafold pae
commandColours in the “right click menu” are quite limited. If you’d like to colour your plot as red-white-blue for example, you can use the following command:
alphafold pae #1 palette bluered plot true
To have a list of possible palettes, use palette list
.
You can also create your own colour gradient by separating colour names
with colons :
e.g. blue:white:red
. Possible
colours can be listed with color list
T10: colour model 1 by its pLDDT scores
Hints:
color bfactor
commandHere, we coloured the PAE with the bluered
colour scale
and coloured the structure with the following command:
color bfactor #1 palette alphafold
Q11: why do you think the region highlighted below has a high predicted error?
Hints:
This zone in the PAE plot shows the predicted distance error between a well-structured domain and a predicted loop region with very poor pLDDT scores. The latter even suggests that this region might be disordered, thus AlphaFold isn’t confident in the actual position of this loop in reference to the rest of the structure.
T12: colour the structure according to domains in the PAE
Hints:
alphafold pae
command to do the
sameChimeraX has many inbuilt features, of which, the identification of predicted well-packed domains from a PAE plot. Here, we used the command line to colour the structure by detected domain:
alphafold pae #1 colorDomains true
And we used the “right click menu” on the PAE plot to colour the PAE as the structure.
This is a useful tool to more easily find correspondances between the structure and the PAE plot and to help you read the PAE plot.
Q13: are pLDDT and PAE values in agreement along the structure?
Hints:
alphafold pae
command to do the
sameTo help you answer this question, you can use the same trick as before and colour the structure using the pLDDT and right click on the PAE to update the colouring as in the structure.
color bfactor #1 palette alphafold
The resulting plot isn’t the most easy to read but if you look closely, it depicts the pLDDT score along the diagonal, together with the PAE error values in grey scale.
PAE values and pLDDT scores represent 2 different types of confidence levels, but as you can see, in this prediction, they are quite in agreement in the Nter predicted domains.
Q14: what are your general conclusions on the confidence of these predictions?
The results are promising but further analysis might help to get more confidence in this interface.
It’s important to double-up the output analysis with complementary information such as electrostatics, possible hydrogen bond networks or salt bridges, conservation etc. to try to answer the following questions: is the interface plausible? Does it have strong contacts? Is it stable?
As a reminder, here’s a table of forces that commonly stabilise proteins:
T15: find the H-bonds between chains A & B of model 1
Hints:
Tools
>
Structure Analysis
> H-bonds
will open up a
window that you might find usefulhbonds
If you’ve used the same criteria and the same model, you should see in the Log pannel that there are:
You’ll also see them as pseudo-bonds in your visualisation window and a sub-object will appear in the models pannel corresponding to this “contacts object”. You can hide or delete them anytime by selecting them within that pannel and clicking on “Hide” or “Close” respectively.
About the criteria used above:
When searching for H-bonds, you can, for example, relax the constraints on angles and distances to allow for “imperfect” models (i.e. models with possible clashes and “deformaties” in the side chains). You can also choose if you want to restrict yourself to bonds between two different chains (“Include intramodel” true) or if you would like to include intermolecule bonds too.
If you use the menu bar: Tools
>
Structure Analysis
> H-bonds
, you can
define these options easily:
This will find H-bonds with ±0.5 Å distance tolerance and ±30° angle tolerance.
If you ticked “Select atoms” above, you can easily add the labels of
the residues involved in H-bonds and Salt bridges if you go in
Tools
> Actions
> labels
,
then select the label type you want (e.g. Chain, code, number).
In the command line, your command could look like this:
hbonds #1 color yellow interModel false distSlop 0.5 angleSlop 30.0 twoColors true slopColor orange intraMol false intraRes false select true reveal true log true;
color #1 bychain;
color #1 byhetero;
select up;
label sel text "{0.chain_id} {0.name} {0.number}{0.insertion_code}";
In the above, you should read the command lines as “option
name”/“option value” pairs e.g. color yellow
to show bonds
as yellow, interModel false
to not include inter-model
bonds, etc.
Q16: repeat the task above but only select salt-bridges: how many salt bridges can you find?
Hints:
saltOnly
optionQ16: 3. If you have a look at the log pannel, you should see something like this:
Finding intramodel H-bonds
Constraints relaxed by 0.5 angstroms and 30 degrees
Models used:
1 ORF1_Nter_dimer_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb
3 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
/A ARG 113 NE /B GLU 254 OE2 no hydrogen 3.149 N/A
/A ARG 113 NH2 /B GLU 254 OE1 no hydrogen 1.710 N/A
/B ARG 283 NE /A GLU 222 OE1 no hydrogen 3.117 N/A
3 hydrogen bonds found
As you can see, there are 3 detected salt bridges, involving 2 different pairs of residues:
Salt bridges can be seen as molecular clips of interfaces. They provide conformational specificity and can contribute to molecular recognition and catalysis.
How we got these results:
The proceedings are basically the same as above except you should
tick the “salt bridges only” checkbox when using the menu bar or add the
saltOnly true
option in the command line like so:
hbonds #1 color lime interModel false distSlop 0.5 angleSlop 30.0 intraMol false intraRes false select true reveal true log true saltOnly true name salt_bridges;
select up;
label sel text "{0.chain_id} {0.name} {0.number}{0.insertion_code}";
In the above, we also decided to give the contact object a new name: “salt_bridges”
NB: creating custom functions
There are ways of creating home-made functions in ChimeraX (like
ChimeraX’s hbonds, color or label commands that you’ve seen previously).
They are called aliases. Below is an example which will create the
check_interface
command, which will put structures in
cartoon style, colour them by chain, show the interface residues as
sticks and finally calculate the hbonds and the salt bridges and show
them in 2 different colours (yellow and lime):
alias check_interface show $1 cartoons; delete H; color $1 bychain; select $1/$2 :<4.5 & $1/$3 | $1/$3 :<4.5 & $1/$2 residues true; show sel atoms; color sel byhetero; hbonds $1/$2 restrict $1/$3 distSlop 0.5 angleSlop 30.0 color yellow twoColors true reveal true log true; hbonds $1/$2 restrict $1/$3 distSlop 0.5 angleSlop 30.0 color lime reveal true log true saltOnly true name salt_bridges; view sel clip true; ~select;
To use the new command, you have to specify the model number and the
2 chain IDs you would like to analyse the interface of. For example:
check_interface #1 A B
to highlight the interface between
chains A & B of model 1.
Electrostatic forces are guiding long-range forces. These interactions already start at larger distances and help binding partners come together (the estimated electrostatic energy between 2 molecules positioned at 10A from one another is ~1 KJ/mol, which is larger that any other energy component at equal distance (Zhang 2012)). Electrostatic interactions not only involve the binding partners, but also the solvent, which has to be displaced from the binding interface first, creating a large desolvation penalty compensated through binding of both partners. These interactions tend to be more specific for a binding partner.
On the other hand, hydrophobic interactions can be considered as glue and will stabilise the interface in a generally non-specific way.
T17: colour model 1 by electrostatics
Hints:
coulombic
commandYou can whether use the Tool bar: Molecule Display
>
electrostatic
or use the command line with:
coulombic #1 key true
key true
activates the colour scale that shows up on the
bottom right corner
This might take a while if your protein/protein complex is big as ChimeraX computes the electrostatic surface on the go.
As you can see in the models pannel, the surface of each chain is saved within a separate subobject (e.g. #1.1 and #1.2 for chains A and B) and can be easily switched on an off by ticking the corresponding box.
Q18: To what colour does red and blue correspond?
Hints:
Red = Negative charge (acidic) Blue = Positive charge (base)
T19: colour model 1 by hydrophobicity
Hints:
mlp
commandYou can whether use the Tool bar: Molecule Display
>
hydrophobic
or use the command line with:
mlp #1 key true
key true
activates the colour scale that shows up on the
bottom right corner
Q20: To what colour does blue and yellow correspond?
Hints:
Blue = hydrophilic (likes water) Yellow = hydrophobic / lipophilic (avoids water, e.g. aromatic residues like PHE or TYR, or non-polar side-chains like LEU or MET)
Q21: can you see any hydrophobic patches on the surface of the protein?
Q22: given the function of replication polyprotein pORF1 HEV, why do you think they’re important?
Q21: yes. There’s an alpha-helix in Cter (residues ~410) that contains many lipophilic residues.
Q22: the theory is that replication polyprotein pORF1 directs synthesis of new viral RNA in infected cells through functionning as a dodecameric pore, homologously to what has been described for Chikungunya virus. As you can see, in the dodecameric context, these alpha-helices surround the complex.
Proteins are subjected to evolutionary pressures in order to conserve their function, of which, their interaction with binding partners. Thus, looking at how conserved residues are in the protein can be an important indicator.
In order to calculate a conservation score for each residue, we need a Multiple Sequence Alignment (MSA) containing homologous sequences to our protein(s).
ChimeraX has a function to compute these scores directly from an MSA. There are also external tools such as Consurf or MSA-tools within our institute that do the same thing with different algorithms.
T23: open the MSA in fasta format in ChimeraX
Hints:
Open
and select the fasta file within the
predictions
> ORF1_Nter_dimer
folderStructure
>
Associations
to associate it with the already uploaded
structuresT24: colour model 1 by conservation
Hints:
color byattr
functionYou can use the following command to colour by conservation with the default colour palette:
color byattr r:seq_conservation #1
An example of a more complete command to color in the (-2,3) range and add a color bar (cf. key true):
color byattr r:seq_conservation #1 palette cyanmaroon range -2,3 novalue silver key true
An example to colour chains A & B with two different colours scales:
color byattr r:seq_conservation #1/A palette yellow:red range -2,3 novalue silver key true;
color byattr r:seq_conservation #1/B palette white:cyan:blue range -2,3 novalue silver key true