Learning how to manipulate and visualise protein structures
Demo Step 1: The data
From now on, we’ll be manipulating protein structures predicted by AlphaFold and their associated per residue (pLDDT) and per residue pair (PAE) scores in ChimeraX. We’ll be working on the N-terminal dimer prediction of pORF1 HEV polyprotein (UniprotID: P33424). Since running the prediction can take some time, we’ve made the results available in Zenodo.
Fetch the data
Download the zip file from the Zenodo project below and unzip it:
Note: the above data corresponds to the output of AlphaFold@I2BC run “ORF1_Nter_dimer“
Data content
The results you just downloaded were generated in 2 steps. We first searched for homologs of our query protein using MMseqs2. The resulting multiple sequence alignment (MSA) corresponds to the ORF1_Nter_dimer.a3m file in the msas directory. This MSA was then used by AlphaFold2 to predict the structure of our query. The outputs of the prediction are in the predictions directory.
The most important files in this example exercise are the 5 predicted models of ranks 1 to 5 in pdb format (3D coordinates of all atoms) in the models folder. For each of these models, we also have the associated prediction scores in json format listed in the scores folder.
Of note: there is an additional model in the models folder corresponding to a declashed (relaxed vs unrelaxed) version of the model of rank 1. This structure doesn’t have an associated scoring file.
Data organisation
The folder hierarchy should look like the one depicted below (traditional output of the AlphaFold@I2BC server). The query.fasta file corresponds to the overall input sequence used for the prediction. All files are readable with a simple text editor such as Notepad++, Bloc-notes, Gedit…
ORF1_Nter_dimer ├── msas │ └── ORF1_Nter_dimer.a3m ├── predictions │ ├── cite.bibtex │ ├── config.json │ ├── log.txt │ └── ORF1_Nter_dimer
│ ├── ORF1_Nter_dimer.done.txt
│ ├── ORF1_Nter_dimer.a3m
│ ├── ORF1_Nter_dimer.fasta │ ├── plots │ │ ├── ORF1_Nter_dimer_coverage.png │ │ ├── ORF1_Nter_dimer_pae.png │ │ └── ORF1_Nter_dimer_plddt.png │ ├── models │ │ ├── ORF1_Nter_dimer_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb │ │ ├── ORF1_Nter_dimer_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_42555.pdb │ │ ├── ORF1_Nter_dimer_unrelaxed_rank_002_alphafold2_multimer_v3_model_2_seed_42555.pdb │ │ ├── ORF1_Nter_dimer_unrelaxed_rank_003_alphafold2_multimer_v3_model_5_seed_42555.pdb │ │ ├── ORF1_Nter_dimer_unrelaxed_rank_004_alphafold2_multimer_v3_model_4_seed_42555.pdb │ │ └── ORF1_Nter_dimer_unrelaxed_rank_005_alphafold2_multimer_v3_model_3_seed_42555.pdb │ └── scores │ ├── ORF1_Nter_dimer_predicted_aligned_error_v1.json │ ├── ORF1_Nter_dimer_scores_rank_001_alphafold2_multimer_v3_model_1_seed_42555.json │ ├── ORF1_Nter_dimer_scores_rank_002_alphafold2_multimer_v3_model_2_seed_42555.json │ ├── ORF1_Nter_dimer_scores_rank_003_alphafold2_multimer_v3_model_5_seed_42555.json │ ├── ORF1_Nter_dimer_scores_rank_004_alphafold2_multimer_v3_model_4_seed_42555.json │ └── ORF1_Nter_dimer_scores_rank_005_alphafold2_multimer_v3_model_3_seed_42555.json ├── progress.log ├── query.fasta ├── README.txt ├── scripts │ └── utils.py └── start_analysis.pml 7 directories, 27 files
The above course material is under CC-BY-SA license.