Benchmark: Critical assessment of coiled-coil predictions based on protein structure data

Coiled-coil prediction benchmark on the entire RCSB PDB

In order to provide an interactive insight into the study »Critical assessment of coiled-coil predictions based on protein structure data« representing an up-to-date evaluation of the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank (PDB) [10] down to each amino acid and its secondary structure, the following page was developed.

PDB-CC is a Web-GL based PDB-viewer for coiled-coil predictions using JSmol or ChemDoodle Web Components. PDB-CC illustrates the predictions of the most commonly used coiled-coil softwares applied to a broad data set of PDB files (state: 2018-12 with 147,073 files available), showing at which positions coiled-coil regions and the corresponding predictions are located. Each PDB-file has been analysed in advance using BioRuby [9] for parsing, with DSSP [1] to assign secondary structures and SOCKET [2] for identifying coiled-coil domains based on the 3D structure of the protein-models to set a ground truth to compare with. Additionally, we ran with a selection of the following coiled-coil-tools (Ncoils [3], Marcoil [4], Multicoil2 [6], Paircoil2 [8], Multicoil [5], Paircoil [7]) over all PDBs and produced predictions for every existing PDB chain-sequence. For a fair and transparent comparison for the end-user, we only used the standard parameter-sets recommended by the respective software developers.

Publications

If this study was helpful for your research, please cite:

Simm D., Hatje K., Waack S. and Kollmar M. (2021). Critical assessment of coiled-coil predictions based on protein structure data. Scientific Reports. 11(12439).
- nature.com/articles/s41598-021-91886-w
- Supplemental benchmark data (10.6084/m9.figshare.9994706)

Inspect coiled-coil predictions for specific PDB structures:

Choose or search a PDB-ID in one of the prepared PDB-sets, assembled without restriction («all»), by detected «coiled-coils», by «predictions» or the intersections of both («matches»). The 3D structure of the selected molecule will then be loaded and displayed in the chosen viewer. The enriching information of SOCKET plus the available coiled-coil predictions can be chosen by selecting the highlighted regions in the sequence domain figures.

PDB

Software References

[1] Kabsch, W. & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-2637. - 10.1002/bip.360221211

[2] Walshaw, J. & Woolfson, D. N. (2001). SOCKET: a program for identifying and analysing coiled-coil motifs within protein structures. J. Mol. Biol., 307(5), 1427-1450. - 10.1006/jmbi.2001.4545

[3] Lupas, A., Dyke, M. & Stock, J. (1991). Predicting coiled coils from protein sequences. Science, 252(5009), 1162-1164. - jstor.org/stable/2876291

[4] Delorenzi, M. & Speed, T. (2002). An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 18, 617–625. - 10.1093/bioinformatics/18.4.617

[5] Kim, P. S., Berger, B. & Wolf, E. (1997). MultiCoil: A program for predicting two-and three-stranded coiled coils. Protein Science, 6, 1179–1189. - 10.1002/pro.5560060606

[6] Trigg, J., Gutwin, K., Keating, A. E. & Berger, B. (2011). Multicoil2: Predicting Coiled Coils and Their Oligomerization States from Sequence in the Twilight Zone. PLOS ONE, 6, 1–10. - 10.1371/journal.pone.0023519

[7] Berger, B., Wilson, D. B., Wolf, E., Tonchev, T., Milla, M. & Kim, P. S. (1995). Predicting Coiled Coils by Use of Pairwise Residue Correlations. Proceedings of the National Academy of Science USA, 92, 8259-8263. - 10.1073/pnas.92.18.8259

[8] McDonnell, A. V., Jiang, T., Keating, A. E. & Berger, B. (2006) Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics, 22: 356–358. - 10.1093/bioinformatics/bti797

[9] Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama. 2010. BioRuby. Bioinformatics, 26: 2617–2619. - 10.1093/bioinformatics/btq475

[10] Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Research, 28: 235-242. - 10.1093/nar/28.1.235

Rights and Restrictions

Using Waggawagga by non-academics requires permission. Waggawagga may be obtained upon request and used under a GNU General Public License.