Our computational pipeline starts from the complete proteome of an Aspergillus species considered here.
In the initial phase of the pipeline, functional annotation on subcellular localization of the proteins with published experimental evidence was gathered from UniProt database. Firstly, intracellular proteins based on UniProt annotation on subcellular localization with experimental evidence were filtered and excluded from subsequent steps in the pipeline (Figure 1). Secondly, the remaining set of proteins with no experimental evidence for intracellular localization was classified into two mutually exclusive categories of proteins (Figure 1). The first category contained secreted extracellular proteins or cell membrane proteins based on UniProt annotation on subcellular localization with experimental evidence or compiled lists from high-throughput proteomic studies, and the second category contained proteins without experimental evidence from UniProt or high-throughput proteomic studies of being either secreted to the extracellular matrix or localized to the cell membrane.
The first category of experimentally verified proteins were subsequently checked for a signal peptide (using SignalP 4.1, Phobius and UniProt annotation with experimental evidence), Glycosylphosphatidylinositol (GPI) anchor (using PredGPI , big-PI and UniProt annotation with experimental evidence) or Transmembrane (TM) domain (using TMHMM 2.0, Phobius and UniProt annotation with experimental evidence), confirming passage through the classical secretory pathway (Branch A in Figure 1). Classical secretory pathway proteins were filtered by their predicted GPI anchors or TM domains to separate out cell membrane proteins from extracellular proteins (Branch A in Figure 1). Proteins without a predicted signal peptide, GPI anchor and TM domain but with experimental evidence from UniProt annotation or high-throughput proteomic studies of being secreted were classified as extracellular proteins secreted through a non-classical secretion pathway (Branch A in Figure 1).
The second category of proteins without experimental evidence from UniProt annotation or high-throughput proteomic studies were subsequently screened for localization using computational predictive tools as follows. First, the second category of proteins was screened for a signal peptide, GPI anchor or TM domain, suggesting translocation into the endoplasmic reticulum (ER) and their sorting via the classical secretion pathway (Figure 1). Next, the proteins with a signal peptide, GPI anchor or TM domain but also with an ER retention signal (determined using PS SCAN with PROSITE pattern PS00014) were excluded from later analysis (Branch B in Figure 1). Next the proteins predicted to have a GPI anchor or TM domain and with predicted subcellular localization as cell membrane (using WoLF PSORT 0.2, TargetP 1.1, ProtComp 6 and UniProt annotation with experimental evidence) were classified as cell membrane proteins, and proteins predicted to have neither GPI anchor or TM domain and with predicted subcellular localization as extracellular were classified as extracellular proteins sorted by classical secretion pathway (Branch B in Figure 1).
Lastly, the subset of the second category of proteins without experimental evidence from UniProt annotation or high-throughput proteomic studies, and in addition, also lacking a signal peptide, GPI anchor and TM domain, were checked if they were orthologs to known secreted proteins from other fungal species using OrthoMCL (Branch C in Figure 1). Next, those proteins in the subset which are orthologs of experimentally identified secreted proteins in other fungi were assessed for an ER retention signal and their predicted subcellular localization (Branch C in Figure 1). Proteins without an ER retention signal and predicted subcellular localization as extracellular were classified as extracellular proteins secreted through non-classical secretion pathway (Branch C in Figure 1). Note that, in this work, we decided to employ a method based on orthology to experimentally verified secreted proteins in all fungi to predict proteins passing via non-classical pathway.
In our computational pipeline for secretome prediction (Figure 1):
(a) Prediction of signal peptides in N-terminus of protein sequences is based on SignalP 4.1 predictions, Phobius predictions and UniProt annotations with published experimental evidence, if available.
(b) Prediction of GPI anchors in the protein sequences is based on PredGPI predictions, big-PI predictions and UniProt annotations with published experimental evidence, if available.
(c) Prediction of TM domains in the protein sequences is based on TMHMM 2.0 predictions, Phobius predictions and UniProt annotations with published experimental evidence, if available.
(d) ER resident proteins were identified based on ER retention signal predictions by PS SCAN.
(e) Prediction of subcellular localization of proteins is based on WoLF PSORT 0.2 predictions, TargetP 1.1 predictions, ProtComp 6 predictions and UniProt annotations with published experimental evidence, if available.
While integrating information from different predictive tools and UniProt annotations with published experimental evidence to decide on the presence of signal peptide or GPI anchor or TM domain in protein sequences, a consensus decision is made based on tool predictions if UniProt annotation with published experimental evidence is not available, else decision is made only on UniProt annotation with published experimental evidence by overriding tool predictions. While integrating information from different predictive tools and UniProt annotation with published experimental evidence to decide on subcellular localization of proteins, the decision is made based on tool predictions using a majority rule if UniProt annotation with published experimental evidence is not available, else decision is made only on UniProt annotation with published experimental evidence by overriding tool predictions.
Funding:
Research in the group of Areejit Samal at The Institute of Mathematical Sciences (IMSc), Chennai is financially supported by Department of Science and Technology (DST), Government of India through the award of a start-up grant (YSS/2015/000060) and Ramanujan fellowship (SB/S2/RJN-006/2014), Max Planck Society, Germany through the award of a Max Planck Partner Group, and intramural funds from Department of Atomic Energy (DAE), Government of India. The funders have no role in study design, prediction, analysis or decision to publish this work.
Contact:
If you have queries regarding our pipeline or datasets, please contact R. P. Vivek-Ananth.