Variable Length Character N-Gram Embedding of Protein Sequences for Secondary Structure Prediction

Ashish K. Sharma; Rajeev Srivastava

doi:10.2174/0929866527666201103145635

ISSN: 0929-8665
E-ISSN: 1875-5305

Variable Length Character N-Gram Embedding of Protein Sequences for Secondary Structure Prediction
By Ashish K. Sharma and Rajeev Srivastava
Source: Protein and Peptide Letters, Volume 28, Issue 5, May 2021, p. 501 - 507
DOI: https://doi.org/10.2174/0929866527666201103145635
- Available online: 01 May 2021

Abstract

Background: The prediction of a protein's secondary structure from its amino acid sequence is an essential step towards predicting its 3-D structure. The prediction performance improves by incorporating homologous multiple sequence alignment information. Since homologous details not available for all proteins. Therefore, it is necessary to predict the protein secondary structure from single sequences. Objective and Methods: Protein secondary structure predicted from their primary sequences using n-gram word embedding and deep recurrent neural network. Protein secondary structure depends on local and long-range neighbor residues in primary sequences. In the proposed work, the local contextual information of amino acid residues captures variable-length character n-gram words. An embedding vector represents these variable-length character n-gram words. Further, the bidirectional long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting the past and future residues information in primary sequences. Results: The proposed model evaluates on three public datasets ss.txt, RS126, and CASP9. The model shows the Q3 accuracy of 92.57%, 86.48%, and 89.66% for ss.txt, RS126, and CASP9. Conclusion: The proposed model performance compares with state-of-the-art methods available in the literature. After a comparative analysis, it observed that the proposed model performs better than state-of-the-art methods.

Article metrics loading...

/content/journals/ppl/10.2174/0929866527666201103145635

2021-05-01

2026-02-25

From This Site

/content/journals/ppl/10.2174/0929866527666201103145635

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

/content/journals/ppl/10.2174/0929866527666201103145635

Article Type: Research Article

Keyword(s): amino acids sequence; bidirectional long short-term memory; character n-gram embedding; deep learning; protein secondary structure; Proteomics

Variable Length Character N-Gram Embedding of Protein Sequences for Secondary Structure Prediction

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

Association between Higher Expression of Vav1 in Hepatocellular Carcinoma and Unfavourable Clinicopathological Features and Prognosis

The Role of TGFBR3 in the Development of Lung Cancer

Wogonin Restrains the Malignant Progression of Lung Cancer Through Modulating MMP1 and PI3K/AKT Signaling Pathway

miR-1204 Positioning in 8q24.21 Involved in the Tumorigenesis of Colorectal Cancer by Targeting MASPIN

The LL-37 Antimicrobial Peptide as a Treatment for Systematic Infection of Acinetobacter baumannii in a Mouse Model

Anti-Cancer Bioactive Peptide Induces Apoptosis in Gastric Cancer Cells through TP53 Signaling Cascade

ZNF165: A Pan-Cancer Biomarker with Prognostic and Therapeutic Potential

Exploring the Therapeutic Potential of Noncoding RNAs in Alzheimer’s Disease

Ferroptosis as a Therapeutic Target in Neurodegenerative Diseases: Exploring the Mechanisms and Potential of Treating Alzheimer's Disease and Parkinson's Disease

Bioactive Peptides from Marine Organisms