
Full text loading...
An algorithm for automatic clustering of database protein sequences from Bothrops jararacussu venomous gland, according to sequence similarities of their domains, is described. The program was written in C and Perl languages. This algorithm compares a domain with each ORF protein sequence in the database. Each nucleotide FASTA sequence generates six ORFs. As a result, the user has a list containing all sequences found in a specific domain and a display of the sequence, domain and number of hits. The algorithm lists only the sequences that present a minimum similarity of 30 hits and the best alignment. This limit was considered appropriate. The algorithm is available in the Internet (www.compbionet.org.br/cgi-domains/homesnake) and it can quickly and accurately organizes large database into classes.