InvestorsHub Logo
Followers 0
Posts 1151
Boards Moderated 0
Alias Born 07/10/2003

Re: None

Saturday, 08/07/2004 7:35:06 AM

Saturday, August 07, 2004 7:35:06 AM

Post# of 82595
Some more of IBM research - The work of the Bioinformatics & Pattern Discovery Group focuses on a number of theoretical and applied problems that are of relevance to computational molecular biology. During the last several years we have been working on the the following:

pattern and association discovery in event streams;
multiple sequence alignment;
new approaches for similarity searching in
protein/dna databases;
the analysis of gene expression data
the parallelization of our algorithms on both
shared-memory and message-passing architectures
establishing lower bounds on the number of irredundant
motifs contained in a given database as well as algorithms
for finding these motifs
the automated annotation of proteins directly from sequence;
the discovery of genes in prokarytotic genomes using
dictionary-driven approaches
the characterization and prediction of local
3D structure directly from sequence;
the discovery of tandem repeats in DNA sequences;
the automated classification of protein sets into families;
the automatic generation of composite descriptors
for arbitrary collections of biological sequences;
new techniques for principal component analysis with
application to rational drug design, gene expression
analysis, and other problems;
the determination of archaea-, bacteria- and eukaryota-
specific signatures,
comparative genomics

and other.
We have also compiled the Bio-Dictionary(TM) an exhaustive collection of 1-dimensional patterns (which we refer to as seqlets - for 'small sequences') by processing the GenPept and SwissProt & TrEMBL databases with Teiresias. This collection allows us to fully characterize the sequence space of natural proteins - to the extent allowed by the sampling provided by the sequences in the processed databases. We have shown that the seqlets the Bio-Dictionary(TM) contains capture conserved functional and structural signatures both within and across family boundaries.

We continuously produce metadata (i.e. 'content') from public databases of biological sequences. Recently, we began making available annotations of complete genomes