498: Statistical Physics, Biological Information and Complexity.

Homework 3 Essays: Bioinformatics

Author:  Tommy Angelini
TitleProtein Chips (13 kb)

Abstract:

In this course, our studies have been entirely based upon the methods of comparison, relying on the assumption that somewhere there is indeed real
information.  After all, comparisons are not made for their own sake, and meaningful comparisons must make reference to some known facts.  Where
does the information come from in bioinformatics?  In this essay I will present one of the methods for collecting such information: protein chips.
Author: Anoush Aghajani-Talesh

Title: PSIC: Profile extraction from sequence alignmments with position  specific counts of independent observations (79 kb)

Abstract:

This essay describes a heuristic method for the extraction of profiles from multiple sequence alignments.
Author: Marco V. Bayas
Title: Hidden Markov Models in Protein Modelling. (15kb)

Abstract:

The use of Hidden Markov Models (HMM) in protein modeling is described.  Sequence alignment based on profile HMMs can help identifying protein family
members and present some advantages. This possibility is discussed.
Author : Swarbhanu Chatterjee.
Title  : Hidden Markov Models and Sequence Alignment (171 kb)

Abstract :

Hidden Markov Models are a sophisticated and useful tool for sequence alignment. Conventional tools are not able to analyze the large amount of data that is available. In this review paper, I have explained what HMMs are, described the different kinds of HMM which exist andshowed how they are useful.
Author: Soon Yong Chang
Title: Review of the paper: Minimum Entropy Approach to Word Segmentation Problems by Bin Wang, from LANL archive physics/0008232 v1 29/Aug/2000.

Abstract:

In this paper, the non-coding portion of DNA is focused, where the regulatory elements of the genes are found. Different from the coding portion where the word size is limited to 3 letters (codon) this region of DNA allows for larger flexibility in terms of the possible size of the words. The proper understanding of the segmentation holds the promise of better understanding the non-coding region of DNA.
Author:         Jordi Cohen
Title:         Non-symmetric score matrices and membrane proteins (106 kb)

Abstract:

Proteins search engines such as BLAST use score matrices to assign a score to protein alignments. The usual scoring matrices, such as PAM and BLOSUM, are very well suited for general-purpose searches, but they perform sub-optimally when they are used to compare hydrophobic protein domains such as those found in membrane proteins. It has been suggested that new scoring matrices should be developped, that would be especially suited for the comparison of the hydrophobic parts of proteins. New scoring matrices are introduced here that perform noticeably better in queries that involve such protein segments.  These matrices have unusual properties such as asymmetric off-diagonal components as well as negative diagonal elements. In this essay, I will present these matrices and describe their properties.
Author: Peter Fleck
Title: 46 kb

Author :Parag Ghosh
Title: Determining protein function from Comparative GenomeAnalysis (19kb)

Abstract:
 

This essay aims at identifying proteins that participate in a functional pathway. The underlying assumption is that proteins that function together in a pathway or structural complex are likely to be preserved together or eliminated together in organisms during the process of evolution.  This property of correlated olution is studied here by characterizing each protein by its phylogenetic profile. This method not only brings out functional correlations among proteins but also helps us to predict the functions of uncharacterized proteins.
Author: Matt Gordon
Title: Parallelization of Sequence Comparison Dynamic Programming Algorithms (33 kb)
 
The need for fast comparison of genetic sequences has become evident with the rapid expansion of genetic sequence databases.  This paper
discusses the most popular sequence alignment algorithms, based on dynamic programming, and how they can be effectively sped up by use of
parallel processing algorithms that distribute the computing requirements over several processors.  Key issues such as load balancing
and efficient processor usage are addressed.
Author:Paul Grayson
Title: Comparison of the proteins in eukaryotes  (50 kb)

Abstract:

This essay discusses a recent study comparing the entire genomes of three eukaryotic organisms.  The study identified most of the protein sequences in each species, in order to examine their differences.  Comparison to known features of the species allows us to begin to understand why their genomes contain differing numbers of copies of particular types of protein.
Author: Chalermpol Kanchanawarin
Title: Alignment-Ambiguous Regions of Genes (42 kb)

Abstract:

Thinking about alignment of multiple DNA sequences, what would you do if you find that there are some regions in the sequences which do not look like they
could be aligned as there may be some ambiguity in the alignment? In most studies, these alignment-ambiguous regions are simply removed before analysis
is carried out. In the article in the opinion section of TRENDS in Ecology & Evolution, December 2001 issue,  the author, Michael S.Y. Lee,  has raised and
restressed the importance of  proper study and analysis of these alignment-ambiguous regions (e.g. the repidly evolving regions of genes) in
Molecular Phylogenetic and Evolutionary studies. Three promising methods have been suggested for analysis of such regions with examples.
Author: David Larson
Title: Hidden Markov Models (135 kb)

Abstract:

This paper presents a definition and some of the mathematics behind hidden Markov models.  It also discusses some of the usefulness and
appliations of these models, with a primary focus on categorizing DNA sequences.
Author: Yan Li
TitleCLUSTAL W Method for Multiple Alignment (814 kb)

Abstract:

This article describes a reliable and efficient method for multiple sequence alignment--CLUSTAL W Method. A brief intorduction of the progressive
appraoch is followed by the summary of improvements upon its sensitivity by CLUSTAL W.  Modifications by CLUSTAL W are discussed in detail, together with its limation.
Author: Ian O'Dwyer
TitleComparison of the Draft Human Genomes (190 kb)

Abstract:

This essay reviews a statistical comparison of the the two draft versions of
the human genome.  It is found that, although both genomes share some similar
features at a macro level, they differ in the details.
Title: The Minimal-Gene-Set (676 kb)
Author: Kapil Rajaraman

Abstract:

An interesting challenge facing the biological community is the construction of a genome with the minimum required genes. This essay reviews a comparative genomics approach to the problem- this approach, combined with some biochemistry, may lead to a solution of the problem.
Author: Rahul Roy
Title: Decoding Noncoding DNA (346 kb)

Abstract:

The century of biological revolution brought about the large scale sequencing of the genomes has provided us more than enough we can handle. The number of sequences and genes in the databases have been growing exponentially.
Author: Prasanth Sankar
Title: Building a dictionary for the identification of regulatory genes

Abstract:

This essay describes  an ab-initio bioinformatic method published recently to identify the regualtory genes. The method requires no experimental input, unlike other methods used for the same purpose, and develops a scheme to generate words from randomly placed letters. The success of the method is modest in the identification of the regulatory genes.
Author:    Martin Ph. Stehno
Title:        Motif or artifact? (141 kb)

Abstract:

Conventional motif-finding algortihms are optimized for either good soundness or completeness. Good soundness is achieved when the output lists only a few
motifs that are very likely to represent binding sites. On the other hand, there are algorithms which are designed to give a complete list of binding
sites. The downside of this is that the list will also contain typically hundreds of small variations of strong motifs, which are not considered to be
motifs in their own right. Here I want to report a new method of post-processing the output of such an algorithm. The method was invented by
Blanchette and Sinha (2001) and clears the motif list from artifacts of strong motifs. Left are a small number of IUPAC ambiguity code sequences that very
likely represent real binding sites.
Author: Kalin Vestigan
Title: Multiple Alignment with Hidden Markov Models (127 kb)

Abstract:

Hidden Markov Models (HMMs) are an implementation of the idea that the scoring parameters should guide the multiple alignment as much as the alignment should determine the scoring parameters.
Author: Elizabeth Villa Rodriguez
Title: Phylogenetic studies on the origin of HIV-1 (111 kb)

Abstract:

This essay reviews the scientific efforts by several group to investigate by means of molecular phylogenetics if a polio vaccine was to guilt for the
introduction of HIV into the man affecting virus landscape.
Author: Qing-jun Wang
Title: Identify Borders of Biological Meaningful Units (213 kb)

Abstract:

This essay describes the recent progress in identifying the borders of biological meaningful units in a genome. Segmentation procedure based on Jensen-Shannon divergence with a new stopping criteria based on the Bayesian information criterion are introduced. The procedure was applied
to complete genome of E. coli and the left telomere of chromosome 12 of Yeast and obtained highly accurate segmentations.
Author: Paul Welander
Title:  Comparative Genomics by way of Relative Abundance Analysis (117 kb)
 
Dr. Samuel Karlin and colleagues at Stanford University have developed a method for assessing genomic similarities based on relative abundances of
short nucleotide chains. The goal of such a method is to eliminate the need for homologous sequences that have been previously aligned by another
procedure. This approach deviates from previous methods of genomic analysis by utilizing information derived from the entire genome rather
than from specific subsequences.  The resulting genomic comparisons are generally in agreement with accepted phylogenies.
Author:Kalin Vestigan
Title: Multiple Alignment with Hidden Markov Models (119 kb)

Author:  Jian Xu
TitleBuilding a dictionary for DNA --Decoding the regulatory regions of a genome (285 kb)

Abstract:

This essay explains how statistical physics can be used to find the genes in regulatory region, which was referred to "junk".  By an free energy analog,
they builded a dictionary for the regulatory region, then confirmed some old "words" (regulatory factor) and find some new "words".
Author: Jin Yu
Title: Gene recognition in completely sequenced bacterial genomes (423 kb)

Abstract:

Approaches to gene recognition are traditionally divided into intrinsic and extrinsic approaches. This essay introduces a work using bacterial DNA regions significantly related to known proteins to extract codon usage statistics and other intrinsic recognition parameters that are further applied to unexplored parts of a genome. The leading idea of this work is that extrinsic evidence should be given higher priority that intrinsic information.


Author:     Guojun Zhu
Title:     Bioinformatics Framework at Organism Level (113 kb)

Abstract:

This essay describes the bioinformatics framework and database at organism level. I describe the structure, the compromises and limitations, the
use, the future of it and talk about some examples.


                                                                       Return to homepage