This course is designed to introduce a broad range of computational
problems in molecular biology. Solution techniques draw from
several branches of mathematics: combinatorics, probability,
optimization, and dynamical systems.
| Prerequisites: |
Fundamentals of Computing (CSC 1410)
|
| Linear algebra, Algorithm analysis, Graph theory,
and Differential equations (MATH 5198)
|
| Probability and statistics (MATH 3800)
|
| Introductory knowledge of molecular biology (BIOL 5099)
|
This is generally taught each spring semester.
| Topics |
- Sequence alignment
Evolutionary events change DNA sequences, and alignment is a way to
understand how one genome relates to another. The alignment is done
at the nucleotide level, aligning DNA or RNA segments. It is also done
at the amino acid (residue) level, aligning proteins. Two sequences are
easy to align, using dynamic progrramming as a solution technique.
Multiple sequences are hard and there are several approaches to this.
- Phylogenetic trees
A phylogenetic tree is a graphical presentation of the evolutionary
history of some species or its parts. We want to compute a phylogenetic
tree in order to understand how life works. Specifically, we can help
with sequence alignment, predict protein structure, predict gene expression, design enhanced organisms (like wheat, rice, ...), map pathogen strain
diversity for vaccines, and assist in epidemiology of infectious diseases or
genetic defect. Some computational methods are distance based; others are
based on parsimony or maximum likelihood.
- Gene expression arrays
Modern biology uses high throughput data methods, which are on chips,
called microarrays, which measure gene expression indirectly. The volume
of data on one chip can be enormous, and one biologist might generate several
chips per day. It is the goal of computational biology to obtain knowledge
from this large volume of data that is riddled with error. Clustering
techniques are presented with illustrations of what can happen if they are
used inproperly, particularly issues of data conditioning. Principle Component
Analysis and data visualization are among other topics, using data from a
variety of biological databases.
- Markov models
The elementary Markov model is described and applied to a variety of problems,
such as CpG island recognition, coding region recognition, gene finding. These
are extended to Hidden Markov Models, showing many modern applications in biology.
(HMMs originally were motivated by problems in speech recognition.)
- Protein structure I: Basics
This begins with what proteins are and how they are made and categorized. This
moves into an understanding of the Protein Data Bank - what information it
contains, how to obtain it, and how it relates to other databases. Visualization
of proteins is illustrated with RasMol and/or CHIME.
- Protein structure II: Analysis
Starting with molecular geometry, we relate coordinates, angles, and distances.
With perfect information we can transform from any one of these to any other,
but there are problems in which we do not have perfect information, such as
given distances between only some of the pairs of atoms. The dynamics of folding
is then considered, showing the mathematical equations in context of the biology,
chemistry, and physics. The fundamental protein folding problem is defined, and
its complexity discussed in depth (viz., Levinthal paradox).
- Systems biology
Networks describe complex biological systesm, such as Protein-protein interactions, gene regulation, cell signaling, metabolic processes, and more.
The cutting edge of modern biology is treating these networks as part of a
system with highly interacting parts.
|
|