CS262

Announcements
3/3Cristina's office hours on Thursday 3/5 will be moved to 3:30-4:30.
2/26Problem Set 4 has been posted.
2/21On PS3 problem 4(d), you need to provide a tree where your algorithm cannot do better than O(n^2), regardless of the order the nodes are chosen.
2/11Problem Set 3 has been posted.
1/30There will be no problem session this Friday, January 30.
1/28Problem Set 2 has been posted.
1/20There was a typo in the Problem Set 1 due date - it is due Wednesday, January 28 at the beginning of class.
1/20Students planning to scribe a lecture should email the staff list with your top 3 choices (in order). These will be assigned on a first come, first serve basis.
1/20Students looking for homework groups should email the staff list ASAP and/or attend the problem session this Friday, January 23.
1/14Problem Set 1 has been posted.
1/14Locations and times have been finalized for all staff office hours and the problem session. The first problem session will be this Friday, January 16.
1/14The staff mailing list ude.drofnats.stsil@ffats-9080niw-262sc (written backwards) has been created. Please use this list rather than emailing course staff individually.
Course Description
Genomics is a new and very active application area of computer science. The past ten years there has been an explosion of genomics data -- the entire DNA sequences of several organisms, including human, are now available. These are long strings of base pairs (A,C,G,T) containing all the information necessary for an organism's development and life. Computer science is playing a central role in genomics: from sequencing and assembling of DNA sequences to analyzing genomes in order to locate genes, repeat families, similarities between sequences of different organisms, and several other applications. The area of computational genomics includes both applications of older methods, and development of novel algorithms for the analysis of genomic sequences. This course aims to present some of the most basic and useful algorithms for sequence analysis, together with the minimal biological background necessary for a computer science student to appreciate their application to current genomics research. Sequence alignments, hidden Markov models, multiple alignment algorithms and heuristics such as Gibbs sampling, and the probabilistic interpretation of alignments will be covered. Applications of these tools to sequence analysis will be presented: comparing genomes of different species, gene finding, gene regulation, whole genome sequencing and assembly. Whenever possible, examples will be drawn from the most current developments in genomics research.

Prerequisites
The following courses are strongly recommended:
  • CS161: Design and Analysis of Algorithms, or equivalent familiarity with algorithmic and data structure concepts.

Textbooks

Durbin, Eddy, Krogh, Mitchison "Biological Sequence Analysis"

Gusfield "Algorithms on Strings, Trees, and Sequences"

Requirements and Grading
  1. Homework. Course will be graded based on the homeworks, NO FINAL. The course will have four challenging problem sets of equal size and grading weight. These must be handed in at the beginning of class on the due date, which will usually be two weeks after they are handed out. Recognizing that students may face unusual circumstances and require some flexibility in the course of the quarter, each student will have a total of three free late days (weekends are NOT counted) to use as s/he sees fit. Once these late days are exhausted, any homework turned in late will be penalized at the rate of 20% per late day (or fraction thereof). Under no circumstances will a homework be accepted more than three days after its due date.

    Late homeworks should be turned in to a member of the course staff, or, if none are available, placed under the door of S266 Clark Center. You must write the time and date of submission on the assignment. It is an honor code violation to write down the wrong time. Students with biological and computational backgrounds are encouraged to work together.

  2. Scribing. Optionally, a student can scribe one lecture. Lecture notes will be due one week after the lecture date, and the grade on the lecture notes will substitute the two lowest-scoring problems in the homeworks. To ensure even coverage of the lectures, please sign up to scribe beforehand with one of the course staff.



Collaboration and Honor Code

Students may discuss and work on problems in groups of at most three people but must write up their own solutions. A student can be part of at most one group. If a student works individually, then the worst problem per problem set will be dropped. When writing up the solutions, students should write the names of people with whom they discussed the assignment. Also, when writing up the solutions students should not use written notes from group work.

Students are expected not to look at the solutions from previous years. Copying or intentionally refering to solutions from previous years will be considered an honor code violation.

Class Schedule
Lecture: Monday Wednesday 11:50am - 1:05 pm in Clark S361
Problem Sessions (optional): Friday 3:00 - 4:00pm in Clark S361

Instructor
Serafim Batzoglou
Office: Clark S266
Office hours: Mon 1:30 - 2:30 pm
Phone: (650) 723-3334
Email: ude.drofnats@mifares (written backwards to avoid spam)

Teaching Assistants
Eugene Davydov
Office: Clark S256
Office hours: Tue 1:00 - 3:00pm, Wed 1:30 - 2:30 pm
Phone: (650) 725-6503
Email: ude.drofnats@vodyvade

Cristina Pop
Office: Gates B26A
Office hours: Mon 4:30 - 5:30 pm, Thu 1:00 - 2:00 pm
Phone: (650) 723-6319
Email: ude.drofnats@popc

Communication
All email correspondence should be sent to the course staff mailing list, ude.drofnats.stsil@ffats-9080niw-262sc. Alternatively, you can communicate your questions in person after lecture or during office hours.

Additional Material and Tutorials
Some additional materials can be found Here

Last Year's Lecture Notes
The lecture notes from the Winter 2008 edition of this class are available from the CS 262 Winter 2008 website.
Schedule (future tentative)
As the quarter progresses, the following schedule will be updated accordingly. Please check back often for the latest material.

 DateTitleReadingHomeworksScribe
11/7Course Overview, Basic Biology  Max Libbrecht
21/12Sequence Alignment, Dynamic ProgrammingDurbin Chapters 1, 2
Gusfield Chapters 11, 12.1, 12.2, 12.7
 Zinnia Zheng
31/14Global Alignment Variants, Local Alignment, Gap Scoring PS 1 out 
41/21Linear-Space Alignment, Heuristic Local Aligners (BLAST)  Cory Barr
51/26Hidden Markov Models - Viterbi, Forward AlgorithmsDurbin Chapters 3, 4 Nicholas Dovidio
61/28HMMs - Backward Algorithm, Higher-Order HMMs, State Duration Modeling PS 1 due
PS 2 out
Noru Perez
72/2CpG Islands, Learning, Baum-Welch Algorithm  Nathan Howard
82/4Pair HMMs, Conditional Random FieldsDurbin Chapters 4 David Hall
92/9DNA SequencingARACHNE, Euler, Genome sizes, transposons, genomic mapping--mathematical analysis Robert Bruggner
102/11Sequencing Contd, Physical Mapping, Fragment Assembly PS 2 due
PS 3 out
Sina Firouz
112/18Molecular Evolution and Phylogenetic TreesGusfield Chapter 5
Genescan, Twinscan, EasyGene, SLAM
 Linda Liu
122/23Fragment Assembly Contd  Harry Robertson
132/25Multiple Sequence AlignmentGene Regulation and Motif Finding references belowPS 3 due
PS 4 out
Sean Meador
143/2Chaining of Local Alignments, Protein Profile HMMs and Classification  Pegah Afshar
153/4Gene RecognitionAVID, LAGAN Daniel Newburger
163/9Gene Regulation, MicroarraysChaining: Gusfield 13.3, Multiple Alignment: suggested reading Gusfield 14.1, 14.2, 14.5, 14.5, 14.10.1-14.10.2
Durbin Chapter 6
 Karl Uhlig
173/11Motif Finding PS 4 dueSean Holbert