CS262

Announcements
3/19Solutions to Problem Set 4 have been posted: ppt pdf. We recommend the PowerPoint slides because they include explanatory notes.
3/13The Friday 3/14 Problem Session is cancelled. Solutions to Problem set 4 will not be presented on Friday since not everyone will have turned in their solutions. Instead, we will release the solutions on the website once everyone's solutions are in.
3/12Problem set 4 is due! Please either turn it in in class or email it to cs262.win08@gmail.com by 12:50 pm.
3/5A scribing slot has opened up for March 10. If you would like to scribe this lecture, please email cs262.win08@gmail.com.
3/5Problem set 3 has been graded! Come pick them up during officer hours, problem sessions, or lecture.
2/27Problem set 4 has been posted.
2/27Problem set 3 is due! Please either turn it in in class or email it to cs262.win08@gmail.com by 12:50 pm.
2/22Problem set 2 has been graded! Come pick them up during officer hours, problem sessions, or lecture.
2/13Problem set 3 has been posted.
2/13Problem set 2 is due! Please either turn it in in class or email it to cs262.win08@gmail.com by 12:50 pm.
2/11There was a mistake in the formulas in PS2 problem 1.c.iii. The formulas given in lecture are correct - please see the updated pdf.
2/6Problem set 1 has been graded! Come pick them up during officer hours, problem sessions, or lecture.
2/2Update: Slot filled! A scribing slot has opened up for March 3. If you would like to scribe this lecture, please email cs262.win08@gmail.com.
1/30Problem set 1 is due! Please either turn it in in class or email it to cs262.win08@gmail.com by 12:50 pm.
1/30Problem set 2 has been posted.
1/19Scribing assignments have been made for all lectures.
1/16Problem set 1 has been posted.
1/10There will be no problem session on Friday January 11.
1/10If you need help finding a group for collaborating on the problem sets, please email cs262.win08@gmail.com with a request.
1/10If you would like to scribe one of the lectures, please email cs262.win08@gmail.com with the date you would like.
Course Description
Genomics is a new and very active application area of computer science. The past ten years there has been an explosion of genomics data -- the entire DNA sequences of several organisms, including human, are now available. These are long strings of base pairs (A,C,G,T) containing all the information necessary for an organism's development and life. Computer science is playing a central role in genomics: from sequencing and assembling of DNA sequences to analyzing genomes in order to locate genes, repeat families, similarities between sequences of different organisms, and several other applications. The area of computational genomics includes both applications of older methods, and development of novel algorithms for the analysis of genomic sequences. This course aims to present some of the most basic and useful algorithms for sequence analysis, together with the minimal biological background necessary for a computer science student to appreciate their application to current genomics research. Sequence alignments, hidden Markov models, multiple alignment algorithms and heuristics such as Gibbs sampling, and the probabilistic interpretation of alignments will be covered. Applications of these tools to sequence analysis will be presented: comparing genomes of different species, gene finding, gene regulation, whole genome sequencing and assembly. Whenever possible, examples will be drawn from the most current developments in genomics research.

Prerequisites
The following courses are recommended:
  • CS161: Design and Analysis of Algorithms, or equivalent familiarity with algorithmic and data structure concepts.

Textbooks

Durbin, Eddy, Krogh, Mitchison "Biological Sequence Analysis"

Gusfield "Algorithms on Strings, Trees, and Sequences"

Requirements and Grading
  1. Homework. Course will be graded based on the homeworks, NO FINAL. The course will have four challenging problem sets of equal size and grading weight. These must be handed in at the beginning of class on the due date, which will usually be two weeks after they are handed out. Recognizing that students may face unusual circumstances and require some flexibility in the course of the quarter, each student will have a total of three free late days (weekends are NOT counted) to use as s/he sees fit. Once these late days are exhausted, any homework turned in late will be penalized at the rate of 20% per late day (or fraction thereof). Under no circumstances will a homework be accepted more than three days after its due date.

    Late homeworks should be turned in to a member of the course staff, or, if none are available, placed under the door of S266 Clark Center. You must write the time and date of submission on the assignment. It is an honor code violation to write down the wrong time.Students with biological and computational backgrounds are encouraged to work together.

  2. Scribing. Optionally, a student can scribe one lecture. Lecture notes will be due one week after the lecture date, and the grade on the lecture notes will substitute the two lowest-scoring problems in the homeworks. To ensure even coverage of the lectures, please sign up to scribe beforehand with one of the course staff.



Collaboration and Honor Code

Students may discuss and work on problems in groups of at most three people but must write up their own solutions. A student can be part of at most one group. If a student works individually, then the worst problem per problem set will be dropped. When writing up the solutions, students should write the names of people with whom they discussed the assignment. Also, when writing up the solutions students should not use written notes from group work.

Students are expected not to look at the solutions from previous years. Copying or intentionally refering to solutions from previous years will be considered an honor code violation.

Class Schedule
Lecture: Monday Wednesday 12:50 - 2:05 pm in Skilling 193
Problem Sessions (optional): Friday 1:15 - 2:05 pm in McCullough 115

Instructor
Serafim Batzoglou
Office: Clark Center S266
Office hours: Monday 2:15 - 3:30 pm in Clark Center S266
Phone: (650) 723-3334
Email: ude.drofnats@mifares (written backwards to avoid spam)

Teaching Assistants
Andreas Sundquist
Office: Clark Center S260
Office hours: Thursdays noon - 2 pm in Clark Center S260
Phone: (650) 725-6094
Email: ude.drofnats@iuqdnusa

Marc Schaub
Office: Clark Center S260
Office hours: Tuesdays 1 - 3 pm in Gates B24A (phone during office hours: (650) 725-4385)
Phone: (650) 725-6094 (except during office hours)
Email: ude.drofnats@buahcs.cram

Communication
Questions should be sent to the course email address cs262.win08@gmail.com, or communicated to course staff in person after lecture or during office hours.

Additional Material and Tutorials
Some additional materials can be found Here

Last Year's Lecture Notes
Below are the lecture notes from the Winter 2007 edition of this class. Please note that the content and order will not exactly correspond with this year's lectures. This year's lecture notes will be made available in the Schedule section below after they are submitted by the student scribing the lecture.

1Introduction: Biology Background
2Sequence Alignment--Dynamic Programming
3Sequence Alignment Cont'd--Linear-Space Alignment;
4Heuristic Local Aligners; Four-Russian Algorims
5Hidden Markov Models--Decoding & Evaluation
6Learning: EM / Baum-Welch
7Learning cont'd
8Pair HMMs for Sequence Alignment
9DNA Sequencing
10DNA Sequencing and Fragment Assembly
11Cont'd Fragment Assembly
12Molecular Evolution and Phylogenetic Trees
13Multiple Sequence Alignment
14Chaining of Local Alignments, Protein Profile HMMs and Classification
15Gene Recognition
16Gene Regulation, Microarrays
17Motif Finding
18Protein Interaction Networks
19Protein Structure Prediction
20RNA Secondary Structure Prediction

Schedule (future tentative)
As the quarter progresses, the following schedule will be updated accordingly. Please check back often for the latest material.

 DateTitleReadingHomeworksScribe
11/9Course overview, Basic biology, Sequence Alignment, Dynamic Programming  Clare Kasemset
21/14Biology introductionDurbin Chapters 1, 2
Gusfield Chapters 11, 12.1, 12.2, 12.7
 Dipankar Bhatt Acharya
31/16Sequence Alignment Cont'd--Linear-Space Alignment; PS 1 outRobin Zhou
41/21No Class (Martin Luther King, Jr., Day)   
51/23Heuristic Local Aligners; Four-Russian Algorims  Chuang Peng
61/28Hidden Markov Models--Decoding & EvaluationDurbin Chapters 3, 4 Jason Auerbach
71/30Learning: EM / Baum-Welch PS 1 due
PS 2 out
Huy Seng
82/4Learning cont'd  Chung Ng
92/6Pair HMMs for Sequence AlignmentDurbin Chapters 4 Hieu Nguyen
102/11DNA SequencingARACHNE, Euler, Genome sizes, transposons, genomic mapping--mathematical analysis Patrick Shih
112/13DNA Sequencing and Fragment Assembly PS 2 due
PS 3 out
Karan Mangla
122/18No Class (Presidents' Day)   
132/20Cont'd Fragment AssemblyGusfield Chapter 5
Genescan, Twinscan, EasyGene, SLAM
 Fah Sathirapongsasuti
142/25Molecular Evolution and Phylogenetic Trees  Saeed Hassanpour
152/27Multiple Sequence AlignmentGene Regulation and Motif Finding references belowPS 3 due
PS 4 out
Sarim Baig
163/3Chaining of Local Alignments, Protein Profile HMMs and Classification  Nuwan Seneratna
173/5Gene RecognitionAVID, LAGAN Khan Shing
183/10Gene Regulation, MicroarraysChaining: Gusfield 13.3, Multiple Alignment: suggested reading Gusfield 14.1, 14.2, 14.5, 14.5, 14.10.1-14.10.2
Durbin Chapter 6
 Daniel Kluesing
193/12Motif Finding PS 4 dueCrystal Fong