# This package was developed by Adam Sadovsky and Gal Chechik # (C) 2005-2008 # Can be used for academic purposes. # # For other uses and licencing please contact gal@ai.stanford.edu Installation: tar -xvzf findmotifs.tgz make clean all To run: findmotifs.pl --help Implementation details ---------------------- A motif = common ancestor (dist <= A anc threshold) no common descendant (up to dist <= A desc threshold) V motif = common descendant (dist <= V desc threshold) no common ancestor (up to dist <= V anc threshold) P motif = common descendant (dist <= P desc threshold) and common ancestor (dist <= P anc threshold) C motif = cycle of length <= C threshold * motifs are relations, so order matters. e.g. if A-B is a V12, B-A is a V21 * when searching for a common descendant/ancestor, we only find the common desc/anc of least distance.. Consequence: if two genes are in both a V11 and a V12, only the V11 will be listed. * since genes can catalyze multiple reactions, it is possible for a gene to be in a motif with itself (these aren't filtered out) * Weak a motif is a superset of a-and, and weak v motif is a superset of v-and (b/c any genes in a v-and/a-and share a common descendant/ancestor) * P motifs contain all T motifs (b/c any genes in T motif have a common ancestor and a common descendant) * Output with _cmpd contains the nodes (compounds) between each genes. This is currently implemented only for T-chains and for V/A motifs * Currently, v-and and a-and (and v-nec and a-nec) only work for depth=1 * Cycles (C motifs) work for weak and strong, but for weak, there will be repeat cycles of the form A-B-C, B-C-A, C-A-B * compound output seems to work but hasn't really been tested yet * for t-chains, we prevent a chain from passing through the same node or gene more than once. In weak mode, this means not using any of the first gene's start-nodes again in the chain (and not reusing any other nodes). It also means that none of the last gene's end-nodes can have been used before. (Note: this is a complicated definition in weak mode) * if the input file is produced in windows, run dos2unix to convert the endline chars