This is a breakdown of how frequently the motif occurs in the various genomes and how many of those hits can be considered "homologous". There are several types of occurrences to consider: how frequently the motif occurs in the genome; how frequently the motif occurs near genes (to within +/- 500kb); and how many distinct genes the motif occurs near (each gene may have multiple hits). Also included is a breakdown of "within" versus "across" species hamming distance, as described in the paper.
This is a standard motif logo, though the characters have been boxed and scaled for visual perspecuity. Less clear are the controls at the top of that logo: the first toggle determines what group of sites you want to compute the logo for---conserved sites, sites near genes, or all sites. The second toggle determines which species you want to consider---sites in human, mouse, or rat. This toggle affects only which sequences are used in the computation of the logo, and does not describe anything about the sites; a "conserved" "human only" setting makes sense: it is all sites in human that have orthologous hits in all three species.
This is a very risky textfield. Basically, if you recognize this motif or have some sort of insight into its occurrences, please note that here. This field was used while hand-annotating the database to check for peculiar motifs or uninformative ones (see a visual description of the process for more information on that). Your entry will be automatically timestamped; the database is occasionally cleaned of "test" annotations (if someone enters "This is a test annotation" or some such, it will get removed eventually).
Each site that lies near a gene is tracked with its position relative to that gene. Within a gene, positions are assigned floating point numbers between 0.0 and 1.0 inclusively, where 0 represents the beginning of a gene and 1 represents the end (thus, -1..-500kb represent upstream regions; this is all corrected for strand so -500kb is all relative to TSS). However, the less intuitive part of this is that the proportional position is considered in exons only---if a site occurs in an intron, it is assigned the position of the last part of the previous exon. As with the motif logo, you may specify the source and conservation level of the data used to construct the histogram. It's a "pseudo" histogram because the X axis is not quite continuous.
Beneath the positional bias histogram is the "proportional positional bias" chart. The problem with the first histogram is that many genes are not the full megabase+gene size since they occur in the middle of a group of genes (but, since we derive all annotations from ensembl, they do not overlap). Thus, you would expect a larger percentage of the sites that occur upstream of a gene to occur closer to 0 than farther from 0 not because the motif is necessarily a regulator, but because more genes have -20..0 regions than -500000..-499980 regions. Unfortunately, this attempt to correct for that seems to overweight features farther out presumably due to an inconsistency in the model. That is, I'm pretty sure the graph is computed correctly, but it seems to rely on a flawed model. Thus, it is unclear how one could use the positional bias to draw any conclusion about motif occurrences. On the other hand, it does help eliminate some motifs as being clearly not important (e.g., if all hits are scattered from 0 to 1, it is likely that the motif is part of a coding region).
We extracted all Zinc finger and POU domain transcription factors from TRANSFAC (public edition) that had multiple confirmed binding sites or a position weight matrix in primates or rodents. In the TRANSFAC "schema" (I use this word loosely here---it's a schema in the sense of "way of doing things", not in the sense of "well-designed database layout"), this corresponds to Factors with one or more Binding Sequence or Matrix (not all Factors exhibit these properties). We compared all of our motifs against those transcription factors to see what matched. Very little did, except for RE1. Rather than choose a priori cutoffs for you, there are toggles and fields for you to change the properties of the alignments that you might consider relevant.
Using a particular motif (i.e., the one whose page you are viewing) as a reference set, this table tells you which other motifs overlap with it. Keeping in mind that "this" refers to the motif whose page you are vieweing and "that" refers to the other motif, the numbers in the table should make sense. The controls at the top of the table govern what gets reported. Clicking on the name of a motif will take you to its details page.
An overlap width of 10 and "true positive" threshold of 10% were used to group motifs into families, which were then annotated according to the process for motifs; there were 7 families with about 200 motifs in them.
Neil Jones. Last modified 8/02/2006.