Which input formats does CEAS accept?
For now, CEAS accepts only BED
or GFF files, which provide flexible
and easy ways to define the regions of interest in the genome.
For our internal bookkeeping purpose, we currently require four fields in
the submitted BED file; namely, (1) chromosome, (2) chromosome_start,
(3) chromosome_end, and (4) uniqueID. Here are more detailed descriptions of
the fields that are needed for each ChIP-region,
chromosome - Name of the chromosome containing the region of interest.
This field should start with "chr", followed by 1, 2, 3, ... 22, X or Y
(e.g. chr3, chrY).
chromosome_start - Starting position of the region.
Please note that in our convention, each chromosome starts at 0. For further
information, chick here.
chromosome_end - Exclusive upper bound of the region;
e.g. chr1 1100 1200 will correspond to the 100 basepairs on chromosome 1
starting at 1100 and ending at 1199.
uniqueID - User-defined name of the region.
This label will be displayed on the summary page as an identifier of the
region you specified.
For further information, please see UCSC's definition of the BED format.
N.B.: These data fields should be in the order
chromosome chromosome_start chromosome_end uniqueID.
CEAS also allows several non-data lines in BED files
-- for example, header information. All non-data lines will be ignored.
Similarly, the GFF format provides structured information associated with DNA,
RNA and Protein sequences. The current specification can be found here.
The GFF fields used in CEAS are:
seqname source feature start end score strand frame [attributes] [comments]
We require the "chromosome" information to be in the
How does CEAS represent the chromosome coordinates?
The starting coordinate of each chromosome in CEAS is 0. The input
chromosome_start position is included but the input chromosome_end position
is not included in our analysis. For example, if you specify
"chr1 10 50 R1"
our analysis will be performed in the region 10, 11, 12, ... , 49. If your
convention is such that the first nucleotide is assigned the position 1
and you are interested in the region starting at A and ending at B on
chromosome 1, including the boundary points, then your input should be
chr1 A-1 B
For each region you specify, CEAS' result page will automatically provide
a hyperlink to the corresponding information on UCSC Bioinformatics
Genome Browser, where one-based start coordinates are displayed.
What kind of output information does CEAS provide?
CEAS' result page provides the following information:
For each region you specify, a repeat-masked sequence (character "N" for
the masked base) will be provided. Current retrieval websites mask only
RepeatMasker repeats and tandem repeats with period of 12 or less. CEAS
outputs these sequences in FASTA format for download.
Biologists are often interested in the level of conservation of their
ChIP-regions across species. CEAS uses the high-quality phastCons information
from the UCSC GoldenPath genome resource, which assigns a conservation score
based on a phylogenetic Hidden Markov Model to virtually every nucleotide in
the human genome. CEAS generates a thumbnail phastCons conservation plot for
each ChIP-region, allowing biologists to skim through hundreds of ChIP-regions
in a single pdf file. In addition, the server extends each ChIP-region to 3kb,
aligns the regions at centers, calculates the average phastCons score at each
aligned position, and generates an average conservation plot, which can give
biologists an idea of how conserved their ChIP-regions are (in the middle of
the plot) compared to the genomic background (at both ends of the plot).
Nearby gene mapping
For each ChIP-region, CEAS reports the nearest RefSeq genes in both upstream
and downstream directions on both strands, unless no gene is found within
300kb. When a ChIP-region lies within a gene, CEAS reports whether it is in
the 5' UTR, 3' UTR, a coding exon, or an intron. The server also provides
summary statistics for gene mapping of all the ChIP-regions, including the
percentages of ChIP-regions that reside in proximal promoters (1kb upstream
from RefSeq 5' start), immediate downstream (1kb downstream from RefSeq 3' end),
5' UTRs, 3' UTRs, coding exons, introns, and enhancers (more than 1kb from
RefSeq). This rough estimate of the ChIP-region distribution will help biologists
understand the specific binding behavior of their transcription factor.
Motif finding and enrichment analysis
CEAS finds enriched sequence motifs in the ChIP-regions that are putatively
bound by the ChIP-chip transcription factor and its cooperative binding
partners. CEAS has pre-collected all the motif matrices in TRANSFAC and
JASPAR databases, and filtered out the motifs that are either from microbial
genomes or constructed with less than 10 sites. CEAS searches for approximately
800 well characterized eukaryotic motifs. Given the user's ChIP-regions, CEAS
counts the number of hits for each motif both within the ChIP-regions and
in the whole genome. It then reports those motifs which are significantly enriched
in the ChIP-regions with > 2 fold-change and binomial test p-value < 1E-10.
For each reported motif, CEAS provides its fold change, p-value, hit sequences
in the ChIP-regions, and sequence logo.
Why haven't I yet received an email from CEAS for my results?
CEAS will typically finished your job in several minutes or hours; an input
file containing 10000 regions will take about 1 hour. When the result page is
generated for your submission, CEAS will send you an email providing you with
an URL for your results.
If you don't receive an email, there are two potential reasons:
(1)Please make sure that you have provided the correct email address.
(2)If your email server has a spam filter, please make sure that it doesn't
reject CEAS' email as a spam.
If you still encounter a problem, please feel free to contact us. Thank you!
An example output is shown below.
The top window contains links to each of the analysis results. Excerpts from
the result sections are shown in the blue callouts in counter-clock-wise order
as genomic sequence of the ChIP-regions in FASTA format, average conservation
plot of the ChIP-regions, sequence logo of an enriched motif, motif site list
with fold change and p-values, and summary of nearby gene mapping of all the