ChIP-chip has become a popular technique to identify genome-wide in vivo
protein-DNA interactions. With genome tiling microarrays commercially
available from Affymetrix, Nimblegen and Agilent, more and more academic
laboratories are adopting this technology to detect cis-regulatory elements
in mammalian genomes. Despite the importance of ChIP-chip, there is a
shortage of web servers developed for integrating the necessary downstream
analysis functions with the capability of processing genome-scale
ChIP-regions. So far all the big ChIP-chip papers in mammalian systems are
published as a direct result of powerful bioinformatics support (e.g. Rick
Young with David Gifford, Mike Snyder with Mark Gerstein, Kevin Struhl with
Tom Gingeras, and Myles Brown with X. Shirley Liu), which is something not
available for smaller labs.
Cis-regulatory Element Annotation System (CEAS) integrates many useful tools
to simplify ChIP-chip analysis for biologists. It can handle hundreds or
thousands of regions from high throughput ChIP-chip experiments.
Given genome-scale ChIP-regions in UCSC genome browser .bed file format, our
CEAS server retrieves information from different sources to help with
downstream analysis. Specifically, it provides the following information:
1. Fully repeat-masked genome DNA sequence for the ChIP-regions for qPCR
validation and transcription factor motif finding. Current UCSC genome
browser does not remove segmental duplication and simple repeats in its DNA
retrieval function, which could create complications for qPCR primer design
and sequence motif finding.
2. GC content and evolutionary conservation of each ChIP-region and their
average. CEAS uses PhastCons conservation scores from UCSC Genome
Bioinformatics, which is based on multiz alignment of human, chimp, mouse,
rat, dog, chicken, fugu, and zebrafish genomic DNA. CEAS generates thumbnail
conservation plot for each ChIP-region and the average conservation plot for
all the ChIP-regions, which can be directly used in ChIP-chip biologists'
3. ChIP-region nearby gene mapping. CEAS examines both upstream and
downstream sequences on both strands to map the nearest RefSeq and miRNA
gene up to 300KB away. In each direction, CEAS reports the distance between
a ChIP-region and its nearest gene. When a ChIP-region is within a gene,
CEAS reports whether the ChIP-region is mapped to 5'UTR, 3'UTR, coding exon,
or intron. CEAS also provides a summary statistics for the location of all
the ChIP-regions based on this gene mapping.
4. Transcription factor motif finding on the fully repeat-masked ChIP
sequences. CEAS finds enriched TRANSFAC and JASPAR motifs in the
ChIP-regions that are the putative binding motifs for the transcription
factor of interest (against which ChIP-chip is conduced) and its cooperative
binding partners. CEAS provides sequence logo, motif enrichment fold change
and p-value for each enriched motif, and combine redundant enriched motifs.
CEAS pre-computes all the motif occurrence information to store in the
database, whereas current TRANSFAC motif-matching programs could not handle
thousands of input sequences.
In summary, CEAS retrieves useful information (e.g. sequence retrieval) for
the validation of ChIP-chip experiments, assembles important knowledge (e.g.
conservation plot, nearby gene mapping, and motif logos) to be included in
biologists' publication, and generates useful hypothesis (e.g. transcription
factor cooperative partner) for further study.