Data Analysis and
BioInformatics in real-time qPCR (1)
What is
Bioinformatics?
In the last few decades,
advances in molecular biology and the
equipment available for research in this
field have allowed the increasingly rapid
sequencing of large portions of the genomes
of several species. In fact, to date,
several bacterial genomes, as well as those
of some simple eukaryotes (e.g., Saccharomyces
cerevisiae, or baker's yeast)
and more complex eukaryotes (C. elegans and
Drosophila) have been sequenced in full. The
Human Genome Project, designed to sequence
all 24 of the human chromosomes, is also
progressing and a rough draft was completed
in the spring of 2000.
http://www.library.csi.cuny.edu/~davis/molbiol/lecture_notes/bioinformatics_genomics/bioinformaticsIntro.html
|
Types of data
|
|
Analysis
and Interpretation of Data
|
|
|
|
The various types of
data:
Many different types
of data are collected and stored in
databases to facilitate retrieval. Depicted
here are amino acid sequences, protein
domain cartoons, different renderings of
three-dimensional structures, and protein
hydrophobicity data. Databases consisting of
data derived experimentally such as
nucleotide sequences and three-dimensional
structures are known as primary databases.
Those data that are derived from the
analysis or treatment of primary data such
as secondary structures, hydrophobicity
plots, and domains are stored in secondary
databases. A protein database consisting of
the conceptual translation of nucleotide
sequences would also be considered a
secondary database.
|
|
The
analysis and interpretation of various data
types: Illustrated here are
various ways in which individual entries in
sequence and structure databases can be
compiled to reveal patterns and trends in
biology. For example, sequence families or
neighborhoods can be defined and annotated
based on the similarity of each sequence to
other members of the family. Common sequence
features in sequence families can be
identified in multiple alignments. These
motifs may provide clues to the biochemical
function of members of the family. Clustering
of sequences into trees that reflect the
degree of similarity between each sequence and
all of the others in the family reveals
evolutionary relationships. Finally,
identification of homologs to each gene in
well-characterized metabolic pathways provides
information about the prevalence of that
pathway in other organisms.
|
http://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.html
|
|
http://www.ncbi.nlm.nih.gov/Education/Bioinformatics/dataanal.html
|
qPCR
software applications:
- Normalisation and
Housekeeping Genes:
Molecular
Biology
Freeware for Windows
A. General - below
B. Microarray -
next page
C. Java programs - next
page
Good places to start are Genamics
SoftwareSeek and BioExchange
and eBioinfogen.
For general software see Winsite.
The following sites are arranged in the order that I
discovered them. At some point they will be
clustered by poreference:
A. DNA,
RNA
and genomic analysis
B. Plasmid
graphic
packages
C. Primer
design
D. Protein
analysis
E. Viewing
three
dimensional structures
F. Alignments
G. Phylogeny
H. Miscellaneous
Statistical power
calculations
R. V. Lenth
Department of Statistics and Actuarial Science,
University of Iowa, Iowa City 52242
ABSTRACT:
This
article
focuses on how to do meaningful power
calculations and sample-size determination for
common study designs. There are 3 important
guiding principles. First, certain types of
retrospective power calculations should be
avoided, because they add no new information to
an analysis. Second, effect size should
be specified on the actual scale of measurement,
not on a standardized scale. Third, rarely can a
definitive study be done without first doing a
pilot study. Some simple examples as well as a
complex example are given. Power calculations
are illustrated using Java applets developed by
the author.
http://www.stat.uiowa.edu/~rlenth/Power/
and
http://www.stat.uiowa.edu/~rlenth/Power/oldversion.html
(runs more stable in Internet Explorer 7)
Java applets for power and sample size
This software is intended to be useful in planning
statistical studies. It is not intended to be
used for analysis of data that have already been
collected. Each selection provides a graphical
interface for studying the power of one or more
tests. They include sliders (convertible to
number-entry fields) for varying parameters, and a
simple provision for graphing one variable against
another.
Each dialog window also offers a Help menu.
Please read the Help menus before contacting me with
questions.
The "Balanced ANOVA" selection provides another dialog
with a list of several popular experimental designs,
plus a provision for specifying your own model.
Note: The
dialogs open in separate windows. If you're running
this on an Apple Macintosh, the applets' menus are
added to the screen menubar -- so, for example, you'll
have two "Help" menus there!
You may also download this software to run it on
your own PC.
Power Calculator
-
Written in PHP by Arno
Ouwehand, using the DSTPLAN
distribution by Barry Brown
et al. These calculators extend the functionality of
the old Xlisp-Stat based Power Calculator by not
only computing the power for given sample size, or
sample size for given power, but will also compute
the other available items when specified.
Further statistical calculators
here => http://calculators.stat.ucla.edu/
by UCLA Department of Statistics
URI Genomics
& Sequencing Center
Calculator
for determining the number of copies
of a template
qPCR-DAMS: a Database Tool to
Analyze, Manage, and Store Both Relative and Absolute Quantitative Real-Time PCR data.
Jin N, He K, Liu L.
Physiol Genomics. 2006
Physiological Sciences, Oklahoma State University,
Stillwater, OK, USA.
Quantitative
real-time
PCR is an important high throughput method in
biomedical sciences. However, existing
software has limitations in handling both
relative and absolute quantification. We
designed qPCR-DAMS (Quantitative PCR Data
Analysis and Management System), a database
tool based on Access 2003, to deal with such
shortcomings by the addition of integrated
mathematical procedures. qPCR-DAMA allows a
user choose among four methods for data
processing within a single software package:
(I) Ratio relative quantification, (II)
Absolute level, (III) Normalized absolute
expression, and (IV) Ratio absolute
quantification. qPCR-DAMS also provides a tool
for multiple reference gene normalization.
qPCR-DAMS has three quality control steps and
a data display system to monitor data
variation. In summary, qPCR-DAMS is a handy
tool for real-time PCR users.
Availability: This software is free for
academic use and downloadable at http://download.gene-quantification.info/
FastPCR is a free
software for Microsoft Windows and is based
on a new approach in the design of PCR primers
for standard and long PCRs, inverse PCR,
direct amino acid sequence degenerate PCR,
multiplex PCR and in silico PCR; for
sequence alignments, clustering and any kind
repeat sequence searching.
At this moment the program is only for OS
Microsoft Windows, but C#
.Net Linux and Mac program versions are
currently under preparation.
FastPCR Software can simultaneously
work with multiple nucleic acid or protein
sequences (up to 1,000,000). The multiplex
PCR primers design and "in silico" PCR
are also supported. The FastPCR program is an
ideal software for personal databases homology
searches which are similar to the basic local
alignment search tool (BLAST) algorithm (a
segment-to-segment alignment principle similar
to DIALIGN).
The
program includes various bioinformatics tools
and supports the clustering of sequences. A new
repeats search theory was developed and applied
to the program, which makes the accomplishment
of all DNA repeat types searches fast and
powerful.
FastPCR software has several
specific, ready-to-use templates for many PCR
and sequencing applications:
- Standard,
inverse and long PCR - Locates
optimal primers for PCR, hybridisation, or
sequencing.
- Multiplex PCR
primers design - fast primers design
with a cross-dimers test for high sensitive
multiplex PCRs.
- Design
group-specific PCR primers.
- Degenerate
PCR: primers are designed
directly on an amino acid sequence.
- In Silico
PCR - prediction of probable
PCR products and the mismatche primer
location search.
- Primers
Secondary structures -
self-dimers and cross-dimers primer analyses;
primer alignment and melting temperatures
calculation.
- False priming
- primers checking for multiple annealing
sites using sequence alignment algorythms.
- Primer quality
- a unique way for PCR efficiency
determination.
- Comprehensive
primer report - comprehensive
pairs and individual primers analysis.
The software supports several file
formats: FASTA, text and Excel files.
Tools:
- Primer tests and dimer detection;
- Powerful Repeats Search: Invert,
Direct, Simple and others;
- Clustering Sequences;
- Make complement, reverse complement and
inverted stand;
- Search the sequence with universal
degenerated code with alignment;
- Extract the sequence from selected
sites;
- Protein/DNA translation;
- Calculation the annealing temperature
of PCR in case unknowns PCR product.
- Database tools;
- Restriction analysis.
- Each application document contains
customisable search settings, based on the
latest published primer selection criteria for
those applications.
Bioinformatics
analysis
of alternative splicing |
Christopher Lee & Qi Wang
|
Briefings in Bioinformatics
Volume: 6 Number:
1 Page: 23 -- 33 |
Over the past few
years, the analysis of alternative splicing using
bioinformatics has emerged as an important new
field, and has significantly changed our view of
genome function. One exciting front has been the
analysis of microarray data to measure alternative
splicing genome-wide. Pioneering studies of both
human and mouse data have produced algorithms for
discerning evidence of alternative splicing and
clustering genes and samples by their alternative
splicing patterns. Moreover, these data indicate
the presence of alternative splice forms in up to
80 per cent of human genes. Comparative genomics
studies in both mammals and insects have
demonstrated that alternative splicing can in some
cases be predicted directly from comparisons of
genome sequences, based on heightened sequence
conservation and exon length. Such studies have
also provided new insights into the connection
between alternative splicing and a variety of
evolutionary processes such as Alu-based
exonisation, exon creation and loss. A number of
groups have used a combination of bioinformatics,
comparative genomics and experimental validation
to identify new motifs for splice regulatory
factors, analyse the balance of factors that
regulate alternative splicing, and propose a new
mechanism for regulation based on the interaction
of alternative splicing and nonsense-mediated
decay. Bioinformatics studies of the functional
impact of alternative splicing have revealed a
wide range of regulatory mechanisms, from NAGNAG
sites that add a single amino acid; to short
peptide segments that can play surprisingly
complex roles in switching protein conformation
and function (as in the Piccolo C2A domain); to
events that entirely remove a specific protein
interaction domain or membrane anchoring domain.
Common to many bioinformatics studies is a new
emphasis on graph representations of alternative
splicing structures, which have many advantages
for analysis.
Comparison
of different melting temperature
calculation methods for short DNA
sequences. |
Alejandro Panjkovich &
Francisco Melo |
Bioinformatics (21,6):
711 -- 722 |
|
|
Motivation:
The overall performance of several
molecular biology techniques involving
DNA/DNA hybridization depends on the
accurate prediction of the experimental
value of a critical parameter: the melting
temperature Tm. Till date, many
computer software programs based on
different methods and/or parameterizations
are available for the theoretical
estimation of the experimental Tm
value of any given short oligonucleotide
sequence. However, in most cases, large
and significant differences in the
estimations of Tm were obtained
while using different methods. Thus, it is
difficult to decide which Tm value
is the accurate one. In addition, it seems
that most people who use these methods are
unaware about the limitations, which are
well described in the literature but not
stated properly or restricted the inputs
of most of the web servers and standalone
software programs that implement them.
Results: A quantitative comparison
on the similarities and differences among
some of the published DNA/DNA Tm
calculation methods is reported. The
comparison was carried out for a large set
of short oligonucleotide sequences ranging
from 16 to 30 nt long, which span the
whole range of CG-content. The results
showed that significant differences were
observed in all the methods, which in some
cases depend on the oligonucleotide length
and CG-content in a non-trivial manner.
Based on these results, the regions of
consensus and disagreement for the methods
in the oligonucleotide feature space were
reported. Owing to the lack of sufficient
experimental data, a fair and complete
assessment of accuracy for the different
methods is not yet possible. Inspite of
this limitation, a consensus Tm
with minimal error probability was
calculated by averaging the values
obtained from two or more methods that
exhibit similar behavior to each
particular combination of oligonucleotide
length and CG-content class. Using a total
of 348 DNA sequences in the size range
between 16mer and 30mer, for which the
experimental Tm data are
available, we demonstrated that the
consensus Tm is a robust and
accurate measure. It is expected that the
results of this work would be constituted
as a useful set of guidelines to be
followed for the successful experimental
implementation of various molecular
biology techniques, such as quantitative
PCR, multiplex PCR and the design of
optimal DNA microarrays.
Availability: A binary software
distribution to calculate the consensus Tm
described in this work for thousands of
oligonucleotides simultaneously for the
LINUX operating system is freely available
upon request to the authors or from our
website http://protein.bio.puc.cl/melting-temperatures.html
Supplementary information: The large
set of oligonucleotides, the detailed
results of the comparative and accuracy
benchmarks, and hundreds of comparative
graphs generated during this work are
available at our website http://protein.bio.puc.cl/melting-temperatures.html
|
A data-driven clustering method for time course
gene expression data.
Ma P, Castillo-Davis CI, Zhong W, Liu JS.
Nucleic Acids Res. 2006 Mar
1;34(4):1261-9. Print 2006.
Department of Statistics, Harvard University,
Cambridge, MA 02138, USA.
Gene
expression
over time is, biologically, a continuous process and
can thus be represented by a
continuous function, i.e. a curve. Individual genes
often share similar expression
patterns (functional forms). However, the shape of
each function, the number of such
functions, and the genes that share similar functional
forms are typically unknown. Here we introduce an
approach that allows direct
discovery of related patterns of gene expression and
their underlying functions (curves)
from data without a priori specification of either cluster
number or functional form. Smoothing spline
clustering (SSC) models natural
properties of gene expression over time, taking into
account natural differences in gene
expression within a cluster of similarly expressed
genes, the effects of experimental
measurement error, and missing data. Furthermore, SSC
provides a visual summary of each cluster's gene
expression function and goodness-of-fit
by way of a 'mean curve' construct and its
associated confidence bands. We
apply this method to gene expression data over the
life-cycle of Drosophila
melanogaster and Caenorhabditis elegans to discover
17 and 16 unique patterns of gene
expression in each species, respectively. New and
previously described expression
patterns in both species are discovered, the
majority of which are biologically
meaningful and exhibit statistically significant
gene function enrichment.
Distribution-insensitive
cluster
analysis in SAS on real-time PCR gene
expression data of steadily expressed genes.
Tichopad A, Pecen L, Pfaffl MW.
Comput Methods Programs Biomed. 2006
Apr;82(1):44-50. Epub 2006
Cluster analysis is a tool often employed in the
micro-array techniques but used less in the
real-time PCR. Herein we present core SAS code that
instead of the Euclidian distances takes correlation
coefficient as a dissimilarity measure. The
dissimilarity measure is made robust using a
rank-order correlation coefficient rather than a
parametric one. There is no need for an overall
probability adjustment like in scoring methods based
on repeated pair-wise comparisons. The rank-order
correlation matrix gives a good base for the
clustering procedure of gene expression data
obtained by real-time RT-PCR as it disregards the
different expression levels. Associated with each
cluster is a linear combination of the variables in
the cluster, which is the first principal component.
Large set of variables can then be replaced by the
set of cluster components with little loss of
information. In this way, distinct clusters
containing unregulated housekeeping genes along with
other steadily expressed genes can be disclosed and
utilized for standardization purposes. Simulated
data in parallel with the data from a biological
experiment were taken to validate the SAS macro. For
both cases, good intuitive results were obtained.
Real-time RT-PCR: Neue
Ansätze zur exakten mRNA Quantifizierung
BioSpektrum 1/2004 (in
German)
Die molekularen Technologien
Genomics, Transcriptomics und Proteomics erobern
immer mehr die klassischen Forschungsgebiete der
Biowissenschaften. Die enorme Flut an gewonnenen
Daten und Ergebnissen ist von überproportionalem
Nutzen in der molekularen Diagnostik und
Physiologie sowie die „Functional Genomics“. Immer
neue ausgeklügelte Methoden und Anwendungen sind
daher nötig um komplexe physiologische Vorgänge zu
beschreiben. Da wir uns erst an Anfang dieser
molekularen Ära befinden, ist es notwendig diese
Techniken zu optimieren und komplett zu verstehen.
Eine dieser technisch ausgefeilten Methoden zur
zuverlässigen und exakten Quantifizierung
spezifischer mRNA, stellt die real-time RT-PCR
dar. Dieser Artikel beschreibt im Wesentlichen die
effizienzkorrigierte relative
Quantifizierung, die Normalisierung der
Expressionsergebnisse anhand eines nicht
regulierten „Housekeeping Gens“, die Berechnung
der real-time PCR Effizienz sowie die Verrechnung
und statistische Auswertung der
Expressionsergebnisse. Alle beschriebenen
Themenkomplexe können im Detail auf der
korrespondierenden Internetseite in
internationalen publizierten Originalarbeiten
nachgeschlagen werden.
|
Nucleic Acids Research - Recent Hot Papers
|
|
Nucleic Acids Research 2005 vol 33
(Database issue)
The 2005 Database Issue of Nucleic Acids
Research is the twelfth in a series dedicated
to factual databases in the field of molecular
biology. Such databases are an essential
resource for working biologists and this
compilation provides descriptions and updates
of the most important of these databases and
serves to introduce newly ... [Full
Text
of this Article]
http://nar.oupjournals.org/content/vol33/suppl_1/
Database
Categories List
- Nucleotide Sequence Databases
- RNA sequence databases
- Protein sequence databases
- Structure Databases
- Genomics Databases (non-vertebrate)
- Metabolic and Signaling Pathways
- Human and other Vertebrate Genomes
- Human Genes and Diseases
- Microarray Data and other Gene
Expression Databases
- Proteomics Resources
- Other Molecular Biology Databases
- Organelle databases
- Plant databases
- Immunological databases
|
|
Nucleic Acids
Research 2004 vol 32 (Web Server issue)
Last year
Nucleic Acids Research published a special
issue devoted to web servers. This issue
complemented the annual Database Issue, which
has now appeared in 11 successive years. The
Web Server Issue highlights the many servers
that are available on the web to perform
useful computations on DNA, RNA and protein
sequences and structures. Between them, the
two issues provide an unparalleled array of
useful computational services. The new Web
Server Issue aims to provide a repository in
which authors of web servers can highlight
their offerings and readers can find out what
is available.
In the current issue there are reports of 137
web servers that run the gamut from BLAST
services to three-dimensional protein
structure prediction. The servers described
have all been subjected to rigorous peer
review, are available free of charge and
provide invaluable resources to the scientific
community. The scientists and programmers who
have provided these resources deserve our
immense thanks. They illustrate the very best
of the scientific spirit that transcends
national boundaries and promotes cooperation
and the sharing of resources.
http://nar.oupjournals.org/content/vol32/suppl_2/index.dtl
|
|
A web server for performing electronic
PCR
Kirill Rotmistrovsky, Wonhee Jang and Gregory
D. Schuler
National Center for Biotechnology Information,
National Library of Medicine, National
Institutes of Health, Bethesda, MD 20984, USA
‘Electronic PCR’ (e-PCR) refers to a
computational procedure that is used to search
DNA sequences for sequence tagged sites
(STSs), each of which is defined by a pair of
primer sequences and an expected PCR product
size. To gain speed, our implementation
extracts short ‘words’ from the 3' end of each
primer and stores them in a sorted hash table
that can be accessed efficiently during the
search. One recent improvement is the use of
overlapping discontinuous words to allow
matches to be found despite the presence of a
mismatch. Moreover, it is possible to allow
gaps in the alignment between the primer and
the sequence. The effect of these changes is
to improve sensitivity without significantly
affecting specificity. The new software
provides a search mode using a query STS
against a sequence database to augment the
previously available mode using a query
sequence against an STS database. Finally,
e-PCR may now be used through a web service,
with search results linked to other web
resources such as the UniSTS database and the
MapViewer genome browser. The e-PCR web server
may be found at www.ncbi.nlm.nih.gov/sutils/e-pcr
|
|
Sequence Mapping by Electronic PCR
Gregory D. Schuler
Genome Research
Vol. 7, No. 5, pp. 541-550, May 1997
National Center for Biotechnology Information,
National Library of Medicine, National
Institutes of Health, Bethesda, Maryland 20984
The
highly specific and sensitive PCR provides the
basis for sequence-tagged sites (STSs), unique
landmarks that have been used widely in the
construction of genetic and physical maps of
the human genome. Electronic PCR (e-PCR)
refers to the process of recovering these
unique sites in DNA sequences by searching for
subsequences that closely match the PCR
primers and have the correct order,
orientation, and spacing that they could
plausibly prime the amplification of a PCR
product of the correct molecular weight. A
software tool was developed to provide an
efficient implementation of this search
strategy and allow the sort of en masse
searching that is required for modern genome
analysis. Some sample searches were performed
to demonstrate a number of factors that can
affect the likelihood of obtaining a match.
Analysis of one large sequence database record
revealed the presence of several
microsatellite and gene-based markers and
allowed the exact base-pair distances among
them to be calculated. This example provides a
demonstration of how e-PCR can be used to
integrate the growing body of genomic sequence
data with existing maps, reveal relationships
among markers that existed previously on
different maps, and correlate genetic
distances with physical distances.
|
iPCR |
iPCR
=
Virtual PCR
http://www.ch.embnet.org/software/iPCR_form.html
|
In
silico PCR |
In silico
simulation of molecular biology
experiments
http://insilico.ehu.es
In silico experiments with complete
genomes
This site
has been developed by Dr. Joseba
Bikandi, Dr. Rosario San Millán and
co-workers in the Department of
Immunology, Microbiology and
Parasitology, Faculty of Pharmacy,
in the University of the Basque
Country.
Some tools
included in this site or their prior
versions where primarily developed
to obtain theoretical PCR results
with Salmonella by the group of Dr.
Javier Garaizar and Dr. Aitor
Rementeria research group. Latter
they were adapted to be used with
any bacterial species sequenced up
to date. The list of genomes is
updated shortly after their
availability at NCBI, and the number
of tools available will also
increase in the near future.
Additional databases used by these
tools have been obtained from NCBI
and in some cases a link will
redirect users to NCBI in order to
obtain specific information.
|
UCSC
In-Silico
PCR |
UCSC In-Silico PCR
http://genome.brc.mcw.edu/cgi-bin/hgPcr/
In-Silico PCR searches a sequence database with
a pair of PCR primers, using an indexing
strategy for fast performance.
Configuration
Options
- Genome and Assembly - The
sequence database to search.
- Forward Primer - Must be at
least 15 bases in length.
- Reverse Primer - On the
opposite strand from the forward primer.
Minimum length of 15 bases.
- Max Product Size - Maximum
size of amplified region.
- Min Perfect Match - Number
of bases that match exactly on 3' end of
primers. Minimum match size is 15.
- Min Good Match - Number of
bases on 3' end of primers where at least 2
out of 3 bases match.
- Flip Reverse Primer - Invert
the sequence order of the reverse primer and
complement it.
|
New real-time PCR primer and probe
databases:
more
PRIMER links => here
Publication: PATTYN, F.,
SPELEMAN, F., DE PAEPE A. & VANDESOMPELE, J.
(2003). RTPrimerDB: the Real-Time PCR primer
and probe database. Nucleic
Acids Research, 31(1): 122-123)
Publication:
Xiaowei Wang and Brian Seed
(2003) A PCR primer bank for quantitative gene
expression analysis.
Nucleic Acids Research 31(24):
e154; pp.1-8.
- The
Quantitative
PCR
Primer Database (QPPD) provides
information about primers and probes that can be
used to quantitate human and mouse mRNA by reverse
transcription polymerase chain reaction (RT–PCR)
assays. All data has been gathered from published
articles, cited in PubMed.
|