The genomic and transcriptomic landscape of a tumor is a major determinant
of the efficacy of several drugs and is central for clinical trial design for
targeted therapies. Furthermore, many genomic alterations, which are drug targets,
occur in several tumor types and are prime candidates for targeted drugs. However,
finding the most relevant targets in cancer sample sets is difficult but can be
greatly enhanced with integration of multiple types of data.
We have built a Gene Ontology based drug priorization tool, GOPredict. We have trained
the prioritization tool for breast and ovarian cancer by integrating genomic and transcriptomic data from
the Cancer Genome Atlas, and
signaling pathway information from Gene Ontology. The code
for the stand-alone GOPredict can be downloaded below. Using the stand-alone application requires installing
Anduril and Moksiskaan (below).
With the prioritization webapp users can submit their own small breast or ovarian cancer data sets
and GOPredict will predict the most relevant drugs and drug targets based on the submitted data
and pre-calculated statistic for one sample at a time (below).
This program is for research purposes only. For other purposes, contact the
Systems Biology Laboratory.
User guide (webapp)
Input data for GOPredict should be a tab-delimited file with two columns. The first column contains
Ensembl format gene identifiers. The second column contains
the status of the that gene in the sample coded as '-1' for downregulation, '1' for upregulation
and '0' for no change. Here is an example file. Note that the input
file must be tab-delimited and must not contain quotation marks.
Status information can be obtained in multiple ways. In simplest case, your data set contains gene
expression measurements for your samples. These can be then analyzed for downregulation and upregulation
to obtain the status information for each sample. Another simple use case is for copy-number data
where '1' corresponds to amplification and '-1' to deletion of a gene. In a more
complex scenario, you can integrate copy-number
and expression data. For example, you can set the status of a gene to be '1' if and only if the sample
shows both expression upregulation and copy-number amplification. The interpretation of the result is
dependent on how the status is assigned. See the GOPredict paper for how we combine multiple data levels
for a status matrix.
In case you wish to query multiple samples, we recommend you install the
Anduril framework and Moksiskaan database,
and download and run the stand-alone analysis
scripts or contact us. You can also use
the web interface to execute individual queries for each sample or do some dimension reduction to
obtain one status vector over the whole sample set.
The output contains three columns: the name of the drug, the inhibition score and the penalty term.
The larger the inhibition penalty, the more anti-cancer effect (i.e., promotion of apoptosis, inhibition
of proliferation) the drug is predicted to have. The penalty term quantifies the pro-cancer effect the
drug is predicted to have (i.e., promotion of proliferation, inhibition of apoptosis).
Louhimo, R.*, Laakso, M.*, Belitskin, D., Klefström, J., Lehtonen, R. and Hautaniemi, S..
Data integration to prioritize drugs using genomics and literature data