The genomic and transcriptomic landscape of a tumor is a major determinant of the efficacy of several drugs and is central for clinical trial design for targeted therapies. Furthermore, many genomic alterations, which are drug targets, occur in several tumor types and are prime candidates for targeted drugs. However, finding the most relevant targets in cancer sample sets is difficult but can be greatly enhanced with integration of multiple types of data.

We have built a Gene Ontology based drug priorization tool, GOPredict. We have trained the prioritization tool for breast and ovarian cancer by integrating genomic and transcriptomic data from the Cancer Genome Atlas, and signaling pathway information from Gene Ontology. The code for the stand-alone GOPredict can be downloaded below. Using the stand-alone application requires installing Anduril and Moksiskaan (below). With the prioritization webapp users can submit their own small breast or ovarian cancer data sets and GOPredict will predict the most relevant drugs and drug targets based on the submitted data and pre-calculated statistic for one sample at a time (below).

This program is for research purposes only. For other purposes, contact the Systems Biology Laboratory.


User guide (webapp)

Input data for GOPredict should be a tab-delimited file with two columns. The first column contains Ensembl format gene identifiers. The second column contains the status of the that gene in the sample coded as '-1' for downregulation, '1' for upregulation and '0' for no change. Here is an example file. Note that the input file must be tab-delimited and must not contain quotation marks.

Status information can be obtained in multiple ways. In simplest case, your data set contains gene expression measurements for your samples. These can be then analyzed for downregulation and upregulation to obtain the status information for each sample. Another simple use case is for copy-number data where '1' corresponds to amplification and '-1' to deletion of a gene. In a more complex scenario, you can integrate copy-number and expression data. For example, you can set the status of a gene to be '1' if and only if the sample shows both expression upregulation and copy-number amplification. The interpretation of the result is dependent on how the status is assigned. See the GOPredict paper for how we combine multiple data levels for a status matrix.

In case you wish to query multiple samples, we recommend you install the Anduril framework and Moksiskaan database, and download and run the stand-alone analysis scripts or contact us. You can also use the web interface to execute individual queries for each sample or do some dimension reduction to obtain one status vector over the whole sample set.

The output contains three columns: the name of the drug, the inhibition score and the penalty term. The larger the inhibition penalty, the more anti-cancer effect (i.e., promotion of apoptosis, inhibition of proliferation) the drug is predicted to have. The penalty term quantifies the pro-cancer effect the drug is predicted to have (i.e., promotion of proliferation, inhibition of apoptosis).

Analyze with GOPredict

Example input


Breast cancer Ovarian cancer

Source code


Louhimo, R.*, Laakso, M.*, Belitskin, D., Klefström, J., Lehtonen, R. and Hautaniemi, S..
Data integration to prioritize drugs using genomics and literature data (submitted)


  • Anduril bioinformatics framework
  • Moksiskaan pathway integration tool for Anduril