Input • oncoEnrichR

Input data

The main input to oncoEnrichR is a list of human genes, typically the top ranked hits from a high-throughput screen. The gene list can be provided by using any of the following identifiers:

Primary gene symbols (e.g. KRAS)
Entrez gene IDs (e.g. 3845)
Uniprot accessions (e.g. P01116)
Ensembl gene identifiers (e.g. ENSG00000133703)
Ensembl transcript identifiers (e.g. ENST00000311936)
Ensembl protein identifiers (e.g. ENSP00000308495)
RefSeq transcript identifiers (e.g. NM_004985)
RefSeq peptide identifiers (e.g. NP_004976)

The type of identifier used should be specified using the --query_id_type argument. Similarly, if a background gene set is specified (through the --bgset argument, for use in enrichment analysis), the identifier type should be set with the --bgset_id_type argument.

If the user submits gene symbols which are no longer considered as primary gene symbols, oncoEnrichR attempts to map such cases as synonyms/aliases for the primary gene symbols.

Number of input genes

In order to keep the size of the HTML output report at a manageable level, there is currently an upper limit of

n = 1000

genes that can be used as input to the tool. When running the tool through Galaxy, the limitation with respect to the number of input genes is stricter (n = 200). Note also that if the number of input genes is very low (i.e. n = 1-5), some analysis modules are not applicable for analysis (functional enrichment, protein-protein interaction network etc).

IMPORTANT NOTE: Due to its large size, the HTML report can be slow to load when generating full reports with the maximum number of genes (n = 1000). We generally recommend to use oncoEnrichR with smaller querysets (< 500), as this will produce reports that can be more efficiently loaded and viewed. If you want to submit a query that pushes the limit (n = 1000), we recommend that you carefully configure the report contents/modules, in that sense producing more managable reports.