Home

PICKER-HG Web Server

PerformIng Classification and Knowledge Extraction via Rules using random forests on Human Genes

PICKER-HG is an easy-to-use service so that biologists can apply a state-of-the-art data mining tool to their data.

Home

Quick start guide

Step 1: Click on "Load dataset" to load your
training genesA set of genes with known class labels (e.g.: over-expressed or under-expressed)
and
testing genesA set of genes with unknown class labels, that one wishes to predict (e.g. genes without over- and under- expression data)
. Optionally, after you load your data, you may check statistics about your dataset by clicking on "Dataset statistics"
Step 2: Click on "Train model" to train the classification model, this may take some time.
Step 3: Check the testing set
predictionsThe prediction made by the data mining algorithm given the characteristics (features) of supplied training genes
and predictive accuracy estimation by clicking on "See predictive accuracy and testing results". To see the
rulesA sequence of conditions leading to a prediction, automatically constructed by the data mining algorithm, given the training genes.
generated by the system, click on "See If-Then rules".

What this server can do:

Output predictions: Estimate the probability non-labelled genes belonging to the classes defined by the user, assisting biologists identifying possible targets for further analysis.
Generate rules: Automatically find rules that "explain" the classification of the system, possibly giving new biological insights to the users.

For more detailed information, please continue reading below.

Welcome to the PerformIng Classification and Knowledge Extraction via Rules using random forests on Human Genes (PICKER-HG) web server. The main objective of this service is to provide an easy-to-use service so that biologists can apply a state-of-the-art data mining (machine learning) tool to their data (for more detailed information about the web server, see the help page).

This web server requires the user to enter a list of gene-class label pairs (the training data) to build a Random Forest (RF) classification model. After that, the web server extracts a list of "predictive" if-then rules from the RF, which can be interpreted by the user. These if-then rules are automatically built by the system with the goal of classifying the training data with high accuracy.

The predictive performance of the RF model is estimated using the applied to the training data and is also reported by the system.

The user can also provide an optional list of "testing" genes that will be classified by the Random Forest model. These genes may be targets for further biological experiments.

For more information about the data mining concepts in this web server, and a step-by-step guide on how to use the server, please see the help section.

Dataset versions

GTex dataset:GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8

BioGrid PPIs:BIOGRID-ALL-3.4.146

Gene Ontology (GO): releases/2017-03-14

Disclaimer

By using this site, its content, information, and software you agree to assume all risks associated with your use or transfer of information and/or software. This site comes without warranty of any kind. The site's mainteners do not warrant that the site will operate correctly or that the site or its server are free of computer viruses or other harmful devices. You agree to hold the site's mainteners harmless from any claims relating to the use of this site.