Continuous Evaluation of Relational Learning in Biomedicine (CERLIB)


Predicting relations between biological entities using machine learning is a common and important task in computational biology. We propose a new challenge which aims to continuously evaluate prediction methods as new biological knowledge becomes available. We invite individuals or teams to predict biological relations. New relations between biological entities are retrieved in regular intervals using SPARQL queries and prediction results automatically evaluated against the query results.


We will frequently (at least monthly) update a set of relations with high confidence (usually reflecting experimental evidence) available from several biological knowledge bases. Each updated set of relations will obtain a timestamp. Evaluation will consist of comparing predictions to the set of relations that become available. Each relation type will be evaluated separately.

With respect to submissions, please keep the following in mind:

  • All submissions will also be time-stamped at submission time.
  • Initial submissions made to CERLIB will not be enrolled in any task at submission time.
  • Any submission which has been made before an update but is not yet enrolled will enroll in the task at the time the updated set of relations becomes available.
  • Any submission made before the update time which was already enrolled will be evaluated with respect to the new set of relations becoming available.

For each challenge, we provide two types of information: a dataset usually in RDF, and a SPARQL query to an endpoint serving the evaluation data. The SPARQL query is used to retrieve the set of relations considered to be true at the timepoint at which the query is evaluated, and will be executed monthly to generate test data and update the evaluations.

Evaluation metrics

We use two types of evaluation metrics.

Metric One: The first applies to relations in which both the subject and object are considered without any semantics that may arise from specific axioms within the graph. For these, we compute recall at ranks (Hits@k), AUROC, AUPR, and mean reciprocal rank (MRR).

Metric Two: We use a second type of evaluation metrics for relations between biological entities and classes in ontologies; the aim of the second set of metrics is to account for the semantics of relations to classes in ontologies (for example, if R(x,C) and SubClassOf(C,D), then a prediction of R(x,D) would be considered correct but less specific than R(x,C)). For this set of relations we use the evaluation metrics developed by CAFA, in particular F_max and S_min, which take class specificity into account.


All team members need to register at the website using their emails or orcid accounts. In order to form a team, a member needs to register the team and invite all other members to join the team. Any of the team members can submit predictions.


Submissions should be submitted in the format of compressed tab-separated file (tsv.gz, with four columns:

  • subject
  • predicate
  • object
  • prediction score

To submit, first find an open Task and follow the instructions.