Skip to content

Base class

polaris.benchmark.BenchmarkSpecification

Bases: PredictiveTaskSpecificationMixin, BaseArtifactModel, BaseSplitSpecificationMixin, ABC

This class wraps a Dataset with additional data to specify the evaluation logic.

Specifically, it specifies:

  1. Which dataset to use (see Dataset);
  2. A task definition (we currently only support predictive tasks);
  3. A predefined, static train-test split to use during evaluation.
Subclasses

Polaris includes various subclasses of the BenchmarkSpecification that provide a more precise data-model or additional logic, e.g. SingleTaskBenchmarkSpecification.

Examples:

Basic API usage:

import polaris as po

# Load the benchmark from the Hub
benchmark = po.load_benchmark("polaris/hello-world-benchmark")

# Get the train and test data-loaders
train, test = benchmark.get_train_test_split()

# Use the training data to train your model
# Get the input as an array with 'train.inputs' and 'train.targets'
# Or simply iterate over the train object.
for x, y in train:
    ...

# Work your magic to accurately predict the test set
predictions = [0.0 for x in test]

# Evaluate your predictions
results = benchmark.evaluate(predictions)

# Submit your results
results.upload_to_hub(owner="dummy-user")

Attributes:

Name Type Description
dataset BaseDataset

The dataset the benchmark specification is based on.

readme str

Markdown text that can be used to provide a formatted description of the benchmark. If using the Polaris Hub, it is worth noting that this field is more easily edited through the Hub UI as it provides a rich text editor for writing markdown.

For additional meta-data attributes, see the base classes.

evaluate

evaluate(
    y_pred: IncomingPredictionsType | None = None,
    y_prob: IncomingPredictionsType | None = None,
) -> BenchmarkResults

Execute the evaluation protocol for the benchmark, given a set of predictions.

What about y_true?

Contrary to other frameworks that you might be familiar with, we opted for a signature that includes just the predictions. This reduces the chance of accidentally using the test targets during training.

For this method, we make the following assumptions:

  1. There can be one or multiple test set(s);
  2. There can be one or multiple target(s);
  3. The metrics are constant across test sets;
  4. The metrics are constant across targets;
  5. There can be metrics which measure across tasks.

Parameters:

Name Type Description Default
y_pred IncomingPredictionsType | None

The predictions for the test set, as NumPy arrays. If there are multiple targets, the predictions should be wrapped in a dictionary with the target labels as keys. If there are multiple test sets, the predictions should be further wrapped in a dictionary with the test subset labels as keys.

None
y_prob IncomingPredictionsType | None

The predicted probabilities for the test set, formatted similarly to predictions, based on the number of tasks and test sets.

None

Returns:

Type Description
BenchmarkResults

A BenchmarkResults object. This object can be directly submitted to the Polaris Hub.

Examples:

  1. For regression benchmarks: pred_scores = your_model.predict_score(molecules) # predict continuous score values benchmark.evaluate(y_pred=pred_scores)
  2. For classification benchmarks:
    • If roc_auc and pr_auc are in the metric list, both class probabilities and label predictions are required: pred_probs = your_model.predict_proba(molecules) # predict probablities pred_labels = your_model.predict_labels(molecules) # predict class labels benchmark.evaluate(y_pred=pred_labels, y_prob=pred_probs)
    • Otherwise: benchmark.evaluate(y_pred=pred_labels)

upload_to_hub

upload_to_hub(
    settings: PolarisHubSettings | None = None,
    cache_auth_token: bool = True,
    access: AccessType = "private",
    owner: HubOwner | str | None = None,
    **kwargs: dict,
)

Very light, convenient wrapper around the PolarisHubClient.upload_benchmark method.

to_json

to_json(destination: str) -> str

Save the benchmark to a destination directory as a JSON file.

Multiple files

Perhaps unintuitive, this method creates multiple files in the destination directory as it also saves the dataset it is based on to the specified destination. See the docstring of Dataset.to_json for more information.

Parameters:

Name Type Description Default
destination str

The directory to save the associated data to.

required

Returns:

Type Description
str

The path to the JSON file.


Subclasses

polaris.benchmark.SingleTaskBenchmarkSpecification

Bases: SingleTaskMixin, BenchmarkV1Specification

Single-task benchmark for the base specification.


polaris.benchmark.MultiTaskBenchmarkSpecification

Bases: MultiTaskMixin, BenchmarkV1Specification

Multitask benchmark for the base specification.