Skip to content

Benchmark

polaris.benchmark.BenchmarkV2Specification

Bases: SplitSpecificationV2Mixin, BenchmarkSpecification[BenchmarkResultsV2]

get_train_test_split

get_train_test_split(
    featurization_fn: Callable | None = None,
) -> tuple[Subset, dict[str, Subset]]

Construct the train and test sets, given the split in the benchmark specification.

Returns Subset objects, which offer several ways of accessing the data and can thus easily serve as a basis to build framework-specific (e.g. PyTorch, Tensorflow) data-loaders on top of.

Parameters:

Name Type Description Default
featurization_fn Callable | None

A function to apply to the input data. If a multi-input benchmark, this function expects an input in the format specified by the input_format parameter.

None

Returns:

Type Description
tuple[Subset, dict[str, Subset]]

A tuple with the train Subset and test Subset objects. If there are multiple test sets, these are returned in a dictionary and each test set has an associated name. The targets of the test set can not be accessed.

evaluate

evaluate(
    y_pred: IncomingPredictionsType | None = None,
    y_prob: IncomingPredictionsType | None = None,
) -> BenchmarkResultsV2

Execute the evaluation protocol for the benchmark, given a set of predictions.

What about y_true?

Contrary to other frameworks that you might be familiar with, we opted for a signature that includes just the predictions. This reduces the chance of accidentally using the test targets during training.

For this method, we make the following assumptions:

  1. There can be one or multiple test set(s);
  2. There can be one or multiple target(s);
  3. The metrics are constant across test sets;
  4. The metrics are constant across targets;
  5. There can be metrics which measure across tasks.

Parameters:

Name Type Description Default
y_pred IncomingPredictionsType | None

The predictions for the test set, as NumPy arrays. If there are multiple targets, the predictions should be wrapped in a dictionary with the target labels as keys. If there are multiple test sets, the predictions should be further wrapped in a dictionary with the test subset labels as keys.

None
y_prob IncomingPredictionsType | None

The predicted probabilities for the test set, formatted similarly to predictions, based on the number of tasks and test sets.

None

Returns:

Type Description
BenchmarkResultsV2

A BenchmarkResultsV2 object. This object can be directly submitted to the Polaris Hub.

Examples:

  1. For regression benchmarks: pred_scores = your_model.predict_score(molecules) # predict continuous score values benchmark.evaluate(y_pred=pred_scores)
  2. For classification benchmarks:
    • If roc_auc and pr_auc are in the metric list, both class probabilities and label predictions are required: pred_probs = your_model.predict_proba(molecules) # predict probablities pred_labels = your_model.predict_labels(molecules) # predict class labels benchmark.evaluate(y_pred=pred_labels, y_prob=pred_probs)
    • Otherwise: benchmark.evaluate(y_pred=pred_labels)

polaris.benchmark.BenchmarkSpecification

Bases: PredictiveTaskSpecificationMixin, BaseArtifactModel, BaseSplitSpecificationMixin, ABC, Generic[BenchmarkResultsType]

This class wraps a dataset with additional data to specify the evaluation logic.

Specifically, it specifies:

  1. Which dataset to use;
  2. A task definition (we currently only support predictive tasks);
  3. A predefined, static train-test split to use during evaluation.

Examples:

Basic API usage:

import polaris as po

# Load the benchmark from the Hub
benchmark = po.load_benchmark("polaris/hello-world-benchmark")

# Get the train and test data-loaders
train, test = benchmark.get_train_test_split()

# Use the training data to train your model
# Get the input as an array with 'train.inputs' and 'train.targets'
# Or simply iterate over the train object.
for x, y in train:
    ...

# Work your magic to accurately predict the test set
predictions = [0.0 for x in test]

# Evaluate your predictions
results = benchmark.evaluate(predictions)

# Submit your results
results.upload_to_hub(owner="dummy-user")

Attributes:

Name Type Description
dataset BaseDataset

The dataset the benchmark specification is based on.

readme str

Markdown text that can be used to provide a formatted description of the benchmark. If using the Polaris Hub, it is worth noting that this field is more easily edited through the Hub UI as it provides a rich text editor for writing markdown.

For additional metadata attributes, see the base classes.

upload_to_hub

upload_to_hub(
    settings: PolarisHubSettings | None = None,
    cache_auth_token: bool = True,
    access: AccessType = "private",
    owner: HubOwner | str | None = None,
    **kwargs: dict,
)

Very light, convenient wrapper around the PolarisHubClient.upload_benchmark method.

to_json

to_json(destination: str) -> str

Save the benchmark to a destination directory as a JSON file.

Multiple files

Perhaps unintuitive, this method creates multiple files in the destination directory as it also saves the dataset it is based on to the specified destination.

Parameters:

Name Type Description Default
destination str

The directory to save the associated data to.

required

Returns:

Type Description
str

The path to the JSON file.