Benchmark

polaris.benchmark.BenchmarkV2Specification

Bases: SplitSpecificationV2Mixin, BenchmarkSpecification[BenchmarkResultsV2]

get_train_test_split

get_train_test_split(
    featurization_fn: Callable | None = None,
) -> tuple[Subset, dict[str, Subset]]

Construct the train and test sets, given the split in the benchmark specification.

Returns Subset objects, which offer several ways of accessing the data and can thus easily serve as a basis to build framework-specific (e.g. PyTorch, Tensorflow) data-loaders on top of.

Parameters:

Name	Type	Description	Default
`featurization_fn`	`Callable \| None`	A function to apply to the input data. If a multi-input benchmark, this function expects an input in the format specified by the `input_format` parameter.	`None`

Returns:

Type	Description
`tuple[Subset, dict[str, Subset]]`	A tuple with the train `Subset` and test `Subset` objects. If there are multiple test sets, these are returned in a dictionary and each test set has an associated name. The targets of the test set can not be accessed.

evaluate

evaluate(
    y_pred: IncomingPredictionsType | None = None,
    y_prob: IncomingPredictionsType | None = None,
) -> BenchmarkResultsV2

Execute the evaluation protocol for the benchmark, given a set of predictions.

What about y_true?

Contrary to other frameworks that you might be familiar with, we opted for a signature that includes just the predictions. This reduces the chance of accidentally using the test targets during training.

For this method, we make the following assumptions:

There can be one or multiple test set(s);
There can be one or multiple target(s);
The metrics are constant across test sets;
The metrics are constant across targets;
There can be metrics which measure across tasks.

Parameters:

Name	Type	Description	Default
`y_pred`	`IncomingPredictionsType \| None`	The predictions for the test set, as NumPy arrays. If there are multiple targets, the predictions should be wrapped in a dictionary with the target labels as keys. If there are multiple test sets, the predictions should be further wrapped in a dictionary with the test subset labels as keys.	`None`
`y_prob`	`IncomingPredictionsType \| None`	The predicted probabilities for the test set, formatted similarly to predictions, based on the number of tasks and test sets.	`None`

Returns:

Type	Description
`BenchmarkResultsV2`	A `BenchmarkResultsV2` object. This object can be directly submitted to the Polaris Hub.

Examples:

For regression benchmarks: pred_scores = your_model.predict_score(molecules) # predict continuous score values benchmark.evaluate(y_pred=pred_scores)
For classification benchmarks:
- If roc_auc and pr_auc are in the metric list, both class probabilities and label predictions are required: pred_probs = your_model.predict_proba(molecules) # predict probablities pred_labels = your_model.predict_labels(molecules) # predict class labels benchmark.evaluate(y_pred=pred_labels, y_prob=pred_probs)
- Otherwise: benchmark.evaluate(y_pred=pred_labels)

polaris.benchmark.BenchmarkSpecification

Bases: PredictiveTaskSpecificationMixin, BaseArtifactModel, BaseSplitSpecificationMixin, ABC, Generic[BenchmarkResultsType]

This class wraps a dataset with additional data to specify the evaluation logic.

Specifically, it specifies:

Which dataset to use;
A task definition (we currently only support predictive tasks);
A predefined, static train-test split to use during evaluation.

Examples:

Basic API usage:

import polaris as po

# Load the benchmark from the Hub
benchmark = po.load_benchmark("polaris/hello-world-benchmark")

# Get the train and test data-loaders
train, test = benchmark.get_train_test_split()

# Use the training data to train your model
# Get the input as an array with 'train.inputs' and 'train.targets'
# Or simply iterate over the train object.
for x, y in train:
    ...

# Work your magic to accurately predict the test set
predictions = [0.0 for x in test]

# Evaluate your predictions
results = benchmark.evaluate(predictions)

# Submit your results
results.upload_to_hub(owner="dummy-user")

Attributes:

Name	Type	Description
`dataset`	`BaseDataset`	The dataset the benchmark specification is based on.
`readme`	`str`	Markdown text that can be used to provide a formatted description of the benchmark. If using the Polaris Hub, it is worth noting that this field is more easily edited through the Hub UI as it provides a rich text editor for writing markdown.
`artifact_version`	`int`	The version of the benchmark.
`artifact_changelog`	`str \| None`	A description of the changes made in this benchmark version.

For additional metadata attributes, see the base classes.

upload_to_hub

upload_to_hub(
    settings: PolarisHubSettings | None = None,
    cache_auth_token: bool = True,
    owner: HubOwner | str | None = None,
    parent_artifact_id: str | None = None,
    **kwargs: dict,
)

Very light, convenient wrapper around the PolarisHubClient.upload_benchmark method.

to_json

to_json(destination: str) -> str

Save the benchmark to a destination directory as a JSON file.

Multiple files

Perhaps unintuitive, this method creates multiple files in the destination directory as it also saves the dataset it is based on to the specified destination.

Parameters:

Name	Type	Description	Default
`destination`	`str`	The directory to save the associated data to.	required

Returns:

Type	Description
`str`	The path to the JSON file.