Base class
polaris.benchmark.BenchmarkSpecification
Bases: PredictiveTaskSpecificationMixin
, BaseArtifactModel
, BaseSplitSpecificationMixin
, ABC
This class wraps a Dataset
with additional data
to specify the evaluation logic.
Specifically, it specifies:
- Which dataset to use (see
Dataset
); - A task definition (we currently only support predictive tasks);
- A predefined, static train-test split to use during evaluation.
Subclasses
Polaris includes various subclasses of the BenchmarkSpecification
that provide a more precise data-model or
additional logic, e.g. SingleTaskBenchmarkSpecification
.
Examples:
Basic API usage:
import polaris as po
# Load the benchmark from the Hub
benchmark = po.load_benchmark("polaris/hello-world-benchmark")
# Get the train and test data-loaders
train, test = benchmark.get_train_test_split()
# Use the training data to train your model
# Get the input as an array with 'train.inputs' and 'train.targets'
# Or simply iterate over the train object.
for x, y in train:
...
# Work your magic to accurately predict the test set
predictions = [0.0 for x in test]
# Evaluate your predictions
results = benchmark.evaluate(predictions)
# Submit your results
results.upload_to_hub(owner="dummy-user")
Attributes:
Name | Type | Description |
---|---|---|
dataset |
BaseDataset
|
The dataset the benchmark specification is based on. |
readme |
str
|
Markdown text that can be used to provide a formatted description of the benchmark. If using the Polaris Hub, it is worth noting that this field is more easily edited through the Hub UI as it provides a rich text editor for writing markdown. |
For additional meta-data attributes, see the base classes.
evaluate
evaluate(
y_pred: IncomingPredictionsType | None = None,
y_prob: IncomingPredictionsType | None = None,
) -> BenchmarkResults
Execute the evaluation protocol for the benchmark, given a set of predictions.
What about y_true
?
Contrary to other frameworks that you might be familiar with, we opted for a signature that includes just the predictions. This reduces the chance of accidentally using the test targets during training.
For this method, we make the following assumptions:
- There can be one or multiple test set(s);
- There can be one or multiple target(s);
- The metrics are constant across test sets;
- The metrics are constant across targets;
- There can be metrics which measure across tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_pred
|
IncomingPredictionsType | None
|
The predictions for the test set, as NumPy arrays. If there are multiple targets, the predictions should be wrapped in a dictionary with the target labels as keys. If there are multiple test sets, the predictions should be further wrapped in a dictionary with the test subset labels as keys. |
None
|
y_prob
|
IncomingPredictionsType | None
|
The predicted probabilities for the test set, formatted similarly to predictions, based on the number of tasks and test sets. |
None
|
Returns:
Type | Description |
---|---|
BenchmarkResults
|
A |
Examples:
- For regression benchmarks: pred_scores = your_model.predict_score(molecules) # predict continuous score values benchmark.evaluate(y_pred=pred_scores)
- For classification benchmarks:
- If
roc_auc
andpr_auc
are in the metric list, both class probabilities and label predictions are required: pred_probs = your_model.predict_proba(molecules) # predict probablities pred_labels = your_model.predict_labels(molecules) # predict class labels benchmark.evaluate(y_pred=pred_labels, y_prob=pred_probs) - Otherwise: benchmark.evaluate(y_pred=pred_labels)
- If
upload_to_hub
upload_to_hub(
settings: PolarisHubSettings | None = None,
cache_auth_token: bool = True,
access: AccessType = "private",
owner: HubOwner | str | None = None,
**kwargs: dict,
)
Very light, convenient wrapper around the
PolarisHubClient.upload_benchmark
method.
to_json
Save the benchmark to a destination directory as a JSON file.
Multiple files
Perhaps unintuitive, this method creates multiple files in the destination directory as it also saves
the dataset it is based on to the specified destination.
See the docstring of Dataset.to_json
for more information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
destination
|
str
|
The directory to save the associated data to. |
required |
Returns:
Type | Description |
---|---|
str
|
The path to the JSON file. |
Subclasses
polaris.benchmark.SingleTaskBenchmarkSpecification
Bases: SingleTaskMixin
, BenchmarkV1Specification
Single-task benchmark for the base specification.
polaris.benchmark.MultiTaskBenchmarkSpecification
Bases: MultiTaskMixin
, BenchmarkV1Specification
Multitask benchmark for the base specification.