Benchmark
polaris.benchmark.BenchmarkV2Specification
Bases: SplitSpecificationV2Mixin
, BenchmarkSpecification[BenchmarkResultsV2]
get_train_test_split
get_train_test_split(
featurization_fn: Callable | None = None,
) -> tuple[Subset, dict[str, Subset]]
Construct the train and test sets, given the split in the benchmark specification.
Returns Subset
objects, which offer several ways of accessing the data
and can thus easily serve as a basis to build framework-specific (e.g. PyTorch, Tensorflow)
data-loaders on top of.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
featurization_fn
|
Callable | None
|
A function to apply to the input data. If a multi-input benchmark, this function
expects an input in the format specified by the |
None
|
Returns:
Type | Description |
---|---|
tuple[Subset, dict[str, Subset]]
|
A tuple with the train |
evaluate
evaluate(
y_pred: IncomingPredictionsType | None = None,
y_prob: IncomingPredictionsType | None = None,
) -> BenchmarkResultsV2
Execute the evaluation protocol for the benchmark, given a set of predictions.
What about y_true
?
Contrary to other frameworks that you might be familiar with, we opted for a signature that includes just the predictions. This reduces the chance of accidentally using the test targets during training.
For this method, we make the following assumptions:
- There can be one or multiple test set(s);
- There can be one or multiple target(s);
- The metrics are constant across test sets;
- The metrics are constant across targets;
- There can be metrics which measure across tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_pred
|
IncomingPredictionsType | None
|
The predictions for the test set, as NumPy arrays. If there are multiple targets, the predictions should be wrapped in a dictionary with the target labels as keys. If there are multiple test sets, the predictions should be further wrapped in a dictionary with the test subset labels as keys. |
None
|
y_prob
|
IncomingPredictionsType | None
|
The predicted probabilities for the test set, formatted similarly to predictions, based on the number of tasks and test sets. |
None
|
Returns:
Type | Description |
---|---|
BenchmarkResultsV2
|
A |
Examples:
- For regression benchmarks: pred_scores = your_model.predict_score(molecules) # predict continuous score values benchmark.evaluate(y_pred=pred_scores)
- For classification benchmarks:
- If
roc_auc
andpr_auc
are in the metric list, both class probabilities and label predictions are required: pred_probs = your_model.predict_proba(molecules) # predict probablities pred_labels = your_model.predict_labels(molecules) # predict class labels benchmark.evaluate(y_pred=pred_labels, y_prob=pred_probs) - Otherwise: benchmark.evaluate(y_pred=pred_labels)
- If
polaris.benchmark.BenchmarkSpecification
Bases: PredictiveTaskSpecificationMixin
, BaseArtifactModel
, BaseSplitSpecificationMixin
, ABC
, Generic[BenchmarkResultsType]
This class wraps a dataset with additional data to specify the evaluation logic.
Specifically, it specifies:
- Which dataset to use;
- A task definition (we currently only support predictive tasks);
- A predefined, static train-test split to use during evaluation.
Examples:
Basic API usage:
import polaris as po
# Load the benchmark from the Hub
benchmark = po.load_benchmark("polaris/hello-world-benchmark")
# Get the train and test data-loaders
train, test = benchmark.get_train_test_split()
# Use the training data to train your model
# Get the input as an array with 'train.inputs' and 'train.targets'
# Or simply iterate over the train object.
for x, y in train:
...
# Work your magic to accurately predict the test set
predictions = [0.0 for x in test]
# Evaluate your predictions
results = benchmark.evaluate(predictions)
# Submit your results
results.upload_to_hub(owner="dummy-user")
Attributes:
Name | Type | Description |
---|---|---|
dataset |
BaseDataset
|
The dataset the benchmark specification is based on. |
readme |
str
|
Markdown text that can be used to provide a formatted description of the benchmark. If using the Polaris Hub, it is worth noting that this field is more easily edited through the Hub UI as it provides a rich text editor for writing markdown. |
For additional metadata attributes, see the base classes.
upload_to_hub
upload_to_hub(
settings: PolarisHubSettings | None = None,
cache_auth_token: bool = True,
access: AccessType = "private",
owner: HubOwner | str | None = None,
**kwargs: dict,
)
Very light, convenient wrapper around the
PolarisHubClient.upload_benchmark
method.
to_json
Save the benchmark to a destination directory as a JSON file.
Multiple files
Perhaps unintuitive, this method creates multiple files in the destination directory as it also saves the dataset it is based on to the specified destination.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
destination
|
str
|
The directory to save the associated data to. |
required |
Returns:
Type | Description |
---|---|
str
|
The path to the JSON file. |