Evaluation
polaris.evaluate.BenchmarkPredictions
Bases: BaseModel
Base model to represent predictions in the Polaris code base.
Guided by Postel's Law, this class normalizes different formats to a single, internal representation.
Attributes:
Name | Type | Description |
---|---|---|
predictions |
PredictionsType
|
The predictions for the benchmark. |
target_labels |
list[str]
|
The target columns for the associated benchmark. |
test_set_labels |
list[str]
|
The names of the test sets for the associated benchmark. |
polaris.evaluate.ResultsMetadata
Bases: BaseArtifactModel
Base class for evaluation results
Attributes:
Name | Type | Description |
---|---|---|
github_url |
HttpUrlString | None
|
The URL to the GitHub repository of the code used to generate these results. |
paper_url |
HttpUrlString | None
|
The URL to the paper describing the methodology used to generate these results. |
contributors |
list[HubUser]
|
The users that are credited for these results. |
_created_at |
datetime
|
The time-stamp at which the results were created. Automatically set. |
For additional meta-data attributes, see the BaseArtifactModel
class.
polaris.evaluate.EvaluationResult
Bases: ResultsMetadata
Class for saving evaluation results
The actual results are saved in the results
field using the following tabular format:
Test set | Target label | Metric | Score |
---|---|---|---|
test_iid | EGFR_WT | AUC | 0.9 |
test_ood | EGFR_WT | AUC | 0.75 |
... | ... | ... | ... |
test_ood | EGFR_L858R | AUC | 0.79 |
Categorizing methods
An open question is how to best categorize a methodology (e.g. a model). This is needed since we would like to be able to aggregate results across benchmarks/competitions too, to say something about which (type of) methods performs best in general.
Attributes:
Name | Type | Description |
---|---|---|
results |
DataFrame
|
Evaluation results are stored directly in a dataframe or in a serialized, JSON compatible dict that can be decoded into the associated tabular format. |
For additional meta-data attributes, see the ResultsMetadata
class.
polaris.evaluate.BenchmarkResults
Bases: EvaluationResult
Class specific to results for standard benchmarks.
This object is returned by BenchmarkSpecification.evaluate
.
In addition to the metrics on the test set, it contains additional meta-data and logic to integrate
the results with the Polaris Hub.
The name of the benchmark for which these results were generated.
Together with the benchmark owner, this uniquely identifies the benchmark on the Hub.
benchmark_owner: The owner of the benchmark for which these results were generated. Together with the benchmark name, this uniquely identifies the benchmark on the Hub.
upload_to_hub
upload_to_hub(access: AccessType = 'private', owner: HubOwner | str | None = None, **kwargs: dict) -> BenchmarkResults
Very light, convenient wrapper around the
PolarisHubClient.upload_results
method.
polaris.evaluate.MetricInfo
Bases: BaseModel
Metric metadata
Attributes:
Name | Type | Description |
---|---|---|
fn |
Callable
|
The callable that actually computes the metric. |
is_multitask |
bool
|
Whether the metric expects a single set of predictions or a dict of predictions. |
kwargs |
dict
|
Additional parameters required for the metric. |
direction |
DirectionType
|
The direction for ranking of the metric, "max" for maximization and "min" for minimization. |
y_type |
PredictionKwargs
|
The type of predictions expected by the metric interface. |
polaris.evaluate.Metric
Bases: BaseModel
A Metric in Polaris.
A metric consists of a default metric, which is a callable labeled with additional metadata, as well as a config. The config can change how the metric is computed, for example by grouping the data before computing the metric.
Attributes:
Name | Type | Description |
---|---|---|
label |
MetricLabel
|
The actual callable that is at the core of the metric implementation. |
custom_name |
str | None
|
A optional, custom name of the metric. Names should be unique within the context of a benchmark. |
config |
GroupedMetricConfig | None
|
For more complex metrics, this object should hold all parameters for the metric. |
fn |
Callable
|
The callable that actually computes the metric, automatically set based on the label. |
is_multitask |
bool
|
Whether the metric expects a single set of predictions or a dict of predictions, automatically set based on the label. |
kwargs |
dict
|
Additional parameters required for the metric, automatically set based on the label. |
direction |
DirectionType
|
The direction for ranking of the metric, "max" for maximization and "min" for minimization, automatically set based on the label. |
y_type |
PredictionKwargs
|
The type of predictions expected by the metric interface, automatically set based on the label. |
score
score(y_true: GroundTruth, y_pred: BenchmarkPredictions | None = None, y_prob: BenchmarkPredictions | None = None) -> float
Compute the metric.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_true
|
GroundTruth
|
The true target values. |
required |
y_pred
|
BenchmarkPredictions | None
|
The predicted target values, if any. |
None
|
y_prob
|
BenchmarkPredictions | None
|
The predicted target probabilities, if any. |
None
|
polaris.evaluate.metrics.generic_metrics
absolute_average_fold_error
Calculate the Absolute Average Fold Error (AAFE) metric. It measures the fold change between predicted values and observed values. The implementation is based on this paper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_true
|
ndarray
|
The true target values of shape (n_samples,) |
required |
y_pred
|
ndarray
|
The predicted target values of shape (n_samples,). |
required |
Returns:
Name | Type | Description |
---|---|---|
aafe |
float
|
The Absolute Average Fold Error. |
cohen_kappa_score
Scikit learn cohen_kappa_score wraper with renamed arguments
polaris.evaluate.metrics.docking_metrics
rmsd_coverage
rmsd_coverage(y_pred: Union[str, List[dm.Mol]], y_true: Union[str, list[dm.Mol]], max_rsmd: float = 2)
Calculate the coverage of molecules with an RMSD less than a threshold (2 Å by default) compared to the reference molecule conformer.
It is assumed that the predicted binding conformers are extracted from the docking output, where the receptor (protein) coordinates have been aligned with the original crystal structure.
Attributes:
Name | Type | Description |
---|---|---|
y_pred |
List of predicted binding conformers. |
|
y_true |
List of ground truth binding confoermers. |
|
max_rsmd |
The threshold for determining acceptable rsmd. |