Evaluation

polaris.evaluate.BenchmarkPredictions

Bases: BaseModel

Base model to represent predictions in the Polaris code base.

Guided by Postel's Law, this class normalizes different formats to a single, internal representation.

Attributes:

Name	Type	Description
`predictions`	`PredictionsType`	The predictions for the benchmark.
`target_labels`	`list[str]`	The target columns for the associated benchmark.
`test_set_labels`	`list[str]`	The names of the test sets for the associated benchmark.
`test_set_sizes`	`dict[str, int]`	The number of rows in each test set for the associated benchmark.

check_test_set_size

check_test_set_size() -> Self

Verify that the size of all predictions

get_subset

get_subset(
    test_set_subset: list[str] | None = None,
    target_subset: list[str] | None = None,
) -> BenchmarkPredictions

Return a subset of the original predictions

get_size

get_size(
    test_set_subset: list[str] | None = None,
    target_subset: list[str] | None = None,
) -> int

Return the total number of predictions, allowing for filtering by test set and target

flatten

flatten() -> np.ndarray

Return the predictions as a single, flat numpy array

len

__len__() -> int

Return the total number of predictions

polaris.evaluate.ResultsMetadata

Bases: BaseArtifactModel

V1 implementation of evaluation results without model field support

Attributes:

Name	Type	Description
`github_url`	`HttpUrlString \| None`	The URL to the code repository that was used to generate these results.
`paper_url`	`HttpUrlString \| None`	The URL to the paper describing the methodology used to generate these results.
`contributors`	`list[HubUser]`	The users that are credited for these results.

For additional metadata attributes, see the base classes.

polaris.evaluate.EvaluationResult

Bases: ResultsMetadataV1, BaseEvaluationResult

V1 implementation of evaluation results without model field support

polaris.evaluate.BenchmarkResults

Bases: EvaluationResultV1, BaseBenchmarkResults

V1 implementation of benchmark results without model field support

polaris.evaluate.MetricInfo

Bases: BaseModel

Metric metadata

Attributes:

Name	Type	Description
`fn`	`Callable`	The callable that actually computes the metric.
`is_multitask`	`bool`	Whether the metric expects a single set of predictions or a dict of predictions.
`kwargs`	`dict`	Additional parameters required for the metric.
`direction`	`DirectionType`	The direction for ranking of the metric, "max" for maximization and "min" for minimization.
`y_type`	`PredictionKwargs`	The type of predictions expected by the metric interface.

polaris.evaluate.Metric

Bases: BaseModel

A Metric in Polaris.

A metric consists of a default metric, which is a callable labeled with additional metadata, as well as a config. The config can change how the metric is computed, for example by grouping the data before computing the metric.

Attributes:

Name	Type	Description
`label`	`MetricLabel`	The actual callable that is at the core of the metric implementation.
`custom_name`	`str \| None`	A optional, custom name of the metric. Names should be unique within the context of a benchmark.
`config`	`GroupedMetricConfig \| None`	For more complex metrics, this object should hold all parameters for the metric.
`fn`	`Callable`	The callable that actually computes the metric, automatically set based on the label.
`is_multitask`	`bool`	Whether the metric expects a single set of predictions or a dict of predictions, automatically set based on the label.
`kwargs`	`dict`	Additional parameters required for the metric, automatically set based on the label.
`direction`	`DirectionType`	The direction for ranking of the metric, "max" for maximization and "min" for minimization, automatically set based on the label.
`y_type`	`PredictionKwargs`	The type of predictions expected by the metric interface, automatically set based on the label.

score

score(
    y_true: GroundTruth,
    y_pred: BenchmarkPredictions | None = None,
    y_prob: BenchmarkPredictions | None = None,
) -> float

Compute the metric.

Parameters:

Name	Type	Description	Default
`y_true`	`GroundTruth`	The true target values.	required
`y_pred`	`BenchmarkPredictions \| None`	The predicted target values, if any.	`None`
`y_prob`	`BenchmarkPredictions \| None`	The predicted target probabilities, if any.	`None`

polaris.evaluate.metrics.generic_metrics

pearsonr

pearsonr(y_true: ndarray, y_pred: ndarray)

Calculate a pearson r correlation

spearman

spearman(y_true: ndarray, y_pred: ndarray)

Calculate a Spearman correlation

absolute_average_fold_error

absolute_average_fold_error(y_true: ndarray, y_pred: ndarray) -> float

Calculate the Absolute Average Fold Error (AAFE) metric. It measures the fold change between predicted values and observed values. The implementation is based on this paper.

Parameters:

Name	Type	Description	Default
`y_true`	`ndarray`	The true target values of shape (n_samples,)	required
`y_pred`	`ndarray`	The predicted target values of shape (n_samples,).	required

Returns:

Name	Type	Description
`aafe`	`float`	The Absolute Average Fold Error.

cohen_kappa_score

cohen_kappa_score(y_true, y_pred, **kwargs)

Scikit learn cohen_kappa_score wraper with renamed arguments

average_precision_score

average_precision_score(y_true, y_score, **kwargs)

Scikit learn average_precision_score wrapper that throws an error if y_true has no positive class

polaris.evaluate.metrics.docking_metrics

rmsd_coverage

rmsd_coverage(
    y_pred: str | list[Mol], y_true: str | list[Mol], max_rsmd: float = 2
)

Calculate the coverage of molecules with an RMSD less than a threshold (2 Å by default) compared to the reference molecule conformer.

It is assumed that the predicted binding conformers are extracted from the docking output, where the receptor (protein) coordinates have been aligned with the original crystal structure.

Attributes:

Name	Type	Description
`y_pred`		List of predicted binding conformers.
`y_true`		List of ground truth binding confoermers.
`max_rsmd`		The threshold for determining acceptable rsmd.