Skip to content



Bases: BaseModel

Base model to represent predictions in the Polaris code base.

Guided by Postel's Law, this class normalizes different formats to a single, internal representation.


Name Type Description
predictions PredictionsType

The predictions for the benchmark.

target_labels list[str]

The target columns for the associated benchmark.

test_set_labels list[str]

The names of the test sets for the associated benchmark.

test_set_sizes dict[str, int]

The number of rows in each test set for the associated benchmark.


_serialize_predictions(predictions: PredictionsType)

Recursively converts all numpy values in the predictions dictionary to lists so they can be serialized.

_validate_predictions classmethod

_validate_predictions(data: dict) -> dict

Normalizes the predictions format to a standard representation we use internally


check_test_set_size() -> Self

Verify that the size of all predictions

_normalize_predictions classmethod

    predictions: IncomingPredictionsType,
    target_labels: list[str],
    test_set_labels: list[str],
) -> PredictionsType

Normalizes the predictions to a standard representation we use internally. This standard representation is a nested, two-level dictionary: {test_set_name: {target_column: np.ndarray}}

_is_fully_specified classmethod

    predictions: IncomingPredictionsType,
    target_labels: list[str],
    test_set_labels: list[str],
) -> bool

Check if the predictions are fully specified for the target columns and test set names.


    test_set_subset: list[str] | None = None,
    target_subset: list[str] | None = None,
) -> BenchmarkPredictions

Return a subset of the original predictions


    test_set_subset: list[str] | None = None,
    target_subset: list[str] | None = None,
) -> int

Return the total number of predictions, allowing for filtering by test set and target


flatten() -> np.ndarray

Return the predictions as a single, flat numpy array


__len__() -> int

Return the total number of predictions


Bases: BaseArtifactModel

V1 implementation of evaluation results without model field support


Name Type Description
github_url HttpUrlString | None

The URL to the code repository that was used to generate these results.

paper_url HttpUrlString | None

The URL to the paper describing the methodology used to generate these results.

contributors list[HubUser]

The users that are credited for these results.

For additional metadata attributes, see the base classes.


Bases: ResultsMetadataV1, BaseEvaluationResult

V1 implementation of evaluation results without model field support


Bases: EvaluationResultV1, BaseBenchmarkResults

V1 implementation of benchmark results without model field support


Bases: BaseModel

Metric metadata


Name Type Description
fn Callable

The callable that actually computes the metric.

is_multitask bool

Whether the metric expects a single set of predictions or a dict of predictions.

kwargs dict

Additional parameters required for the metric.

direction DirectionType

The direction for ranking of the metric, "max" for maximization and "min" for minimization.

y_type PredictionKwargs

The type of predictions expected by the metric interface.


Bases: BaseModel

A Metric in Polaris.

A metric consists of a default metric, which is a callable labeled with additional metadata, as well as a config. The config can change how the metric is computed, for example by grouping the data before computing the metric.


Name Type Description
label MetricLabel

The actual callable that is at the core of the metric implementation.

custom_name str | None

A optional, custom name of the metric. Names should be unique within the context of a benchmark.

config GroupedMetricConfig | None

For more complex metrics, this object should hold all parameters for the metric.

fn Callable

The callable that actually computes the metric, automatically set based on the label.

is_multitask bool

Whether the metric expects a single set of predictions or a dict of predictions, automatically set based on the label.

kwargs dict

Additional parameters required for the metric, automatically set based on the label.

direction DirectionType

The direction for ranking of the metric, "max" for maximization and "min" for minimization, automatically set based on the label.

y_type PredictionKwargs

The type of predictions expected by the metric interface, automatically set based on the label.


    y_true: GroundTruth,
    y_pred: BenchmarkPredictions | None = None,
    y_prob: BenchmarkPredictions | None = None,
) -> float

Compute the metric.


Name Type Description Default
y_true GroundTruth

The true target values.

y_pred BenchmarkPredictions | None

The predicted target values, if any.

y_prob BenchmarkPredictions | None

The predicted target probabilities, if any.




pearsonr(y_true: np.ndarray, y_pred: np.ndarray)

Calculate a pearson r correlation


spearman(y_true: np.ndarray, y_pred: np.ndarray)

Calculate a Spearman correlation


absolute_average_fold_error(y_true: np.ndarray, y_pred: np.ndarray) -> float

Calculate the Absolute Average Fold Error (AAFE) metric. It measures the fold change between predicted values and observed values. The implementation is based on this paper.


Name Type Description Default
y_true ndarray

The true target values of shape (n_samples,)

y_pred ndarray

The predicted target values of shape (n_samples,).



Name Type Description
aafe float

The Absolute Average Fold Error.


cohen_kappa_score(y_true, y_pred, **kwargs)

Scikit learn cohen_kappa_score wraper with renamed arguments


average_precision_score(y_true, y_score, **kwargs)

Scikit learn average_precision_score wrapper that throws an error if y_true has no positive class



_rmsd(mol_probe: dm.Mol, mol_ref: dm.Mol) -> float

Calculate RMSD between predicted molecule and closest ground truth molecule. The RMSD is calculated with first conformer of predicted molecule and only consider heavy atoms for RMSD calculation. It is assumed that the predicted binding conformers are extracted from the docking output, where the receptor (protein) coordinates have been aligned with the original crystal structure.


Name Type Description Default
mol_probe Mol

Predicted molecule (docked ligand) with exactly one conformer.

mol_ref Mol

Ground truth molecule (crystal ligand) with at least one conformer. If multiple conformers are present, the lowest RMSD will be reported.



Type Description

Returns the RMS between two molecules, taking symmetry into account.


    y_pred: Union[str, List[dm.Mol]],
    y_true: Union[str, list[dm.Mol]],
    max_rsmd: float = 2,

Calculate the coverage of molecules with an RMSD less than a threshold (2 Å by default) compared to the reference molecule conformer.

It is assumed that the predicted binding conformers are extracted from the docking output, where the receptor (protein) coordinates have been aligned with the original crystal structure.


Name Type Description

List of predicted binding conformers.


List of ground truth binding confoermers.


The threshold for determining acceptable rsmd.