Skip to content

Actions

auroris.curation.actions.BaseAction

Bases: BaseModel, ABC

An action in the curation process.

The importance of reproducibility

One of the main goals in designing auroris is to make it easy to reproduce the curation process. Reproducibility is key to scientific research. This is why a BaseAction needs to be serializable and uniquely identified by a name.

Attributes:

Name Type Description
name str

The name that uniquely identifies the action. This is used to serialize and deserialize the action.

prefix str

This prefix is used when an action adds columns to a dataset. If not set, it defaults to the name in uppercase.


StereoIsomerACDetection

Bases: BaseAction

Automatic detection of activity shift between stereoisomers.

See auroris.curation.functional.detect_streoisomer_activity_cliff for the docs of the stereoisomer_id_col, y_cols and threshold attributes

Attributes:

Name Type Description
mol_col Optional[str]

Column with the SMILES or RDKit Molecule objects. If specified, will be used to render an image for the activity cliffs.

Deduplication

Bases: BaseAction

Automatic detection of outliers.

See auroris.curation.functional.deduplicate for the docs of the deduplicate_on, y_cols, keep and method attributes

Discretization

Bases: BaseAction

Thresholding bioactivity columns to binary or multiclass labels.

See auroris.curation.functional.discretize for the docs of the thresholds, inplace, allow_nan and label_order attributes

Attributes:

Name Type Description
input_column str

The column to discretize.

log_scale bool

Whether a visual depiction of the discretization should be on a log scale.

ContinuousDistributionVisualization

Bases: BaseAction

Visualize one or more continuous distribution(s).

See auroris.visualization.visualize_continuous_distribution for the docs of the log_scale and bins attributes

Attributes:

Name Type Description
y_cols List[str]

The columns whose distributions should be visualized.

MoleculeCuration

Bases: BaseAction

Automated molecule curation and chemistry space distribution.

See auroris.curation.functional.curate_molecules for the docs of the remove_stereo, fix_mol, count_stereoisomers, and count_stereocenters attributes

Attributes:

Name Type Description
input_column str

The name of the column that has the molecules (either dm.Mol objects or SMILES).

X_col Optional[str]

Column with custom features for each of the molecules. If None, will use ECFP.

y_cols Optional[Union[str, List[str]]]

Column names for bioactivities, which will be used to colorcode the chemical space visualization.

OutlierDetection

Bases: BaseAction

Automatic detection of outliers.

See auroris.curation.functional.detect_outliers for the docs of the method and kwargs attributes

Attributes:

Name Type Description
columns List[str]

The columns for which to detect outliers.