Submit to a Benchmark

This tutorial is an extended version of the Quickstart Guide

In [ ]:

Copied!

import polaris as po
import polaris as po

We first need to authenticate ourselves using our Polaris account. If you don't have an account yet, you can create one here.

In [ ]:

Copied!

from polaris.hub.client import PolarisHubClient

with PolarisHubClient() as client:
    client.login()
from polaris.hub.client import PolarisHubClient

with PolarisHubClient() as client:
    client.login()

Load from the Hub¶

Datasets and benchmarks are identified by a owner/slug id.

In [ ]:

Copied!

benchmark = po.load_benchmark("polaris/hello-world-benchmark")
benchmark = po.load_benchmark("polaris/hello-world-benchmark")

Loading a benchmark will automatically load the underlying dataset.

You can also load the dataset directly.

In [ ]:

Copied!

dataset = po.load_dataset("polaris/hello-world")
dataset = po.load_dataset("polaris/hello-world")

The Benchmark API¶

The benchmark object provides two main API endpoints.

get_train_test_split(): For creating objects through which we can access the different dataset partitions.
evaluate(): For evaluating a set of predictions in accordance with the benchmark protocol.

Train-test split¶

In [ ]:

Copied!

train, test = benchmark.get_train_test_split()
train, test = benchmark.get_train_test_split()

The created objects support various flavours to access the data.

The objects are iterable;
The objects can be indexed;
The objects have properties to access all data at once.

In [ ]:

Copied!

for x, y in train:
    pass
for x, y in train:
    pass

In [ ]:

Copied!

for i in range(len(train)):
    x, y = train[i]
for i in range(len(train)):
    x, y = train[i]

In [ ]:

Copied!

x = train.inputs
y = train.targets
x = train.inputs
y = train.targets

To avoid accidental access to the test targets, the test object does not expose the labels and will throw an error if you try access them explicitly.

In [ ]:

Copied!

for x in test:
    pass
for x in test:
    pass

In [ ]:

Copied!

for i in range(len(test)):
    x = test[i]
for i in range(len(test)):
    x = test[i]

In [ ]:

Copied!

x = test.inputs

# NOTE: The below will throw an error!
# y = test.targets
x = test.inputs

# NOTE: The below will throw an error!
# y = test.targets

We also support conversion to other typical formats.

In [ ]:

Copied!

df_train = train.as_dataframe()
df_train = train.as_dataframe()

Submit your results¶

In this example, we will train a simple Random Forest model on the ECFP representation through scikit-learn and datamol.

In [ ]:

Copied!





import datamol as dm
from sklearn.ensemble import RandomForestRegressor

# We will recreate the split to pass a featurization function.
train, test = benchmark.get_train_test_split(featurization_fn=dm.to_fp)

# Define a model and train
model = RandomForestRegressor(max_depth=2, random_state=0)
model.fit(train.X, train.y)
import datamol as dm
from sklearn.ensemble import RandomForestRegressor

# We will recreate the split to pass a featurization function.
train, test = benchmark.get_train_test_split(featurization_fn=dm.to_fp)

# Define a model and train
model = RandomForestRegressor(max_depth=2, random_state=0)
model.fit(train.X, train.y)

In [ ]:

Copied!

predictions = model.predict(test.X)
predictions = model.predict(test.X)

As said before, evaluating the submissions should be done through the evaluate() endpoint.

In [ ]:

Copied!

results = benchmark.evaluate(predictions)
results
results = benchmark.evaluate(predictions)
results

Before uploading the results to the Hub, you can provide some additional information about the results that will be displayed on the Polaris Hub.

In [ ]:

Copied!





# For a complete list of metadata, check out the BenchmarkResults object
results.name = "hello-world-result"
results.github_url = "https://github.com/polaris-hub/polaris-hub"
results.paper_url = "https://polarishub.io/"
results.description = "Hello, World!"
results.tags = ["random_forest", "ecfp"]
results.user_attributes = {"Framework": "Scikit-learn"}
# For a complete list of metadata, check out the BenchmarkResults object
results.name = "hello-world-result"
results.github_url = "https://github.com/polaris-hub/polaris-hub"
results.paper_url = "https://polarishub.io/"
results.description = "Hello, World!"
results.tags = ["random_forest", "ecfp"]
results.user_attributes = {"Framework": "Scikit-learn"}

Finally, let's upload the results to the Hub!

In [ ]:

Copied!

results.upload_to_hub(owner="my-username")
results.upload_to_hub(owner="my-username")

That's it! Just like that you have submitted a result to a Polaris benchmark

The End.

Submit to a Benchmark

Login¶

Load from the Hub¶

The Benchmark API¶

Train-test split¶

Submit your results¶