Submit to a Benchmark
This tutorial is an extended version of the Quickstart Guide
import polaris as po
from polaris.hub.client import PolarisHubClient
with PolarisHubClient() as client:
client.login()
Load from the Hub¶
Datasets and benchmarks are identified by a owner/slug
id.
benchmark = po.load_benchmark("polaris/hello-world-benchmark")
Loading a benchmark will automatically load the underlying dataset.
You can also load the dataset directly.
dataset = po.load_dataset("polaris/hello-world")
train, test = benchmark.get_train_test_split()
The created objects support various flavours to access the data.
- The objects are iterable;
- The objects can be indexed;
- The objects have properties to access all data at once.
for x, y in train:
pass
for i in range(len(train)):
x, y = train[i]
x = train.inputs
y = train.targets
To avoid accidental access to the test targets, the test object does not expose the labels and will throw an error if you try access them explicitly.
for x in test:
pass
for i in range(len(test)):
x = test[i]
x = test.inputs
# NOTE: The below will throw an error!
# y = test.targets
We also support conversion to other typical formats.
df_train = train.as_dataframe()
Submit your results¶
In this example, we will train a simple Random Forest model on the ECFP representation through scikit-learn and datamol.
import datamol as dm
from sklearn.ensemble import RandomForestRegressor
# We will recreate the split to pass a featurization function.
train, test = benchmark.get_train_test_split(featurization_fn=dm.to_fp)
# Define a model and train
model = RandomForestRegressor(max_depth=2, random_state=0)
model.fit(train.X, train.y)
predictions = model.predict(test.X)
As said before, evaluating the submissions should be done through the evaluate()
endpoint.
results = benchmark.evaluate(predictions)
results
Before uploading the results to the Hub, you can provide some additional information about the results that will be displayed on the Polaris Hub.
# For a complete list of metadata, check out the BenchmarkResults object
results.name = "hello-world-result"
results.github_url = "https://github.com/polaris-hub/polaris-hub"
results.paper_url = "https://polarishub.io/"
results.description = "Hello, World!"
results.tags = ["random_forest", "ecfp"]
results.user_attributes = {"Framework": "Scikit-learn"}
Finally, let's upload the results to the Hub!
results.upload_to_hub(owner="my-username", access="public")
That's it! Just like that you have submitted a result to a Polaris benchmark
The End.