Skip to content

Client

polaris.hub.settings.PolarisHubSettings

Bases: BaseSettings

Settings for the OAuth2 Polaris Hub API Client.

Secrecy of these settings

Since the Polaris Hub uses PCKE (Proof Key for Code Exchange) for OAuth2, these values thus do not have to be kept secret. See RFC 7636 for more info.

Attributes:

Name Type Description
hub_url HttpUrlString

The URL to the main page of the Polaris Hub.

api_url Optional[HttpUrlString]

The URL to the main entrypoint of the Polaris API.

authorize_url HttpUrlString

The URL of the OAuth2 authorization endpoint.

callback_url HttpUrlString

The URL to which the user is redirected after authorization.

token_fetch_url HttpUrlString

The URL of the OAuth2 token endpoint.

user_info_url HttpUrlString

The URL of the OAuth2 user info endpoint.

scopes str

The OAuth2 scopes that are requested.

client_id str

The OAuth2 client ID.

ca_bundle Optional[Union[str, bool]]

The path to a CA bundle file for requests. Allows for custom SSL certificates to be used.


polaris.hub.client.PolarisHubClient

PolarisHubClient(settings: Optional[PolarisHubSettings] = None, cache_auth_token: bool = True, **kwargs: dict)

Bases: OAuth2Client

A client for the Polaris Hub API. The Polaris Hub is a central repository of datasets, benchmarks and results. Visit it here: https://polarishub.io/.

Bases the authlib client, which in turns bases the httpx client. See the relevant docs to learn more about how to use these clients outside of the integration with the Polaris Hub.

Closing the client

The client should be closed after all requests have been made. For convenience, you can also use the client as a context manager to automatically close the client when the context is exited. Note that once the client has been closed, it cannot be used anymore.

# Make sure to close the client once finished
client = PolarisHubClient()
client.get(...)
client.close()

# Or use the client as a context manager
with PolarisHubClient() as client:
    client.get(...)
Async Client

authlib also supports an async client. Since we don't expect to make multiple requests to the Hub in parallel and due to the added complexity stemming from using the Python asyncio API, we are sticking to the sync client - at least for now.

Parameters:

Name Type Description Default
settings Optional[PolarisHubSettings]

A PolarisHubSettings instance.

None
cache_auth_token bool

Whether to cache the auth token to a file.

True
**kwargs dict

Additional keyword arguments passed to the authlib OAuth2Client constructor.

{}

user_info property

user_info: dict

Get information about the currently logged in user through the OAuth2 User Info flow.

user_as_owner property

user_as_owner: HubOwner

Easily get the currently logged-in user a HubOwner instance.

login

login(overwrite: bool = False, auto_open_browser: bool = True)

Login to the Polaris Hub using the OAuth2 protocol.

Headless authentication

It is currently not possible to login to the Polaris Hub without a browser. See this Github issue for more info.

Parameters:

Name Type Description Default
overwrite bool

Whether to overwrite the current token if the user is already logged in.

False
auto_open_browser bool

Whether to automatically open the browser to visit the authorization URL.

True

list_datasets

list_datasets(limit: int = 100, offset: int = 0) -> list[str]

List all available datasets on the Polaris Hub.

Parameters:

Name Type Description Default
limit int

The maximum number of datasets to return.

100
offset int

The offset from which to start returning datasets.

0

Returns:

Type Description
list[str]

A list of dataset names in the format owner/dataset_name.

get_dataset

get_dataset(owner: Union[str, HubOwner], name: str, verify_checksum: bool = True) -> Dataset

Load a dataset from the Polaris Hub.

Parameters:

Name Type Description Default
owner Union[str, HubOwner]

The owner of the dataset. Can be either a user or organization from the Polaris Hub.

required
name str

The name of the dataset.

required
verify_checksum bool

Whether to use the checksum to verify the integrity of the dataset.

True

Returns:

Type Description
Dataset

A Dataset instance, if it exists.

open_zarr_file

open_zarr_file(owner: Union[str, HubOwner], name: str, path: str, mode: IOMode, as_consolidated: bool = True) -> zarr.hierarchy.Group

Open a Zarr file from a Polaris dataset

Parameters:

Name Type Description Default
owner Union[str, HubOwner]

Which Hub user or organization owns the artifact.

required
name str

Name of the dataset.

required
path str

Path to the Zarr file within the dataset.

required
mode IOMode

The mode in which the file is opened.

required
as_consolidated bool

Whether to open the store with consolidated metadata for optimized reading. This is only applicable in 'r' and 'r+' modes.

True

Returns:

Type Description
Group

The Zarr object representing the dataset.

list_benchmarks

list_benchmarks(limit: int = 100, offset: int = 0) -> list[str]

List all available benchmarks on the Polaris Hub.

Parameters:

Name Type Description Default
limit int

The maximum number of benchmarks to return.

100
offset int

The offset from which to start returning benchmarks.

0

Returns:

Type Description
list[str]

A list of benchmark names in the format owner/benchmark_name.

get_benchmark

get_benchmark(owner: Union[str, HubOwner], name: str, verify_checksum: bool = True) -> BenchmarkSpecification

Load a benchmark from the Polaris Hub.

Parameters:

Name Type Description Default
owner Union[str, HubOwner]

The owner of the benchmark. Can be either a user or organization from the Polaris Hub.

required
name str

The name of the benchmark.

required
verify_checksum bool

Whether to use the checksum to verify the integrity of the dataset.

True

Returns:

Type Description
BenchmarkSpecification

A BenchmarkSpecification instance, if it exists.

upload_results

upload_results(results: BenchmarkResults, access: AccessType = 'private', owner: Optional[Union[HubOwner, str]] = None)

Upload the results to the Polaris Hub.

Owner

The owner of the results will automatically be inferred by the hub from the user making the request. Contrary to other artifact types, an organization cannot own a set of results. However, you can specify the BenchmarkResults.contributors field to share credit with other hub users.

Required meta-data

The Polaris client and hub maintain different requirements as to which meta-data is required. The requirements by the hub are stricter, so when uploading to the hub you might get some errors on missing meta-data. Make sure to fill-in as much of the meta-data as possible before uploading.

Benchmark name and owner

Importantly, results.benchmark_name and results.benchmark_owner must be specified and match an existing benchmark on the Polaris Hub. If these results were generated by benchmark.evaluate(...), this is done automatically.

Parameters:

Name Type Description Default
results BenchmarkResults

The results to upload.

required
access AccessType

Grant public or private access to result

'private'
owner Optional[Union[HubOwner, str]]

Which Hub user or organization owns the artifact. Takes precedence over results.owner.

None

upload_dataset

upload_dataset(dataset: Dataset, access: AccessType = 'private', timeout: TimeoutTypes = (10, 200), owner: Optional[Union[HubOwner, str]] = None, if_exists: ZarrConflictResolution = 'replace')

Upload the dataset to the Polaris Hub.

Owner

You have to manually specify the owner in the dataset data model. Because the owner could be a user or an organization, we cannot automatically infer this from just the logged-in user.

Required meta-data

The Polaris client and hub maintain different requirements as to which meta-data is required. The requirements by the hub are stricter, so when uploading to the hub you might get some errors on missing meta-data. Make sure to fill-in as much of the meta-data as possible before uploading.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to upload.

required
access AccessType

Grant public or private access to result

'private'
timeout TimeoutTypes

Request timeout values. User can modify the value when uploading large dataset as needed. This can be a single value with the timeout in seconds for all IO operations, or a more granular tuple with (connect_timeout, write_timeout). The type of the the timout parameter comes from httpx. Since datasets can get large, it might be needed to increase the write timeout for larger datasets. See also: https://www.python-httpx.org/advanced/#timeout-configuration

(10, 200)
owner Optional[Union[HubOwner, str]]

Which Hub user or organization owns the artifact. Takes precedence over dataset.owner.

None
if_exists ZarrConflictResolution

Action for handling existing files in the Zarr archive. Options are 'raise' to throw an error, 'replace' to overwrite, or 'skip' to proceed without altering the existing files.

'replace'

upload_benchmark

upload_benchmark(benchmark: BenchmarkSpecification, access: AccessType = 'private', owner: Optional[Union[HubOwner, str]] = None)

Upload the benchmark to the Polaris Hub.

Owner

You have to manually specify the owner in the benchmark data model. Because the owner could be a user or an organization, we cannot automatically infer this from the logged-in user.

Required meta-data

The Polaris client and hub maintain different requirements as to which meta-data is required. The requirements by the hub are stricter, so when uploading to the hub you might get some errors on missing meta-data. Make sure to fill-in as much of the meta-data as possible before uploading.

Non-existent datasets

The client will not upload the associated dataset to the hub if it does not yet exist. Make sure to specify an existing dataset or upload the dataset first.

Parameters:

Name Type Description Default
benchmark BenchmarkSpecification

The benchmark to upload.

required
access AccessType

Grant public or private access to result

'private'
owner Optional[Union[HubOwner, str]]

Which Hub user or organization owns the artifact. Takes precedence over benchmark.owner.

None