API Reference

class xarray_mongodb.XarrayMongoDB(database: Database, collection: str = 'xarray', *, chunk_size_bytes: int = 261120, embed_threshold_bytes: int = 261120, ureg: pint.registry.UnitRegistry | None = None)

Synchronous driver for MongoDB to read/write xarray objects

Parameters
  • databasepymongo.database.Database

  • collection (str) – prefix of the collections to store the xarray data. Two collections will actually be created, <collection>.meta and <collection>.chunks.

  • chunk_size_bytes (int) – Size of the payload in a document in the chunks collection. Not to be confused with dask chunks. dask chunks that are larger than chunk_size_bytes will be transparently split across multiple MongoDB documents.

  • embed_threshold_bytes (int) –

    Cumulative size of variable buffers that will be embedded into the metadata documents in <collection>.meta. Buffers that exceed the threshold (starting from the largest) will be stored into the chunks documents in <collection>.chunks.

    Note

    • Embedded variables ignore the load parameter of get()

    • dask variables are never embedded, regardless of size

    • set embed_threshold_bytes=0 to force all buffers to be saved to <collection>.chunks, with the only exception of size zero non-dask variables

    • size zero non-dask variables are always embedded

  • ureg (pint.registry.UnitRegistry) – pint registry to allow putting and getting arrays with units. If omitted, it defaults to the global registry defined with pint.set_application_registry(). If the global registry was never set, it defaults to a standard registry built with defaults_en.txt.

get(_id: ObjectId, load: bool | collections.abc.Collection[str] | None = None) xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset

Read an xarray object back from MongoDB

:param ObjectId _id:

MongoDB object ID, as returned by put()

Parameters

load

Determines which variables to load immediately and which instead delay loading with dask. Must be one of:

None (default)

Match whatever was stored with put(), including chunk sizes

True

Immediately load all variables into memory. dask chunk information, if any, will be discarded.

False

Only load indices in memory; delay the loading of everything else with dask.

collection of str

variable names that must be immediately loaded into memory. Regardless of this, indices are always loaded. Non-existing variables are ignored. When retrieving a DataArray, you can target the data with the special hardcoded variable name __DataArray__.

Note

Embedded variables (see embed_threshold_bytes) are always loaded regardless of this flag.

Returns

xarray.DataArray or xarray.Dataset, depending on what was stored with put()

Raises

DocumentNotFoundError

_id not found in the MongoDB ‘meta’ collection, or one or more chunks are missing in the ‘chunks’ collection. This error typically happens when:

  • documents were deleted from the database

  • the Delayed returned by put() was never computed

  • one or more chunks of the dask variables failed to compute at any point during the graph resolution

If chunks loading is delayed with dask (see ‘load’ parameter), this exception may be raised at compute() time.

It is possible to invoke get() before put() is computed, as long as:

  • The pass parameter is valued None, False, or does not list any variables that were backed by dask during put()

  • the output of get() is computed after the output of put() is computed

Warning

The dask graph (if any) underlying the returned xarray object contains full access credentials to the MongoDB server. This commands caution if one pickles it and stores it on disk, or if he sends it over the network e.g. through dask distributed.

put(x: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset) tuple[bson.objectid.ObjectId, dask.delayed.Delayed | None]

Write an xarray object to MongoDB. Variables that are backed by dask are not computed; instead their insertion in the database is delayed. All other variables are immediately inserted.

This method automatically creates an index on the ‘chunks’ collection if there isn’t one yet.

Parameters

xxarray.DataArray or xarray.Dataset

Returns

Tuple of:

  • MongoDB _id of the inserted object

  • dask delayed object, or None if there are no variables using dask. It must be explicitly computed in order to fully store the Dataset/DataArray on the database.

Warning

The dask future contains access full credentials to the MongoDB server. This commands caution if one pickles it and stores it on disk, or if he sends it over the network e.g. through dask distributed.

class xarray_mongodb.XarrayMongoDBAsyncIO(database: AsyncIOMotorDatabase, collection: str = 'xarray', *, chunk_size_bytes: int = 261120, embed_threshold_bytes: int = 261120, ureg: pint.registry.UnitRegistry | None = None)

asyncio driver for MongoDB to read/write xarray objects

Parameters
async get(_id: ObjectId, load: bool | collections.abc.Collection[str] | None = None) xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset

Asynchronous variant of xarray_mongodb.XarrayMongoDB.get()

async put(x: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset) tuple[bson.objectid.ObjectId, dask.delayed.Delayed | None]

Asynchronous variant of xarray_mongodb.XarrayMongoDB.put()

exception xarray_mongodb.DocumentNotFoundError

One or more documents not found in MongoDB