API Reference
- class xarray_mongodb.XarrayMongoDB(database: Database, collection: str = 'xarray', *, chunk_size_bytes: int = 261120, embed_threshold_bytes: int = 261120, ureg: pint.registry.UnitRegistry | None = None)
Synchronous driver for MongoDB to read/write xarray objects
- Parameters:
database –
pymongo.database.Database
collection (str) – prefix of the collections to store the xarray data. Two collections will actually be created, <collection>.meta and <collection>.chunks.
chunk_size_bytes (int) – Size of the payload in a document in the chunks collection. Not to be confused with dask chunks. dask chunks that are larger than chunk_size_bytes will be transparently split across multiple MongoDB documents.
embed_threshold_bytes (int) –
Cumulative size of variable buffers that will be embedded into the metadata documents in <collection>.meta. Buffers that exceed the threshold (starting from the largest) will be stored into the chunks documents in <collection>.chunks.
Note
Embedded variables ignore the
load
parameter ofget()
dask variables are never embedded, regardless of size
set
embed_threshold_bytes=0
to force all buffers to be saved to <collection>.chunks, with the only exception of size zero non-dask variablessize zero non-dask variables are always embedded
ureg (pint.registry.UnitRegistry) – pint registry to allow putting and getting arrays with units. If omitted, it defaults to the global registry defined with
pint.set_application_registry()
. If the global registry was never set, it defaults to a standard registry built withdefaults_en.txt
.
- get(_id: ObjectId, load: bool | collections.abc.Collection[str] | None = None) xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset
Read an xarray object back from MongoDB
- Parameters:
load –
Determines which variables to load immediately and which instead delay loading with dask. Must be one of:
- None (default)
Match whatever was stored with put(), including chunk sizes
- True
Immediately load all variables into memory. dask chunk information, if any, will be discarded.
- False
Only load indices in memory; delay the loading of everything else with dask.
- collection of str
variable names that must be immediately loaded into memory. Regardless of this, indices are always loaded. Non-existing variables are ignored. When retrieving a DataArray, you can target the data with the special hardcoded variable name
__DataArray__
.
Note
Embedded variables (see
embed_threshold_bytes
) are always loaded regardless of this flag.- Returns:
xarray.DataArray
orxarray.Dataset
, depending on what was stored withput()
- Raises:
_id not found in the MongoDB ‘meta’ collection, or one or more chunks are missing in the ‘chunks’ collection. This error typically happens when:
documents were deleted from the database
the Delayed returned by put() was never computed
one or more chunks of the dask variables failed to compute at any point during the graph resolution
If chunks loading is delayed with dask (see ‘load’ parameter), this exception may be raised at compute() time.
It is possible to invoke
get()
beforeput()
is computed, as long as:The
pass
parameter is valued None, False, or does not list any variables that were backed by dask duringput()
the output of
get()
is computed after the output ofput()
is computed
Warning
The dask graph (if any) underlying the returned xarray object contains full access credentials to the MongoDB server. This commands caution if one pickles it and stores it on disk, or if he sends it over the network e.g. through dask distributed.
- put(x: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset) tuple[bson.objectid.ObjectId, dask.delayed.Delayed | None]
Write an xarray object to MongoDB. Variables that are backed by dask are not computed; instead their insertion in the database is delayed. All other variables are immediately inserted.
This method automatically creates an index on the ‘chunks’ collection if there isn’t one yet.
- Parameters:
x –
xarray.DataArray
orxarray.Dataset
- Returns:
Tuple of:
MongoDB _id of the inserted object
dask delayed object, or None if there are no variables using dask. It must be explicitly computed in order to fully store the Dataset/DataArray on the database.
Warning
The dask future contains access full credentials to the MongoDB server. This commands caution if one pickles it and stores it on disk, or if he sends it over the network e.g. through dask distributed.
- class xarray_mongodb.XarrayMongoDBAsyncIO(database: AsyncIOMotorDatabase, collection: str = 'xarray', *, chunk_size_bytes: int = 261120, embed_threshold_bytes: int = 261120, ureg: pint.registry.UnitRegistry | None = None)
asyncio
driver for MongoDB to read/write xarray objects- Parameters:
database –
motor.motor_asyncio.AsyncIOMotorDatabase
collection (str) – See
XarrayMongoDB
chunk_size_bytes (int) – See
XarrayMongoDB
embed_threshold_bytes (int) – See
XarrayMongoDB
ureg (pint.registry.UnitRegistry) – See
XarrayMongoDB
- async get(_id: ObjectId, load: bool | collections.abc.Collection[str] | None = None) xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset
Asynchronous variant of
xarray_mongodb.XarrayMongoDB.get()
- async put(x: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset) tuple[bson.objectid.ObjectId, dask.delayed.Delayed | None]
Asynchronous variant of
xarray_mongodb.XarrayMongoDB.put()
- exception xarray_mongodb.DocumentNotFoundError
One or more documents not found in MongoDB