xarray_mongodb
xarray_mongodb allows storing xarray objects on MongoDB. Its design is heavily influenced by GridFS.
Current features
Synchronous operations with PyMongo
asyncio support with Motor
Units annotations with Pint
Delayed put/get of xarray objects backed by dask. Only metadata and numpy-backed variables (e.g. indices) are written and read back at the time of graph definition.
Support for dask distributed. Note that the full init parameters of the MongoDB client are sent over the network; this includes access credentials. One needs to make sure that network communications between dask client and scheduler and between scheduler and workers are secure.
Data is stored on the database in a format that is agnostic to Python; this allows writing clients in different languages.
Future features
Sparse arrays with PyData Sparse
Limitations
The Motor Tornado driver is not supported due to lack of developer interest - submissions are welcome.
At the moment of writing, Dask and Pint are not supported at the same time due to limitations in the Pint and xarray packages.
attrs
are limited to the data types natively accepted by PyMongoNon-string xarray dimensions and variable names are not supported
Quick start
>>> import pymongo
>>> import xarray
>>> import xarray_mongodb
>>> db = pymongo.MongoClient()['mydb']
>>> xdb = xarray_mongodb.XarrayMongoDB(db)
>>> a = xarray.DataArray([1, 2], dims=['x'], coords={'x': ['x1', 'x2']})
>>> _id, _ = xdb.put(a)
>>> xdb.get(_id)
<xarray.DataArray (x: 2)>
array([1, 2])
Coordinates:
* x (x) <U2 'x1' 'x2'
Dask support:
>>> _id, future = xdb.put(a.chunk(1)) # store metadata and numpy variables
>>> future.compute() # store dask variables
>>> b = xdb.get(_id) # retrieve metadata and numpy variables
>>> b
<xarray.DataArray (x: 2)>
dask.array<shape=(2,), dtype=int64, chunksize=(1,)>
Coordinates:
* x (x) <U2 'x1' 'x2'
>>> b.compute() # retrieve dask variables
<xarray.DataArray (x: 2)>
array([1, 2])
Coordinates:
* x (x) <U2 'x1' 'x2'
Index
License
xarray_mongodb is available under the open source Apache License
The database storage specifications are patent-free and in the public domain. Anybody can write an alternative implementation; compatibility with the Python module is not enforced by law, but strongly encouraged.