xarray_mongodb allows storing xarray objects on MongoDB. Its design is heavily influenced by GridFS.

Current Features

  • Synchronous operations with PyMongo

  • asyncio support with Motor

  • Units annotations with Pint

  • Delayed put/get of xarray objects backed by dask. Only metadata and numpy-backed variables (e.g. indices) are written and read back at the time of graph definition.

  • Support for dask distributed. Note that the full init parameters of the MongoDB client are sent over the network; this includes access credentials. One needs to make sure that network communications between dask client and scheduler and between scheduler and workers are secure.

  • Data is stored on the database in a format that is agnostic to Python; this allows writing clients in different languages.

Upcoming Features


  • The Motor Tornado driver is not supported due to lack of developer interest - submissions are welcome.

  • At the moment of writing, Dask and Pint are not supported at the same time due to limitations in the Pint and xarray packages.

  • attrs are limited to the data types natively accepted by PyMongo

  • Non-string xarray dimensions and variable names are not supported

Quick start

>>> import pymongo
>>> import xarray
>>> import xarray_mongodb

>>> db = pymongo.MongoClient()['mydb']
>>> xdb = xarray_mongodb.XarrayMongoDB(db)
>>> a = xarray.DataArray([1, 2], dims=['x'], coords={'x': ['x1', 'x2']})
>>> _id, _ = xdb.put(a)
>>> xdb.get(_id)

<xarray.DataArray (x: 2)>
array([1, 2])
  * x        (x) <U2 'x1' 'x2'

Dask support:

>>> _id, future = xdb.put(a.chunk(1))  # store metadata and numpy variables
>>> future.compute()  # store dask variables
>>> b = xdb.get(_id)  # retrieve metadata and numpy variables
>>> b

<xarray.DataArray (x: 2)>
dask.array<shape=(2,), dtype=int64, chunksize=(1,)>
  * x        (x) <U2 'x1' 'x2'

>>> b.compute()  # retrieve dask variables

<xarray.DataArray (x: 2)>
array([1, 2])
  * x        (x) <U2 'x1' 'x2'



xarray_mongodb is developed by Amphora and is available under the open source Apache License

The database storage specifications are patent-free and in the public domain. Anybody can write an alternative implementation; compatibility with the Python module is not enforced by law, but strongly encouraged.