Developer notes: Pint and Sparse

Note

This page is for people contributing patches to the xarray_mongodb library itself.

If you just want to use Pint or Sparse, just make sure you satisfy the dependencies (see Installation) and feed the data through! Also read the documentation of the ureg parameter when initialising XarrayMongoDB.

For how pint and sparse objects are stored on the database, see Database Reference.

What is NEP18, and how it impacts xarray_mongodb

Several “numpy-like” libraries support a duck-type interface, specified in NEP18, so that both numpy and other NEP18-compatible libraries can transparently wrap around them.

xarray_mongodb does not, itself, use NEP18. However, it does explicitly support several data types that are possible thanks to NEP18. Namely,

Worst case

The most complicated use case that xarray_mongodb has to deal with is

  1. a xarray.Variable, which wraps around

  2. a pint.Quantity, which wraps around

  3. a dask.array.Array, which wraps around

  4. a sparse.COO, which is built on top of

  5. two numpy.ndarray.

The order is always the one described above. Simpler use cases may remove any of the intermediate layers; at the top there’s always has a xarray.Variable and at the bottom the data is always stored by numpy.ndarray.

Note

At the moment of writing, the example below doesn’t work; see pint#878.

>>> import dask.array as da
>>> import numpy as np
>>> import pint
>>> import sparse
>>> import xarray
>>> ureg = pint.UnitRegistry()
>>> a = xarray.DataArray(
...     ureg.Quantity(
...         da.from_array(
...             sparse.COO.from_numpy(
...                 np.array([0, 0, 1.1])
...             )
...         ), "kg"
...     )
... )
>>> a
<xarray.DataArray (dim_0: 3)>
dask.array<array, shape=(3,), dtype=float64, chunksize=(3,), chunktype=pint.Quantity>
Dimensions without coordinates: dim_0
>>> a.data
<Quantity(<dask.array<array, shape=(3,), dtype=float64, chunksize=(3,),
           chunktype=COO>>, 'kilogram')>
>>> a.data.magnitude
<dask.array<array, shape=(3,), dtype=float64, chunksize=(3,), chunktype=COO>
>>> a.data.units
<Unit('kilogram')>
>>> a.data.magnitude._meta
<COO: shape=(0,), dtype=float64, nnz=0, fill_value=0.0>
>>> a.data.magnitude.compute()
<COO: shape=(3,), dtype=float64, nnz=1, fill_value=0.0>
>>> a.data.magnitude.compute().data
array([1.1])
>>> a.data.magnitude.compute().coords
array([[2]])