Developer notes: Pint and Sparse

Note

This page is for people contributing patches to the xarray_mongodb library itself.

If you just want to use Pint or Sparse, just make sure you satisfy the dependencies (see Installation) and feed the data through! Also read the documentation of the ureg parameter when initialising XarrayMongoDB.

For how pint and sparse objects are stored on the database, see Database Reference.

What is NEP18, and how it impacts xarray_mongodb

Several “numpy-like” libraries support a duck-type interface, specified in NEP18, so that both numpy and other NEP18-compatible libraries can transparently wrap around them.

xarray_mongodb does not, itself, use NEP18. However, it does explicitly support several data types that are possible thanks to NEP18. Namely,

Worst case

The most complicated use case that xarray_mongodb has to deal with is

  1. a xarray.Variable, which wraps around

  2. a pint.Quantity, which wraps around

  3. a dask.array.Array, which wraps around

  4. a sparse.COO, which is built on top of

  5. two numpy.ndarray.

The order is always the one described above. Simpler use cases may remove any of the intermediate layers; at the top there’s always has a xarray.Variable and at the bottom the data is always stored by numpy.ndarray.

Note

At the moment of writing, the example below doesn’t work; see pint#878.

>>> import dask.array as da
>>> import numpy as np
>>> import pint
>>> import sparse
>>> import xarray
>>> ureg = pint.UnitRegistry()
>>> a = xarray.DataArray(
...     ureg.Quantity(
...         da.from_array(
...             sparse.COO.from_numpy(
...                 np.array([0, 0, 1.1])
...             )
...         ), "kg"
...     )
... )
>>> a
<xarray.DataArray (dim_0: 3)>
dask.array<array, shape=(3,), dtype=float64, chunksize=(3,), chunktype=pint.Quantity>
Dimensions without coordinates: dim_0
>>> a.data
<Quantity(<dask.array<array, shape=(3,), dtype=float64, chunksize=(3,),
           chunktype=COO>>, 'kilogram')>
>>> a.data.magnitude
<dask.array<array, shape=(3,), dtype=float64, chunksize=(3,), chunktype=COO>
>>> a.data.units
<Unit('kilogram')>
>>> a.data.magnitude._meta
<COO: shape=(0,), dtype=float64, nnz=0, fill_value=0.0>
>>> a.data.magnitude.compute()
<COO: shape=(3,), dtype=float64, nnz=1, fill_value=0.0>
>>> a.data.magnitude.compute().data
array([1.1])
>>> a.data.magnitude.compute().coords
array([[2]])

Legacy support

xarray_mongodb has to cope with a few caveats with legacy versions of its dependencies:

  • It requires numpy >= 1.15; however NEP18 was first introduced in v1.16 and consolidated in v1.17.

  • It requires dask >= 1.2; however the da.Array._meta property, which exposes wrapped non-numpy objects, was not added until v2.0.

Hence, there is a set of minimum required versions when pint and sparse are not involved, and a different set of much more recent ones when they are.

See also: Minimum dependency versions.