Developer notes: Pint and Sparse
Note
This page is for people contributing patches to the xarray_mongodb library itself.
If you just want to use Pint or
Sparse, just make sure you satisfy the dependencies
(see Installation) and feed the data through! Also read the documentation of the
ureg
parameter when initialising XarrayMongoDB
.
For how pint and sparse objects are stored on the database, see Database Reference.
What is NEP18, and how it impacts xarray_mongodb
Several “numpy-like” libraries support a duck-type interface, specified in NEP18, so that both numpy and other NEP18-compatible libraries can transparently wrap around them.
xarray_mongodb does not, itself, use NEP18. However, it does explicitly support several data types that are possible thanks to NEP18. Namely,
A
xarray.Variable
can directly wrap:a
numpy.ndarray
, ora
pint.Quantity
, ora
sparse.COO
, or
The wrapped object is accessible through the
.data
property.Note
xarray.IndexVariable
wraps apandas.Index
, but the.data
property converts it on the fly to anumpy.ndarray
.A
pint.Quantity
can directly wrap:a
numpy.ndarray
, ora
sparse.COO
, or
Note
Vanilla pint can also wrap int, float,
decimal.Decimal
, but they are automatically transformed tonumpy.ndarray
as soon as xarray wraps around the Quantity.The wrapped object is accessible through the
.magnitude
property.A
dask.array.Array
can directly wrap:a
numpy.ndarray
, ora
sparse.COO
.
The wrapped object cannot be accessed until the dask graph is computed; however the object meta-data is visible without computing through the
._meta
property.Note
dask wrapping pint, while theoretically possible due to how NEP18 works, is not supported.
A
sparse.COO
is always backed by twonumpy.ndarray
objects,.data
and.coords
.
Worst case
The most complicated use case that xarray_mongodb has to deal with is
a
xarray.Variable
, which wraps arounda
pint.Quantity
, which wraps arounda
dask.array.Array
, which wraps arounda
sparse.COO
, which is built on top oftwo
numpy.ndarray
.
The order is always the one described above. Simpler use cases may remove any of the
intermediate layers; at the top there’s always has a xarray.Variable
and at the
bottom the data is always stored by numpy.ndarray
.
Note
At the moment of writing, the example below doesn’t work; see pint#878.
>>> import dask.array as da
>>> import numpy as np
>>> import pint
>>> import sparse
>>> import xarray
>>> ureg = pint.UnitRegistry()
>>> a = xarray.DataArray(
... ureg.Quantity(
... da.from_array(
... sparse.COO.from_numpy(
... np.array([0, 0, 1.1])
... )
... ), "kg"
... )
... )
>>> a
<xarray.DataArray (dim_0: 3)>
dask.array<array, shape=(3,), dtype=float64, chunksize=(3,), chunktype=pint.Quantity>
Dimensions without coordinates: dim_0
>>> a.data
<Quantity(<dask.array<array, shape=(3,), dtype=float64, chunksize=(3,),
chunktype=COO>>, 'kilogram')>
>>> a.data.magnitude
<dask.array<array, shape=(3,), dtype=float64, chunksize=(3,), chunktype=COO>
>>> a.data.units
<Unit('kilogram')>
>>> a.data.magnitude._meta
<COO: shape=(0,), dtype=float64, nnz=0, fill_value=0.0>
>>> a.data.magnitude.compute()
<COO: shape=(3,), dtype=float64, nnz=1, fill_value=0.0>
>>> a.data.magnitude.compute().data
array([1.1])
>>> a.data.magnitude.compute().coords
array([[2]])