Developer notes: Pint and Sparse
================================
.. note::
This page is for people contributing patches to the xarray_mongodb library itself.
If you just want to use `Pint `_ or
`Sparse `_, just make sure you satisfy the dependencies
(see :doc:`installing`) and feed the data through! Also read the documentation of the
``ureg`` parameter when initialising :class:`~xarray_mongodb.XarrayMongoDB`.
For how pint and sparse objects are stored on the database, see :doc:`db_reference`.
What is NEP18, and how it impacts xarray_mongodb
------------------------------------------------
Several "numpy-like" libraries support a duck-type interface, specified in
`NEP18 `_, so that
both numpy and other NEP18-compatible libraries can transparently wrap around them.
xarray_mongodb does not, itself, use NEP18. However, it does explicitly support several
data types that are possible thanks to NEP18. Namely,
- A :class:`xarray.Variable` can directly wrap:
- a :class:`numpy.ndarray`, or
- a :class:`pint.Quantity`, or
- a :class:`sparse.COO`, or
- a :class:`dask.array.Array`.
The wrapped object is accessible through the ``.data`` property.
.. note::
:class:`xarray.IndexVariable` wraps a :class:`pandas.Index`, but the ``.data``
property converts it on the fly to a :class:`numpy.ndarray`.
- A :class:`pint.Quantity` can directly wrap:
- a :class:`numpy.ndarray`, or
- a :class:`sparse.COO`, or
- a :class:`dask.array.Array`.
.. note::
Vanilla pint can also wrap int, float, :class:`decimal.Decimal`, but they are
automatically transformed to :class:`numpy.ndarray` as soon as xarray wraps around
the Quantity.
The wrapped object is accessible through the ``.magnitude`` property.
- A :class:`dask.array.Array` can directly wrap:
- a :class:`numpy.ndarray`, or
- a :class:`sparse.COO`.
The wrapped object cannot be accessed until the dask graph is computed; however the
object meta-data is visible without computing through the ``._meta`` property.
.. note::
dask wrapping pint, while theoretically possible due to how NEP18 works, is not
supported.
- A :class:`sparse.COO` is always backed by two :class:`numpy.ndarray` objects,
``.data`` and ``.coords``.
Worst case
----------
The most complicated use case that xarray_mongodb has to deal with is
1. a :class:`xarray.Variable`, which wraps around
2. a :class:`pint.Quantity`, which wraps around
3. a :class:`dask.array.Array`, which wraps around
4. a :class:`sparse.COO`, which is built on top of
5. two :class:`numpy.ndarray`.
The order is always the one described above. Simpler use cases may remove any of the
intermediate layers; at the top there's always has a :class:`xarray.Variable` and at the
bottom the data is always stored by :class:`numpy.ndarray`.
.. note::
At the moment of writing, the example below doesn't work; see
`pint#878 `_.
.. code::
>>> import dask.array as da
>>> import numpy as np
>>> import pint
>>> import sparse
>>> import xarray
>>> ureg = pint.UnitRegistry()
>>> a = xarray.DataArray(
... ureg.Quantity(
... da.from_array(
... sparse.COO.from_numpy(
... np.array([0, 0, 1.1])
... )
... ), "kg"
... )
... )
>>> a
dask.array
Dimensions without coordinates: dim_0
>>> a.data
>, 'kilogram')>
>>> a.data.magnitude
>>> a.data.units
>>> a.data.magnitude._meta
>>> a.data.magnitude.compute()
>>> a.data.magnitude.compute().data
array([1.1])
>>> a.data.magnitude.compute().coords
array([[2]])