Data and metadata#
This chapter explains data and metadata in FINAM.
Data arrays#
Internally, all data is passed as numpy.ndarray
(or numpy.ma.MaskedArray
, see Masked arrays), wrapped in pint.Quantity
.
In addition, a time axis with a single entry is added at index 0.
Data can be pushed to outputs as any type that can be wrapped in numpy.ndarray
.
This includes lists and scalar values.
Wrapping, adding time axis and units are performed internally, based on the available metadata (see section metadata).
Inputs always receive data in the wrapped numpy.ndarray
form, with units and time axis.
Several tool functions are provided in data
to convert to and from the internal data structure:
full(value, info)
Creates a new data array with units according to the given info, filled with given value.full_like(data, value)
Creates a new data array with the same shape, type and units as a given object.prepare(data, info, time)
Wraps data, checks or adds time axis and units based oninfo
(see The Info object). Performs a metadata checks.strip_time(xdata, grid)
Squeezes away the time axis if there is a single entry only, and raises an error otherwise.get_magnitude(xdata)
Extracts data without units. Returns anumpy.ndarray
array without units, but with time axis preserved.get_units(xdata)
Gets thepint.Unit
units of the dataget_dimensionality(xdata)
Gets thepint
dimensionality of the data (like length, mass, …)has_time_axis(xdata, grid)
Checks if the data has a time axis
Masked arrays#
FINAM uses numpy.ma.MaskedArray
inside pint.Quantity
to represent masked data.
Masked data does not require any special treatment and can be used like usual numpy arrays.
By default FINAM will allow data to have flexible masks, which means they can change over time.
In the Info
object (see below), the mask of the data can be specified:
.Mask.FLEX
: data can be masked or unmasked and the mask could change over time (default).Mask.NONE
: data is unmasked and exchanged as plain numpy arraysnumpy.ndarray
orbool
: data is masked with a given mask that is constant over time
Convenience functions for masked arrays are:
is_masked_array
to check if the given data is a masked arrayhas_masked_values
to check if the given data is a masked array and has some values maskedfilled
to create a copy of the data with masked entries filled with a given value, if it is a masked arrayto_masked
to create a masked version of the datato_masked
to create a flattened version of the data only containing the unmasked valuesto_masked
to create a full mask array from a compressed version of the datato_masked
to check if mask settings in info objects are compatiblemasks_equalto_masked
to check if masks are equal
Warning
Due to a numpy
bug, quantities should not be created from masked data using multiplication syntax (i.e. magnitude * units
).
Instead, use method Quantity
of UNITS
.
data = finam.UNITS.Quantity(magnitude, "m")
(See issues pint#633, numpy#15200)
Metadata#
In FINAM, all data is associated with metadata.
Inputs and outputs of components specify the metadata describing the data they send or receive. Internally, this is used for consistency checks, and for automated data transformations.
FINAM metadata follows the CF Conventions.
There are two types of mandatory metadata:
Units (missing units are assumed as dimensionless)
Metadata is passed around as objects of type Info
:
The Info
object#
Objects of type Info
represent the metadata associated with an input or output.
It has the following properties:
time
- initial time stamp for the associated datagrid
- for the Grid specificationmeta
- adict
for all other metadatamask
- the mask specification for the data, eitherMask
,numpy.ndarray
orbool
For convenience, entries in meta
can be used like normal member variables:
info = Info(
time=datetime(2000, 1, 1),
grid=NoGrid(),
mask=Mask.NONE,
units="m",
foo="bar"
)
print(info.units)
print(info.foo)
m
bar
When creating inputs or outputs in components, the Info
object does not need to be constructed explicitly.
In component code, these two lines are equivalent:
time = datetime(2000, 1, 1)
self.inputs.add(name="A", time=time, grid=NoGrid(), units="m")
self.inputs.add(name="B", info=Info(time=time, grid=NoGrid(), units="m"))
Metadata from source or target#
Any Info
attributes initialized with None (default for all entries) will be filled from the metadata on the other end of the coupling link.
E.g. if the grid specification of an input is intended to be taken from the connected output, the input can be initialized like this:
self.inputs.add(name="Input_A", time=None, grid=None, units="m")
This works in the same way for outputs to get metadata from connected inputs.
For more details on metadata exchange, see chapter The Connect Phase.
Grid specification#
Most of the data exchanged through FINAM will be spatio-temporal be their nature. FINAM supports different types of structured grids and unstructured grids/meshes, as well as unstructured point data.
For data that is not on a spatial grid, a placeholder “no-grid” type is provided.
Inputs as well as outputs must specify the grid specification for the data they send and receive, respectively. We provide regridding adapters to transform between different grids or meshes in an automated way.
Coordinate Reference Systems (CRS) conversions are also covered by the regridding adapters.
Available grid types are:
Non-spatial grids#
For data that is not on a spacial grid.
dims
specifies the number of dimensions, like 0 for scalars, 1 for 1D arrays, etc.
Spatial grids#
All spatial grids can have up to 3 dimensions.
RectilinearGrid(axes=[axis_x, axis_y, axis_z])
For rectilinear grids, with uneven spacing along some axes.
UniformGrid(dims=(sx, sy, sz), spacing=(dx, dy, dz), origin=(ox, oy, oz))
For uniform rectangular grids, with even spacing along each axis.
A sub-class of RectilinearGrid
.
EsriGrid(nrows, ncols, cellsize, xllcorner, yllcorner)
For square grids according the ESRI/ASCII grid standard.
A sub-class of UniformGrid
.
UnstructuredGrid(points, cells, celltypes)
For unstructured grids (or meshes), composed of triangles and/or quads in 2D, and tetrahedrons of hexahedrons in 3D.
For unstructured point-associated data that does not require cells.
Class diagram grids#
The following figure shows a diagram of grid classes inheritance hierarchy.
Figure 1: FINAM grids class diagram.
Common grid properties#
CRS: All spatial grid types have a property crs
for the Coordinate Reference Systems.
The property can take any values understood by pyproj
.
In many cases, this will just be an EPSG code, like crs="EPSG:32632"
Order: All structured grids have an order
attribute for being in either Fortran ("F"
) or C ("C"
) order.
Data location: For all spatial grids except UnstructuredPoints
, data can be associated to either cells or points,
given by the data_location
attribute.
Axis names: Grid axes are names according to the axes_names
attribute.
Axis order: Regular grids can have inverted axis order (i.e. zyx instead of xyz),
indicated by the axes_reversed
attribute.
Axis direction: Axis direction can be inverted, like with descending values for the y axis.
This is indicated by the axes_increase
attribute, which is a tuple of boolean values.
Units#
All data in FINAM has units of measurement. The units can, however, be “dimensionless” for no actual units.
Unit conversions along links between components is done automatically,
based on the metadata provided by the receiving inputs.
So if an input was initialized with units="km"
, and data is passed in meters,
the input will internally do the conversion to kilometers.
FINAM uses the pint
library for units handling,
and follows the CF Conventions.
For direct access to pint
units, the central units registry is exposed by UNITS
.
Metadata flow#
For details on how metadata is provided, and how it is passed around during coupling, see chapter The Connect Phase.
Composition metadata#
Besides metadata for data exchange, FINAM provides functionality to access metadata that describes a given Composition
and corresponding simulation.
Users can call Composition.metadata
to retrieve a nested dict of all metadata.
This encompasses general metadata like the simulation time frame, as well as metadata for individual components and adapters and the coupling links.
Component
as well as Adapter
provide default implementations of Component.metadata
and Adapter.metadata
, respectively.
Developers can overwrite these properties to add their own specific metadata. For examples, see the API docs for Component.metadata
and Adapter.metadata
.