Data and metadata#

This chapter explains data and metadata in FINAM.

Data arrays#

Internally, all data is passed as numpy.ndarray (or numpy.ma.MaskedArray, see Masked arrays), wrapped in pint.Quantity. In addition, a time axis with a single entry is added at index 0.

Data can be pushed to outputs as any type that can be wrapped in numpy.ndarray. This includes lists and scalar values. Wrapping, adding time axis and units are performed internally, based on the available metadata (see section metadata).

Inputs always receive data in the wrapped numpy.ndarray form, with units and time axis.

Several tool functions are provided in data to convert to and from the internal data structure:

full(value, info) Creates a new data array with units according to the given info, filled with given value.
full_like(data, value) Creates a new data array with the same shape, type and units as a given object.
prepare(data, info, time) Wraps data, checks or adds time axis and units based on info (see The Info object). Performs a metadata checks.
strip_time(xdata, grid) Squeezes away the time axis if there is a single entry only, and raises an error otherwise.
get_magnitude(xdata) Extracts data without units. Returns a numpy.ndarray array without units, but with time axis preserved.
get_units(xdata) Gets the pint.Unit units of the data
get_dimensionality(xdata) Gets the pint dimensionality of the data (like length, mass, …)
has_time_axis(xdata, grid) Checks if the data has a time axis

Masked arrays#

FINAM uses numpy.ma.MaskedArray inside pint.Quantity to represent masked data. Masked data does not require any special treatment and can be used like usual numpy arrays.

Convenience functions for masked arrays are:

is_masked_array to check if the given data is a masked array
has_masked_values to check if the given data is a masked array and has some values masked
filled to create a copy of the data with masked entries filled with a given value, if it is a masked array

Warning

Due to a numpy bug, quantities should not be created from masked data using multiplication syntax (i.e. magnitude * units). Instead, use method Quantity of UNITS.

data = finam.UNITS.Quantity(magnitude, "m")

(See issues pint#633, numpy#15200)

Metadata#

In FINAM, all data is associated with metadata.

Inputs and outputs of components specify the metadata describing the data they send or receive. Internally, this is used for consistency checks, and for automated data transformations.

FINAM metadata follows the CF Conventions.

There are two types of mandatory metadata:

Grid specification
Units (missing units are assumed as dimensionless)

Metadata is passed around as objects of type Info:

The `Info` object#

Objects of type Info represent the metadata associated with an input or output. It has the following properties:

grid - for the Grid specification
meta - a dict for all other metadata

For convenience, entries in meta can be used like normal member variables:

info = Info(
    time=datetime(2000, 1, 1),
    grid=NoGrid(),
    units="m",
    foo="bar"
)

print(info.units)
print(info.foo)

m
bar

When creating inputs or outputs in components, the Info object does not need to be constructed explicitly. In component code, these two lines are equivalent:

time = datetime(2000, 1, 1)
self.inputs.add(name="A", time=time, grid=NoGrid(), units="m")
self.inputs.add(name="B", info=Info(time=time, grid=NoGrid(), units="m"))

Metadata from source or target#

Any Info attributes initialized with None will be filled from the metadata on the other end of the coupling link. E.g. if the grid specification of an input is intended to be taken from the connected output, the input can be initialized like this:

self.inputs.add(name="Input_A", time=None, grid=None, units="m")

This works in the same way for outputs to get metadata from connected inputs.

For more details on metadata exchange, see chapter The Connect Phase.

Grid specification#

Most of the data exchanged through FINAM will be spatio-temporal be their nature. FINAM supports different types of structured grids and unstructured grids/meshes, as well as unstructured point data.

For data that is not on a spatial grid, a placeholder “no-grid” type is provided.

Inputs as well as outputs must specify the grid specification for the data they send and receive, respectively. We provide regridding adapters to transform between different grids or meshes in an automated way.

Coordinate Reference Systems (CRS) conversions are also covered by the regridding adapters.

Available grid types are:

Non-spatial grids#

NoGrid(dims)

For data that is not on a spacial grid. dims specifies the number of dimensions, like 0 for scalars, 1 for 1D arrays, etc.

Spatial grids#

All spatial grids can have up to 3 dimensions.

RectilinearGrid(axes=[axis_x, axis_y, axis_z])

For rectilinear grids, with uneven spacing along some axes.

UniformGrid(dims=(sx, sy, sz), spacing=(dx, dy, dz), origin=(ox, oy, oz))

For uniform rectangular grids, with even spacing along each axis. A sub-class of RectilinearGrid.

EsriGrid(nrows, ncols, cellsize, xllcorner, yllcorner)

For square grids according the ESRI/ASCII grid standard. A sub-class of UniformGrid.

UnstructuredGrid(points, cells, celltypes)

For unstructured grids (or meshes), composed of triangles and/or quads in 2D, and tetrahedrons of hexahedrons in 3D.

UnstructuredPoints(points)

For unstructured point-associated data that does not require cells.

Class diagram grids#

The following figure shows a diagram of grid classes inheritance hierarchy.

Figure 1: FINAM grids class diagram.

Common grid properties#

CRS: All spatial grid types have a property crs for the Coordinate Reference Systems. The property can take any values understood by pyproj. In many cases, this will just be an EPSG code, like crs="EPSG:32632"

Order: All structured grids have an order attribute for being in either Fortran ("F") or C ("C") order.

Data location: For all spatial grids except UnstructuredPoints, data can be associated to either cells or points, given by the data_location attribute.

Axis names: Grid axes are names according to the axes_names attribute.

Axis order: Regular grids can have inverted axis order (i.e. zyx instead of xyz), indicated by the axes_reversed attribute.

Axis direction: Axis direction can be inverted, like with descending values for the y axis. This is indicated by the axes_increase attribute, which is a tuple of boolean values.

Units#

All data in FINAM has units of measurement. The units can, however, be “dimensionless” for no actual units.

Unit conversions along links between components is done automatically, based on the metadata provided by the receiving inputs. So if an input was initialized with units="km", and data is passed in meters, the input will internally do the conversion to kilometers.

FINAM uses the pint library for units handling, and follows the CF Conventions.

For direct access to pint units, the central units registry is exposed by UNITS.

Metadata flow#

For details on how metadata is provided, and how it is passed around during coupling, see chapter The Connect Phase.

Composition metadata#

Besides metadata for data exchange, FINAM provides functionality to access metadata that describes a given Composition and corresponding simulation. Users can call Composition.metadata to retrieve a nested dict of all metadata. This encompasses general metadata like the simulation time frame, as well as metadata for individual components and adapters and the coupling links.

Component as well as Adapter provide default implementations of Component.metadata and Adapter.metadata, respectively. Developers can overwrite these properties to add their own specific metadata. For examples, see the API docs for Component.metadata and Adapter.metadata.