Rules and Guidelines for the ITER Physics Data Model¶
Documentation maintained by F. Imbeaux (CEA)
Preamble¶
In order to interpret this page without reference to other IMAS Data Model documents, a number of notions and definitions are required, which we reproduce here. These form part of the Rules and Guidelines.
The organisation of ITER physics data model comprises two principal components:
The Data Dictionary, defining the structuring and naming of data which are being moved between analysis components or being stored or recovered
The mapping of the Data Dictionary onto storage of the data;
This page only discusses the structuring of data in the Data Dictionary.
The Data Dictionary Rules and Guidelines have been designed in order to fulfill the following aims:
Satisfy two very different Use Cases: Integrated Modelling workflows and hands-on data browsing by the physicist
Have identical data structures for experimental and simulated data
The ITER physics data are structured as trees to allow re-use of names, reference to sub-trees at any level of nesting, and targeted data recovery.
Within the context of a tree structure, we define the following:
A node is any element of a tree
A leaf is a node which is an end-point of a tree
A parent is one level above a particular node
A sibling is a node at the same level as a given node
A child is one level below a particular node
We assume that a powerful feature of the Data Dictionary will be the automated definition of the data structures for all supported languages
We have avoided, where reasonable, notions which stem from a particular technology; however this choice might lead to tortuous implementation once a technology is selected; the strongest case of this is the avoidance of “properties” of which the implementation varies considerably between XML, HDF5, MDSplus and so on
We have avoided choices which have been found to create problems in particular languages; certain names have to be avoided
An Interface Data Structure (IDS) is an entry point of the Data Dictionary that can be used as a single entity to be used by a user; examples are the full description of a tokamak subsystem (diagnostic, heating system, …) or an abstract physical concept (equilibrium, set of core plasma profiles, wave propagation, …); this concept allows tracing of data provenance and allows simple transfer of large numbers of variables between loosely or tightly coupled applications; the IDS thereby define standardized interface points between IMAS physics components.
An IDS is a part of the Data Dictionary, like an entry point into it, thus the IMAS components are interfaced with the same structures as those constituting the Data Dictionary. An IDS is marked by having a child
ids_properties
node, containing traceability and self-description information. Nested IDS can be foreseen but should have a clear usefulness for interfacing components in a workflow.We make a distinction between categories of data according to their time-variation;
constant
data are data which are not varying within the context of the data being referred to (e.g. pulse, simulation, calculation);static
data are likely to be constant over a wider range (e.g. nominal coil positions during operation);dynamic
data are those which vary in time within the context of the data.An IDS may contain quantities with different timebases, essentially to have the ability to describe experimental data as it is acquired in the experiment. However, an IDS can also be filled in a synchronous way (i.e. all time-dependent quantities are stored on a unique timebase) and declared so, since this will likely be a frequent usage in IMAS workflows.
The quantities describing the N coordinates of an N-dimensional array are called the coordinates of the array.
Rules and guidelines¶
This section presents the current Rules and Guidelines. They have been revisited according to the evolution of the thinking on the ITER Physics Data Model. These Rules and Guidelines are structured by topics (Naming Conventions, Reserved node names, Structuring Conventions, Documentation Conventations, Self-Description Conventions, Technical Constraints). The first four topics are of interest to both the Data Model designer and the XML developer who will implement them. The last two topics are of interest only for the XML developer of the Data Model, since they do not impact the Data Model design.
Naming Conventions¶
ID |
Rule |
Motivation |
Date of last modification |
---|---|---|---|
R1.1 |
Node names shall be composed of |
Avoidance of characters which might generate language or parsing difficulties; Readability. |
28 May 2013 |
R1.2 |
Node names shall only begin with |
Avoidance of conflict in some languages; assistance to the interpreter to separate variables from numbers. |
29 March 2012 |
R1.3 |
Names shall be semantically meaningful and not depend on familiarity with a specific implementation; the use of acronyms, abbreviations, prefixes and suffixes shall be restricted to a uniform and recognized set defined in this document, sections Recognized Acronyms and Recognized Abbreviations, prefixes and suffixes. |
This is mandatory given the international nature of ITER; Acronyms and abbreviations vary considerably between institutions. |
10 May 2013 |
R1.4 |
Forbidden names defined in this document (section Forbidden names) shall not be used for Data Dictionary nodes. |
Reserved names create problems when generating code or declarative statements in some programming languages. |
29 March 2012 |
R1.5 |
Naming of the Data Dictionary shall be lower case, with underscores used for semantic separation for human reading clarity. The only exception being the namings of the units (Wb, eV, A, …). |
Avoidance of confusion and allows straightforward usage in
case-sensitive languages. We recommend the names of
routines/modules/tools related to the datamodel to be lower case as
well, to ease maintenance and usage ( |
10 May 2013 |
R1.6 |
Node names shall not repeat the context of their parent identities where this would be redundant, unless it allows avoiding a conflict with a forbidden name (see R1.4). |
Provides clarity and brevity |
29 March 2012 |
R1.7 |
Qualifiers should be suffixes, not prefixes (e.g. |
Qualified names will appear grouped when sorted. This rule must be applied in a way that facilitates finding a quantity in the data dictionary. |
29 March 2012 |
R1.8 |
If a node is an array of structures, its name shall be singular. |
This aids clarity. Plurals should only be used if node is a leaf or a structure describing multiple instances. |
5 June 2013 |
R1.9 |
Nodes storing the same data but in different IDSs shall have the same name. |
Homogeneity of the data dictionary. |
2 October 2016 |
ID |
Guideline |
Motivation |
Date of last modification |
---|---|---|---|
G1.1 |
Long clear names are strongly preferred to short ambiguous or unclear names. |
The cost of confusion is far higher than the cost of a few characters. |
29 March 2012 |
G1.2 |
Time-dependent additive corrections to static data must be named with a
|
For some applications, a higher precision for static data is needed which requires applying corrections. This applies to all geometrical data. ITER changes size slightly during the burn and more during baking etc. |
19 August 2017 |
Reserved node names¶
The following node names are reserved for a specific usage as defined below.
ID |
Rule |
Motivation |
Date of last modification |
---|---|---|---|
R2.1 |
As a consequence, different timebases cannot be placed at the same level in the tree structure. A timebase is dynamic and has a coordinate “1…N”. |
User needs to find unambiguously the time vector relevant to a time-dependent quantity. |
29 August 2013 |
R2.2 |
|
Self-description of an instance of an IDS. |
10 May 2013 |
R2.3 |
|
Traceability of the code and its parameters that has produced an IDS |
3 July 2013 |
R2.4 |
|
Homogeneity of the data dictionary. |
3 July 2013 |
Structuring Conventions¶
ID |
Rule |
Motivation |
Date of last modification |
---|---|---|---|
R3.1 |
Each IDS node must have an Each IDS node must have a |
|
29 August 2013 |
R3.2 |
Nodes have either children or data, not both. |
To have data-free nodes and data only in leaves. No need to have more complex options. |
29 March 2012 |
R3.3 |
Arrays of structures shall be used to group quantities that describe the same object/concept but possibly of different sizes. |
Arrays of structures allow the Data Dictionary to be flexible enough and avoid the creation of large sparse arrays. |
28 May 2013 |
R3.4 |
The coordinates of a quantity must exist in the same IDS as this quantity. |
Guarantee a consistent link between a quantity and its coordinates which is available when an IDS is used on its own. In case of nested IDS, the coordinates must be at least in the lowest level IDS. |
30 May 2013 |
R3.5 |
A child node cannot have the same name as its parent. |
A child node with the same name as its parent is confusing, moreover it is not possible to declare such a structure in Java. |
29 March 2012 |
R3.6 |
For time-dependent quantities, the time index shall be the last index of the array. |
Contributes to homogeneity in the data model. NB: the CODAC convention on this has not been decided. |
Guideline moved as a Rule. 4 June 2013 |
R3.7 |
There should not be explicit nodes for indicating the size of a data item. However, in the expectedly rare case of an oversized array, an explicit node is required to document the rank of useful information for a given index. |
This information is part of the metadata and can be retrieved from e.g. a PUAL “shape_of” instruction (or alternatively, a “get” instruction of the PUAL automatically allocates the returned variables to the correct size). |
Guideline moved as a Rule. 4 June 2013 |
R3.8 |
Use generic sub-structures for data of the same nature when available. If not available, they must be created. |
Contributes to homogeneity in the data model. |
Guideline moved as a Rule. 3 July 2013 |
R3.9 |
Physical quantities that require only quantities from a single IDS to be computed must belong to this IDS. |
Aid provenance traceability by grouping consistent quantities in the same IDS. Example: the plasma current as estimated by the magnetics belongs to
the |
20 September 2013 |
ID |
Guideline |
Motivation |
Date of last modification |
---|---|---|---|
G3.1 |
Data model structures should be designed from the usage point of view. |
It is easy to create an apparently logical structure which becomes unwieldy during use. |
30 May 2013 |
G3.2 |
Avoid as much as possible any ITER-specific definitions or features in the data model. |
Maximise maintainability, generality and durability and allow using IMAS for other experiments. |
30 May 2013 |
G3.3 |
Group quantities depending on the same coordinates at the same level. |
Clarity. |
4 June 2013 |
G3.4 |
The coordinates of a quantity should be siblings of the highest level nodes using these coordinates. However, if the coordinate is the index of an array of structures, the coordinate should be an immediate child of the array of structures. |
Group quantities and their coordinates for clarity. An array of structures usually describes an object and it is logical to make the coordinate a property of the object. Note: the homogeneous timebase of an IDS is placed within
|
4 June 2013 |
G3.5 |
When multiple quantities have a common coordinate, define a single node for this common coordinate. Define multiple coordinate nodes otherwise. |
Reduces complexity and enhance access performance. |
4 June 2013 |
G3.6 |
When multiple quantities have a common coordinate, choose on a case by case basis the most suitable structuring.
Array of structures¶
Dimensional leaf structure¶
|
Rule 3.3 defines a case where using an array of structures is mandatory. When the leaves are commonly used separately and n is large, the latter structure is a better choice for performance, since it allows separate access to a given leaf and avoids having to retrieve a large size object with all leaves beneath. |
4 June 2013 |
G3.7 |
When there are multiple methods for generating a set of physical quantities within an IDS (typically for the processing of measurements): Use different IDS occurences when most of the IDS quantities depend on the generation method. Group the generated quantities under a |
Example: the core_profile IDS contains only data generated by a given processing method (e.g. profile fitting): multiple generations of a set of core_profiles are stored using multiple IDS occurrences. Avoids replicating common quantities in different places. |
20 September 2013 |
Documentation Conventions¶
ID |
Rule |
Motivation |
Date of last modification |
---|---|---|---|
R4.1 |
The documentation field of the data dictionary shall contain a complete, self-contained, English language description of the data item content, avoiding jargon and unofficial abbreviations. |
Self-description of the Data Dictionary. The data model should not be separated from its documentations source. |
30 May 2013 |
R4.2 |
The documentation field of the data dictionary shall not duplicate the information contained in some other field of the data dictionary. |
Avoid duplication of information and risks of errors. |
4 June 2013 |
Self-description Conventions¶
In this section, the XML syntax indications are provided for the persons in charge of coding the Data Dictionary directly in XML. It has been shown that this is not a requirement for Data Dictionary contributors, who can also develop the Data Dictionary in other formats, e.g. an Excel spreadsheet. Nonetheless the information listed below has to be provided by the Data Dictionary contributor to be then implemented by the XML developer.
ID |
Rule |
Motivation |
Date of last modification |
---|---|---|---|
R5.1 |
The data type of an element must be coded by including an XML Group of the corresponding type to the element. The syntax is:
|
Self-description of the Data Dictionary to allow the creation of the structures in declarative languages. A list of the existing data types is provided section List of the existing data types. |
10 May 2013 |
R5.2 |
The static/constant/dynamic character of a data item must be coded in
the Data Dictionary under the
|
Self-description of the Data Dictionary. |
10 May 2013 |
R5.3 |
Float and Complex data items shall have their units defined. If the
quantity is dimensionless, the units shall be If the quantity is implemented via a generic structure with its
definition and units given by its parent node, its units shall be
The units of a quantity shall be self-described in the Data Dictionary
under the XML The XML syntax for units is The units use the standard names with both lower and upper cases for clarity. The “/” operator shall not be used in the units, always use “.” operator and a negative exponent for units at the denominator. Exponents greater than 1 are indicated with the “^” character. |
Self-description of the Data Dictionary; conformity with ITER Project Requirements. Example: the mhd_linear IDS describes a perturbed vector quantity as a parent node having the definition and units of the quantity with three coordinates as children (e.g. a_perturbed/coordinate1). The coordinates have units “as_parent”. Style convention. Example: |
2 October 2023 |
R5.4 |
The coordinate properties of a quantity shall be self-described in the
Data Dictionary under the The syntax of the coordinate list is The Path to the element shall use UNIX syntax. A coordinate which is simply a set of indices is marked as If the coordinate can be different elements, it is noted as:
In the exceptional case of a quantity with a coordinate residing in
another IDS the coordinate must be specified as
|
Self-description of the Data Dictionary. |
22 July 2015 |
Limitations of the present implementation¶
These limitations on the Data Dictionary structure arise from the present implementation and do not represent Rules or Guidelines.
ID |
Limitation |
Comment |
---|---|---|
L1 |
Nested IDS are not implemented yet. |
The case of nested IDS has not been implemented yet but in principle could be implemented with no major difficulty. |
L2 |
The timebase of a node must be located within the same array of structure as the node, or be reachable from the root of the IDS without going through an array of structure. This limitation doesn’t exist within a dynamic array of structure, in which all nodes share the same time base. |
The New Low Level (2018) logic is based on “contexts” which start at the level of the nearest array of structure ancestor. Note that this limitation is much lighter than the one which existed with the previous implementation of the Access Layer (forcing the systematic use of data/time structures). |
L3 |
The “series of bytes” datatypes (see section List of the existing data types) are defined in the Data Dictionary but are not implemented in the Access Layer. |
The Access Layer shall be extended to handle these data types. |
Remaining issues¶
ID |
Issue |
Motivation |
---|---|---|
I1 |
Design a mechanism for storing expressions instead of values and an expression evaluator. |
Very important functionality. Saves storage and bandwidth. Would allow changing and facilitating the timebase implementation. |
I2 |
Implement a referencing system for the static data. |
This avoids copying the static data. |
I3 |
Implement an optional “topic” metadata. A new documentation Rule will be needed for its usage. |
Allow searching for all quantities belonging to a given topic (e.g. electron temperature). |
I4 |
Implement a way of documenting the method to be used to interpolate a quantity. |
Document how to use data. |
I5 |
Discuss the precision desired for the data types and how to implement them in the various programming languages. |
Make assumptions on precision explicit. |
Recognized Acronyms¶
The project acronym base list will be provided by IO.
The following acronyms have been used outside the basis list of project acronyms, and are proposed for adoption by IO POP.
Acronym |
Definition and comment |
---|---|
API |
Application Programming Interface |
CBS |
CODAC Breakdown Structure, the division by CODAC of their equipment, using an EPICS conforming non-semantic naming convention |
DD |
Data Dictionary |
DM |
Data Model |
IDS |
Interface Data Structure. Defines the point at which a node and its children can be used in a workflow. |
PAPI |
Physics Application Programmer Interface |
PF |
Poloidal Field, as in Poloidal Field system, includes all coordinateymmetric components, such as PF Coils, CS coils, VS coils. |
PUAL |
Physics User Access Layer, unique access to the the Physics Data Model |
Recognized Abbreviations, prefixes and suffixes¶
Abbrevations¶
Abbreviation |
Example |
Definition and comment |
---|---|---|
|
|
Derivative of y with respect to quantity x. In this context only, t refers to time. In the definition text of the node, first derative is assumed unless explicitly stated. If the node is a structure (parent of other nodes), containing as
children the derivatives of various quantities, y is omitted (e.g.
|
|
|
Derivative of y with respect to quantity x at constant z. In this context only, t refers to time. |
|
|
Second order derivative of y with respect to quantity x. In this context only, t refers to time. |
|
Electric field |
|
|
Magnetic field |
|
|
Electromagnetic vector potential |
|
|
Safety Factor |
|
|
Effective charge of the plasma |
|
|
Plasma current |
|
|
Plasma Internal Inductance |
|
|
Major radius (see exact definition in ITER_D_2F5MKL) |
|
|
Height in the machine coordinates (see exact definition in ITER_D_2F5MKL) |
|
|
When used in conjunction with Toroidal flux otherwise |
|
|
Poloidal angle |
|
|
Electrostatic potential |
|
|
Poloidal flux |
|
|
Electromagnetic super potential related to an MHD mode, see ref
[Antonsen/Lane Phys Fluids 23(6) 1980, formula 34], so that
|
|
|
Coordinates of N-Dimensional grids (from leftmost to rightmost) |
|
|
Poloidal Field |
|
|
Toroidal Field |
|
|
radian, used as units |
|
|
UTC time, a string to give absolute time |
|
|
Electron cyclotron (heating and current drive) |
|
|
Electron cyclotron emission |
|
|
Ion cyclotron (heating and current drive) |
|
|
Lower hybrid (heating and current drive) |
|
|
Poloidal mode number |
|
|
Toroidal mode number |
|
|
Magnetohydrodynamic |
|
|
Neoclassical Tearing Mode |
|
|
Neoclassical toroidal viscosity |
|
|
Neutral beam injection |
|
|
Atomic mass |
|
|
Nuclear charge |
|
|
Related to the vector product between electric and magnetic fields (\(E\times B\)), can be used between underscores at any place in the node name. |
|
|
General grid description |
|
|
|
Wave vector |
|
|
Wave refractive index |
|
|
Energy confinement time enhancement factor (with respect to a scaling expression), or related to H-mode. |
|
|
Atomic, molecular, nuclear, and surface related data |
|
|
Resonant Magnetic Perturbations |
|
|
Analogic-Digital Converter |
|
|
Motional Stark Effect |
|
bes structure in the |
Beam Emission Spectroscopy |
|
|
Infrared |
|
|
Gyroaveraged |
|
|
Technical limit of a system. Always used with the suffix |
|
|
Plasma Control System |
|
|
Left term divided by right term |
|
|
Left term multiplied by right term |
|
|
Fiber Optic Current Sensor (diagnostic) |
Prefixes¶
Prefix |
Example |
Definition and comment |
---|---|---|
|
Electron temperature |
|
|
Ion temperature |
|
|
Electron density |
|
|
Ion density |
|
|
Characteristic time |
|
|
Current density, authorized only as a prefix |
|
|
Ion current density, authorized only as a prefix |
|
|
Voltage or electric potential, authorized only as a prefix |
|
|
Electromagnetic |
|
|
In the |
Additive correction to static data (see G1.2) or more generally a quantity defined relatively to another one. The name of the original quantity is used after the prefix |
Suffixes¶
Suffix |
Example |
Definition and comment |
---|---|---|
|
|
Minimum value of, not to be confused with minute, which is not a standard IM unit |
|
|
Maximum value of |
|
|
Sign of |
|
|
Number of …, requires underscore. |
|
|
|
|
|
Poloidal |
|
|
Normalised |
|
|
1-dimension, 2-dimensions, … |
|
High field side |
|
|
Low field side |
|
|
|
Parallel component with respect to the local magnetic field |
|
|
Perpendicular component with respect to the local magnetic field |
|
|
Denotes an integer quantity equivalent to a boolean with the following convention: .FALSE. = 0 and .TRUE. = 1. Boolean types don’t exist in all IMAS HLI languages and are thus not allowed in the Data Dictionary |
|
|
Standard deviation of a quantity |
Forbidden names¶
A list of known forbidden names is given below but it is not exhaustive: any name that creates potential conflicts with programming languages used by the IMAS is forbidden as a node name.
Forbidden name |
Motivation |
---|---|
|
Language conflict |
|
Language conflict |
|
Language conflict |
All single character names, with the exceptions listed in section 6.1 |
Clarity of the naming |
A more exhaustive list of forbidden names can be found in reserved_names.txt.
List of the existing data types¶
The following data types are available for data nodes (as opposed to parent nodes which have children, see rule R3.2):
Data type |
Definition |
---|---|
|
Integer and arrays of integers |
|
Real and arrays of reals |
|
String and arrays of strings |
|
Complex number and arrays of complex numbers. |
|
Series of bytes. Note: not implemented in the Access Layer. |