Skip to content

File naming convention

OGCR project consistently uses a standard file-naming convention to submit new data-sets (geospatial layers, images, vectors and tabular data) and similar (compare e.g. with the MODIS file naming convention). This is to ensure consistency and ease of use within the OGCR project, but also by the end-users. This applies especially to WP3, WP4, WP5 and WP6. The 12 field file naming convention is listed below.

Table: OGCRS file name has a total of 12 fields.

#ComponentDescriptionExample
1Product Short NameIdentifies the product and platform (e.g. based on the Task code); 8-characters maximum. Product code names are available in the OGCR product registry (under construction). It is recommended that each unique product name should come with detailed technical documentation i.e. list of DOI’s of peer-reviewed articles and/or URLs with links to computational notebooks on Codeberg.org or similar.ogcrt31
2Variable Short nameIdentifies a physical variable (layer within collection); 8-characters maximum. All OGCR variables are registered in the OGCR variable registry (under construction).agbtha
3Variable typeVariable type in terms of statistical distribution and/or aggregation level. See list of possible items below.m
4Spatial resolutionSpatial resolution in metric units or DD e.g. 10m, 30m for metric values and 00025dd which is equal to 0.00025 for decimal degrees (about 30 m spatial resolution).10m
5Vertical referenceCan be above ground (ag), at surface (sg), below ground (bg) and can indicate depth intervals.bg0..30cm
6Begin time (temporal reference)Begin time of the prediction interval/begin time of observations or measurements in Julian format (YYYYMMDDHHMMSS). Should specify at least a year of field sampling (begin time).20250101
7End time (temporal reference)End time of the prediction interval/end time of observations or measurements in Julian format (YYYYMMDDHHMMSS). Should specify at least a year of field sampling (end time).20251231
8Bounding box and tile ID or open location codeUnique code and version determining the bounding box (mosaics and/or tiling ID). For mosaics use the code without ID. For individual farms and arbitrary bounding boxes, best use Open Location Code or similar external standard/coding system.eu01..id0005
9Projections systemThe reference EPSG code.epsg3035
10Collection VersionIndicates the version of the processing collection based on the semantic versioning V<major>.<minor>.<patch>v1.0.1
12Scientific readiness levelScientific readiness level (SRL) based on ESA Scientific Readiness Levels (SRL) Handbook and/or Technology readiness level based on the European Commission regulations (TRL).srl03
12File FormatThe data format of the file, commonly tif for images..tif

This is an example of early pre-beta release of soil organic carbon density predictions (mosaic) for years 2023–2024 (bi-annual) at 30 m spatial resolution for top soil 0–30 cm, that can be used for testing purposes:

ogcrt33_sockgm3_m_30m_bg0..30cm_20230101_20241231_eu01_epsg3035_v1.0.1_srl04.tif

The following variable types are currently supported:

  • m = mean i.e. mean prediction value (applies to sums, differences and similar);
  • d = median value; equivalent to Q050;
  • c = class or factor; requires a domain with codes/levels;
  • cdD = change class (special cross-domain 2D matrix);
  • pc = percent cover of the pixel 0–100% within the pixel;
  • p = probability i.e. values 0–100%;
  • q005 = 5% probability quantile (one side probability);
  • q095 = 95% probability quantile (one side probability);
  • l159 = lower 68% probability threshold (lower prediction interval);
  • u841 = upper 68% probability threshold (upper prediction interval);
  • sse = Shannon Scaled Entropy index;
  • sd = standard deviation or prediction error; for multiple standard deviation use e.g. “SD2”
  • md = model deviation (in the case of ensemble predictions);
  • si = confidence interval range for the prediction of the mean value;
  • td = cumulative difference (usually based on time-series of values);
  • tm = temporal trend (usually beta coefficient fitted to time-series);

The following TRLs are recommended for this project:

  • TRL03 – Experimental proof of concept: early pre-beta project release for testing purposes only;
  • TRL04 – Technology validated in lab: early beta project release validated and attached to scientific publication/can be released publicly with a clear disclaimer (use at own risk);
  • TRL05 – Technology validated in relevant environment (industrially relevant environment in the case of key enabling technologies): beta project release validated and ready for use including for commercial use (use at own risk);
  • TRL06 – Technology demonstrated in relevant environment (industrially relevant environment in the case of key enabling technologies): official project release ready for use including for commercial use under a clear disclaimer;
  • TRL07 – System prototype demonstration in operational environment: official final release ready for use including for commercial use and for decision making and policy support;

The following SRLs are recommended for this project:

Standard Science Readiness Levels used by ESA.

Note that the derivatives from basic products should inherit most of the specifications determined by the input products e.g.:

ogcrt33_bsockgm3_tm_30m_bg0..30cm_20000101_20241231_eu01_epsg3035_v1.0.1_srl04.tif

is the derived trend (beta coefficient) using a time-series of tifs representing changes in SOC across 25 years.

Note that this file naming convention has the following advantages:

  • Large quantities of files can be easily sorted and searched (one line queries in Bash). GDAL operations can be generated by using codes in the file name.
  • File-naming patterns can be used to seamlessly build virtual mosaics and composites.
  • Key spatiotemporal properties of the data are available in the file name e.g. variable type, O&M method, spatial resolution, bounding box, projection system, temporal references. Users can program analysis without opening or testing files.
  • Both tiles, farm plots and complete mosaics can be used.
  • The versioning system is ubiquitous.
  • All file-names are unique.
  • Data set limitations and terms of use are indicated in the file name which is linked to generic disclaimer and terms of use.

A list of vocabularies to be used as abbreviated names of variables will be provided.