Serving Meteo data with GeoServer, GeoBatch and GeoNetwork: the LaMMA use case
in this post I’d like to talk about the work we have done for the LaMMA consortium.
The purpose of this project is to build a complete Spatial Data Infrastructure (SDI) to provide a spatio-temporal raster data processing, publishing, and interactive visualisation facility. This platform is candidate to substitute the current one which was already built leveraging on Open Source software but which was rather static and contained no OGC services.
The data that will be ingested into the system is generated by an existing processing infrastructure which produces a set of different MetOc models. Our goal is to manage the geophysical parameter (or variables) produced by the following models:
- ARW ECM
- 3 Km resolution
- 9 Km resolution
- 50 Km resolution
The ingestion is started every day at noon and midnight, hence there are 2 run-times a day for each model at a certain resolution and the produced data contains different forecast times.
– ARW ECM (3 days with interval of 1h)
– GFS (8 days with interval of 6h)
The data is produced in GriB format (version 1).
Leveraging on the OpenSDI suite and specifically on the following components:
as well as some other well known Open Source project such as (Apache Tomcat, Apache Http server, Postgres) we provided an extensible and standard based platform to automatically ingest and publish data.
The infrastructure we have put together is depicted in the deployment diagram below.
This infrastructure has been designed from the beginning with the goal of being scalable in terms of supporting large number of external users since it is based on a GeoServer Master/Slave infrastructure where multiple slaves can be installed for higher throughput. Caching will be tackled in a successive phase.
As you can see we provided three access level for different type of users:
- Admin can locally access to the entire infrastructure and add instances of GeoServer to the cluster to improve performances
- Poweruser can remotely add files to ingestion and administer GeoBatch via Basic Autentication
- User can look at ingested data accessing one of the GeoServer slave machines via Apache httpd proxy server. The load of these accesses is distributed between all available slaves.
As mentioned above, the main building blocks are as follows:
- GeoServer for providing WMS, WCS and WFS services with support for the TIME and Elevation dimensions
- GeoNetwork, for publishing metadata for all data with specific customizations for managing the TIME dimensions in the dataset
- GeoBatch, to perform preprocessing and ingestion in near real time of data and related metadata with minimal human intervention
Using GeoBatch for ingestion and data preprocessing
|GeoBatch ingestion flow example|
The various building blocks comprising this flow are explained here below:
- NetCDF2GeotiffAction reads the incoming GRIB file and produces a proper set of Geotiff perfoming on-the- fly tiling, pyramiding and unit conversions.Each GeoTiff represent a 2D slice out of one of the original 4D cubes contained in the source GriB file
- ImageMosaicAction uses the GeoServer Manager library to create the ImageMosaic store and layer in the GeoServer Master. The created ImageMosaic contains proper configuration to parse Time and Elevation dimensions’ values from the GeoTiff in order to create 4D layers in GeoServer.
- XstreamAction takes an XML file and deserializes it to a Java object this is passed to the next action.
- FreeMarkerAction produces a proper xml metadata file for publishing in GeoNetwork, using a pre-cooked template and the passed data model.
- GeoNetworkAction published the metadata on the target GeoNetwork
- ReloadAction forces a reload on all the GeoServer slaves in order to pick up the changes done by the master instance
This type of flow, (with a slight different set up) is used to convert and publish the 3 different incoming models.
The other type of flow is the remove flow which is a composed by the following building blocks:
- ScriptingAction executes a remove.groovy script which will:
- calculate the oldest time to retain
- select older files to be removes
- search and remove matching metadata from the GeoNetwork
- remove collected layers and stores from the GeoServer Master catalog
- delete permanently succesfully removed files
Using GeoNetwork for metadata management
We have customized the metadata indexing (thanks Lucene!) in GeoNetwork in order to be able to index meteorological model execution in terms of their run time as well as in term of their forecast times.
Generally speaking the data we are dealing with is driven by a meterological model which produces daily a certain number of geophysical parameters with temporal validity that spans for certain number of time instants (forecast times) in the future. In GeoNetwork we are currently creating a new metadata object for each geophysical parameter (e.g. Temperature) of a new model run; this metadata object contains multiple links to WMS requests for each forecast time, leveraging the TIME dimension in GeoServer (see picture below). Moreover the forecast times themselves are indexed so that advanced searches can be done on them.
The GeoSolutions team,