Real Time Maritime Picture with GeoServer
In this post we will describe briefly the architecture we put in place and the challenges we had to overcome, to ingest and publish real time ship positions data at scale to allow users to visualize the so called real time maritime picture for world seas and oceans through GeoServer OGC services (WMS, WFS and WPS), finding the correct balance that maximized ingestion performance as well as visualization performance.
The goal of the project is to display for each vessel the latest known position reported in the last 48 hours, enforcing also authorization rights to differentiate information shown based on the permission of the specific users making the visualizing request (see pictures below). The received vessel positions are also required to be enriched with several additional data sets, for example, with fisheries information to improve successive querying capabilities.
Maritime data is produced by a variety of sources (AIS, SAR, VMS, … ) and maritime assets (vessel, ports, navigational aid systems, …) that combined together provide a foundation for informed decision-making applications for activities such as maritime traffic monitoring, search and rescue operations and environmental marine disasters monitoring, just to name a few. The amount of maritime data collected per day is quite significative and is usually provided as a stream of data that needs to be processed, enriched and stored in near-real time. In a typical day (24 hours) we receive between 20 and 10 millions positions, reported by around 300’000 ships, with peaks of activity during daylight. The system was dimensioned to handle up to 2500 positions per second. We had access to a large Oracle Exadata environment for storing positions, and it was a requirement to use it.
As mentioned above, data storage was based on Oracle Exadata (as required by the client) and we also had to integrate with a Kafka streaming platform to take care of the enrichment and ingestion of the data being received. A few custom GeoServer extensions were implemented to handle the authorization complexity, the advanced styling needs and data integration needs of maritime data (see picture below).
The authorization requirements are quite variable, they can be defined based on several criterias, for example, geographical areas, the country that reported the position or the sensor type that reported the position. Authorization rights impact the querying performance of the data but the storing as well. Users with different authorization rights may see a different last position for a ship, hence we can’t just store the last reported position for each ship overriding the existing one, for each ship we need to store the minimum set of the latest reported positions that satisfy the defined authorization rights rules.
The ingestion engine was built on top of Kafka, Kafka Streams and Spring Boot. Initially we relied on Kafka Connect to store the processed positions in Oracle. Unfortunately after a few stress tests it was clear that it was not the right technology for this use case:
- Failing to insert a position stalled the Kafka Connect task in question, it was not possible to configure a retry.
- A rebalancing of the partitions produced a dead-lock in Oracle, hinting that some sessions were not properly closed.
- We didn’t have enough control over the SQL upsert query generated by Kafka Connect or the Java prepared statement binding process.
We end up implementing our own component that took care of storing the processed positions in Oracle:
- The processed positions were stored in an intermediary Kafka topic, this allowed us to implement a safe back-pressure mechanism to avoid overflowing Oracle with inserts during peaks.
- We carefully crafted an SQL upsert query and correspondent Java prepared statements allowing Oracle to optimize batch statements execution, the performance gain during writings was very significant.
- We implemented a configurable retry mechanism with an attempt limit, by default the system will try to upsert a batch five times or during one minute, then it will reject the batch.
- The implemented logging periodically provides some statistics of the processed messages for a configurable interval of time, detailed information of storing failures and runtime information (rebalancing, reconnections,configuration changes, …) about the storing process.
The storage engine (Oracle Exadata) is a critical component of the system: it needs to be able to sustain the writing rates of the ingestion and at the same time the querying rate of GeoServer. Finding the balance that maximizes both writing and reading operations required an end to end monitoring to be put in place and a carefully crafted performance test plan. We come to the conclusion that Oracle spatial indexing was not the best option to handle the required writing and querying load, instead GeoServer was extended to be able to use latitudes and longitudes directly, building the final geometry and translating spatial operations on the fly, allowing us to take advantage of the well polished Oracle B-Tree index to index latitudes and longitudes.
The new version of the system was significantly faster than the previous one (performance testing was an extensive task in this project), all implemented components were designed as distributed systems, hence scalable horizontally.
Visualizing the real time maritime picture
The real time maritime picture can be visualized in different projections, polar projections are often relevant in this use case (see below).
Several styles were implemented, each one providing an unique view of the real time maritime picture (see pictures below). The vessel symbol rotation is based on the vessel course direction. The styling is based on the vessel information, the course of the vessel and all the enrichment data available for the vessel.
It is possible to filter, as well highlight, vessels based on any of the available attributes. Highlighting is particularly challenging in this use case: we need to highlight a significant number of fast moving features, hence we need to highlight and paint them at the same time to avoid any glitch (see picture below).
Ongoing and future work
We are working on improving the rendering of the vessel positions: adding more styling options and making it possible, at high zoom levels, to visualize an approximate shape matching the vessel’s real dimensions.
We are also planning the ingestion of vessels tracks and its visualization through GeoServer, this task will force us to push the boundaries of our system extending its integration with big data technologies, allowing GeoServer to query cloud based long term storages.
We are proud to work in such challenging projects where we actually see our work impacting important use cases and being able to make a difference.
If you are interested in learning about how we can help you achieving your goals with open source products like GeoServer, MapStore, GeoNode and GeoNetwork through our Enterprise Support Services Subscription Services or Professional Training Services offer, feel free to contact us!
The GeoSolutions Team,