The Forgotten Requirement Of Adding Ping Operations to APIs

I have been working with a lot of different solutions during my life as Software Developer and Architect. Most of the time in the middleware domain, and mostly with services of different kind. These services are (hopefully) designed and created according to business requirements, and following business standards.

As soon as these services reach production state, the need for checking operational state arises. In monolithic systems this can easily be covered by a simple "ping service" along with the other services, since there is reason to believe that if one service is available, the application is available. But what happens when we move into a microservice architecture with services running in separate JVMs? Then a separate ping service is of little or no use.

My opinion is that a "ping operation" should be added to every microservice for monitoring reasons, and should be part of the standard for every service you create. "Ping operations" should be really lightweight operations that requires a minimum of processing power/time, simply saying "Yes, the service is alive".

Sometimes there is also need for a more "deep ping" solution. In these cases you are not only monitoring if the service is alive, but also that the service stack is up and running. One simple operation can be to call underlying databases with a simple query like: SELECT 1 FROM DUAL. Another use is to make ping calls to underlying services if also those services provide ping operations.

Doing this will make it possible to verify if the application stack is up and running.

By adding a parameter to the ping operation, covering the depth of checking, will cover both the easy non-intrusive health check of the service, and the more deep check of application status.

Very rarely this kind of functionality is stated in requirements, and by my opinion it is not necessary to have it specified there. It should just be something that we do. This is one of my few exceptions to the YAGNI principle, simply because: "You WILL need it!"

One final tip: Monitoring services through ping operations from operational tools should also measure response times. If a simple ping operation starting to respond slower, that might be an indication of bigger problems arising in the runtime.