Patch release: v7.4.1

15 May 2023

We're pleased to announce the next patch release of sharedo - version 7.4.1 - is in general availability as of this morning, 15th May 2023.

This release is a stability/patch release specifically for the event engine.

Event engine stability enhancements

This patch purely focuses on improving stability of the event/execution engine in some rare edge case scenarios, which could lead to the engine remaining unavailable until manual intervention is taken to restore service.

Role restarts with more reliability

Processing roles are designed to fail fast and restart. When a configurator creates an erroneous plan, such as something with a syntax error, or a missing step etc, the roles are designed to fail and exit and the service manager detects this and restarts the role.

In some scenarios it is possible that the detection of the failed role can be missed due to a race condition within the server's process eventing mechanisms. Where a process is started and the role termination event  is hooked after the process has fully started, the event is not attached and therefore the service manager will not detect the failure of the role.

This has been mitigated by ensuring the events for the process termination are always attached before the process fully starts, improving the reliability of role restarts without the need for manual intervention.

Always restart by default

Further to that, roles can be configured to detect crashes and restart within a given period of time. Originally this was designed to detect a role that constantly failed within a set time window and to keep it shut down as a result to avoid additional processing overhead. If, within that set window, the role crashes and restarts enough times, the service manager will not attempt to restart it.

Given recent improvements in role start up time and processing capacity for these roles, this has been changed to always restart roles regardless of how many times they have shut down and in whatever time window. This is to avoid the scenario where an erroneous user script shuts down a role, and it is restarted so quickly that it exhausts it's retries. EE will now try to start failing roles indefinitely.