Systems are built to be production ready, tolerant of failure, self healing and be able to be monitored.
Adopting the building of more smaller systems, and utilising cloud environments means handling failure is more important than ever. If a system is self healing, it must also raise alerts which provide enough information to investigate the problem.
- Release It will be used as the starting point for what defines Production Ready
- The system should have sufficient monitoring and understand and report on its health and the health of systems that it depends upon in a consistent way.
- Production monitoring should be available to all!
- Teams and business owners will need to work together to identify relevant business and technical KPIs for their product.
- Dashboard and alerting will need to be implemented for the most important KPIs
- Not all risks to production readiness are analysable in advance so (in addition to checking KPIs) exploratory testing should be used to expose new information about software behaviour
- Applications should be built with a diversity of stakeholders in mind. Operability and supportability are important in most contexts, but see [https://en.wikipedia.org/wiki/List_of_system_quality_attributes] for a list of other software 'illities' that may need particular consideration in your context.
- The 1JL Category Service, Elastic search service and apps built on the digital platform have been built with monitoring and self healing in mind using Grafana and Prometheus