To ensure websites and applications deliver consistently excellent speed and availability, some organizations are adopting Google’s Site Reliability Engineering (SRE) model. In this model, a Site Reliability Engineer (SRE) – usually someone with both development and IT Ops experience – institutes clear-cut metrics to determine when a website or application is production-ready from a user performance perspective. This helps reduce friction that often exists between the “dev” and “ops” sides of organizations. More specifically, metrics can eliminate the conflict between developers’ desire to “Ship it!” and operations desire to not be paged when they are on-call. If performance thresholds aren’t met, releases cannot move forward.
Sounds simple and straightforward enough, but you’d be surprised at how challenging the SRE role can be, given basic human psychological tendencies. Our desire to see ourselves and our teams in a positive light, and avoid negative consequences that can result in our subconsciously gaming, distorting, and manipulating metrics.
Read more at SDTimes