Blog

Incident management insights, guides, and product updates from Rootly

Search...
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Google’s State of DevOps 2021 Report: What SREs Need to Know

Google’s State of DevOps 2021 Report: What SREs Need to Know

The four key takeaways for SREs from Google’s State of DevOps 2021 report

Quentin Rousseau

Quentin Rousseau

October 1, 2021
5 min read
SRE vs. DevOps: What are the Differences?

SRE vs. DevOps: What are the Differences?

SRE and DevOps are closely related concepts, and many businesses can benefit from embracing both of them. Nonetheless, there are important distinctions between SRE and DevOps.

Mateus Gurgel

Mateus Gurgel

September 19, 2021
4 min read
What is an SRE?

What is an SRE?

A comprehensive definition of SREs and Site Reliability Engineering, including what SREs do and what makes SREs different from other roles.

JJ Tang

JJ Tang

September 9, 2021
5 min read
The Role of SREs in Observability

The Role of SREs in Observability

Although conversation about observability often ignores SREs, SREs have a central role to play in observability success.

Quentin Rousseau

Quentin Rousseau

September 3, 2021
4 min read
You Do the Math: Reliability Issues Triggered by Math Errors

You Do the Math: Reliability Issues Triggered by Math Errors

Even seemingly minor math bugs in software code can have outsize consequences.

Mateus Gurgel

Mateus Gurgel

August 26, 2021
5 min read
Making Your On-call and Incident Management Program Stick

Making Your On-call and Incident Management Program Stick

Maintenance of your incident management practice is as important as creation - find out what you can do to keep your engineering organization strong and consistent year over year.

JJ Tang

JJ Tang

August 20, 2021
5 min read
How to Improve Upon Google’s Four Golden Signals of Monitoring

How to Improve Upon Google’s Four Golden Signals of Monitoring

The Four Golden Signals of monitoring and observability get a lot of things right. But they could be even better.

JJ Tang

JJ Tang

August 13, 2021
5 min read
Incident Management Goes to the Olympics

Incident Management Goes to the Olympics

A look at outages and disruptions to the IT systems that power the Olympics, from 1996 to today.

Quentin Rousseau

Quentin Rousseau

August 5, 2021
5 min read
The Unique Reliability Engineering Requirements of Microservices

The Unique Reliability Engineering Requirements of Microservices

Although the fundamental concepts of site reliability engineering are the same in any environment, SREs must adapt practices to different technologies, like microservices.

JJ Tang

JJ Tang

July 30, 2021
5 min read