![SRE WEEKLY](https://i1.feedspot.com/4994470.jpg?t=1622103480)
SRE WEEKLY
406 FOLLOWERS
SRE Weekly is a newsletter devoted to everything related to keeping a site or service available as consistently as possible. It's about a holistic view of reliability that takes into account everything from servers to human factors to processes to automation and more.
SRE WEEKLY
4d ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.
https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/
Technical Details: Falcon Update for Windows Hosts
The big news this week, of course, is the CrowdStrike-related series of outages in airports, banks, and many other businesses. Here’s their statement on the situation ..read more
SRE WEEKLY
1w ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.
https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/
5 Non-Technical Skills Every Site Reliability Engineer Should Master
This article covers five skills:
Ability to Lead
Taking Charge in Critical Situations
Expressing Opinions in a Non-Conflicting Way
Leading Initiati ..read more
SRE WEEKLY
2w ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.
https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/
Investigating Mysterious Kafka Broker I/O When Using Confluent Tiered Storage
In this debugging story, an engineer wielded SystemTap to figure out why a Kafka broker was doing a ridiculous amount of reads.
  ..read more
SRE WEEKLY
3w ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.
https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/
Cloudflare incident on June 20, 2024
This is a really thorny one. As individual subprocesses started infinitely looping, their system shifted load to other datacenters, masking the problem. A coinciding failure in the ..read more
SRE WEEKLY
1M ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.
https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/
r/sre: Senior SRE looking for a resume review, out of work for 7+ months now and still struggling to get interviews
Lots of great tips in the comments if you’re looking to tune your resume.
u/goodolbluey an ..read more
SRE WEEKLY
1M ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.
https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/
Virtualizing Our Storage Engine
Time to get down into the bits and bytes of how Honeycomb queries work with this look into a recent optimization in their data storage layer.
Hazel Edmands — Honeycomb
  ..read more
SRE WEEKLY
1M ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.
https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/
The Reverse Red Herring
This article presents in incident theme that I’ve lived through many times but never had such a pithy name for.
Geoff Townsend — Blameless
Centralisation and distribution: When on ..read more
SRE WEEKLY
2M ago
View on sreweekly.com
Got any burning questions to ask an experienced SRE? I’m gathering your questions in this google form, and I’d love to hear from you. I’m hoping to use your questions to help inspire authors looking to write more great SRE-related content.
A message from our sponsor, FireHydrant:
FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates.
https://firehydrant.com/blog/ai-for-incident-management-is-here/
The Rule of 5 Errors
If your overall request volume is low, s ..read more
SRE WEEKLY
2M ago
View on sreweekly.com
A message from our sponsor, FireHydrant:
FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates.
https://firehydrant.com/blog/ai-for-incident-management-is-here/
My Availability Investment Playbook
Here’s an ultra-practical guide to pushing for reliability investments at your company, formatted as a runbook with a set of specific steps.
Ross Brodbeck
MemoryDB: Speed, Durability, and Composition.
A neat dive into how Amazon’s MemoryDB composes ..read more
SRE WEEKLY
2M ago
A message from our sponsor, FireHydrant:
FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https://firehydrant.com/blog/ai-for-incident-management-is-here/
How to Fight Alert Fatigue with Synthetic Monitoring
This one’s full of great advice about making sure alerts are actionable, including alerting on flows that actually matter to customers.
Nočnica Mellifera — Checkly
What playing Magic: the Gathering taught me about incidents.
Here are a collection of thi ..read more