Reddit » Site Reliability Engineers
0 FOLLOWERS
Reddit gives you the best of the internet in one place. Get a constantly updating feed of everything about site reliability engineering. A subreddit for Site Reliability Engineers.
Reddit » Site Reliability Engineers
2d ago
submitted by /u/serverlessmom
[visit reddit] [comments ..read more
Reddit » Site Reliability Engineers
2d ago
I'm starting a new role as a Dynatrace SRE and would love some insights from this community. My company uses Dynatrace alongside Zabbix for monitoring, with data visualizations handled through Power BI. I'm keen on learning about any effective procedures, best practices, or tools that could help streamline my responsibilities and optimize our monitoring setup.
Any recommendations on integrating these platforms more effectively? Also, are there specific plugins or extensions for Dynatrace or Zabbix that you find indispensable? Tips on dashboard setups in Power BI that effectively harness data ..read more
Reddit » Site Reliability Engineers
2d ago
My situation:
Several different sources of alert (email from service, cloud monitoring, etc.)
Single target for alerts - Slack. Either by email or webhook
Alerts are being repeated while ongoing
I'd like some recommendations how to aggregate these alerts. I know that I should remove toil or non-actionable alerts, but currently it's hard to categorize them.
My ideal state:
Export the alerts to a centralized DB so I can run analytics
Keep repeated updates to an ongoing alert in a thread - while there is an alert "heartbeat". If there was a gap in alerts start a new thread
We're currently al ..read more
Reddit » Site Reliability Engineers
2d ago
I've got a webinar coming up on how to turn visual regression tests supported by Playwright into monitoring tools with Checkly.
We all know that our site should only change visually at deploy time, but that's not always how it works in the real world. Wouldn't it be nice to get an alert when a 3rd party change or a rogue GTM edit causes something to shift by more than a few pixels? See a demo this Wednesday April 25th at 8AM PST/5PM CET.
Read more here, I'll also use the same page later to share a recording of the webinar.
submitted by /u/serverlessmom
[visit reddit] [comments ..read more
Reddit » Site Reliability Engineers
2d ago
Hello everyone, I'm building an open source framework to automate investigations that any senior engineer can write and automate to make on-call better for their service (and reduce escalations).
We made our repo public recently after working on it basis our past experiences with some early users.
Github link: https://github.com/DrDroidLab/playbooks
Website: https://drdroid.io/
As a lot of us here have spent significant time of work hours troubleshooting, I'd love for community here to try, give feedback and suggestions.
Thanks!
submitted by /u/siddharthnibjiya
[visit reddit] [commen ..read more
Reddit » Site Reliability Engineers
2d ago
There are a bunch of tools/technologies in SRE/DevOps world in different aspects, e.g. public cloud products (AWS, Azure), Monitoring tools (ELK, Prometheus, Datadog). However, every company uses very different tech stacks, e.g. some company uses Azure instead of AWS.
To increase my odds of getting an interview, I always customize my resume in following ways
Collect the technologies mentioned in the job post
Put achievements done using a specify Technology on resume if the company emphasize that Technology.
Change the keywords to fit the job post, e.g. GitLab -> Gitlab if job post says "G ..read more
Reddit » Site Reliability Engineers
5d ago
submitted by /u/serverlessmom
[visit reddit] [comments ..read more
Reddit » Site Reliability Engineers
5d ago
Hello Everyone, I recently had a successful technical interview (DSA) for SRE - Infrastructure (entry-level) position with TikTok. I have been invited for another technical round (linux/networking) next week in the hackerrank platform. Will it primarily be an open-ended conceptual interview, or will it involve practical exercises like DSA coding rounds, focusing on Linux use cases?
For those who have interviewed for SRE position in the past, I would greatly appreciate your input. Could you please share your experiences regarding the interview? What were the types of questions asked, and how d ..read more
Reddit » Site Reliability Engineers
5d ago
We've had a few people who work at vendors approach us about custom user flair to increase transparency with their posts on this subreddit. If you, too, would like to be flaired like this, please contact us via modmail.
submitted by /u/thecal714
[visit reddit] [comments ..read more
Reddit » Site Reliability Engineers
5d ago
i am asked to build a script to validate aerospike configuration changes from scratch. How do I build this. I have written a basic script to parse and check for basic parameters like namespaces etc. But how do I build a script that tracks the dynamic changes of a config file. I'm puzzled
submitted by /u/iamnotshivanandp
[visit reddit] [comments ..read more