Some questions to ask about what silencing alerts means
Wandering Thoughts Blog
by cks
11h ago
A common desired feature for an alert notification system is that you can silence (some) alert notifications for a while. You might silence alerts about things that are under planned maintenance, or do it generally in the dead of night for things that aren't important enough to wake someone. This sounds straightforward but in practice my simple description here is under-specified and raises some questions about how things behave (or should behave). The simplest implementation of silencing alert notifications is for the alerting system to go through all of its normal process for sending notifi ..read more
Visit website
How I would automate monitoring DNS queries in basic Prometheus
Wandering Thoughts Blog
by cks
1d ago
Recently I wrote about the problem of using basic Prometheus to monitor DNS query results, which comes about primarily because the Blackbox exporter requires a configuration stanza (a module) for every DNS query you want to make and doesn't expose any labels for what the query type and name are. In a comment, Mike Kohne asked if I'd considered using a script to generate the various configurations needed for this, where you want to check N DNS queries across M different DNS servers. I hadn't really thought about it and we're unlikely to do it, but here is how I would if we did. The input for t ..read more
Visit website
Options for diverting alerts in Prometheus
Wandering Thoughts Blog
by cks
2d ago
Suppose, not hypothetically, that you have a collection of machines and some machines are less important than others or are of interest only to a particular person. Alerts about normal machines should go to everyone; alerts about the special machines should go elsewhere. There are a number of options to set this up in Prometheus and Alertmanager, so today I want to run down a collection of them for my own future use. First, you have to decide the approach you'll use in Alertmanager. One option is to is to specifically configure an early Alertmanager route that specifically knows the names of ..read more
Visit website
The many possible results of turning an IP address into a 'hostname'
Wandering Thoughts Blog
by cks
4d ago
One of the things that you can do with the DNS is ask it to give you the DNS name for an IP address, in what is called a reverse DNS lookup. A full and careful reverse DNS lookup is more complex than it looks and has more possible results than you might expect. As a result, it's common for system administrators to talk about validated reverse DNS lookups versus plain or unvalidated reverse DNS lookups. If you care about the results of the reverse DNS lookup, you want to validate it, and this validation is where most of the extra results come in to play. (To put the answer first, a validated r ..read more
Visit website
The Linux kernel.task_delayacct sysctl and why you might care about it
Wandering Thoughts Blog
by cks
5d ago
If you run a recent enough version of iotop on a typical Linux system, it may nag at you to the effect of: CONFIG_TASK_DELAY_ACCT and kernel.task_delayacct sysctl not enabled in kernel, cannot determine SWAPIN and IO % You might wonder whether you should turn on this sysctl, how much you care, and why it was defaulted to being disabled in the first place. This sysctl enables (Task) Delay accounting, which tracks things like how long things wait for the CPU or wait for their IO to complete on a per-task basis (which in Linux means 'thread', more or less). General system information will prov ..read more
Visit website
Reading the Linux cpufreq sysfs interface is (deliberately) slow
Wandering Thoughts Blog
by cks
6d ago
The Linux kernel has a CPU frequency (management) system, called cpufreq. As part of this, Linux (on supported hardware) exposes various CPU frequency information under /sys/devices/system/cpu, as covered in Policy Interface in sysfs. Reading these files can provide you with some information about the state of your system's CPUs, especially their current frequency (more or less). This information is considered interesting enough that the Prometheus host agent collects (some) cpufreq information by default. However, there is a little caution, which is that apparently the kernel deliberately sl ..read more
Visit website
Sorting out PIDs, Tgids, and tasks on Linux
Wandering Thoughts Blog
by cks
1w ago
In the beginning, Unix only had processes and processes had process IDs (PIDs), and life was simple. Then people added (kernel-supported) threads, so processes could be multi-threaded. When you add threads, you need to give them some user-visible identifier. There are many options for what this identifier is and how it works (and how threads themselves work inside the kernel). The choice Linux made was that threads were just processes (that shared more than usual with other processes), and so their identifier was a process ID, allocated from the same global space of process IDs as regular ind ..read more
Visit website
Disk write buffering and its interactions with write flushes
Wandering Thoughts Blog
by cks
1w ago
Pretty much every modern system defaults to having data you write to filesystems be buffered by the operating system and only written out asynchronously or when you specially request for it to be flushed to disk, which gives you general questions about how much write buffering you want. Now suppose, not hypothetically, that you're doing write IO that is pretty much always going to be specifically flushed to disk (with fsync() or the equivalent) before the programs doing it consider this write IO 'done'. You might get this situation where you're writing and rewriting mail folders, or where the ..read more
Visit website
Some more notes on Linux's ionice and kernel IO priorities
Wandering Thoughts Blog
by cks
1w ago
In the long ago past, Linux gained some support for block IO priorities, with some limitations that I noticed the first time I looked into this. These days the Linux kernel has support for more IO scheduling and limitations, for example in cgroups v2 and its IO controller. However ionice is still there and now I want to note some more things, since I just looked at ionice again (for reasons outside the scope of this entry). First, ionice and the IO priorities it sets are specifically only for read IO and synchronous write IO, per ioprio_set(2) (this is the underlying system call that ionice u ..read more
Visit website
The problem of using basic Prometheus to monitor DNS query results
Wandering Thoughts Blog
by cks
1w ago
Suppose that you want to make sure that your DNS servers are working correctly, for both your own zones and for outside DNS names that are important to you. If you have your own zones you may also care that outside people can properly resolve them, perhaps both within the organization and genuine outsiders using public DNS servers. The traditional answer to this is the Blackbox exporter, which can send the DNS queries of your choice to the DNS servers of your choice and validate the result. Well, more or less. What you specifically do with the Blackbox exporter is that you configure some modu ..read more
Visit website

Follow Wandering Thoughts Blog on FeedSpot

Continue with Google
Continue with Apple
OR