Blog

Taming the SNMP Beast: Custom Monitoring with Zabbix Discovery

Taming the SNMP Beast: Custom Monitoring with Zabbix Discovery

Good morning everyone, and welcome back! It’s Dimitri Bellini here on Quadrata, your channel for open-source tech insights. Today, we’re diving back into the world of Zabbix, specifically tackling what many consider a ‘black beast’: SNMP monitoring.

SNMP (Simple Network Management Protocol) is incredibly common for monitoring network devices like routers and switches (think Mikrotik, Cisco), but it’s also used for applications and servers running SNMP daemons. Zabbix comes packed with pre-built templates for many SNMP devices, which makes life easy – apply the template, set a few parameters, and data starts flowing. But what happens when you have a device with no template, or you only have the manufacturer’s MIB file? That’s when things get trickier, and you need to build your monitoring from scratch.

In this post, I want to share my approach to creating new SNMP monitoring templates or discovering SNMP data when starting from zero. Let’s demystify this process together!

Understanding the SNMP Basics: MIBs and OIDs

Before jumping into Zabbix, we need to understand what we *can* monitor. This information lives in MIB (Management Information Base) files provided by the device vendor. These files define the structure of manageable data using OIDs (Object Identifiers) – unique numerical addresses for specific metrics or pieces of information.

Your Essential Toolkit: MIB Browser and snmpwalk

To explore these MIBs and test SNMP communication, I rely on a couple of key tools:

  • MIB Browser: I often use the iReasoning MIB Browser. It’s free, multi-platform (Java-based), and lets you load MIB files visually. You can navigate the OID tree, see descriptions, data types, and even potential values (which helps later with Zabbix Value Maps). For example, you can find the OID for interface operational status (ifOperStatus) and see that ‘1’ means ‘up’, ‘2’ means ‘down’, etc.
  • snmpwalk: This command-line utility (part of standard SNMP tools on Linux) lets you query a device directly. It’s crucial for verifying that the device responds and seeing the actual data returned for a specific OID.

Finding Your Way with OIDs

Let’s say we want to monitor network interfaces on a device (like the pfSense appliance I use in the video). Using the MIB browser, we find the OID for interface descriptions, often IF-MIB::ifDescr. We can then test this with snmpwalk:

snmpwalk -v2c -c public 192.168.1.1 IF-MIB::ifDescr

(Replace public with your device’s SNMP community string and 192.168.1.1 with its IP address. We’re using SNMP v2c here for simplicity, though v3 offers better security).

This command might return something like:


IF-MIB::ifDescr.1 = STRING: enc0

IF-MIB::ifDescr.2 = STRING: ovpns1

...

Sometimes, especially when Zabbix might not have the MIB loaded, it’s easier to work with the full numerical OID. Use the -On flag:

snmpwalk -v2c -c public -On 192.168.1.1 1.3.6.1.2.1.2.2.1.2

This will output the full numerical OIDs, like .1.3.6.1.2.1.2.2.1.2.1, .1.3.6.1.2.1.2.2.1.2.2, etc.

The Power of SNMP Indexes

Notice the numbers at the end of the OIDs (.1, .2)? These are **indexes**. SNMP often organizes data in tables. Think of it like a spreadsheet: each row represents an instance (like a specific interface or disk), identified by its index. Different columns represent different metrics (like description, status, speed, octets in/out) for that instance.

So, ifDescr.1 is the description for interface index 1, and ifOperStatus.1 (OID: .1.3.6.1.2.1.2.2.1.8.1) would be the operational status for that *same* interface index 1. This index is the key to correlating different pieces of information about the same logical entity.

Automating with Zabbix Low-Level Discovery (LLD)

Manually creating an item in Zabbix for every single interface and every metric (status, traffic in, traffic out…) is tedious and static. If a new interface appears, you have to add it manually. This is where Zabbix’s Low-Level Discovery (LLD) shines for SNMP.

LLD allows Zabbix to automatically find entities (like interfaces, disks, processors) based on SNMP indexes and then create items, triggers, and graphs for them using prototypes.

Setting Up Your Discovery Rule

Let’s create a discovery rule for network interfaces:

  1. Go to Configuration -> Templates -> Your Template -> Discovery rules -> Create discovery rule.
  2. Name: Something descriptive, e.g., “Network Interface Discovery”.
  3. Type: SNMP agent.
  4. Key: A unique key you define, e.g., net.if.discovery.
  5. SNMP OID: This is the core. Use the Zabbix discovery syntax: discovery[{#MACRO_NAME1}, OID1, {#MACRO_NAME2}, OID2, ...].

    • Zabbix automatically provides {#SNMPINDEX} representing the index found.
    • We define custom macros to capture specific values. For interface names, we can use {#IFNAME}.

    So, to discover interface names based on their index, the OID field would look like this:
    discovery[{#IFNAME}, 1.3.6.1.2.1.2.2.1.2]
    (Using the numerical OID for ifDescr).

  6. Configure other settings like update interval as needed.

Zabbix will periodically run this rule, perform an SNMP walk on the specified OID (ifDescr), and generate a list mapping each {#SNMPINDEX} to its corresponding {#IFNAME} value.

Pro Tip: Use the “Test” button in the discovery rule configuration! It’s incredibly helpful to see the raw data Zabbix gets and the JSON output with your macros populated before saving.

Creating Dynamic Items with Prototypes

Now that Zabbix can discover the interfaces, we need to tell it *what* to monitor for each one using Item Prototypes:

  1. Within your discovery rule, go to the “Item prototypes” tab -> Create item prototype.
  2. Name: Use the macros found by discovery for dynamic naming, e.g., Interface {#IFNAME}: Operational Status.
  3. Type: SNMP agent.
  4. Key: Must be unique per host. Use a macro to ensure this, e.g., net.if.status[{#SNMPINDEX}].
  5. SNMP OID: Specify the OID for the metric you want, appending the index macro. For operational status (ifOperStatus, numerical OID .1.3.6.1.2.1.2.2.1.8), use: 1.3.6.1.2.1.2.2.1.8.{#SNMPINDEX}. Zabbix will automatically replace {#SNMPINDEX} with the correct index (1, 2, 3…) for each discovered interface.
  6. Type of information: Numeric (unsigned) for status codes.
  7. Units: N/A for status.
  8. Value mapping: Select or create a value map that translates the numerical status (1, 2, 3…) into human-readable text (Up, Down, Testing…). This uses the information we found earlier in the MIB browser.
  9. Configure other settings like update interval, history storage, etc.

Once saved, Zabbix will use this prototype to create an actual item for each interface discovered by the LLD rule. If a new interface appears on the device, Zabbix will discover it and automatically create the corresponding status item!

A Practical Example: Monitoring Disk Storage

We can apply the same logic to other SNMP data, like disk storage. In the video, I showed discovering disk types and capacities on my pfSense box.

Discovering Disk Types with Preprocessing

I created another discovery rule targeting the OID for storage types (e.g., from the HOST-RESOURCES-MIB or a vendor-specific MIB). This OID often returns numbers (like 3 for Hard Disk, 5 for Optical Disk).

To make the discovered macro more readable (e.g., {#DISKTYPE}), I used Zabbix’s **Preprocessing** feature within the discovery rule itself:

  • Add a preprocessing step of type “Replace”.
  • Find `^5$` (regex for the number 5) and replace with `Optical Disk`.
  • Add another step to find `^3$` and replace with `Hard Disk`.

Now, the {#DISKTYPE} macro will contain “Hard Disk” or “Optical Disk” instead of just a number.

Monitoring Disk Capacity with Unit Conversion

Then, I created an item prototype for disk capacity:

  • Name: `Disk {#DISKTYPE}: Capacity`
  • Key: `storage.capacity[{#SNMPINDEX}]`
  • SNMP OID: `[OID_for_storage_size].{#SNMPINDEX}`
  • Units: `B` (Bytes)
  • Preprocessing (in the Item Prototype): The SNMP device reported capacity in Kilobytes (or sometimes in allocation units * block size). To normalize it to Bytes, I added a “Custom multiplier” preprocessing step with a value of `1024`.

Putting It All Together

By combining MIB exploration, `snmpwalk` testing, Zabbix LLD rules with custom macros, and item prototypes with appropriate OIDs and preprocessing, you can build powerful, dynamic SNMP monitoring for almost any device, even without off-the-shelf templates.

It might seem a bit daunting initially, especially understanding the OID structure and LLD syntax, but once you grasp the concept of indexes and macros, it becomes quite manageable. The key is to break it down: find the OIDs, test them, set up discovery, and then define what data to collect via prototypes.


I hope this walkthrough helps demystify custom SNMP monitoring in Zabbix! It’s a powerful skill to have when dealing with diverse infrastructure.

What are your biggest challenges with SNMP monitoring? Have you built custom LLD rules? Share your experiences, questions, or tips in the comments below! I’ll do my best to answer any doubts you might have.

And if you have more Zabbix questions, feel free to join the Italian Zabbix community on Telegram: Zabbix Italia.

If you found this post helpful, please give the original video a ‘Like’ on YouTube, share this post, and subscribe to Quadrata for more open-source and Zabbix content.

Thanks for reading, and see you next week!

– Dimitri Bellini

Read More
Building a Bulletproof PostgreSQL Cluster with Patroni, etcd, and PGBackrest

Building a Bulletproof PostgreSQL Cluster: My Go-To High Availability Setup

Good morning everyone! Dimitri Bellini here, back on Quadrata, my channel dedicated to the open-source world and the IT topics I love – and hopefully, you do too!

Thanks for tuning in each week. If you haven’t already, please hit that subscribe button and give this video a thumbs up – it really helps!

Today, we’re diving into a crucial topic for anyone running important applications, especially (but not only!) those using Zabbix: database resilience and performance. Databases are often the heart of our applications, but they can also be the source of major headaches – slow queries, crashes, data loss. Ensuring your database is robust and performs well is fundamental.

Why PostgreSQL and This Specific Architecture?

A few years back, we made a strategic decision to shift from MySQL to PostgreSQL. Why? Several reasons:

  • The community and development activity around Postgres seemed much more vibrant.
  • It felt like a more “serious,” robust database, even if maybe a bit more complex to configure initially compared to MySQL’s out-of-the-box readiness.
  • For applications like Zabbix, which heavily utilize the database, especially in complex setups, having a reliable and performant backend is non-negotiable. Avoiding database disasters and recovery nightmares is paramount!

The architecture I’m showcasing today isn’t just for Zabbix; it’s a solid foundation for many applications needing high availability. We have clients using this exact setup for various purposes.

The Core Components

The solution we’ve settled on combines several powerful open-source tools:

  • PostgreSQL: The core relational database.
  • Patroni: A fantastic template for creating a High Availability (HA) PostgreSQL cluster. It manages the Postgres instances and orchestrates failover.
  • etcd: A distributed, reliable key-value store. Patroni uses etcd for coordination and sharing state information between cluster nodes, ensuring consensus.
  • PGBackrest: A reliable, feature-rich backup and restore solution specifically designed for PostgreSQL.
  • HAProxy (Optional but Recommended): A load balancer to direct application traffic to the current primary node seamlessly.

How It Fits Together

Imagine a setup like this:

  • Multiple PostgreSQL Nodes: Typically, at least two nodes running PostgreSQL instances.
  • Patroni Control: Patroni runs on these nodes, monitoring the health of Postgres and managing roles (leader/replica).
  • etcd Cluster: An etcd cluster (minimum 3 nodes for quorum – one can even be the backup server) stores the cluster state. Patroni instances consult etcd to know the current leader and overall cluster health.
  • PGBackrest Node: Often one of the etcd nodes also serves as the PGBackrest repository server, storing backups and Write-Ahead Logs (WALs) for point-in-time recovery. Backups can be stored locally or, even better, pushed to an S3-compatible object store.
  • Load Balancer: HAProxy (or similar) sits in front, checking an HTTP endpoint provided by Patroni on each node to determine which one is the current leader (primary) and directs all write traffic there.

This creates an active-standby (or active-passive) cluster. Your application connects to a single endpoint (the balancer), completely unaware of which physical node is currently active. HAProxy handles the redirection automatically during a switchover or failover.

Key Advantages of This Approach

  • True High Availability: Provides a really bulletproof active-standby solution.
  • Easy Balancer Integration: Uses simple HTTP checks, avoiding the complexities of virtual IPs (VIPs) and Layer 2 network requirements often seen in traditional clustering (like Corosync/Pacemaker), making it great for modern Layer 3 or cloud environments.
  • “Simple” Configuration (Relatively!): Once you grasp the concepts, configuration is largely centralized in a single YAML file per node (patroni.yml).
  • Highly Resilient & Automated: Handles node failures, switchovers, and even node reintegration automatically.
  • Powerful Backup & Recovery: PGBackrest makes backups and, crucially, Point-in-Time Recovery (PITR) straightforward (again, “straightforward” for those familiar with database recovery!).
  • 100% Open Source: No licensing costs or vendor lock-in. Test it, deploy it freely.
  • Enterprise Ready & Supportable: These are mature projects. For production environments needing formal support, companies like Cybertec PostgreSQL (no affiliation, just an example we partner with) offer commercial support for this stack. We at Quadrata can also assist with first-level support and implementation.

In my opinion, this architecture brings PostgreSQL very close to the robustness you might expect from expensive proprietary solutions like Oracle RAC, but using entirely open-source components.

Let’s See It In Action: A Practical Demo

Talk is cheap, right? Let’s walk through some common management tasks and failure scenarios. In my lab, I have three minimal VMs (2 vCPU, 4GB RAM, 50GB disk): two for PostgreSQL/Patroni (node1, node2) and one for PGBackrest/etcd (backup-node). Remember, 3 nodes is the minimum for a reliable etcd quorum.

1. Checking Cluster Status

The primary command is patroni ctl. Let’s see the cluster members:

$ patronictl -c /etc/patroni/patroni.yml list
+ Cluster: my_cluster (73...) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------+--------+---------+----+-----------+
| node1 | 10.0.0.1| Leader | running | 10 | |
| node2 | 10.0.0.2| Replica| running | 10 | 0 |
+--------+---------+--------+---------+----+-----------+

Here, node1 is the current Leader (primary), and node2 is a Replica, perfectly in sync (Lag 0 MB) on the same timeline (TL 10).

2. Performing a Manual Switchover

Need to do maintenance on the primary? Let’s gracefully switch roles:

$ patronictl -c /etc/patroni/patroni.yml switchover
Current cluster leader is node1
Available candidates for switchover:
1. node2
Select candidate from list [1]: 1
When should the switchover take place (e.g. 2023-10-27T10:00:00+00:00) [now]: now
Are you sure you want to switchover cluster 'my_cluster', leader 'node1' to member 'node2'? [y/N]: y
Successfully switched over to "node2"
... (Check status again) ...
+ Cluster: my_cluster (73...) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------+--------+---------+----+-----------+
| node1 | 10.0.0.1| Replica| running | 11 | 0 |
| node2 | 10.0.0.2| Leader | running | 11 | |
+--------+---------+--------+---------+----+-----------+

Patroni handled demoting the old leader, promoting the replica, and ensuring the old leader started following the new one. Notice the timeline (TL) incremented.

3. Simulating a Primary Node Failure

What if the primary node just dies? Let’s stop Patroni on node2 (the current leader):

# systemctl stop patroni (on node2)

Now, check the status from node1:

$ patronictl -c /etc/patroni/patroni.yml list
+ Cluster: my_cluster (73...) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+---------+--------+---------+----+-----------+
| node1 | 10.0.0.1| Leader | running | 12 | |
| node2 | 10.0.0.2| | stopped | | unknown |
+--------+---------+--------+---------+----+-----------+

Patroni automatically detected the failure and promoted node1 to Leader. When node2 comes back online (systemctl start patroni), Patroni will automatically reintegrate it as a replica.

4. Recovering a Destroyed Node

What if a node is completely lost? Data disk corrupted, VM deleted? Let’s simulate this on node2 (assuming node1 is currently the leader):

# systemctl stop patroni (on node2)
# rm -rf /var/lib/patroni/data # Or wherever your PG data directory is
# systemctl start patroni (on node2)

Watching the Patroni logs on node2, you’ll see it detects it has no data and initiates a `pg_basebackup` (or uses PGBackrest if configured) from the current leader (node1) to rebuild itself from scratch. Checking patroni ctl list shows its state transitioning through `creating replica` to `running` as a replica again, all automatically!

5. Point-in-Time Recovery (PITR) – The Real Lifesaver!

This is why I made the video! Recently, a bad deployment caused data corruption. We needed to restore to a state just *before* the incident. Here’s how PGBackrest and Patroni help.

Scenario: I accidentally deleted all records from a critical table.

psql> SELECT COUNT(*) FROM my_table; -- Shows 1000 rows
psql> DELETE FROM my_table;
psql> SELECT COUNT(*) FROM my_table; -- Shows 0 rows! Disaster!

Recovery Steps:

  1. STOP PATRONI EVERYWHERE: This is critical. We need to prevent Patroni from interfering while we manipulate the database state manually.

    # systemctl stop patroni (on ALL nodes: node1, node2)

  2. Identify Target Time/Backup: Use PGBackrest to find the backup and approximate time *before* the data loss.

    $ pgbackrest --stanza=my_stanza info 
    ... (Find the latest FULL backup timestamp, e.g., '2023-10-27 11:30:00') ...

  3. Perform Restore on the (Ex-)Leader Node: Go to the node that *was* the leader (let’s say node1). Run the restore command, specifying the target time. The `–delta` option is efficient as it only restores changed files.

    $ pgbackrest --stanza=my_stanza --delta --type=time --target="2023-10-27 11:30:00" --target-action=pause restore

    (Note: `–target-action=pause` or `promote` might be needed depending on your exact recovery goal. For simplicity here, let’s assume we want to stop recovery at that point). Check PGBackrest docs for specifics. The video used a slightly different target specification based on the backup label.)

    Correction based on Video: The video demonstrated restoring to the end time of a specific full backup. A more typical PITR might use `–type=time` and a specific timestamp like `YYYY-MM-DD HH:MM:SS`. Let’s assume we used the backup label as shown in the video’s logic:

    $ pgbackrest --stanza=my_stanza --delta --set=20231027-xxxxxxF --type=default --target-action=promote restore

    (Replace `20231027-xxxxxxF` with your actual backup label. Using `–target-action=promote` tells Postgres to finish recovery and become promotable immediately after reaching the target.)

  4. Start Postgres Manually (on the restored node): Start the database *without* Patroni first.

    # pg_ctl -D /var/lib/patroni/data start

    PostgreSQL will perform recovery using the restored files and WAL archives up to the specified target. Because we used `–target-action=promote`, it should finish recovery and be ready. If we had used `pause`, we would need `pg_ctl promote`.

  5. Verify Data: Connect via `psql` and check if your data is back!

    psql> SELECT COUNT(*) FROM my_table; -- Should show 1000 rows again!

  6. Restart Patroni: Now that the database is in the desired state, start Patroni on the restored node first, then on the other nodes.

    # systemctl start patroni (on node1)
    # systemctl start patroni (on node2)

    Patroni on `node1` will see it’s a valid database, assert leadership in etcd. Patroni on `node2` will detect it’s diverged (or has no data if we wiped it) and automatically re-sync from the now-restored leader (`node1`).

As you saw, we recovered from a potential disaster relatively quickly because the architecture and tools are designed for this.

Final Thoughts

Setting up this entire stack isn’t trivial – it requires understanding each component. That’s why I didn’t do a full step-by-step configuration in the video (it would be too long!). But I hope showing you *how it works* and its capabilities demonstrates *why* we chose this architecture.

It provides automation, resilience, and recovery options that are crucial for critical systems. Having an organized setup like this, combined with good documentation (please, write down your procedures!), turns stressful recovery scenarios into manageable tasks.

What do you think? Is PostgreSQL with Patroni something you’d consider? Are there comparable HA solutions in the MySQL/MariaDB world you think are as robust or easy to manage? Let me know your thoughts in the comments below!

Don’t forget to check out the Quadrata YouTube channel for more open-source and IT content, and join the discussion on the Zabbix Italia Telegram channel!

That’s all for this episode. A big greeting from me, Dimitri, and see you in the next one. Bye everyone!

Read More
Unlocking Zabbix Proxies: Monitoring Remote Networks Like a Pro

Unlocking Zabbix Proxies: Monitoring Remote Networks Like a Pro

Hey everyone, Dimitri Bellini here, back with another episode on Quadrata (my YouTube channel, @quadrata)! This week, we’re diving deep into Zabbix proxies. I’ve been getting a lot of questions about how these things work, especially when it comes to discoveries and monitoring devices in remote networks. So, let’s get into it!

What is a Zabbix Proxy and Why Do You Need It?

Think of a Zabbix proxy as your monitoring agent in a segregated area. It’s a powerful tool within the Zabbix ecosystem that allows us to:

  • Monitor segregated areas or remote branches: It handles all the checks the Zabbix server would, but closer to the source.
  • Scale horizontally: It can offload work from your main Zabbix server in larger deployments.
  • Reduce bandwidth usage: It collects data locally and transmits it to the Zabbix server in a single, often compressed, transaction.
  • Simplify firewall configurations: You only need to configure one TCP port.
  • Buffer data during connectivity issues: The proxy stores collected data and forwards it when the connection to the Zabbix server is restored.

What Can a Zabbix Proxy Do?

A Zabbix proxy is surprisingly versatile. It can perform almost all the checks your Zabbix server can:

  • SNMP monitoring
  • IPMI checks
  • Zabbix agent monitoring
  • REST API checks
  • Remote command execution for auto-remediation

Key Improvements in Zabbix 7.0

The latest version of Zabbix (7.0) brings some significant enhancements to Zabbix proxies, including:

  • IA Viability: Improved overall stability and performance.
  • Automatic Load Distribution: The Zabbix server intelligently distributes hosts across proxies based on various factors.

Configuring a Zabbix Proxy with SQLite

For smaller setups, SQLite is a fantastic option. Here’s the basic configuration:

  1. Modify zabbix_proxy.conf:

    • Set the Server directive to the IP or DNS name of your Zabbix server.
    • Define the Hostname. This is crucial and must match the proxy name in the Zabbix web interface.
    • Set DBName to the path and filename for your SQLite database (e.g., /var/lib/zabbix/proxy.db). Remember, this is a *file path*, not a database name.

  2. Configure Zabbix Agents: Point the Server or ServerActive directives in your agent configurations to the proxy’s IP address, not the Zabbix server’s.

Remember to always consult the official Zabbix documentation for the most up-to-date and comprehensive information!

Zabbix Proxy Discovery in Action

Now, let’s talk about automatic host discovery using a Zabbix proxy. Here’s how I set it up:

  1. Create a Discovery Rule: In the Zabbix web interface, go to Data Collection -> Discovery and create a new rule.

    • Give it a descriptive name.
    • Set Discovery by to Proxy and select your proxy.
    • Define the IP range to scan. You can specify multiple ranges separated by commas.
    • Adjust the Update interval. Start with something reasonable (like an hour) to avoid network flooding. You can temporarily lower it for testing, but remember to change it back!
    • Configure the Checks. I used ICMP ping, SNMP (to get the system name), and Zabbix agent checks (system.hostname, system.uname).
    • Define Device unique criteria, typically IP address.
    • Specify Hostname and Visible name (I usually use the Zabbix agent’s hostname).

  2. Check Discovery Results: Go to Monitoring -> Discovery to see what the proxy has found.

Pro Tip: Debugging Discovery Issues with Runtime Commands

If you’re not seeing results immediately, don’t panic! Instead of guessing, SSH into your Zabbix proxy server and use the Zabbix proxy binary’s runtime commands:

zabbix_proxy -R help

This will show you available commands. The key one for debugging discovery is:

zabbix_proxy -R loglevel_increase="discovery manager"

This increases the logging level for the discovery manager process, providing much more verbose output in the Zabbix proxy’s log file. This is invaluable for troubleshooting!

Automating Host Onboarding with Discovery Actions

The real magic happens when you automate the process of adding discovered hosts. This is done through Configuration -> Actions -> Discovery actions.

  1. Enable the Default “Autodiscovery Linux Servers” Action (or create your own):

    • The key conditions are:

      • Application equals Discovery (meaning something was discovered).
      • Received value like Linux. This checks if the Zabbix agent’s system.uname value contains “Linux”.

    • The key operations are:

      • Create a host.
      • Add the host to the “Linux servers” host group.
      • (Crucially!) Link a template (e.g., “Template OS Linux by Zabbix agent”).

You can create more sophisticated actions based on other discovered properties, like SNMP data, allowing you to automatically assign appropriate templates based on device type (e.g., Cisco routers, HP printers).

Wrapping Up

While my live demo didn’t go *exactly* as planned (as is the way with live demos!), I hope this has given you a solid understanding of how Zabbix proxies work and how to use them effectively for monitoring remote networks. The key takeaways are understanding the configuration, using discovery rules effectively, and leveraging discovery actions to automate host onboarding.

If you found this helpful, give me a thumbs up! If you have any questions, drop them in the comments below. Also, be sure to join the ZabbixItalia Telegram channel (ZabbixItalia) for more Zabbix discussions. I can’t always answer everything immediately, but I’ll do my best to help. Thanks for watching, and I’ll see you next week on Quadrata!

Read More
AI for Coding: A Revolution or Just a Buzzword?

AI for Coding: A Revolution or Just a Buzzword?

Hello everyone, Dimitri Bellini here, and welcome back to my channel, Quadrata! It’s always a pleasure to share my thoughts on the open-source world and IT. If you haven’t already, please give this a like and subscribe to the channel. In this episode, I’m diving into a hot topic: artificial intelligence for coding. Is it truly the game-changer many claim it to be, or is it just another overhyped buzzword? Let’s find out.

The Promise of AI in Coding

The idea that AI can help us write code is incredibly appealing. Coding, whether it’s in Python or any other language, isn’t always straightforward. It involves working with libraries, understanding complex concepts, and debugging. So, the prospect of AI assistance to generate scripts or entire software is definitely something that excites many people, including me!

However, there’s a catch. Accessing these AI coding tools often comes at a cost. Many platforms require a subscription, or you need to pay for API usage, like with OpenAI’s ChatGPT. And, of course, you’ll need a computer, but the bulk of the processing is increasingly cloud-based.

Personally, I’ve experimented with AI for tasks like creating widgets in Zabbix and tuning parameters in Python scripts. The results? Mixed. Sometimes AI does a decent job, but other times, it falls short.

Popular AI Coding Tools

Let’s look at some of the popular tools in the AI coding space:

    • Cursor: One of the most well-known, Cursor is essentially a fork of Visual Studio Code. It provides a suite of AI models (OpenAI, Anthropic, Google) for a subscription fee, starting at around $20 per month. The pricing model, based on tokens, can be a bit complex. Initially focused on code creation, Cursor now seems to emphasize code suggestion and autocompletion.
    • Windsurf Editor: Another VS Code fork, Windsurf also integrates API calls to major AI models. It’s priced slightly lower, around $15 per month. Like Cursor, the actual cost can vary based on token usage.
    • Cline and Roocode: These are open-source VS Code extensions. Roocode is actually a fork of Cline. While they offer the advantage of being free, you’ll need to manage your subscriptions with AI providers separately. This approach can be cost-effective, especially if you want to use local AI engines.
    • Bolt DIY: Similar to Bolt.new, Bolt DIY is an open-source platform focused on code generation. While it can be useful for small tasks, I have doubts about its effectiveness for more complex projects. It also comes with a subscription fee of around $20 per month, but the token allocation for AI models isn’t very clear.

In my own testing, I used the trial version of Windsurf. I attempted to create a widget for Zabbix and modify a Python script. In just two days, I exhausted the available credits. This highlights the importance of carefully evaluating the cost-effectiveness of these tools.

The Concept of AI Agents and Tools

To improve the output from AI, the concept of using specialized AI agents has emerged. Instead of giving an AI model a broad task, breaking it down into smaller, specialized tasks can lead to more efficient and sensible results.

This is where “tools” or “function calling” comes in. These techniques allow AI engines to use external tools. For example, if an AI model’s dataset is limited to 2023, it won’t be able to provide real-time information like today’s flight details. However, with tools, the AI can be instructed to use an external script (e.g., in Python) to fetch the information from the internet and then process the output.

This capability extends the functionality of AI models, enabling them to, for example, pull code snippets from documentation or connect to APIs.

Challenges and the Model Context Protocol (MCP)

Despite the promise, there are challenges. Not all AI models support tools or function calling, and even those that do may have different formats. This is where the Model Context Protocol (MCP) comes in.

Introduced by Anthropic, the company behind Cloud, MCP aims to standardize communication between different tools and AI models. Think of it like a USB hub for AI. It provides a standard way for AI to discover available tools, understand their functions, and invoke them. This standardization could simplify development and reduce the complexity of integrating various services.

The MCP server, which can be hosted in your private cloud, exposes an API to allow AI or MCP clients to discover available tools and their capabilities. It also provides a standardized method for invoking these tools, addressing the current inconsistencies between AI models.

The Road Ahead

Despite these advancements, AI for coding still faces challenges. AI models often struggle to interpret the output from tools and use them effectively to produce satisfactory results. We are still in the early stages of this technology.

There are also concerns about the complexity introduced by MCP, such as the need for a server component and potential security issues like the lack of encryption. It’s a balancing act between the benefits and the added complexities.

Personally, I don’t believe AI is ready to handle serious coding tasks independently. However, it can be incredibly useful for simplifying repetitive tasks, like translations, text improvements, and reformatting. AI is excellent at repetitive tasks. While I may not be using it to its fullest potential, it certainly makes my daily tasks easier.

The future of AI in coding is promising, especially with the development of smaller, more efficient models that can run locally. Models like the one with 24 billion parameters, having the same capacity as DeepSeq R1 and requiring 20GB of RAM, are a step in the right direction. If we can continue to refine these models, AI could become an even more integral part of our coding workflow.

Let’s Discuss!

I’m eager to hear your thoughts on AI for coding. Please share your experiences and opinions in the comments below. Let’s learn from each other! You can also join the conversation on the ZabbixItalia Telegram Channel.

Thank you for joining me today. This is Dimitri Bellini, and I’ll see you next week. Bye everyone!

Visit my channel: Quadrata

Join the Telegram Channel: ZabbixItalia

Read More
Linux to Mac OS: A Tech Dilemma?

The Great Debate: Am I Ditching Linux for Mac OS? – Dimitri Bellini from Quadrata

Hey everyone, Dimitri Bellini here from Quadrata! This week, I’ve been wrestling with a decision that might surprise some of you: I’m seriously considering switching from Linux to Mac OS for my daily driver laptop. Yes, you read that right!

Why the Switch? A Long-Time Linux User’s Perspective

Now, before you brand me a heretic, let me explain. I’ve been a Linux user since the early 2000s – think Gentoo, Debian, Caldera, the early Fedora days. I’ve navigated the complexities, the driver issues, the customization rabbit holes. And I’ve loved it! But times change, and so do my needs.

The Linux Love Affair: A History

  • Early Days: Gentoo, Debian, Caldera
  • Mid-2000s Onward: Fedora (Desktop), CentOS/Rocky Linux/Alma Linux (Servers)
  • Current: Fedora on ThinkPad

My journey with Linux has been one of constant learning and problem-solving. From tweaking icons to optimizing desktops, I enjoyed the process of making Linux my own. But nowadays, I value different things.

The Allure of Mac OS: Productivity and Performance

The main reason I’m considering the switch is the need for a more reliable, out-of-the-box experience. My current ThinkPad T14S (Ryzen 7, 16GB RAM, 1TB SSD) is starting to show its age, especially during these hot summer months. I’m experiencing:

  • Thermal Throttling: Performance slowdowns when multitasking.
  • Webcall Issues: Cracking voice during video conferences.
  • Hardware Optimization: Ongoing driver challenges.

The new MacBook Air M4 is tempting. Here’s a quick comparison:

ThinkPad T14S (Current) vs. MacBook Air M4 (Potential)

  • Processor: AMD Ryzen 7 (X86) vs. Apple M4 (ARM)
  • Cores: 8 Cores (ThinkPad) vs 10 Cores (MacBook Air)
  • RAM: 16GB (ThinkPad) vs. 16GB (MacBook Air)
  • Storage: 1TB (ThinkPad) vs. 256GB (MacBook Air – expandable)
  • Ports: USB-A, USB-C, HDMI (ThinkPad) vs. 2x USB-C (MacBook Air)
  • Price: My upgraded ThinkPad, vs a starting price for a MacBook Air (similar spec’d Thinkpads are far more costly)

The thermal efficiency of the M4 chip is particularly appealing. The promise of 7-12 hours of battery life compared to my current 1-2 hours is a game-changer. Plus, Mac OS is a Unix-based system, which makes the move a little less daunting.

The Pros and Cons: A Balanced View

Mac OS Pros:

  • Thermal Efficiency: Runs much cooler and longer.
  • Software Ecosystem: Access to popular professional applications.
  • Build Quality: Aluminum build feels premium.
  • Unix Based: Unix underpinnings.

Mac OS Cons:

  • Virtualization Challenges: ARM architecture requires ARM-compiled VMs.
  • Expandability: Limited upgrade options.
  • Ecosystem: Can be expensive and less flexible.

The Virtualization Hurdle

One of my biggest concerns is virtualization. Switching to an ARM-based system means I’ll need ARM versions of Linux and Windows for my virtual machines. While most Linux distros offer ARM versions, older Windows software might not work seamlessly. I am also worried about the cost of the virtualization applications. I am used to using free version but for Mac OS it might require paid solutions like Parellel,

It’s important to note about the ecosystem surrounding Mac OS and software. Many of the apps are free or paid (downloadable from the Apple store or other method), unlike with the Linux eco-system where pretty much all the apps come via a software repository.

The Decision Looms: What Do I Do?

Ultimately, I’m at a crossroads. I need a reliable tool for productivity, not just a platform for endless tinkering. The MacBook Air M4 offers that promise, but the ecosystem concerns and virtualization hurdles are giving me pause. I also need to be mindful of costs for the required applications.

I want a solution that makes my daily work life simpler. I want to move away from “Commercial Scum” that is starting to become the Linux world, with Redhat forcing everyone to paid tiers.

Let’s Talk!

So, what do you think? Should I make the jump to Mac OS? Are any of you fellow Linux users who’ve made the switch (or switched back)? Let me know your thoughts in the comments below! Don’t forget to give this post a thumbs up and subscribe to my channel Quadrata for more open-source and IT discussions. Also, check out the ZabbixItalia Telegram Channel for awesome community discussions.

Read More
Demystifying AI in Zabbix: Can AI Correlate Events?

Demystifying AI in Zabbix: Can AI Really Correlate Events?

Good morning, everyone! Dimitri Bellini here, back with you on Quadrata, my YouTube channel dedicated to the open-source world and the IT topics I’m passionate about. This week, I wanted to tackle a question that I, and many members of the Zabbix community, get asked all the time: Why doesn’t Zabbix have more built-in AI?

It seems like every monitoring product out there is touting its AI capabilities, promising to solve all your problems with a touch of magic. But is it all hype? My colleagues and I have been digging deep into this, exploring whether an AI engine can truly correlate events within Zabbix and make our lives easier. This blog post, based on my recent video, will walk you through our thought process.

The AI Conundrum: Monitoring Tools and Artificial Intelligence

Let’s be honest: integrating AI into a monitoring tool isn’t a walk in the park. It requires time, patience, and a willingness to experiment with different technologies. More importantly, it demands a good dose of introspection to understand how all the pieces of your monitoring setup fit together. But why even bother?

Anyone who’s managed a complex IT environment knows the struggle. You can be bombarded with hundreds, even thousands, of alerts every single day. Identifying the root cause and prioritizing issues becomes a monumental task, even for seasoned experts. Severity levels help, but they often fall short.

Understanding the Challenges

Zabbix gives us a wealth of metrics – CPU usage, memory consumption, disk space, and more. We typically use these to create triggers and set alarm thresholds. However, these metrics, on their own, often don’t provide enough context when a problem arises. Here are some key challenges we face:

  • Limited Metadata: Event information and metadata, like host details, aren’t always comprehensive enough. We often need to manually enrich this data.
  • Lack of Visibility: Monitoring teams often lack a complete picture of what’s happening across the entire organization. They might not know the specific applications running on a host or the impact of a host failure on the broader ecosystem.
  • Siloed Information: In larger enterprises, different departments (e.g., operating systems, databases, networks) might operate in silos, hindering the ability to connect the dots.
  • Zabbix Context: While Zabbix excels at collecting metrics and generating events, it doesn’t automatically discover application dependencies. Creating custom solutions to address this is possible but can be complex.

Our Goals: Event Correlation and Noise Reduction

Our primary goal is to improve event correlation using AI. We want to:

  • Link related events together.
  • Reduce background noise by filtering out less important alerts.
  • Identify the true root cause of problems, even when buried beneath a mountain of alerts.

Possible AI Solutions for Zabbix

So, what tools can we leverage? Here are some solutions we considered:

  • Time Correlation: Analyzing the sequence of events within a specific timeframe to identify relationships.
  • Host and Host Group Proximity: Identifying correlations based on the physical or logical proximity of hosts and host groups.
  • Semantic Similarities: Analyzing the names of triggers, tags, and hosts to find connections based on their meaning.
  • Severity and Tag Patterns: Identifying correlations based on event severity and patterns in tags.
  • Metric Pattern Analysis: Analyzing how metrics evolve over time to identify patterns associated with specific problems.

Leveraging scikit-learn

One promising solution we explored involves using scikit-learn, an open-source machine learning library. Our proposed pipeline looks like this:

  1. Event Processing: Collect events from our Zabbix server using streaming capabilities.
  2. Encoding Events: Use machine learning techniques to vectorize and transform events into a usable format.
  3. Cluster Creation: Apply algorithms like DBSCAN to create clusters of related events (e.g., network problems, operating system problems).
  4. Merging Clusters: Merge clusters based on identified correlations.

A Simple Example

Imagine a scenario where a router interface goes down and host B becomes unreachable. It’s highly likely that the router issue is the root cause, and host B’s unreachability is a consequence.

Implementation Steps

To implement this solution, we suggest a phased approach:

  1. Temporal Regrouping: Start by grouping events based on their timing.
  2. Host and Group Context: Add context by incorporating host and host group information.
  3. Semantic Analysis: Include semantic analysis of problem names to identify connections.
  4. Tagging: Enrich events with tags to define roles and provide additional information.
  5. Iterated Feedback: Gather feedback from users to fine-tune the system and improve its accuracy.
  6. Scaling Considerations: Optimize data ingestion and temporal window size based on Zabbix load.

Improvements Using Existing Zabbix Features

We can also leverage existing Zabbix features:

  • Trigger Dependencies: Utilize trigger dependencies to define static relationships.
  • Low-Level Discovery: Use low-level discovery to gather detailed information about network interfaces and connected devices.
  • Enriched Tagging: Encourage users to add more informative tags to events.

The Reality Check: It’s Not So Simple

While the theory sounds great, real-world testing revealed significant challenges. The timing of events in Zabbix can be inconsistent due to update intervals and threshold configurations. This can create temporary discrepancies and make accurate correlation difficult.

Consider this scenario:

  • File system full
  • CRM down
  • DB instance down
  • Unreachable host

A human might intuitively understand that a full file system could cause a database instance to fail, which in turn could bring down a CRM application. However, a machine learning algorithm might struggle to make these connections without additional context.

Exploring Large Language Models (LLMs)

To address these limitations, we explored using Large Language Models (LLMs). LLMs have the potential to understand event descriptions and make connections based on their inherent knowledge. For example, an LLM might know that a CRM system typically relies on a database, which in turn requires a file system.

However, even with LLMs, challenges remain. Identifying the root cause versus the symptoms can be tricky, and LLMs might not always accurately correlate events. Additionally, using high-end LLMs in the cloud can be expensive, while local models might not provide sufficient accuracy.

Conclusion: The Complex Reality of AI in Monitoring

In conclusion, integrating AI into Zabbix for event correlation is a complex challenge. A one-size-fits-all solution is unlikely to be effective. Tailoring the solution to the specific needs of each client is crucial. While LLMs offer promise, the cost and complexity of using them effectively remain significant concerns.

We’re continuing to explore this topic and welcome your thoughts and ideas!

Let’s Discuss!

What are your thoughts on using AI in monitoring? Have you had any success with similar approaches? Share your insights in the comments below or join the conversation on the ZabbixItalia Telegram Channel! Let’s collaborate and find new directions for our reasoning.

Thanks for watching! See you next week!

Bye from Dimitri!

Watch the original video: Quadrata Youtube Channel

Read More
Automate Your Zabbix Reporting with Scheduled Reports: A Step-by-Step Guide

Automate Your Zabbix Reporting with Scheduled Reports: A Step-by-Step Guide

Hey everyone, Dimitri Bellini here from Quadrata, your go-to channel for open source and IT insights! It’s fantastic to have you back with me. If you’re enjoying the content and haven’t subscribed yet, now’s a great time to hit that button and help me bring you even more valuable videos. 😉

Today, we’re diving deep into a Zabbix feature that’s been around for a while but is now truly shining – Scheduled Reports. Recently, I’ve been getting a lot of questions about this from clients, and it made me realize it’s time to shed light on this often-overlooked functionality. So, let’s talk about automating those PDF reports from your Zabbix dashboards.

Why Scheduled Reports? The Power of Automated Insights

Scheduled reports might not be brand new to Zabbix (they’ve been around since version 5.2!), but honestly, I wasn’t completely sold on them until recently. In older versions, they felt a bit… incomplete. But with Zabbix 7 and especially 7.2, things have changed dramatically. Now, in my opinion, scheduled reports are becoming a genuinely useful tool.

What are we talking about exactly? Essentially, scheduled reports are a way to automatically generate PDFs of your Zabbix dashboards and have them emailed to stakeholders – think bosses, team leads, or anyone who needs a regular overview without logging into Zabbix directly. We all know that stakeholder, right? The one who wants to see a “green is good” PDF report every Monday morning (or Friday afternoon!). While dashboards are great for real-time monitoring, scheduled reports offer that convenient, digestible summary for those who need a quick status update.

Sure, everyone *could* log into Zabbix and check the dashboards themselves. But let’s be real, sometimes pushing the information directly to them in a clean, professional PDF format is just more efficient and impactful. And that’s where Zabbix Scheduled Reports come in!

Key Features of Zabbix Scheduled Reports

Let’s break down the main advantages of using scheduled reports in Zabbix:

    • Automation: Define parameters to automatically send specific dashboards on a schedule (daily, weekly, monthly) to designated users.
    • Customization: Leverage your existing Zabbix dashboards. The reports are generated directly from the dashboards you design with widgets.
    • PDF Format: Reports are generated in PDF, the universally readable and versatile format.
    • Access Control: Control who can create and manage scheduled reports using user roles and permissions within Zabbix (Admin and Super Admin roles with specific flags).

For more detailed information, I highly recommend checking out the official Zabbix documentation and the Zabbix blog post about scheduled reports. I’ll include links in the description below for your convenience!

Setting Up Zabbix Scheduled Reports: A Step-by-Step Guide

Ready to get started? Here’s how to set up scheduled reports in Zabbix. Keep in mind, this guide is based on a simplified installation for demonstration purposes. For production environments, always refer to the official Zabbix documentation for best practices and advanced configurations.

Prerequisites

Before we begin, make sure you have the following:

    • A running Zabbix server (version 7.0 or higher recommended, 7.2+ for the best experience).
    • Configured dashboards in Zabbix that you want to use for reports.
    • Email media type configured in Zabbix for sending reports.

Installation of Zabbix Web Service and Google Chrome

The magic behind Zabbix scheduled reports relies on a separate component: Zabbix Web Service. This service handles the PDF generation and needs to be installed separately. It also uses Google Chrome (or Chromium) in headless mode to take screenshots of your dashboards and convert them to PDF.

Here’s how to install them on a Red Hat-based system (like Rocky Linux) using YUM/DNF:

    1. Install Zabbix Web Service:
      sudo yum install zabbix-web-service

      Make sure you have the official Zabbix repository configured.

    1. Install Google Chrome Stable:
      sudo yum install google-chrome-stable

      This will install Google Chrome and its dependencies. Be aware that Chrome can pull in quite a few dependencies, which is why installing the web service on a separate, smaller machine can be a good idea for cleaner Zabbix server environments.

Configuring Zabbix Server

Next, we need to configure the Zabbix server to enable scheduled reports and point it to the web service.

    1. Edit the Zabbix Server Configuration File:
      sudo vi /etc/zabbix/zabbix_server.conf
    1. Modify the following parameters:
        • StartReportWriters=1 (Change from 0 to 1 or more, depending on your reporting needs. Start with 1 for testing.)
        • WebServiceURL="http://localhost:10053/report" (Adjust the IP address and port if your web service is running on a different machine or port. 10053 is the default port for Zabbix Web Service).
    1. Restart Zabbix Server:
      sudo systemctl restart zabbix-server
    1. Start Zabbix Web Service:
      sudo systemctl start zabbix-web-service
    1. Enable Zabbix Web Service to start on boot:
      sudo systemctl enable zabbix-web-service

Configuring Zabbix Frontend

One last crucial configuration step in the Zabbix web interface!

    1. Navigate to Administration -> General -> GUI.
    1. Modify “Frontend URL”: Set this to the full URL of your Zabbix frontend (e.g., http://your_zabbix_server_ip/zabbix). This is essential for Chrome to access the dashboards correctly for PDF generation.
    1. Click “Update”.

Creating a Scheduled Report

Now for the fun part – creating your first scheduled report!

    1. Go to Reports -> Scheduled reports.
    1. Click “Create scheduled report”.
    1. Configure the report:
        • Name: Give your report a descriptive name (e.g., “Weekly Server Health Report”).
        • Dashboard: Select the dashboard you want to use for the report.
        • Period: Choose the time period for the report data (e.g., “Previous week”).
        • Schedule: Define the frequency (daily, weekly, monthly), time, and start/end dates for report generation.
        • Recipients: Add users or user groups who should receive the report via email. Make sure they have email media configured!
        • Generated report by: Choose if the report should be generated based on the permissions of the “Current user” (the admin creating the report) or the “Recipient” of the report.
        • Message: Customize the email message that accompanies the report (you can use Zabbix macros here).
    1. Click “Add”.

Testing and Troubleshooting

To test your setup, you can use the “Test” button next to your newly created scheduled report. If you encounter issues, double-check:

    • Email media configuration for recipients.
    • Zabbix Web Service and Google Chrome installation.
    • Zabbix server and web service configuration files.
    • Frontend URL setting.
    • Permissions: In the video, I encountered a permission issue related to the /var/lib/zabbix directory. You might need to create this directory and ensure the Zabbix user has write permissions if you face similar errors. sudo mkdir /var/lib/zabbix && sudo chown zabbix:zabbix /var/lib/zabbix

Why Zabbix 7.x Makes a Difference

I really started to appreciate scheduled reports with Zabbix 7.0 and 7.2. Why? Because these versions brought significant improvements:

    • Multi-page Reports: Finally, reports can span multiple pages, making them much more comprehensive.
    • Enhanced Dashboard Widgets: Zabbix 7.x introduced richer widgets like Top Hosts, Top Items, Pie charts, and Donut charts. These make dashboards (and therefore reports) far more visually appealing and informative.
    • Custom Widgets: With the ability to create custom widgets, you can tailor your dashboards and reports to very specific needs.

These enhancements make scheduled reports in Zabbix 7.x and above a truly valuable tool for delivering insightful and professional monitoring summaries.

Conclusion

Zabbix Scheduled Reports are a fantastic way to automate the delivery of key monitoring insights to stakeholders. While they’ve been around for a while, the improvements in Zabbix 7.x have made them significantly more powerful and user-friendly. Give them a try, experiment with your dashboards, and start delivering automated, professional PDF reports today!

I hope you found this guide helpful! If you did, please give this post a thumbs up (or share!) and let me know in the comments if you have any questions or experiences with Zabbix Scheduled Reports. Don’t forget to subscribe to Quadrata for more open source and IT tips and tricks.

And if you’re in the Zabbix community, be sure to join the ZabbixItalia Telegram channel – a great place to connect with other Zabbix users and get your questions answered. A big thank you for watching, and I’ll see you in the next video!

Bye from Dimitri!

P.S. Keep exploring Zabbix – there’s always something new and cool to discover!


Keywords: Zabbix, Scheduled Reports, PDF Reports, Automation, Dashboards, Monitoring, IT Reporting, Zabbix Web Service, Google Chrome, Tutorial, Guide, Dimitri Bellini, Quadrata, Zabbix 7.2, Zabbix 7.0, Open Source, IT Infrastructure, System Monitoring

Read More