September 2023

Pi-Hole: Metrics from a Postgres database

I love Pi-Hole.

It is one of my favourite services that I have running on my home network.

I definitely take Pi-Hole for granted at home and only really appreciate how effective it is when I travel and connect to other networks.

While the function of Pi-Hole is brilliant – my Data warehouse Analyst and reporting background craves for more power from the metrics behind Pi-Hole.

By default Pi-Hole offers a simple dashboard with a few catchy graphs and metrics.  This can be hypnotizing to watch when first installing Pi-Hole; watching shock of how many DNS queries are blocked visiting different  websites.

Behind the scenes is a sqlite database located on the server for more power users to query.

We need more power!

For me – I have a couple of problems with the Pi-Hole dashboard and crave for more functionality and power from what is on offer.

Some issues I have:

  • On my home network – I have two Pi-Hole servers working in a redundancy.  The main Pi-Hole server is sitting on a hypervisor while the backup one is on a Raspberry Pi.  The main rational of this is that if we have a power outage that knocks the hypervisor offline while I am not at home; the backup Pi-Hole server will take over and save an angry phone call from the wife on why she can’t access the internet.

    One of my wishes is to see the combined Pi-Hole metrics across both servers in one dashboard.

  • A couple of items on the network creates a large amount of noise that skews the metrics.  For instance the security cameras use to query a NTP server every second until I routed them through a NTP server at home.  My work computer when it drops off the VPN will flood particular addresses with traffic. 

    I would like to exclude these addresses from the metrics; but not exclude them from the P-Hole functionality.

  • Access to better analytics from the Pi-Hole data. 
    • How about a heat map of blocked queries on devices? 
    • Or weekly trends with a moving average? 
    • Or ability to drill into a time where there is a high amount of blocked traffic to determine the  device and application is behind it? (Answer: My kid playing mobile app games)

Napkin Solution

This is what I am working on at the moment to extract and load the data from my two Pi-Hole servers into a Postgres database to query from.

  1. A NFS share is set up to share out the pihole-FLT.db to a linux server on both Pi-Hole servers
  2. A cron job runs every x minutes to run a python script.  The python scrip will query the pihole-FTL.db to get the records from the last timestamp it ran to the current timestamp in the Pi-Hole database and land the data in to a “landing” schema on the postgres database
  3. A stored procedure will run to transfer and normalise the data into an ods schema
  4. A reporting tool will then query off the ods schema.  In this case I am using Apache Superset; just because it was freely available and the web nature means I can share dashboards to multiple devices without installing client software.

Performance of querying sqlite through a NFS share

Initially I was worried about querying the pihole-FTL.db over the NFS share; thinking it will be too slow to be practical.

I was thinking of a few clunky solutions:

  1. Copy the entire DB over to the postgres host and load it from there
  2. Write a script that runs on the Pi-Hole server to extract the incremental data from the pihole-FLT.db into files and load the files

Thankfully from the volume of traffic we produce; it is not a concern.

The initial load of copying all the data out of the Pi-Hole database does take a long time if there is a large amount of history to transfer across.

But once the initial load is done; small incremental bites of data only take a few seconds

Current state

I have a pilot running on my home network as a proof of concept.

Fundamentally it works fine and I can gleam some interesting metrics from my Pi-Hole servers.

At the moment the security is not great atrocious as I am just using a super username and password throughout the pipeline.

Once I have cleaned up the postgres sql, python code I can share the complete solution.

Qlik Replicate Expressions – Wait. That’s Illegal

A new team will be taking over for out of hours oncall for Qlik Replicate and I tasked with training up a new team to diagnose and resolve problems with QR.

I thought it was a good idea to create some practice QR tasks in our development environment; intentionally put some problems in the tasks and let the new team diagnose the problem in a safe and no pressure scenarios.

I did the obvious ones; like put an incorrect password in an end point, put a typo in a filter criteria.

Looking for another scenario I thought – “Nothing better than a divide by zero error.”

Wait. That’s Illegal – Part 1

I created a simple table; with a field of CURR_BAL = 1000 and ASSET_VALUE = 0 and added a new field call LVR with the expression of:

$CURR_BALANCE / $ASSET_VALUE

Ran a full load expecting lots of nice and nasty error messages for the new team to investigate.

Hmm – no errors.

That’s not what I expected.

All the rows were transferred across and there was no errors or warnings in the log file, no data errors or anything.

I bumped up all the logging to “Trace” and reran the full load.  Once again, no error messages.

Redirecting the output from the task to a kafka topic; I notice the records that had an intentional div/0 scenario was coming through as null.

{
  "magic": "atMSG",
  "type": "DT",
  "headers": null,
  "messageSchemaId": null,
  "messageSchema": null,
  "message": {
    "data": {
      "ACCOUNT_ID": 5,
      "ACCOUNT_NAME": "Elissa",
      "CURR_BALANCE": "10000.00",
      "ASSET_VALUE": "0.00",
      "LVR": null
    },
    "beforeData": null,
    "headers": {
      "operation": "REFRESH",
      "changeSequence": "",
      "timestamp": "",
      "streamPosition": "",
      "transactionId": "",
      "changeMask": null,
      "columnMask": null,
      "transactionEventCounter": null,
      "transactionLastEvent": null
    }
  }
}

Then I remembered – CDC expressions are built on sqlite syntax.  Looking up how sqlite handles div/0 scenarios; by default sqlite will return a null instead of failing the query on a div/0.

Quoting from Sqlite – Difference Between engines

SQLite does not raise arithmetic exceptions (eg. divide by zero, 1/0). SQLite returns a NULL value for 1/0.

Wait. That’s Illegal – Part 2

I got past my div/0 conundrum after the explanation of what was happening and had a look at a “valid” record.

{
    "magic": "atMSG",
    "type": "DT",
    "headers": null,
    "messageSchemaId": null,
    "messageSchema": null,
    "message": {
        "data": {
            "ACCOUNT_ID": 1,
            "ACCOUNT_NAME": "Alice",
            "CURR_BALANCE": "12345.00",
            "ASSET_VALUE": "500000.00",
            "LVR": 0        <------------- Huh?
        },
        "beforeData": null,
        "headers": {
            "operation": "REFRESH",
            "changeSequence": "",
            "timestamp": "",
            "streamPosition": "",
            "transactionId": "",
            "changeMask": null,
            "columnMask": null,
            "transactionEventCounter": null,
            "transactionLastEvent": null
        }
    }
}

Again our LVR value was wrong.

12345.00 / 500000.00 should be 0.02469; instead of 0

A couple of things popped to mind.

It looked like the classic integer division issue that always catches out new (and experienced) programmer.

But that didn’t make any sense since the source fields were both numeric(18,2).

I checked the destination field that I added.

It was set to numeric(18,2), which isn’t precise enough for the LVR value but we still should have got 0.2 instead 0.

OK – QR must not like the “custom” precision of numeric(18,2) for the LVR field. 

I tried forcing the LVR field data type to REAL(4).

Still got the incorrect value for LVR

Maybe for some reason QR /sqlite treats the values coming in as an integer; even though the source data type is numeric(18,2) on the database.

Let’s give this a go:

CAST($CURR_BALANCE AS NUMERIC) / $ASSET_VALUE

^^ Nope

CAST($CURR_BALANCE AS NUMERIC) / CAST($ASSET_VALUE AS NUMERIC) 

^^ Long shot but nope

CAST($CURR_BALANCE AS REAL) / $ASSET_VALUE 

^^ At last success

{
    "magic": "atMSG",
    "type": "DT",
    "headers": null,
    "messageSchemaId": null,
    "messageSchema": null,
    "message": {
        "data": {
            "ACCOUNT_ID": 1,
            "ACCOUNT_NAME": "Alice",
            "CURR_BALANCE": "12345.00",
            "ASSET_VALUE": "500000.00",
            "LVR": "0.0246"  <------------- Finally!
        },
        "beforeData": null,
        "headers": {
            "operation": "REFRESH",
            "changeSequence": "",
            "timestamp": "",
            "streamPosition": "",
            "transactionId": "",
            "changeMask": null,
            "columnMask": null,
            "transactionEventCounter": null,
            "transactionLastEvent": null
        }
    }
}

It works – but forcing a precise of numeric value to a floating point data type REAL leaves me uneasy; especially in a finance world were precision is needed.

I haven’t researched or tested whether this is a risk of generating an imprecise number through a division or not; something I will need to put on my todo list.

Conclusion

My design principle for Qlik Replicate tasks is to keep them as simple as possible.

Grab the data off the source and get it to the destination as quickly as possible.

By design principle I am against putting expressions in QR tasks:

  1. It is not as feature rich as other programming languages 
  2. It hides business rules that adds a black box and complexity to the pipeline

Point 2 was the big one for me.  I didn’t want downstream users messaging me every time there was an abnormality in the data.  With no expressions in the QR tasks – I could rightfully say, “QR just grabs what it finds on the source database – go and talk to them”

Now with the discovery of these little “features” that could come from expressions; and more importantly no error or warnings about these “features”; it gives me even more incentive not to program in exceptions.

QR in my eyes is not an “ETL” tool – it is just a “EL” tool.

If your designers want complex expressions and transformations programmed into their data pipelines – it should be the downstream developers task.

That’s why they get paid the big bucks.