October 2024

Python & openpyxl – If the fraud team tells you to do something; do it quickly!

A member of our Fraud team contacted me unexpectedly at work.

Hey. I got this excel workbook of a customer’s statement with the data spread across multiple sheets. Do you know how to consolidate them into one sheet?

“OK,” I thought, “It can’t be that hard.”

She sent me the workbook. It had 447 sheets in it! The statement was initially in a PDF format and the Fraud team exported it to Excel. This resulted in a sheet for each page; looking like this:

The data that we were interested in was between A7 and E29 on each page. I visually spot checked half a dozen sheets across the workbook and they were in similar format

Ditching VBA for openpyxl

In yesteryear; I would have VBA inside Excel to consolidate the sheets. But I felt it was time to grow up from VBA and see what python had to offer.

After a quick Google; I came across Data Camp’s Python Excel Tutorial.

It was surprisingly quick and easy to program up. I honestly thought it would have taken me as long; if not longer in VBA.

Here is the scratch code for your perusal

import openpyxl 
import logging
from datetime import datetime
import csv

# Formats a row into a format for the delimited file
def format_row(in_record):

    try:
        in_record[0] = datetime.strftime(in_record[0], "%Y-%m-%d")  # Converts the python date to YYYY-MM-DD
    except TypeError as e:
        root_logger.warning("Cannot convert " + in_record[0] + " to date/time")

    in_record[1] = in_record[1].replace("\n", " \u21B5")    # Adds in a return arrow

    if in_record[2] == None:
        in_record[2] = ""

    if in_record[3] == None:
        in_record[3] = ""
    
    return in_record


if __name__ == '__main__':

    now = datetime.now()

    log_formatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-8.8s]  %(message)s")
    root_logger = logging.getLogger()
    root_logger.setLevel(logging.INFO)

    log_console_handler = logging.StreamHandler()
    log_console_handler.setFormatter(log_formatter)
    root_logger.addHandler(log_console_handler)

    final_array = []

    wb = openpyxl.load_workbook("CONSOLIDATEDExcel.xlsx") 

    for focus_sheet_name in wb.sheetnames:

        root_logger.info("Processing sheet: " + focus_sheet_name)
        
        focus_sheet = wb[focus_sheet_name]
        root_logger.debug("Total number of rows: " + str(focus_sheet.max_row) + '. And total number of columns: ' + str(focus_sheet.max_column))


        if focus_sheet["A6"].value == "Date":   # Checks for the key field "Date" in position A6

            my_list = []

            for sheet_row in focus_sheet.iter_rows(
                min_row=7, max_row=26, min_col=1, max_col=5,    # Gets specific section of the worksheet
                values_only=True):

                sheet_row_list = list(sheet_row)

                if sheet_row_list[0] != None:   # Ignore lines that are blank
                    final_array.append(format_row(sheet_row_list))
        else:
            root_logger.info("Cannot find key column \"DATE\".  Skipping sheet")


    # Output records to a tab delimited file
    with open("Consolidated_output.tab", 'w', newline='', encoding="utf-8") as myfile:
        wr = csv.writer(myfile, quoting=csv.QUOTE_ALL, delimiter="\t")
        wr.writerows(final_array)
        

Also; it ran blindingly quick – as quick as I feel in VBA with Application.ScreenUpdating = False

Anyway; Fraud team was happy I turned their request around quickly. I’m guessing if the fraud team is involved – someone is about to have a bad day.

Docker, Qlik Replicate and Postres – stitching them together

So far – what do we have?

We have:

  1. Postges working in a docker container
  2. Qlik Replicate working in a docker container

Let’s see if we can get them talking to each other

Working on Postgres Config files

First modification is needed is to the postgresql.conf for the postgres docker image. If you were following the docker-compose.yml in my previous post; you can find the file under:

~/apps/postgres-plv8/postgresql.conf

The following changes need to be made in postgresql.conf as found in the Qlik documentation

wal_level = logical
max_replication_slots = 3       # max number of replication slots

Now to added the following lines to

~/apps/postgres-plv8/pg_hba.conf
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

Once done – the docker container needs to be restarted for the changes to take effect.

Setting up QR

Log into QR using the address:

https://127.0.0.1:3552/attunityreplicate

Create a new postgres connection

Fill out the following fields:

Field Value
Serverhost.docker.internal
Port9432
UsernameUsername from the docker-compose.yml file for postgres
PasswordPassword from the docker-compose.yml for postgres
DatabaseDatabase from docker-compose.yml for postgres

From here a QR task can be built to read from the postgres database