A member of our Fraud team contacted me unexpectedly at work.
Hey. I got this excel workbook of a customer’s statement with the data spread across multiple sheets. Do you know how to consolidate them into one sheet?
“OK,” I thought, “It can’t be that hard.”
She sent me the workbook. It had 447 sheets in it! The statement was initially in a PDF format and the Fraud team exported it to Excel. This resulted in a sheet for each page; looking like this:
The data that we were interested in was between A7 and E29 on each page. I visually spot checked half a dozen sheets across the workbook and they were in similar format
Ditching VBA for openpyxl
In yesteryear; I would have VBA inside Excel to consolidate the sheets. But I felt it was time to grow up from VBA and see what python had to offer.
It was surprisingly quick and easy to program up. I honestly thought it would have taken me as long; if not longer in VBA.
Here is the scratch code for your perusal
import openpyxl
import logging
from datetime import datetime
import csv
# Formats a row into a format for the delimited file
def format_row(in_record):
try:
in_record[0] = datetime.strftime(in_record[0], "%Y-%m-%d") # Converts the python date to YYYY-MM-DD
except TypeError as e:
root_logger.warning("Cannot convert " + in_record[0] + " to date/time")
in_record[1] = in_record[1].replace("\n", " \u21B5") # Adds in a return arrow
if in_record[2] == None:
in_record[2] = ""
if in_record[3] == None:
in_record[3] = ""
return in_record
if __name__ == '__main__':
now = datetime.now()
log_formatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-8.8s] %(message)s")
root_logger = logging.getLogger()
root_logger.setLevel(logging.INFO)
log_console_handler = logging.StreamHandler()
log_console_handler.setFormatter(log_formatter)
root_logger.addHandler(log_console_handler)
final_array = []
wb = openpyxl.load_workbook("CONSOLIDATEDExcel.xlsx")
for focus_sheet_name in wb.sheetnames:
root_logger.info("Processing sheet: " + focus_sheet_name)
focus_sheet = wb[focus_sheet_name]
root_logger.debug("Total number of rows: " + str(focus_sheet.max_row) + '. And total number of columns: ' + str(focus_sheet.max_column))
if focus_sheet["A6"].value == "Date": # Checks for the key field "Date" in position A6
my_list = []
for sheet_row in focus_sheet.iter_rows(
min_row=7, max_row=26, min_col=1, max_col=5, # Gets specific section of the worksheet
values_only=True):
sheet_row_list = list(sheet_row)
if sheet_row_list[0] != None: # Ignore lines that are blank
final_array.append(format_row(sheet_row_list))
else:
root_logger.info("Cannot find key column \"DATE\". Skipping sheet")
# Output records to a tab delimited file
with open("Consolidated_output.tab", 'w', newline='', encoding="utf-8") as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL, delimiter="\t")
wr.writerows(final_array)
Also; it ran blindingly quick – as quick as I feel in VBA with Application.ScreenUpdating = False
Anyway; Fraud team was happy I turned their request around quickly. I’m guessing if the fraud team is involved – someone is about to have a bad day.
Let’s see if we can get them talking to each other
Working on Postgres Config files
First modification is needed is to the postgresql.conf for the postgres docker image. If you were following the docker-compose.yml in my previous post; you can find the file under:
~/apps/postgres-plv8/postgresql.conf
The following changes need to be made in postgresql.conf as found in the Qlik documentation
wal_level = logical
max_replication_slots = 3 # max number of replication slots
Now to added the following lines to
~/apps/postgres-plv8/pg_hba.conf
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
Once done – the docker container needs to be restarted for the changes to take effect.
Setting up QR
Log into QR using the address:
https://127.0.0.1:3552/attunityreplicate
Create a new postgres connection
Fill out the following fields:
Field
Value
Server
host.docker.internal
Port
9432
Username
Username from the docker-compose.yml file for postgres
Password
Password from the docker-compose.yml for postgres
Database
Database from docker-compose.yml for postgres
From here a QR task can be built to read from the postgres database