python

Qlik Replicate: Stripping latency out of log files

We have been doing a bit of “Stress and Volume” testing in Qlik Replicate over the past few days; investigating how my latency is introduced to a MS SQL server task if we run it through a log stream.

If you’re not aware – you can get minute latecy from Qlik Replicate by turning the “Performance” logs up to “Trace” or higher:

This will result in messages getting produced in the task’s log file like:

00011044: 2025-08-04T14:57:56 [PERFORMANCE     ]T:  Source latency 1.23 seconds, Target latency 2.42 seconds, Handling latency 1.19 seconds  (replicationtask.c:3879)
00011044: 2025-08-04T14:58:26 [PERFORMANCE     ]T:  Source latency 1.10 seconds, Target latency 2.27 seconds, Handling latency 1.16 seconds  (replicationtask.c:3879)
00009024: 2025-08-04T14:58:30 [SOURCE_CAPTURE  ]I:  Throughput monitor: Last DB time scanned: 2025-08-04T14:58:30.700. Last LSN scanned: 008050a1:00002ce8:0004. #scanned events: 492815.   (sqlserver_log_utils.c:5000)
00011044: 2025-08-04T14:58:57 [PERFORMANCE     ]T:  Source latency 0.61 seconds, Target latency 1.87 seconds, Handling latency 1.25 seconds  (replicationtask.c:3879)
00011044: 2025-08-04T14:59:27 [PERFORMANCE     ]T:  Source latency 1.10 seconds, Target latency 1.55 seconds, Handling latency 0.45 seconds  (replicationtask.c:3879)

I created a simple python script to parse a folder of log files and output it in a excel. Since the data ends up in a panda data frame; it would be easy to manipulate the data and output it in a specific way:

import os
import pandas as pd
from datetime import datetime


def strip_seconds(inString):
    
    index = inString.find(" ")  
    returnString = float(inString[0:index])
    return(returnString)


def format_timestamp(in_timestamp):
    
    # Capture dates like "2025-08-01T10:42:36" and add a microsecond section
    if len(in_timestamp) == 19:
        in_timestamp = in_timestamp + ":000000"

    date_format = "%Y-%m-%dT%H:%M:%S:%f"

    # Converts date string to a date object    
    date_obj = datetime.strptime(in_timestamp, date_format)

    return date_obj

        
def process_file(in_file_path):

    return_array = []
    
    with open(in_file_path, "r") as in_file:
    
        for line in in_file:

            upper_case = line.upper().strip()

            if upper_case.find('[PERFORMANCE     ]') >= 0:
                timestamp_temp = upper_case[10:37]
                timestamp = timestamp_temp.split(" ")[0]
                
                split_string = upper_case.split("LATENCY ")
                
                if len(split_string) == 4:
                
                    source_latency = strip_seconds(split_string[1])
                    target_latency = strip_seconds(split_string[2])
                    handling_latency = strip_seconds(split_string[3])

                    # Makes the date compatible with Excel
                    date_obj = format_timestamp(timestamp)
                    excel_datetime = date_obj.strftime("%Y-%m-%d %H:%M:%S")
                    
                    # If you're outputting to standard out
                    #print(f"{in_file_path}\t{time_stamp}\t{source_latency}\t{target_latency}\t{handling_latency}\n")

                    return_array.append([in_file_path, timestamp, excel_datetime, source_latency, target_latency, handling_latency])
                     
    return return_array
    

if __name__ == '__main__':

    log_folder = "/path/to/logfile/dir"
    out_excel = "OutLatency.xlsx"
    
    latency_data = []

    # Loops through files in log_folder
    for file_name in os.listdir(log_folder):

        focus_file = os.path.join(log_folder, file_name )

        if os.path.isfile(focus_file):
            filename, file_extension = os.path.splitext(focus_file)

            if file_extension.upper().endswith("LOG"):
                print(f"Processing file: {focus_file}")
                return_array = process_file(focus_file)

                latency_data += return_array
                

    df = pd.DataFrame(latency_data, columns=["File Name", "Time Stamp", "Excel Timestamp", "Source Latency", "Target Latency", "Handling Latency"])
    df.info()

    # Dump file to Excel; but you can dump to other formats like text etc
    df.to_excel(out_excel)

And voilĂ  – we have an output to Excel to quickly create statistics on latency for a given task(s).

What statistics to use?

Latency is something you can analyse in different ways, depending on what you’re trying to answer. It is also important to pair latency statistics with the change volume that is coming through.

Has the latency jumped at a specific time because the source database is processing a daily batch? Is there a spike of latency around Christmas time where there are more financial transactions compared to a benign day in February?

Generally, if the testers are processing the latency data they provide back:

  • Average target latency
  • 90th percentile target latency
  • 95th percentile target latency
  • Min target latency
  • Max target latency

This is what we use to compare two runs to each other when the source load is consistent and we’re changing a Qlik Replicate task

Python & openpyxl – If the fraud team tells you to do something; do it quickly!

A member of our Fraud team contacted me unexpectedly at work.

Hey. I got this excel workbook of a customer’s statement with the data spread across multiple sheets. Do you know how to consolidate them into one sheet?

“OK,” I thought, “It can’t be that hard.”

She sent me the workbook. It had 447 sheets in it! The statement was initially in a PDF format and the Fraud team exported it to Excel. This resulted in a sheet for each page; looking like this:

The data that we were interested in was between A7 and E29 on each page. I visually spot checked half a dozen sheets across the workbook and they were in similar format

Ditching VBA for openpyxl

In yesteryear; I would have VBA inside Excel to consolidate the sheets. But I felt it was time to grow up from VBA and see what python had to offer.

After a quick Google; I came across Data Camp’s Python Excel Tutorial.

It was surprisingly quick and easy to program up. I honestly thought it would have taken me as long; if not longer in VBA.

Here is the scratch code for your perusal

import openpyxl 
import logging
from datetime import datetime
import csv

# Formats a row into a format for the delimited file
def format_row(in_record):

    try:
        in_record[0] = datetime.strftime(in_record[0], "%Y-%m-%d")  # Converts the python date to YYYY-MM-DD
    except TypeError as e:
        root_logger.warning("Cannot convert " + in_record[0] + " to date/time")

    in_record[1] = in_record[1].replace("\n", " \u21B5")    # Adds in a return arrow

    if in_record[2] == None:
        in_record[2] = ""

    if in_record[3] == None:
        in_record[3] = ""
    
    return in_record


if __name__ == '__main__':

    now = datetime.now()

    log_formatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-8.8s]  %(message)s")
    root_logger = logging.getLogger()
    root_logger.setLevel(logging.INFO)

    log_console_handler = logging.StreamHandler()
    log_console_handler.setFormatter(log_formatter)
    root_logger.addHandler(log_console_handler)

    final_array = []

    wb = openpyxl.load_workbook("CONSOLIDATEDExcel.xlsx") 

    for focus_sheet_name in wb.sheetnames:

        root_logger.info("Processing sheet: " + focus_sheet_name)
        
        focus_sheet = wb[focus_sheet_name]
        root_logger.debug("Total number of rows: " + str(focus_sheet.max_row) + '. And total number of columns: ' + str(focus_sheet.max_column))


        if focus_sheet["A6"].value == "Date":   # Checks for the key field "Date" in position A6

            my_list = []

            for sheet_row in focus_sheet.iter_rows(
                min_row=7, max_row=26, min_col=1, max_col=5,    # Gets specific section of the worksheet
                values_only=True):

                sheet_row_list = list(sheet_row)

                if sheet_row_list[0] != None:   # Ignore lines that are blank
                    final_array.append(format_row(sheet_row_list))
        else:
            root_logger.info("Cannot find key column \"DATE\".  Skipping sheet")


    # Output records to a tab delimited file
    with open("Consolidated_output.tab", 'w', newline='', encoding="utf-8") as myfile:
        wr = csv.writer(myfile, quoting=csv.QUOTE_ALL, delimiter="\t")
        wr.writerows(final_array)
        

Also; it ran blindingly quick – as quick as I feel in VBA with Application.ScreenUpdating = False

Anyway; Fraud team was happy I turned their request around quickly. I’m guessing if the fraud team is involved – someone is about to have a bad day.