Qlik Replicate

Now Qlik Replicate and MongoDB are stuck inside containers

That message from my Manager

I got a message from my manager unexpectedly.

“Hey. A team is converting a MSSQL database to MongoDB.  Can Qlik Replicate capture deletes from the MongoDB?”

I thought this was an abnormal question; why wouldn’t QR support deletes?

Not wanting to make an assumption that may be false and have profound consequences later in a conversion project, I looked up the Qlik documentation. 

Nothing in the documented “Limitations and considerations.”

Just as a safeguard I also did a google search and a forum search to see if anything else turned up.

Again nothing.

I pinged back to my manager.

“QR should capture deletes simply fine.  Where did you hear that QR had problems with MongoDB deletes from?”

“Oh.  The project manager heard from a Mongo guy that Qlik does not work.”

I bit my lip.  To me that seemed the same as getting health advice off a random guy at the pub who saw something on Telegram.

But I realised now that even though I had official Qlik documentation backing me up; the burden of proof was back on me.

Back into the world of containers

Qlik Replicate in a docker container has been an extremely useful tool on my Linux development machine.  It enables me to quickly test various aspects of QR without interrupting the official dev environment; or fighting with the cloud team to get a proof-of-concept source system set up.

First, I had to set up a MongoDB docker container.  Specifically, the MongoDB had to have a “replica”; a standalone wouldn’t work. Since this was a POC; I didn’t need a fully fledge – multi replica – ultra secure cluster; just something simple and disposable.

After fumbling around with a number of examples (Gemeni – how could you fail me!); I came across a page by “Gink” titled “How to set up MongoDB with replica set via Docker Compose?

It was simple and had exactly what I needed. 

Dockerfile:

FROM mongo:7.0
RUN openssl rand -base64 756 > /etc/mongo-keyfile 
RUN chmod 400 /etc/mongo-keyfile 
RUN chown mongodb:mongodb /etc/mongo-keyfile 

docker-compose.yml:

# From https://ginkcode.com/post/how-to-set-up-mongodb-with-replica-set-via-docker-compose
version: '3.8'
 
services:
  mongo:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: mongodb-replicaset
    restart: always
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: root
    ports:
      - 27017:27017
    command: --replSet rs0 --keyFile /etc/mongo-keyfile --bind_ip_all --port 27017
    healthcheck:
      test: echo "try { rs.status() } catch (err) { rs.initiate({_id:'rs0',members:[{_id:0,host:'127.0.0.1:27017'}]}) }" | mongosh --port 27017 -u root -p root --authenticationDatabase admin
      interval: 5s
      timeout: 15s
      start_period: 15s
      retries: 10
    volumes:
      - data:/data/db
 
volumes:
  data: {}

Connection string

mongodb://root:root@localhost:27017/?authSource=admin

Please visit Gink’s page

Connecting to Qlik Replicate

The great thing is that we don’t have to mess around installing extra ODBC drivers that can make the Qlik Replicate docker image complex.

So with the MongoDB and the Qlik Replicate docker containers up and running; we can how create a new endpoint with the following settings:

FieldVariable
MongoDB DeploymentStandard
Hostshost.docker.internal
Authentication MethodSCRAM-SHA-256
Usernameroot
Passwordroot
Authentication database nameadmin

A test connection confirmed that the connection was working as expected.

Proving that deletes are working

Getting close to resolving the burden of proof.  I got our GenAI to create a simple python script to add, update, and delete some data from the Mongo Database and set it running:

import pymongo
import random
import string
import time

def generate_random_string(length=10):
    """Generate a random string of fixed length."""
    letters = string.ascii_lowercase
    return ''.join(random.choice(letters) for i in range(length))

def add_random_data(collection):
    """Adds a new document with random data to the collection."""
    
    new_document = {
        "name": generate_random_string(),
        "age": random.randint(18, 99),
        "email": f"{generate_random_string(5)}@example.com"
    }
    
    result = collection.insert_one(new_document)

    print(f"Added: {result.inserted_id}")

def update_random_data(collection):
    """Updates a random document in the collection."""
    if collection.count_documents({}) > 0:
        # Get a random document using the $sample aggregation operator
        try:
            random_doc_cursor = collection.aggregate([{"$sample": {"size": 1}}])
            random_doc = next(random_doc_cursor)
            new_age = random.randint(18, 99)
            
            collection.update_one(
                {"_id": random_doc["_id"]},
                {"$set": {"age": new_age}}
            )

            print(f"Updated: {random_doc['_id']} with new age {new_age}")
        except StopIteration:
            print("Could not find a document to update.")
    else:
        print("No documents to update.")

def delete_random_data(collection):
    """Deletes a random document from the collection."""
    if collection.count_documents({}) > 0:
        # Get a random document using the $sample aggregation operator
        try:
            random_doc_cursor = collection.aggregate([{"$sample": {"size": 1}}])
            random_doc = next(random_doc_cursor)

            collection.delete_one({"_id": random_doc["_id"]})
            
            print(f"Deleted: {random_doc['_id']}")
        except StopIteration:
            print("Could not find a document to delete.")
    else:
        print("No documents to delete.")


if __name__ == "__main__":
    print("Starting script... Press Ctrl+C to stop.")

    try:
        client = pymongo.MongoClient("mongodb://root:root@localhost:27017/?authSource=admin")
        db = client["random_data_db"]
        collection = db["my_collection"]
        # The ismaster command is cheap and does not require auth.
        client.admin.command('ismaster')
        print("MongoDB connection successful.")
    except pymongo.errors.ConnectionFailure as e:
        print(f"Could not connect to MongoDB: {e}")
        exit()

    while True:
        try:
            add_random_data(collection)
            add_random_data(collection)
            add_random_data(collection)

            update_random_data(collection)
            delete_random_data(collection)
            
            # Wait for five seconds before the next cycle
            time.sleep(5)

        except KeyboardInterrupt:
            print("\nScript stopped by user.")
            break
        except Exception as e:
            print(f"An error occurred: {e}")
            break

    # Close the connection
    client.close()
    print("MongoDB connection closed.")

With the script running and adding, updating and deleting data to the MongoDB; a QR task can be created to read from the database “random_data_db” and collection “my_collection”

And voilà. Data is flowing through Qlik Replicate from the MongoDB.

And more importantly deletes are getting picked up as expected.

{
	"header__change_seq":"20260204052516000000000000000004085",
	"header__change_oper":"D","header__change_mask":"80",
	"header__stream_position":"6982d83c:00000005:00000000",
	"header__operation":"DELETE",
	"header__transaction_id":"4155544F434F4D4D4954000000000000",
	"header__timestamp":"2026-02-04 05:25:16.000000",
	"_id":"6982d8186b2cc5ee1a161d13",
	"_doc":"{\"_id\": {\"$oid\": \"6982d8186b2cc5ee1a161d13\"}}"
}
 

Conclusion

And voilà. Data is flowing through Qlik Replicate from the MongoDB.

And more importantly; deletes are getting picked up as expected.

If you have come to my humble website after googling “Can Qlik Replicate replicate deletes from a MongoDB?” well, the answer is “Yes – Yes it can.”

I can only speculate where that rumour came from.  Maybe it was from an older version of QR or MongoDB that the person in question was referring to?  Or some very abnormal setup of a MongoDB cluster?

Or some complete misunderstanding.

Anyway, I have meeting in the next hour or two with the project team; let us find out.

Qlik Replicate: And we’re back to EBCDIC problems again

EBCDIC – my perpetual nightmare

Before I have written about EBCDIC; and the tribulations I have come across

With the post festive system haze still on – I spent a full day trying to work out another encoding problem that came up in our exception table.

Let me take you on a journey.

It starts with MS-SQL.

We have Qlik Replicate replicating data from a DB2/zOS system into a MS-SQL system.  I was investigating why there were inserts missing from the target table and exploring the dbo.attrep_apply_exceptions table I notice a bunch of INSERTS from another table in the exception table:

/*
CREATE TABLE dbo.destination
(
	add_dte numeric(5, 0) NOT NULL,
	chg_dte numeric(5, 0) NOT NULL,
	chg_tme varchar(6) NOT NULL,
	argmt varbinary(10) NOT NULL,
	CONSTRAINT pk_destination PRIMARY KEY CLUSTERED
	(
		argmt
	) WITH (DATA_COMPRESSION = ROW) ON [PRIMARY]
) ON [PRIMARY]
*/

INSERT INTO [dbo].[destination] ( [add_dte],[chg_dte],[chg_tme],[argmt])  VALUES (46021,46021,'151721', 0x35333538C3A302070000);

I thought it would be a good place to start by substituting the failed insert into a temp table and using the temp table to investigate downstream data flows.

So:

DROP TABLE IF EXISTS ##temp_destination;

SELECT *
INTO ##temp_destination
FROM [dbo].[destination]
WHERE 1 = 0;

INSERT INTO ##temp_destination ( [add_dte],[chg_dte],[chg_tme],[argmt]) VALUES (46021,46021,'151721', 0x35333538C3A302070000);

But as I was substituting the temp table into objects downstream – I was just getting logical errors. Comparing the temp table against other data that already existed dbo.destination; it looked not right.

Working Upstream

I thought it would be a good place to start by substituting the failed insert into a temp table and using the temp table to investigate downstream data flows.

Since we just merged a system into our DB2/zOS system; my initial thought it was a corrupted record coming from a merge process. I contacted the upstream system owners and asked for help.

They came back and said the record was fine and the issue must be in the QR system converting the data wrong. This did not make sense – if there was a issue in the QR system; it would have been picked up in testing. Also, there would be lots of exceptions in dbo.attrep_apply_exceptions. I had confidence that QR was handling it right.

Not trusting what the DB2/zOS team had told me; I created a temp Qlik task to dump out that table into a test database.

Using the other values in the record; I could look the record up in the table and compare the two.

For the field argmt it came back as:

0xF5F3F5F846022F00000F

This looked nothing like what was in the exception table.

EBCDICy epiphany

Of course – it took a while; but it finally struck what was going on.

The field argmt had sections encoded in EBCDIC. This was getting brought across as bytes to MS-SQL. There was a downstream trigger that was failing in the MS-SQL database. But when it was getting written out into the exception table; it was getting written as ASCII (or possibly Windows-Latin) encoding.

(Only showing start of bytes as this is what we use downstream)

With me just copying the insert statement and using it as is; I was treating ASCII bytes as EBCDIC; causing all sorts of logical errors downstream.

Where to from here?

I’m going to raise this abnormality with Qlik. I am not sure if it is a “bug” or a “feature”; but good to get clarification on that point.

I also will need to educate our users on how to interpret the values in dbo.attrep_apply_exceptions so not to stumble into mistakes that I did.

As for fixing the data itself – since we have the full dump of data from the table and the data is slow moving; I can substitute the values from the data dump into the problem INSERT statement.

This should allow it to run just fine.

Qlik Replicate Schedules – a CRON with an extra field?

Introduction

To make our lives easier migrating Qlik Replicate tasks from one environment to another; I am looking how to migrate Qlik Replicate schedules.

So far by GitLab pipelines we can migrate tasks and endpoint; but schedules we must do manually.  Quite often we forget to manually create the schedules as they are not as obvious component compared to tasks and endpoints.  If you move to a new environment and your task is not there, well that is a no brainer.  If your endpoint is not there; your task will not migrate.

But schedules are something that lurks in the background; inconsistently created when we rush to put them in when we realise the Qlik task did not run when it was meant to.

Plan

Using the python API, we can export all the items from a Qlik Replicate server with the export_all method.  In the returned Json file will be the schedule details like this:

{
    "name": "DEV_S004_GCS_DLA",
    "command_id": 25,
    "schedule": "0 1 * * * *",
    "command_requests": {
        "cleanlogs_req": {
            "server": {},
            "task": {},
            "fts": {}
        },
        "logfilerollover_req": {},
        "execute_req": {
            "task": "",
            "operation": "EXECUTE_OPERATIONS_BOTH",
            "flags": "FRESH"
        },
        "stoptask_req": {
            "task": ""
        }
    },
    "task": "DEV_T222_DL_LENDING_DLA",
    "is_local_time": true
}

The idea is to modify the Json for the schedule to match the environment we are migrating to; put the schedule into a blank server Json template and use the import_all

 method to upload the new schedules to the destination server.

The Cron does not look right…

Looking over the exported Json; the cron syntax did not look right.  It had six fields instead of the expected five.

"schedule": "0 1 * * * *"

This confused me for a while as I haven’t come across a cron like this before.

After coming up with no results searching Qlik’s documentation; I created some test schedules to try and determine what the sixth field is used for.

Ahh. After a short while I found the answer.

If the schedule is a “Once off” run; the sixth field is used for “Year”.

Here is an example below:

{
	"name":	"Test",
	"command_id":	25,
	"schedule":	"30 15 7 1 * 2026",
	"command_requests":	{
		"cleanlogs_req":	{
			"server":	{
			},
			"task":	{
			},
			"fts":	{
			}
		},
		"logfilerollover_req":	{
		},
		"execute_req":	{
			"task":	"",
			"operation":	"EXECUTE_OPERATIONS_BOTH",
			"flags":	"FRESH"
		},
		"stoptask_req":	{
			"task":	""
		}
	},
	"task":	"EIT_T002_OB_RFSB_ACCT_02",
	"enabled":	false
}

Qlik Replicate: Is the record there? Or not? I’m confused!

Introduction

I think as Qlik Replicate developers; we have all been here before.

The testers berating you through MS Teams saying, “We have made a change, and we cannot see it downstream! It is all Qlik Replicate’s fault!”

Opening the task and checking the monitoring screen – I can see the change against the table.  What are they going on about?  Qlik has picked up something; why can’t they see it? (Sixty percent of the time; something downstream has failed, twenty percent of the time something upstream has failed, nineteen percent of the time they are looking in the wrong spot and the remaining one percent of the time – well we won’t mention that one percent.)

But looking at the monitoring screen – what does those numbers mean?  Also, if you look in the analytics database; what does those figures mean?

Hopefully, this article will help you understand the monitoring and analytics numbers with some simple examples.

Filters – why does it have to be filters?

We have two main types of Qlik Replicate tasks.  One type grabs all changes from a particular table and sends it to our Data Lake in micro batches of fifteen minutes.

The other type are our speed tasks; only grabbing changes on specific columns on a table.  To limit QR to only picking up specific changes; we have filters like this:

($AR_H_OPERATION != 'UPDATE' AND $AR_H_OPERATION != 'DELETE') OR
(
  ($AR_H_OPERATION == 'UPDATE') AND 
  (
    ( $BI__FIELD_1 != $FIELD_1 ) OR ( $BI__FIELD_1 IS NULL AND $FIELD_1 IS NOT NULL ) OR ( $BI__FIELD_1 IS NOT NULL AND $FIELD_1 IS NULL ) OR
    ( $BI__FIELD_2 != $FIELD_2 ) OR ( $BI__FIELD_2 IS NULL AND $FIELD_2 IS NOT NULL ) OR ( $BI__FIELD_2 IS NOT NULL AND $FIELD_2 IS NULL ) OR
    ( $BI__FIELD_3 != $FIELD_3 ) OR ( $BI__FIELD_3 IS NULL AND $FIELD_3 IS NOT NULL ) OR ( $BI__FIELD_3 IS NOT NULL AND $FIELD_3 IS NULL )
 )
)

So, the question is – if we made an update to $FIELD_4 meaning that QR will send the change downstream; what would the monitoring tab on the task display?

Let is create an example to find out.

An example

Here is a simple table -pre-populated with six record:

CREATE TABLE dbo.CDC_ANALYTICS_EXAMPLE
(
	RECORD_ID INT IDENTITY(1,1) PRIMARY KEY,
	DATA_INT INT,
	DATA_VARCHAR VARCHAR(100),
	INCLUDE_RECORD CHAR(1)
);

GO

INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(1, 'Record 1', 'Y');
INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(2, 'Record 2', 'Y');
INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(3, 'Record 3', 'Y');
INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(4, 'Record 4', 'N');
INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(5, 'Record 5', 'N');
INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(6, 'Record 6', 'N');

SELECT *
FROM dbo.CDC_ANALYTICS_EXAMPLE WITH (NOLOCK);

And we create a simple QR task with the following filters:

The first filter is the Fullload Passthru Filter. When the full load initially runs; only three of the six records will be brought across.

The second filter Record Selection Condition; should filter any data changes on when $INCLUDE_RECORD has a value of ‘Y’.

Results after running the QR task

Full load

After running the full load; qlik shows the following on the monitoring screen:

Three records – that’s what we expected out of the six.

Interesting under the “Total Completion” section; the distinction of the filter is quite clear. Three records brought across and three remaining.

Change Processing

Let’s make some changes to the data in our test table:

-- This will be captured
BEGIN TRANSACTION
	DECLARE @IDENTITY INT;

	INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(null, 'Record x', 'Y');
	SET @IDENTITY = @@IDENTITY;

	UPDATE dbo.CDC_ANALYTICS_EXAMPLE 
	SET DATA_INT = @IDENTITY, 
		DATA_VARCHAR = 'Record ' + CAST(@IDENTITY AS varchar)
	WHERE RECORD_ID = @IDENTITY;

	DELETE dbo.CDC_ANALYTICS_EXAMPLE WHERE RECORD_ID = @IDENTITY;

COMMIT;

GO

-- This will be excluded
BEGIN TRANSACTION
	DECLARE @IDENTITY INT;

	INSERT INTO dbo.CDC_ANALYTICS_EXAMPLE VALUES(null, 'Record x', 'N');
	SET @IDENTITY = @@IDENTITY;

	UPDATE dbo.CDC_ANALYTICS_EXAMPLE 
	SET DATA_INT = @IDENTITY, 
		DATA_VARCHAR = 'Record ' + CAST(@IDENTITY AS varchar)
	WHERE RECORD_ID = @IDENTITY;

	DELETE dbo.CDC_ANALYTICS_EXAMPLE WHERE RECORD_ID = @IDENTITY;

COMMIT;

This is what the monitoring screen shows:

Looking at the Recent activity screen; hmm – this could lead to some confusion.

The metrics shows the two blocks of changes; even though one block was filtered out. So this is indicating that this gives the results when there is a change in the table; not matter if it was subsequently filtered out.

If those testers in the opening paragraph created a record that was filtered out in QR; we could give them false direction if we said “it should be downstream”

If we look at the Aggregates screen; it paints a true picture of what was detected and transferred to the target:

We can see the three records in the “Total Changes Applied” column.

Analytics database

From the analytic database:

SELECT 
    *
FROM public.aem_taskv t
WHERE
    t.task_name IN ('TEST_ANALYTICS') 
ORDER BY
     t.server_name, t.task_name, t.retrieval_time
FieldValue
id110876201
retrieval_time2025-12-03 12:02
server_id5
task_name_id914
task_state_id3
task_stop_reason_id1
task_profile_id1
cdc_evt_applied_insert_count1
cdc_evt_applied_update_count1
cdc_evt_applied_delete_count1
cdc_evt_applied_ddl_count0
full_load_tables_completed_count1
full_load_tables_loading_count0
full_load_tables_queued_count0
full_load_tables_with_error_count0
full_load_total_records_transferred3
full_load_est_records_count_for_all_tables6
full_load_completed1
full_load_start2025-12-03 10:57
full_load_finish2025-12-03 10:57
full_load_thrput_src_thrput_records_count0
full_load_thrput_src_thrput_volume0
full_load_thrput_trg_thrput_records_count0
full_load_thrput_trg_thrput_volume0
cdc_thrput_src_thrput_records_count0
cdc_thrput_src_thrput_volume0
cdc_thrput_trg_thrput_records_count0
cdc_thrput_trg_thrput_volume0
cdc_trans_read_rollback_count0
cdc_trans_read_records_rollback_count0
cdc_trans_rollback_change_volume0
cdc_trans_applied_transactions_in_progress_count0
cdc_trans_applied_records_in_progress_count0
cdc_trans_applied_comitted_transaction_count2
cdc_trans_applied_records_comitted_count6
cdc_trans_applied_volume_committed0
cdc_trans_read_memory_events_count0
cdc_trans_read_swapped_events_count0
cdc_trans_applied_memory_events_count0
cdc_trans_applied_swap_events_count0
cdc_source_latency0
cdc_apply_latency0
memory_usage_kb0
disk_usage_kb0
cpu_percentage0
data_error_count0
task_option_full_load_enabled1
task_option_apply_changes_enabled0
task_option_store_changes_enabled1
task_option_audit_changes_enabled0
task_option_recovery_enabled0
cdc_trans_read_in_progress2
server_cpu_percentage0
machine_cpu_percentage16
tasks_cpu_percentage0
server_namexxxx
task_nameTEST_ANALYTICS
server_with_task_namexxxx::::TEST_ANALYTICS

A couple of take aways from the results:

  • Unfortunately the records in the the analytic database is grouped up at a task level; not at an individual table level. This makes it harder to determine if a change from a particular table was applied
  • cdc_trans_applied_records_comitted_count is a count of all the records read; but not necessarily applied
  • If you want the number of records flowing down to downstream; add together the fields cdc_evt_applied_insert_count, cdc_evt_applied_update_count and cdc_evt_applied_delete_count together

Final thoughts

The monitoring and analytics screen is useful tools for diagnosing QR tasks – and general ASMR; watching the numbers bounce up as transactions flow through the system.

Understanding the figures of what they translate to – whether changes getting detected or applied downstream is fundamental in diagnosing problems.

Nothing is better than getting access to the source and target systems and checking yourself to confirm if things are working correctly; but the monitoring screen is a good place to start investigating.

Qlik Replicate: Stripping latency out of log files

We have been doing a bit of “Stress and Volume” testing in Qlik Replicate over the past few days; investigating how my latency is introduced to a MS SQL server task if we run it through a log stream.

If you’re not aware – you can get minute latecy from Qlik Replicate by turning the “Performance” logs up to “Trace” or higher:

This will result in messages getting produced in the task’s log file like:

00011044: 2025-08-04T14:57:56 [PERFORMANCE     ]T:  Source latency 1.23 seconds, Target latency 2.42 seconds, Handling latency 1.19 seconds  (replicationtask.c:3879)
00011044: 2025-08-04T14:58:26 [PERFORMANCE     ]T:  Source latency 1.10 seconds, Target latency 2.27 seconds, Handling latency 1.16 seconds  (replicationtask.c:3879)
00009024: 2025-08-04T14:58:30 [SOURCE_CAPTURE  ]I:  Throughput monitor: Last DB time scanned: 2025-08-04T14:58:30.700. Last LSN scanned: 008050a1:00002ce8:0004. #scanned events: 492815.   (sqlserver_log_utils.c:5000)
00011044: 2025-08-04T14:58:57 [PERFORMANCE     ]T:  Source latency 0.61 seconds, Target latency 1.87 seconds, Handling latency 1.25 seconds  (replicationtask.c:3879)
00011044: 2025-08-04T14:59:27 [PERFORMANCE     ]T:  Source latency 1.10 seconds, Target latency 1.55 seconds, Handling latency 0.45 seconds  (replicationtask.c:3879)

I created a simple python script to parse a folder of log files and output it in a excel. Since the data ends up in a panda data frame; it would be easy to manipulate the data and output it in a specific way:

import os
import pandas as pd
from datetime import datetime


def strip_seconds(inString):
    
    index = inString.find(" ")  
    returnString = float(inString[0:index])
    return(returnString)


def format_timestamp(in_timestamp):
    
    # Capture dates like "2025-08-01T10:42:36" and add a microsecond section
    if len(in_timestamp) == 19:
        in_timestamp = in_timestamp + ":000000"

    date_format = "%Y-%m-%dT%H:%M:%S:%f"

    # Converts date string to a date object    
    date_obj = datetime.strptime(in_timestamp, date_format)

    return date_obj

        
def process_file(in_file_path):

    return_array = []
    
    with open(in_file_path, "r") as in_file:
    
        for line in in_file:

            upper_case = line.upper().strip()

            if upper_case.find('[PERFORMANCE     ]') >= 0:
                timestamp_temp = upper_case[10:37]
                timestamp = timestamp_temp.split(" ")[0]
                
                split_string = upper_case.split("LATENCY ")
                
                if len(split_string) == 4:
                
                    source_latency = strip_seconds(split_string[1])
                    target_latency = strip_seconds(split_string[2])
                    handling_latency = strip_seconds(split_string[3])

                    # Makes the date compatible with Excel
                    date_obj = format_timestamp(timestamp)
                    excel_datetime = date_obj.strftime("%Y-%m-%d %H:%M:%S")
                    
                    # If you're outputting to standard out
                    #print(f"{in_file_path}\t{time_stamp}\t{source_latency}\t{target_latency}\t{handling_latency}\n")

                    return_array.append([in_file_path, timestamp, excel_datetime, source_latency, target_latency, handling_latency])
                     
    return return_array
    

if __name__ == '__main__':

    log_folder = "/path/to/logfile/dir"
    out_excel = "OutLatency.xlsx"
    
    latency_data = []

    # Loops through files in log_folder
    for file_name in os.listdir(log_folder):

        focus_file = os.path.join(log_folder, file_name )

        if os.path.isfile(focus_file):
            filename, file_extension = os.path.splitext(focus_file)

            if file_extension.upper().endswith("LOG"):
                print(f"Processing file: {focus_file}")
                return_array = process_file(focus_file)

                latency_data += return_array
                

    df = pd.DataFrame(latency_data, columns=["File Name", "Time Stamp", "Excel Timestamp", "Source Latency", "Target Latency", "Handling Latency"])
    df.info()

    # Dump file to Excel; but you can dump to other formats like text etc
    df.to_excel(out_excel)

And voilà – we have an output to Excel to quickly create statistics on latency for a given task(s).

What statistics to use?

Latency is something you can analyse in different ways, depending on what you’re trying to answer. It is also important to pair latency statistics with the change volume that is coming through.

Has the latency jumped at a specific time because the source database is processing a daily batch? Is there a spike of latency around Christmas time where there are more financial transactions compared to a benign day in February?

Generally, if the testers are processing the latency data they provide back:

  • Average target latency
  • 90th percentile target latency
  • 95th percentile target latency
  • Min target latency
  • Max target latency

This is what we use to compare two runs to each other when the source load is consistent and we’re changing a Qlik Replicate task

Qlik Replicate – MS SQL dates to Confluent Avro

I am just writing a brief post about a conversation I was asked to join.

The question paraphrased:

For dates in an Microsoft SQL Server, when they get passed through Qlik Replicate to Kafka Avro – how are they interpreted? Epoch to utc? Or to the current time zone?

I didn’t know the answer so I created the following simple test:


CREATE TABLE dbo.JD_DATES_TEST
(
 ID INT IDENTITY(1,1) PRIMARY KEY,
 TEST_SMALLDATETIME SMALLDATETIME,
 TEST_DATE date,
 TEST_DATETIME datetime,
 TEST_DATETIME2 datetime2,
 TEST_DATETIMEOFFSET datetimeoffset
);

GO

INSERT INTO dbo.JD_DATES_TEST VALUES(current_timestamp, current_timestamp, current_timestamp, current_timestamp, current_timestamp);

SELECT * FROM dbo.JD_DATES_TEST;
/*
ID          TEST_SMALLDATETIME      TEST_DATE  TEST_DATETIME           TEST_DATETIME2              TEST_DATETIMEOFFSET
----------- ----------------------- ---------- ----------------------- --------------------------- ----------------------------------
1           2025-06-12 12:04:00     2025-06-12 2025-06-12 12:04:16.650 2025-06-12 12:04:16.6500000 2025-06-12 12:04:16.6500000 +00:00

(1 row(s) affected)


*/

On the other side after passing through Qlik Replicate and then onto Kafka in Avro format; we got:

{
  "data": {
    "ID": {
      "int": 1
    },
    "TEST_SMALLDATETIME": {
      "long": 1749729840000000
    },
    "TEST_DATE": {
      "int": 20251
    },
    "TEST_DATETIME": {
      "long": 1749729856650000
    },
    "TEST_DATETIME2": {
      "long": 1749729856650000
    },
    "TEST_DATETIMEOFFSET": {
      "string": "2025-06-12 12:04:16.6500000 +00:00"
    },
    "x_y": {
      "string": "1.0.0"
    }
  },
  "beforeData": null,
  "headers": {
    "operation": "REFRESH",
    "changeSequence": "",
    "timestamp": "",
    "streamPosition": "",
    "transactionId": "",
    "changeMask": null,
    "columnMask": null,
    "transactionEventCounter": null,
    "transactionLastEvent": null
  }
}

So 1749729856650000 equals Thursday, June 12, 2025 12:04:16.650 PM – which is local time.

Qlik Replicate to AWS Postgres (Why are you so slow?!)

Wait – Performance problems writing to an AWS RDS Postgres database? 

Haven’t we been here before?

Yes, we had, and I wrote quite a few posts of the trials and tribulations that we went through.

But this is a new problem that we came across that resulted in several messages backwards and forwards between us and Qlik before we worked out the problem.

Core System Migration.

Our organisation has several core systems; making maintenance of them expensive and holding us back in using the data in these systems in modern day tools like advance AI.  Over the past several years – various projects are running to consolidate the systems together. 

This is all fun and games for the downstream consumers – as they have lots of migration data coming down the pipelines.  For instance, shell accounts getting created on the main system from the sacrificial system.

One downstream system wanted to exclude migration data from their downstream data and branch the data into another database so they can manipulate the migrated data to fit it into their pipeline.

I created the Qlik Replicate task to capture the migration data.  It was a simple task to create.  Unusually, the downstream users created their own tables that they want me top pipe the data into.  In the past, we let Qlik Replicate create the table in the lower environment, copy the schema and use that schema going forwards.

Ready to go we fired up the task in testing to capture the test run through of the migration.

Slow.  So Slow.

The test migration ran on the main core system, and we were ready to capture the changes under the user account running the migration process.

It was running slow.  So slow.

We knew data was getting loaded as we were periodically running a SELECT COUNT(*) on the destination table.  But we were running at less than 20tps.

Things we checked:

  1. The source and target databases were not under duress.
  2. The QR server (although a busy server) CPU and Memory wasn’t maxed out.
  3. There were no critical errors in the error log.
  4. Records were not getting written to the attrep_apply_exceptions table.
  5. There were no triggers built off the landing table that might be slowing down the process
  6. We knew from previous testing that we could get a higher tps.

I bumped up the logging on “Target Apply” to see if we can capture more details on the problem.

One by One.

After searching the log files, we came across an interesting message:

00007508: 2025-02-20T08:33:24:595067 [TARGET_APPLY    ]I:  Bulk apply operation failed. Trying to execute bulk statements in 'one-by-one' mode  (bulk_apply.c:2430)

00007508: 2025-02-20T08:33:24:814779 [TARGET_APPLY    ]I:  Applying INSERTS one-by-one for table 'dbo'.'DESTINATION_TABLE' (4)  (bulk_apply.c:4849)

For some reason instead of using a bulk load operation – QR was loading the records one by one.  This accounted for the slow performance.

But why was it switching to this one-by-one mode?  What caused the main import of bulk insert to fail – but one-by-still works.

Truncating data types

First, I suspected that the column names might be mismatched.  I got the source and destination schema out and compared the two. 

All the column names aligned correctly.

Turning up the logging we got the following message:

00003568: 2025-02-19T15:24:50 [TARGET_APPLY    ]T:  Error code 3 is identified as a data error  (csv_target.c:1013)

00003568: 2025-02-19T15:24:50 [TARGET_APPLY    ]T:  Command failed to load data with exit error code 3, Command output: psql:C:/Program Files/Attunity/Replicate/data/tasks/MY_CDC_TASK_NAME/data_files/0/LOAD00000001.csv.sql:1: ERROR:  value too long for type character varying(20)

CONTEXT:  COPY attrep_changes472AD6934FE46504, line 1, column col12: "2014-08-10 18:33:52.883" [1020417]  (csv_target.c:1087)

All the varchar fields between the source and target align correctly.

Then I noticed the column MODIFIED_DATE.  On the source it is a datetime; while on the postgres target it is just a date.

My theory was that the bulk copy could not handle the conversion – but in the one-by-one; it could truncate the time component off the date and successfully load.

The downstream team changed the field from a date to a timestamp and I reloaded the data.  With this fix the task went blindingly quick; from hours for just a couple of thousands of records to all done within minutes.

Conclusion

I suppose the main conclusion from this exercise is that “Qlik Replicate Knows best.” 

Unless you have a very accurate mapping process from the source to the target; let QR create the destination table in a lower environment.  Use this table as a source and build on it. 

It will save a lot time and heartache later on.

Postgres: EBCDIC decoding through a JavaScript Function

EBCDIC? Didn’t that die out with punch cards and the Dinosaurs?

EBCDIC (Extended Binary Coded Decimal Interchange Code) is an eight-bit character encoding that was created by IBM in the ’60s.

While the rest of the world went on with ASCII and UTF-8; we still find fields in our DB2 database encoded in EBCDIC 037 just to make our lives miserable.

Qlik Replicate when replicating from these fields on its default settings; brings it across as a normal “string” and becomes quite unusable when loaded into a destination system.

Decoding EBCDIC in Postgres

To have the flexibility to decode particular fields in EBCDIC; we need to bring the fields across as BYTES instead of that QR suggests. This can be done in the Table Settings for the table in question:

On the destination Postgres database; load the table into a bytea field.

Now with a udf function in Postgres; we can decode the EBCDIC bytes fields into something readable:

CREATE OR REPLACE FUNCTION public.fn_convert_bytes2_037(
    in_bytes bytea)
    RETURNS character varying
    LANGUAGE 'plv8'
    COST 100
    VOLATILE PARALLEL UNSAFE
AS $BODY$
    const hex_037 = new Map([
        ["40", " ",],
        ["41", " ",],
        ["42", "â",],
        ["43", "ä",],
        ["44", "à",],
        ["45", "á",],
        ["46", "ã",],
        ["47", "å",],
        ["48", "ç",],
        ["49", "ñ",],
        ["4a", "¢",],
        ["4b", ".",],
        ["4c", "<",],
        ["4d", "(",],
        ["4e", "+",],
        ["4f", "|",],
        ["50", "&",],
        ["51", "é",],
        ["52", "ê",],
        ["53", "ë",],
        ["54", "è",],
        ["55", "í",],
        ["56", "î",],
        ["57", "ï",],
        ["58", "ì",],
        ["59", "ß",],
        ["5a", "!",],
        ["5b", "$",],
        ["5c", "*",],
        ["5d", ")",],
        ["5e", ";",],
        ["5f", "¬",],
        ["60", "-",],
        ["61", "/",],
        ["62", "Â",],
        ["63", "Ä",],
        ["64", "À",],
        ["65", "Á",],
        ["66", "Ã",],
        ["67", "Å",],
        ["68", "Ç",],
        ["69", "Ñ",],
        ["6a", "¦",],
        ["6b", ",",],
        ["6c", "%",],
        ["6d", "_",],
        ["6e", ">",],
        ["6f", "?",],
        ["70", "ø",],
        ["71", "É",],
        ["72", "Ê",],
        ["73", "Ë",],
        ["74", "È",],
        ["75", "Í",],
        ["76", "Î",],
        ["77", "Ï",],
        ["78", "Ì",],
        ["79", "`",],
        ["7a", ":",],
        ["7b", "#",],
        ["7c", "@",],
        ["7d", "'",],
        ["7e", "=",],
        ["7f", ","],
        ["80", "Ø",],
        ["81", "a",],
        ["82", "b",],
        ["83", "c",],
        ["84", "d",],
        ["85", "e",],
        ["86", "f",],
        ["87", "g",],
        ["88", "h",],
        ["89", "i",],
        ["8a", "«",],
        ["8b", "»",],
        ["8c", "ð",],
        ["8d", "ý",],
        ["8e", "þ",],
        ["8f", "±",],
        ["90", "°",],
        ["91", "j",],
        ["92", "k",],
        ["93", "l",],
        ["94", "m",],
        ["95", "n",],
        ["96", "o",],
        ["97", "p",],
        ["98", "q",],
        ["99", "r",],
        ["9a", "ª",],
        ["9b", "º",],
        ["9c", "æ",],
        ["9d", "¸",],
        ["9e", "Æ",],
        ["9f", "¤",],
        ["a0", "µ",],
        ["a1", "~",],
        ["a2", "s",],
        ["a3", "t",],
        ["a4", "u",],
        ["a5", "v",],
        ["a6", "w",],
        ["a7", "x",],
        ["a8", "y",],
        ["a9", "z",],
        ["aa", "¡",],
        ["ab", "¿",],
        ["ac", "Ð",],
        ["ad", "Ý",],
        ["ae", "Þ",],
        ["af", "®",],
        ["b0", "^",],
        ["b1", "£",],
        ["b2", "¥",],
        ["b3", "·",],
        ["b4", "©",],
        ["b5", "§",],
        ["b6", "¶",],
        ["b7", "¼",],
        ["b8", "½",],
        ["b9", "¾",],
        ["ba", "[",],
        ["bb", "]",],
        ["bc", "¯",],
        ["bd", "¨",],
        ["be", "´",],
        ["bf", "×",],
        ["c0", "{",],
        ["c1", "A",],
        ["c2", "B",],
        ["c3", "C",],
        ["c4", "D",],
        ["c5", "E",],
        ["c6", "F",],
        ["c7", "G",],
        ["c8", "H",],
        ["c9", "I",],
        ["ca", "­",],
        ["cb", "ô",],
        ["cc", "ö",],
        ["cd", "ò",],
        ["ce", "ó",],
        ["cf", "õ",],
        ["d0", "}",],
        ["d1", "J",],
        ["d2", "K",],
        ["d3", "L",],
        ["d4", "M",],
        ["d5", "N",],
        ["d6", "O",],
        ["d7", "P",],
        ["d8", "Q",],
        ["d9", "R",],
        ["da", "¹",],
        ["db", "û",],
        ["dc", "ü",],
        ["dd", "ù",],
        ["de", "ú",],
        ["df", "ÿ",],
        ["e0", "\\",],
        ["e1", "÷",],
        ["e2", "S",],
        ["e3", "T",],
        ["e4", "U",],
        ["e5", "V",],
        ["e6", "W",],
        ["e7", "X",],
        ["e8", "Y",],
        ["e9", "Z",],
        ["ea", "²",],
        ["eb", "Ô",],
        ["ec", "Ö",],
        ["ed", "Ò",],
        ["ee", "Ó",],
        ["ef", "Õ",],
        ["f0", "0",],
        ["f1", "1",],
        ["f2", "2",],
        ["f3", "3",],
        ["f4", "4",],
        ["f5", "5",],
        ["f6", "6",],
        ["f7", "7",],
        ["f8", "8",],
        ["f9", "9",],
        ["fa", "³",],
        ["fb", "Û",],
        ["fc", "Ü",],
        ["fd", "Ù",],
        ["fe", "Ú"]
    ]);
 
    let in_varchar = "";
    let build_string = "";
     
    for (var loop_bytes = 0; loop_bytes < in_bytes.length; loop_bytes++)
    {
        /* Converts a byte character to a hex representation*/
        let focus_char = ('0' + (in_bytes[loop_bytes] & 0xFF).toString(16)).slice(-2); 
        let return_value = hex_037.get(focus_char.toLowerCase());
 
        /* If no mapping found - replace the character with a space */
        if(return_value === undefined)
        {
            return_value = " ";
        }
 
        build_string = build_string.concat(return_value)
    }
 
    return build_string
$BODY$;

The function can now be used in SQL:

SELECT public.fn_convert_bytes2_037(my_EBCDIC_byte_column)
FROM public.foo;

Reference

JavaScript bytes to HEX string function: Code Shock – How to Convert Between Hexadecimal Strings and Byte Arrays in JavaScript

Qlik Replicate: Oh Oracle – you’re a fussy beast

It’s all fun and games – until Qlik Replicate must copy 6 billion rows from a very wide Oracle table to GCS…

…in a small time window

…with the project not wanting to perform Stress and Volume testing

Oh boy.

Our Dev environment had 108 milling rows to play with, which ran “quick” in relationship the amount of data it had to copy.  But being 33 times smaller; even if it takes an hour in Dev – extrapolating the time out will relate to over 30 hours of run time.

The project forged ahead in the implementation and QR only processed 2% of the changes before we ran out of the time window.

 The QR servers didn’t seem stressed performance wise; had plenty of CPU and RAM.  I suspect the bottle neck was in the bandwidth going out to GCS; but there was no way to monitor how much of the connection has been used.

When in doubt – change the file type

After the failed implementation, we tried to work out how we can improve the throughput to GCS in our dev environment.

I thought changing the destination’s file type might be a way.  JSON is a chunky file format, and my hypothesis was if the JSON was compressed it would transfer to GCS quicker.  We tested out a NULL connector, raw JSON, GZIP JSON and Parquet.  As a test using Dev – we let a test task run for 20min to see how much data is copied across.

Full Load Tuning:

  • Transaction consistency timeout (seconds): 600
  • Commit rate during full load: 100000

Endpoint settings:

  • Maximum file size(KB): 1000000 KB (1GB)

Results

Unfortunately, my hypothesis on compressed JSON was incorrect.  We speculated that compressing the JSON might have been taking up as much time as transferring it.  I would have like to test this theory on a quieter QR server, but time is of the essence.

Parquet seemed to be the winner with the limited testing offering a nice little throughput boost over the JSON formats.  But it wasn’t the silver bullet to our throughput problems. Added onto this; the downstream users would need to spend time modifying their ingestion pipelines.

Divide and conquer – until Oracle says no.

The next stage was to look if we could divide the table up into batches and transfer across section at a time.  Looking at the primary key; it was an identity column that had little meaningful relation to easily divide up into batches.

There was another indexed column called RUN_DATE; which is a date relation to when the record was entered.

OK – let’s turn on Passthrough filtering and test it out.

First of all to test the syntax out in SQL Developer

SELECT COUNT(*)
FROM xxxxx.TRANSACTIONS
WHERE
    RUN_DATE >= '01/Jan/2023' AND
    RUN_DATE < '01/Jan/2024';

The query ran fine meaning that the date syntax was right.

Looking good – let’s add the filter to the Full Load Passthru Filter

But when running the task; it goes into “recoverable error” mode.

Looking into the logs:

00014204: 2024-11-26T08:38:26:184700 [SOURCE_UNLOAD   ]T:  Select statement for UNLOAD is 'SELECT "PK_ID","RUN_DATE", "LOTS", "OF", "OTHER, "COLUMNS"  FROM "xxxxx"."TRANSACTIONS" WHERE (RUN_DATE >= '01/Jan/2023' AND RUN_DATE &lt; '01/Jan/2024')'  (oracle_endpoint_utils.c:1941)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  ORA-01858: a non-numeric character was found where a numeric was expected  [1020417]  (oracle_endpoint_unload.c:175)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Failed to init unloading table 'xxxxx'.'TRANSACTIONS' [1020417]  (oracle_endpoint_unload.c:385)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  ORA-01858: a non-numeric character was found where a numeric was expected  [1020417]  (oracle_endpoint_unload.c:175)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  Failed to init unloading table 'xxxxx'.'TRANSACTIONS' [1020417]  (oracle_endpoint_unload.c:385)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Error executing source loop [1020417]  (streamcomponent.c:1942)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Stream component 'st_1_SRC_DEV_B1_xxxxx' terminated [1020417]  (subtask.c:1643)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Free component st_1_SRC_DEV_B1_xxxxx  (oracle_endpoint.c:51)
00011868: 2024-11-26T08:38:26:215961 [TASK_MANAGER    ]I:  Task error notification received from subtask 1, thread 0, status 1020417  (replicationtask.c:3603)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  Error executing source loop [1020417]  (streamcomponent.c:1942)
00014204: 2024-11-26T08:38:26:215961 [TASK_MANAGER    ]E:  Stream component failed at subtask 1, component st_1_SRC_DEV_B1_xxxxx  [1020417]  (subtask.c:1474)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  Stream component 'st_1_SRC_DEV_B1_xxxxx' terminated [1020417]  (subtask.c:1643)
00011868: 2024-11-26T08:38:26:231570 [TASK_MANAGER    ]W:  Task 'TEST_xxxxx' encountered a recoverable error  (repository.c:6200)

Error code ORA-01858 seems to be the key to the problem. As an experiment I copied out the select code and ran it into SQL Developer.

Works fine 🙁

OK – maybe it is a quirk of SQL Developer?

Using sqlplus I ran the same code from the command line.

Again works fine 🙁

Resorting to good old Google – I searched ORA-01858.

The top hit was this article from Stack Overflow that recommended confirming the format of the date with the TO_DATE function.

OK Oracle; if you want to be fussy with your dates – let’s explicitly define the date format with TO_DATE in the Full Load Passthru Filter.

RUN_DATE >= TO_DATE('01/Jan/2023','DD/Mon/YYYY') AND RUN_DATE < TO_DATE('01/Jan/2024','DD/Mon/YYYY')

Ahhhh – that works better and Qlik Replicate now runs successfully with the passthrough filter.

Conclusion

I tried a different set of date formats; including an ISO date format and Oracle spat them all out. So using TO_DATE is the simplest way to avoid the ORA-01858 error. I can understand Oracle refusing to run on a date like 03/02/2024; I mean is it the 3rd of Feb 2024; or for Americans the 2nd of Mar 2024? But surprised something very clear like an ISO date format; or 03/Feb/2024 did not work.

Maybe how SQL Developer and SQLplus interacts with the database is different than QR that leads to the different in behaviour of how filters on dates work.

Docker, Qlik Replicate and Postres – stitching them together

So far – what do we have?

We have:

  1. Postges working in a docker container
  2. Qlik Replicate working in a docker container

Let’s see if we can get them talking to each other

Working on Postgres Config files

First modification is needed is to the postgresql.conf for the postgres docker image. If you were following the docker-compose.yml in my previous post; you can find the file under:

~/apps/postgres-plv8/postgresql.conf

The following changes need to be made in postgresql.conf as found in the Qlik documentation

wal_level = logical
max_replication_slots = 3       # max number of replication slots

Now to added the following lines to

~/apps/postgres-plv8/pg_hba.conf
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

Once done – the docker container needs to be restarted for the changes to take effect.

Setting up QR

Log into QR using the address:

https://127.0.0.1:3552/attunityreplicate

Create a new postgres connection

Fill out the following fields:

Field Value
Serverhost.docker.internal
Port9432
UsernameUsername from the docker-compose.yml file for postgres
PasswordPassword from the docker-compose.yml for postgres
DatabaseDatabase from docker-compose.yml for postgres

From here a QR task can be built to read from the postgres database