qlikreplicate

Qlik Replicate: Stripping latency out of log files

We have been doing a bit of “Stress and Volume” testing in Qlik Replicate over the past few days; investigating how my latency is introduced to a MS SQL server task if we run it through a log stream.

If you’re not aware – you can get minute latecy from Qlik Replicate by turning the “Performance” logs up to “Trace” or higher:

This will result in messages getting produced in the task’s log file like:

00011044: 2025-08-04T14:57:56 [PERFORMANCE     ]T:  Source latency 1.23 seconds, Target latency 2.42 seconds, Handling latency 1.19 seconds  (replicationtask.c:3879)
00011044: 2025-08-04T14:58:26 [PERFORMANCE     ]T:  Source latency 1.10 seconds, Target latency 2.27 seconds, Handling latency 1.16 seconds  (replicationtask.c:3879)
00009024: 2025-08-04T14:58:30 [SOURCE_CAPTURE  ]I:  Throughput monitor: Last DB time scanned: 2025-08-04T14:58:30.700. Last LSN scanned: 008050a1:00002ce8:0004. #scanned events: 492815.   (sqlserver_log_utils.c:5000)
00011044: 2025-08-04T14:58:57 [PERFORMANCE     ]T:  Source latency 0.61 seconds, Target latency 1.87 seconds, Handling latency 1.25 seconds  (replicationtask.c:3879)
00011044: 2025-08-04T14:59:27 [PERFORMANCE     ]T:  Source latency 1.10 seconds, Target latency 1.55 seconds, Handling latency 0.45 seconds  (replicationtask.c:3879)

I created a simple python script to parse a folder of log files and output it in a excel. Since the data ends up in a panda data frame; it would be easy to manipulate the data and output it in a specific way:

import os
import pandas as pd
from datetime import datetime


def strip_seconds(inString):
    
    index = inString.find(" ")  
    returnString = float(inString[0:index])
    return(returnString)


def format_timestamp(in_timestamp):
    
    # Capture dates like "2025-08-01T10:42:36" and add a microsecond section
    if len(in_timestamp) == 19:
        in_timestamp = in_timestamp + ":000000"

    date_format = "%Y-%m-%dT%H:%M:%S:%f"

    # Converts date string to a date object    
    date_obj = datetime.strptime(in_timestamp, date_format)

    return date_obj

        
def process_file(in_file_path):

    return_array = []
    
    with open(in_file_path, "r") as in_file:
    
        for line in in_file:

            upper_case = line.upper().strip()

            if upper_case.find('[PERFORMANCE     ]') >= 0:
                timestamp_temp = upper_case[10:37]
                timestamp = timestamp_temp.split(" ")[0]
                
                split_string = upper_case.split("LATENCY ")
                
                if len(split_string) == 4:
                
                    source_latency = strip_seconds(split_string[1])
                    target_latency = strip_seconds(split_string[2])
                    handling_latency = strip_seconds(split_string[3])

                    # Makes the date compatible with Excel
                    date_obj = format_timestamp(timestamp)
                    excel_datetime = date_obj.strftime("%Y-%m-%d %H:%M:%S")
                    
                    # If you're outputting to standard out
                    #print(f"{in_file_path}\t{time_stamp}\t{source_latency}\t{target_latency}\t{handling_latency}\n")

                    return_array.append([in_file_path, timestamp, excel_datetime, source_latency, target_latency, handling_latency])
                     
    return return_array
    

if __name__ == '__main__':

    log_folder = "/path/to/logfile/dir"
    out_excel = "OutLatency.xlsx"
    
    latency_data = []

    # Loops through files in log_folder
    for file_name in os.listdir(log_folder):

        focus_file = os.path.join(log_folder, file_name )

        if os.path.isfile(focus_file):
            filename, file_extension = os.path.splitext(focus_file)

            if file_extension.upper().endswith("LOG"):
                print(f"Processing file: {focus_file}")
                return_array = process_file(focus_file)

                latency_data += return_array
                

    df = pd.DataFrame(latency_data, columns=["File Name", "Time Stamp", "Excel Timestamp", "Source Latency", "Target Latency", "Handling Latency"])
    df.info()

    # Dump file to Excel; but you can dump to other formats like text etc
    df.to_excel(out_excel)

And voilà – we have an output to Excel to quickly create statistics on latency for a given task(s).

What statistics to use?

Latency is something you can analyse in different ways, depending on what you’re trying to answer. It is also important to pair latency statistics with the change volume that is coming through.

Has the latency jumped at a specific time because the source database is processing a daily batch? Is there a spike of latency around Christmas time where there are more financial transactions compared to a benign day in February?

Generally, if the testers are processing the latency data they provide back:

Average target latency
90^th percentile target latency
95^th percentile target latency
Min target latency
Max target latency

This is what we use to compare two runs to each other when the source load is consistent and we’re changing a Qlik Replicate task

August 6, 2025 by jonny.donker@gmail.com Python Qlik Replicate 0

Qlik Replicate – MS SQL dates to Confluent Avro

I am just writing a brief post about a conversation I was asked to join.

The question paraphrased:

For dates in an Microsoft SQL Server, when they get passed through Qlik Replicate to Kafka Avro – how are they interpreted? Epoch to utc? Or to the current time zone?

I didn’t know the answer so I created the following simple test:


CREATE TABLE dbo.JD_DATES_TEST
(
 ID INT IDENTITY(1,1) PRIMARY KEY,
 TEST_SMALLDATETIME SMALLDATETIME,
 TEST_DATE date,
 TEST_DATETIME datetime,
 TEST_DATETIME2 datetime2,
 TEST_DATETIMEOFFSET datetimeoffset
);

GO

INSERT INTO dbo.JD_DATES_TEST VALUES(current_timestamp, current_timestamp, current_timestamp, current_timestamp, current_timestamp);

SELECT * FROM dbo.JD_DATES_TEST;
/*
ID          TEST_SMALLDATETIME      TEST_DATE  TEST_DATETIME           TEST_DATETIME2              TEST_DATETIMEOFFSET
----------- ----------------------- ---------- ----------------------- --------------------------- ----------------------------------
1           2025-06-12 12:04:00     2025-06-12 2025-06-12 12:04:16.650 2025-06-12 12:04:16.6500000 2025-06-12 12:04:16.6500000 +00:00

(1 row(s) affected)


*/

On the other side after passing through Qlik Replicate and then onto Kafka in Avro format; we got:

{
  "data": {
    "ID": {
      "int": 1
    },
    "TEST_SMALLDATETIME": {
      "long": 1749729840000000
    },
    "TEST_DATE": {
      "int": 20251
    },
    "TEST_DATETIME": {
      "long": 1749729856650000
    },
    "TEST_DATETIME2": {
      "long": 1749729856650000
    },
    "TEST_DATETIMEOFFSET": {
      "string": "2025-06-12 12:04:16.6500000 +00:00"
    },
    "x_y": {
      "string": "1.0.0"
    }
  },
  "beforeData": null,
  "headers": {
    "operation": "REFRESH",
    "changeSequence": "",
    "timestamp": "",
    "streamPosition": "",
    "transactionId": "",
    "changeMask": null,
    "columnMask": null,
    "transactionEventCounter": null,
    "transactionLastEvent": null
  }
}

So 1749729856650000 equals Thursday, June 12, 2025 12:04:16.650 PM – which is local time.

June 12, 2025 by jonny.donker@gmail.com Qlik Replicate SQL 0

Qlik Replicate to AWS Postgres (Why are you so slow?!)

Wait – Performance problems writing to an AWS RDS Postgres database?

Haven’t we been here before?

Yes, we had, and I wrote quite a few posts of the trials and tribulations that we went through.

But this is a new problem that we came across that resulted in several messages backwards and forwards between us and Qlik before we worked out the problem.

Core System Migration.

Our organisation has several core systems; making maintenance of them expensive and holding us back in using the data in these systems in modern day tools like advance AI. Over the past several years – various projects are running to consolidate the systems together.

This is all fun and games for the downstream consumers – as they have lots of migration data coming down the pipelines. For instance, shell accounts getting created on the main system from the sacrificial system.

One downstream system wanted to exclude migration data from their downstream data and branch the data into another database so they can manipulate the migrated data to fit it into their pipeline.

I created the Qlik Replicate task to capture the migration data. It was a simple task to create. Unusually, the downstream users created their own tables that they want me top pipe the data into. In the past, we let Qlik Replicate create the table in the lower environment, copy the schema and use that schema going forwards.

Ready to go we fired up the task in testing to capture the test run through of the migration.

Slow. So Slow.

The test migration ran on the main core system, and we were ready to capture the changes under the user account running the migration process.

It was running slow. So slow.

We knew data was getting loaded as we were periodically running a SELECT COUNT(*) on the destination table. But we were running at less than 20tps.

Things we checked:

The source and target databases were not under duress.
The QR server (although a busy server) CPU and Memory wasn’t maxed out.
There were no critical errors in the error log.
Records were not getting written to the attrep_apply_exceptions table.
There were no triggers built off the landing table that might be slowing down the process
We knew from previous testing that we could get a higher tps.

I bumped up the logging on “Target Apply” to see if we can capture more details on the problem.

One by One.

After searching the log files, we came across an interesting message:

00007508: 2025-02-20T08:33:24:595067 [TARGET_APPLY    ]I:  Bulk apply operation failed. Trying to execute bulk statements in 'one-by-one' mode  (bulk_apply.c:2430)

00007508: 2025-02-20T08:33:24:814779 [TARGET_APPLY    ]I:  Applying INSERTS one-by-one for table 'dbo'.'DESTINATION_TABLE' (4)  (bulk_apply.c:4849)

For some reason instead of using a bulk load operation – QR was loading the records one by one. This accounted for the slow performance.

But why was it switching to this one-by-one mode? What caused the main import of bulk insert to fail – but one-by-still works.

Truncating data types

First, I suspected that the column names might be mismatched. I got the source and destination schema out and compared the two.

All the column names aligned correctly.

Turning up the logging we got the following message:

00003568: 2025-02-19T15:24:50 [TARGET_APPLY    ]T:  Error code 3 is identified as a data error  (csv_target.c:1013)

00003568: 2025-02-19T15:24:50 [TARGET_APPLY    ]T:  Command failed to load data with exit error code 3, Command output: psql:C:/Program Files/Attunity/Replicate/data/tasks/MY_CDC_TASK_NAME/data_files/0/LOAD00000001.csv.sql:1: ERROR:  value too long for type character varying(20)

CONTEXT:  COPY attrep_changes472AD6934FE46504, line 1, column col12: "2014-08-10 18:33:52.883" [1020417]  (csv_target.c:1087)

All the varchar fields between the source and target align correctly.

Then I noticed the column MODIFIED_DATE. On the source it is a datetime; while on the postgres target it is just a date.

My theory was that the bulk copy could not handle the conversion – but in the one-by-one; it could truncate the time component off the date and successfully load.

The downstream team changed the field from a date to a timestamp and I reloaded the data. With this fix the task went blindingly quick; from hours for just a couple of thousands of records to all done within minutes.

Conclusion

I suppose the main conclusion from this exercise is that “Qlik Replicate Knows best.”

Unless you have a very accurate mapping process from the source to the target; let QR create the destination table in a lower environment. Use this table as a source and build on it.

It will save a lot time and heartache later on.

March 17, 2025 by jonny.donker@gmail.com Qlik Replicate 0

Postgres: EBCDIC decoding through a JavaScript Function

EBCDIC? Didn’t that die out with punch cards and the Dinosaurs?

EBCDIC (Extended Binary Coded Decimal Interchange Code) is an eight-bit character encoding that was created by IBM in the ’60s.

While the rest of the world went on with ASCII and UTF-8; we still find fields in our DB2 database encoded in EBCDIC 037 just to make our lives miserable.

Qlik Replicate when replicating from these fields on its default settings; brings it across as a normal “string” and becomes quite unusable when loaded into a destination system.

Decoding EBCDIC in Postgres

To have the flexibility to decode particular fields in EBCDIC; we need to bring the fields across as BYTES instead of that QR suggests. This can be done in the Table Settings for the table in question:

On the destination Postgres database; load the table into a bytea field.

Now with a udf function in Postgres; we can decode the EBCDIC bytes fields into something readable:

CREATE OR REPLACE FUNCTION public.fn_convert_bytes2_037(
    in_bytes bytea)
    RETURNS character varying
    LANGUAGE 'plv8'
    COST 100
    VOLATILE PARALLEL UNSAFE
AS $BODY$
    const hex_037 = new Map([
        ["40", " ",],
        ["41", " ",],
        ["42", "â",],
        ["43", "ä",],
        ["44", "à",],
        ["45", "á",],
        ["46", "ã",],
        ["47", "å",],
        ["48", "ç",],
        ["49", "ñ",],
        ["4a", "¢",],
        ["4b", ".",],
        ["4c", "<",],
        ["4d", "(",],
        ["4e", "+",],
        ["4f", "|",],
        ["50", "&",],
        ["51", "é",],
        ["52", "ê",],
        ["53", "ë",],
        ["54", "è",],
        ["55", "í",],
        ["56", "î",],
        ["57", "ï",],
        ["58", "ì",],
        ["59", "ß",],
        ["5a", "!",],
        ["5b", "$",],
        ["5c", "*",],
        ["5d", ")",],
        ["5e", ";",],
        ["5f", "¬",],
        ["60", "-",],
        ["61", "/",],
        ["62", "Â",],
        ["63", "Ä",],
        ["64", "À",],
        ["65", "Á",],
        ["66", "Ã",],
        ["67", "Å",],
        ["68", "Ç",],
        ["69", "Ñ",],
        ["6a", "¦",],
        ["6b", ",",],
        ["6c", "%",],
        ["6d", "_",],
        ["6e", ">",],
        ["6f", "?",],
        ["70", "ø",],
        ["71", "É",],
        ["72", "Ê",],
        ["73", "Ë",],
        ["74", "È",],
        ["75", "Í",],
        ["76", "Î",],
        ["77", "Ï",],
        ["78", "Ì",],
        ["79", "`",],
        ["7a", ":",],
        ["7b", "#",],
        ["7c", "@",],
        ["7d", "'",],
        ["7e", "=",],
        ["7f", ","],
        ["80", "Ø",],
        ["81", "a",],
        ["82", "b",],
        ["83", "c",],
        ["84", "d",],
        ["85", "e",],
        ["86", "f",],
        ["87", "g",],
        ["88", "h",],
        ["89", "i",],
        ["8a", "«",],
        ["8b", "»",],
        ["8c", "ð",],
        ["8d", "ý",],
        ["8e", "þ",],
        ["8f", "±",],
        ["90", "°",],
        ["91", "j",],
        ["92", "k",],
        ["93", "l",],
        ["94", "m",],
        ["95", "n",],
        ["96", "o",],
        ["97", "p",],
        ["98", "q",],
        ["99", "r",],
        ["9a", "ª",],
        ["9b", "º",],
        ["9c", "æ",],
        ["9d", "¸",],
        ["9e", "Æ",],
        ["9f", "¤",],
        ["a0", "µ",],
        ["a1", "~",],
        ["a2", "s",],
        ["a3", "t",],
        ["a4", "u",],
        ["a5", "v",],
        ["a6", "w",],
        ["a7", "x",],
        ["a8", "y",],
        ["a9", "z",],
        ["aa", "¡",],
        ["ab", "¿",],
        ["ac", "Ð",],
        ["ad", "Ý",],
        ["ae", "Þ",],
        ["af", "®",],
        ["b0", "^",],
        ["b1", "£",],
        ["b2", "¥",],
        ["b3", "·",],
        ["b4", "©",],
        ["b5", "§",],
        ["b6", "¶",],
        ["b7", "¼",],
        ["b8", "½",],
        ["b9", "¾",],
        ["ba", "[",],
        ["bb", "]",],
        ["bc", "¯",],
        ["bd", "¨",],
        ["be", "´",],
        ["bf", "×",],
        ["c0", "{",],
        ["c1", "A",],
        ["c2", "B",],
        ["c3", "C",],
        ["c4", "D",],
        ["c5", "E",],
        ["c6", "F",],
        ["c7", "G",],
        ["c8", "H",],
        ["c9", "I",],
        ["ca", "",],
        ["cb", "ô",],
        ["cc", "ö",],
        ["cd", "ò",],
        ["ce", "ó",],
        ["cf", "õ",],
        ["d0", "}",],
        ["d1", "J",],
        ["d2", "K",],
        ["d3", "L",],
        ["d4", "M",],
        ["d5", "N",],
        ["d6", "O",],
        ["d7", "P",],
        ["d8", "Q",],
        ["d9", "R",],
        ["da", "¹",],
        ["db", "û",],
        ["dc", "ü",],
        ["dd", "ù",],
        ["de", "ú",],
        ["df", "ÿ",],
        ["e0", "\\",],
        ["e1", "÷",],
        ["e2", "S",],
        ["e3", "T",],
        ["e4", "U",],
        ["e5", "V",],
        ["e6", "W",],
        ["e7", "X",],
        ["e8", "Y",],
        ["e9", "Z",],
        ["ea", "²",],
        ["eb", "Ô",],
        ["ec", "Ö",],
        ["ed", "Ò",],
        ["ee", "Ó",],
        ["ef", "Õ",],
        ["f0", "0",],
        ["f1", "1",],
        ["f2", "2",],
        ["f3", "3",],
        ["f4", "4",],
        ["f5", "5",],
        ["f6", "6",],
        ["f7", "7",],
        ["f8", "8",],
        ["f9", "9",],
        ["fa", "³",],
        ["fb", "Û",],
        ["fc", "Ü",],
        ["fd", "Ù",],
        ["fe", "Ú"]
    ]);
 
    let in_varchar = "";
    let build_string = "";
     
    for (var loop_bytes = 0; loop_bytes < in_bytes.length; loop_bytes++)
    {
        /* Converts a byte character to a hex representation*/
        let focus_char = ('0' + (in_bytes[loop_bytes] & 0xFF).toString(16)).slice(-2); 
        let return_value = hex_037.get(focus_char.toLowerCase());
 
        /* If no mapping found - replace the character with a space */
        if(return_value === undefined)
        {
            return_value = " ";
        }
 
        build_string = build_string.concat(return_value)
    }
 
    return build_string
$BODY$;

The function can now be used in SQL:

SELECT public.fn_convert_bytes2_037(my_EBCDIC_byte_column)
FROM public.foo;

Reference

JavaScript bytes to HEX string function: Code Shock – How to Convert Between Hexadecimal Strings and Byte Arrays in JavaScript

January 9, 2025 by jonny.donker@gmail.com Postgres Qlik Replicate 0

Qlik Replicate: Oh Oracle – you’re a fussy beast

It’s all fun and games – until Qlik Replicate must copy 6 billion rows from a very wide Oracle table to GCS…

…in a small time window

…with the project not wanting to perform Stress and Volume testing

Oh boy.

Our Dev environment had 108 milling rows to play with, which ran “quick” in relationship the amount of data it had to copy. But being 33 times smaller; even if it takes an hour in Dev – extrapolating the time out will relate to over 30 hours of run time.

The project forged ahead in the implementation and QR only processed 2% of the changes before we ran out of the time window.

The QR servers didn’t seem stressed performance wise; had plenty of CPU and RAM. I suspect the bottle neck was in the bandwidth going out to GCS; but there was no way to monitor how much of the connection has been used.

When in doubt – change the file type

After the failed implementation, we tried to work out how we can improve the throughput to GCS in our dev environment.

I thought changing the destination’s file type might be a way. JSON is a chunky file format, and my hypothesis was if the JSON was compressed it would transfer to GCS quicker. We tested out a NULL connector, raw JSON, GZIP JSON and Parquet. As a test using Dev – we let a test task run for 20min to see how much data is copied across.

Full Load Tuning:

Transaction consistency timeout (seconds): 600
Commit rate during full load: 100000

Endpoint settings:

Maximum file size(KB): 1000000 KB (1GB)

Results

Unfortunately, my hypothesis on compressed JSON was incorrect. We speculated that compressing the JSON might have been taking up as much time as transferring it. I would have like to test this theory on a quieter QR server, but time is of the essence.

Parquet seemed to be the winner with the limited testing offering a nice little throughput boost over the JSON formats. But it wasn’t the silver bullet to our throughput problems. Added onto this; the downstream users would need to spend time modifying their ingestion pipelines.

Divide and conquer – until Oracle says no.

The next stage was to look if we could divide the table up into batches and transfer across section at a time. Looking at the primary key; it was an identity column that had little meaningful relation to easily divide up into batches.

There was another indexed column called RUN_DATE; which is a date relation to when the record was entered.

OK – let’s turn on Passthrough filtering and test it out.

First of all to test the syntax out in SQL Developer

SELECT COUNT(*)
FROM xxxxx.TRANSACTIONS
WHERE
    RUN_DATE >= '01/Jan/2023' AND
    RUN_DATE < '01/Jan/2024';

The query ran fine meaning that the date syntax was right.

Looking good – let’s add the filter to the Full Load Passthru Filter

But when running the task; it goes into “recoverable error” mode.

Looking into the logs:

00014204: 2024-11-26T08:38:26:184700 [SOURCE_UNLOAD   ]T:  Select statement for UNLOAD is 'SELECT "PK_ID","RUN_DATE", "LOTS", "OF", "OTHER, "COLUMNS"  FROM "xxxxx"."TRANSACTIONS" WHERE (RUN_DATE >= '01/Jan/2023' AND RUN_DATE &lt; '01/Jan/2024')'  (oracle_endpoint_utils.c:1941)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  ORA-01858: a non-numeric character was found where a numeric was expected  [1020417]  (oracle_endpoint_unload.c:175)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Failed to init unloading table 'xxxxx'.'TRANSACTIONS' [1020417]  (oracle_endpoint_unload.c:385)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  ORA-01858: a non-numeric character was found where a numeric was expected  [1020417]  (oracle_endpoint_unload.c:175)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  Failed to init unloading table 'xxxxx'.'TRANSACTIONS' [1020417]  (oracle_endpoint_unload.c:385)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Error executing source loop [1020417]  (streamcomponent.c:1942)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Stream component 'st_1_SRC_DEV_B1_xxxxx' terminated [1020417]  (subtask.c:1643)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]T:  Free component st_1_SRC_DEV_B1_xxxxx  (oracle_endpoint.c:51)
00011868: 2024-11-26T08:38:26:215961 [TASK_MANAGER    ]I:  Task error notification received from subtask 1, thread 0, status 1020417  (replicationtask.c:3603)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  Error executing source loop [1020417]  (streamcomponent.c:1942)
00014204: 2024-11-26T08:38:26:215961 [TASK_MANAGER    ]E:  Stream component failed at subtask 1, component st_1_SRC_DEV_B1_xxxxx  [1020417]  (subtask.c:1474)
00014204: 2024-11-26T08:38:26:215961 [SOURCE_UNLOAD   ]E:  Stream component 'st_1_SRC_DEV_B1_xxxxx' terminated [1020417]  (subtask.c:1643)
00011868: 2024-11-26T08:38:26:231570 [TASK_MANAGER    ]W:  Task 'TEST_xxxxx' encountered a recoverable error  (repository.c:6200)

Error code ORA-01858 seems to be the key to the problem. As an experiment I copied out the select code and ran it into SQL Developer.

Works fine 🙁

OK – maybe it is a quirk of SQL Developer?

Using sqlplus I ran the same code from the command line.

Again works fine 🙁

Resorting to good old Google – I searched ORA-01858.

The top hit was this article from Stack Overflow that recommended confirming the format of the date with the TO_DATE function.

OK Oracle; if you want to be fussy with your dates – let’s explicitly define the date format with TO_DATE in the Full Load Passthru Filter.

RUN_DATE >= TO_DATE('01/Jan/2023','DD/Mon/YYYY') AND RUN_DATE < TO_DATE('01/Jan/2024','DD/Mon/YYYY')

Ahhhh – that works better and Qlik Replicate now runs successfully with the passthrough filter.

Conclusion

I tried a different set of date formats; including an ISO date format and Oracle spat them all out. So using TO_DATE is the simplest way to avoid the ORA-01858 error. I can understand Oracle refusing to run on a date like 03/02/2024; I mean is it the 3rd of Feb 2024; or for Americans the 2nd of Mar 2024? But surprised something very clear like an ISO date format; or 03/Feb/2024 did not work.

Maybe how SQL Developer and SQLplus interacts with the database is different than QR that leads to the different in behaviour of how filters on dates work.

November 25, 2024 by jonny.donker@gmail.com Oracle Qlik Replicate 0

Qlik Replicate: You’re trapped in a Docker container now!

In Qlik Replicate we tasks unable to resume when we have nasty server failures (for instant the CrowdStrike outage in July 2024).

This only happens in tasks that are impacted are a RDBMS to a cloud storage system like AWS S3 or GCS.

In the task log the error message takes the form of:

00002396: 2022-08-26T15:21:14 [AT_GLOBAL ]E: Json doesn't start with '{' [1003001] (at_cjson.c:1773)
00002396: 2022-08-26T15:21:14 [AT_GLOBAL ]E: Cannot parse json: [1000251] (at_protobuf.c:1420)

This error gives us problems; I can’t resume the task as the error re-appears. I can’t even start it from the stream position and must rely on restarting the QR task from a timestamp, which is extremely dangerous with the chance of missing out on data for that split of a second.

I suspect the problem is that the “staging” file on the QR server gets corrupted mid write when the server fails and when resume; QR can’t parse it.

But trying to recreate the problem in a safe environment to diagnose it is tricky. Our DTL environment doesn’t create enough traffic to trigger the issue. Also, I don’t want to be abruptly turning off our DTL QR servers and interrupting other people’s testing. As for trying to recreate the problem in production – the pain of all the red tape is not worth the effort.

I needed a safer space to work in. A space when I can pump through large volumes of data through QR and kick the QR service around trying to provoke the error. Armed with my little Linux VM – docker containers was the answer.

CentOS? Why CentOS?

My goal was to build a Docker container with Qlik Replicate and Postgres drivers so I can use it on my Linux VM.

Under Support articles, Qlik has a guide on how to run Qlik Replicate in a Docker container.

Following the instructions I ran into some initial problems. The first major problem was using the Cent OS docker image. The issue was that I must use the packages in my company’s artifactory and not external packages. Although the company had CentOS; there was no other packages available to update and install. Since my VM cannot reach http://vault.centos.org; the CentOS image was a lame duck.

With CentOS off the cards, I had to use Redhat image that my company provided. With Redhat – the artifactory had all the packages that I needed.

The second problem was that I was wanting to use the 2023.11 image to match our environment. With 2023.11 there are some extra steps needed in the docker file compared to 2024.05. The differences is notated on Qlik’s support article.

The Dockerfile

Here is the Dockerfile

FROM my.companys.repo/redhat/ubi9


ENV QLIK_REPLICATE_BASE_DIR=/opt/attunity/replicate/
ENV ReplicateDataFolder=/replicate/data
ENV ReplicateAdminPassword=AB1gL0ngPa33w0rd
ENV ReplicateRestPort=3552
ENV LicenseFile=/tmp/replicate_license_exp2025-06-29_ser60038556.txt

# Copy across installation packages and licenses
ADD postgresql*.rpm /tmp/
ADD areplicate-*.rpm /tmp/
ADD systemctl /usr/sbin
ADD replicate_license_exp2025-06-29_ser60038556.txt /tmp/

# Update packages
RUN dnf -y update
RUN dnf makecache

# To get ps command
RUN dnf -y install procps-ng
RUN dnf -y install unixODBC unzip
RUN dnf -y install libicu.x86_64
RUN rm -f /etc/odbcinst.ini

# Installing posgres packages
RUN rpm -ivh /tmp/postgresql13-libs-13.9-1PGDG.rhel9.x86_64.rpm
RUN rpm -ivh /tmp/postgresql13-odbc-13.02.0000-2PGDG.rhel9.x86_64.rpm
RUN rpm -ivh /tmp/postgresql13-13.9-1PGDG.rhel9.x86_64.rpm

ADD odbcinst.ini /etc/

# Installing Qlik Replicate
RUN systemd=no yum -y install /tmp/areplicate-2023.11.0-468.x86_64.rpm
RUN yum clean all
RUN rm -f /tmp/areplicate-*.rpm

RUN export LD_LIBRARY_PATH=/opt/attunity/replicate/lib:\$LD_LIBRARY_PATH
RUN echo "export LD_LIBRARY_PATH=/usr/pgsql-13/lib:\$LD_LIBRARY_PATH" >> /opt/attunity/replicate/bin/site_arep_login.sh

ADD start_replicate.sh /opt/attunity/replicate/bin/start_replicate.sh
RUN chmod 775 /opt/attunity/replicate/bin/start_replicate.sh
RUN chown attunity:attunity /opt/attunity/replicate/bin/start_replicate.sh
RUN source $QLIK_REPLICATE_BASE_DIR/bin/arep_login.sh >>~attunity/.bash_profile
ENTRYPOINT /opt/attunity/replicate/bin/start_replicate.sh ${ReplicateDataFolder} ${ReplicateAdminPassword} ${ReplicateRestPort} ${LicenseFile} ; tail -f /dev/null

The postgres packages can be obtained from https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-9-x86_64/

Th file odbcinst.ini content is:

[PostgreSQL]
Description = ODBC for PostgreSQL
Driver      = /usr/lib/psqlodbcw.so
Setup       = /usr/lib/libodbcpsqlS.so
Driver64    = /usr/pgsql-13/lib/psqlodbcw.so
Setup64     = /usr/lib64/libodbcpsqlS.so
FileUsage   = 1

The systemctl file is:

# Run LS command - remove this line 
ls

And of course you need the rpm for Qlik replicate and your license file.

Once the Dockerfile and files are collated in a directory; build the container with:

docker build --no-cache -t ccc/replicate:2023.11 .

If all goes well – a Docker contain will be built and ready to be used.

Docker Compose

To make running the docker images easier; create a docker compose file:

version: '3.3'

services:
  replicate:
    image: docker.io/ccc/replicate:2023.11
    container_name: replicate_2023_11
    ports: 
      - "3552:3552"

    environment:
      - ReplicateRestPort=3552
      - TZ=Australia/Melbourne

    volumes:
      - /dockermount/data/replicate/data:/replicate/data

    extra_hosts:
      - host.docker.internal:host-gateway

volumes:
  replicate:

Save the docker-compose.yml in a directory and from the directory start the container with the command:

docker-compose up -d

If everything is working – run the docker ps command to verify everything is working:

docker ps

So far looking good. Further conformation can be had by connecting into the container and observe the QR processes running:

docker exec -it qr_container_id bash
ps -aux

There should be two main processes; plus a process for each individual QR tasks running:

With everything confirmed – QR console can be accessed from a browser.

https://127.0.0.1:3552/attunityreplicate/

September 4, 2024 by jonny.donker@gmail.com Docker Postgres Qlik Replicate 1

Qlik Replicate – Fight of the filters. Who will prevail?

Background

The business is doing their best to keep me on my toes with Qlik Replicate; finding new was to bend and stretch the system and consequently my sanity.

The initial request was, “Can we overwrite this field in a Qlik Replicate task with a SOURCE_LOOKUP?”

OK – we can do this. I abhor putting ETL logic in Qlik Replicate tasks and wanting to keep them as simple as possible and allow the power and the flexibility of the downstream systems to manipulate data.

But project timelines were pressing, and I complied with their request.

Later, they came back to me and requested a to add a filter to the derived field in question.

And that led to me and our Tech Business analyst scratching our heads.

If we apply a filter to our focus field; will it use the raw field that is in the table? Or will is use the new lookup field with the same name to base the filter on?

Testing the filters – setting up

To start with; some simple tables in MS-SQL:

CREATE TABLE dbo.TEST_LOOKUP
(
	ACCOUNT_ID INT PRIMARY KEY,
	FRUIT_ID INT,
	FRUIT_NAME VARCHAR(100),
	SOURCE_NAME VARCHAR(100)
);

GO

CREATE TABLE dbo.FRUITS
(
	FRUIT_ID INT PRIMARY KEY,
	NEW_FRUIT_NAME VARCHAR(100)
)

GO

INSERT INTO dbo.FRUITS VALUES(1, 'NEW APPLES');
INSERT INTO dbo.FRUITS VALUES(2, 'NEW ORANGES');

A simple Qlik Replicate task was created to replicate from the table dbo.TEST_LOOKUP.

All columns were brought across instead of FRUIT_NAME. FRUIT_NAME will be overwritten with the source lookup:

source_lookup('NO_CACHING','dbo','FRUITS','NEW_FRUIT_NAME','FRUIT_ID =?',$FRUIT_ID)

To test; a simple insert was added to ensure that the source lookup is working correctly:

INSERT INTO dbo.TEST_LOOKUP VALUES(1, 1, 'OLD APPLES', 'Truck');

Result:

{
    "magic": "atMSG",
    "type": "DT",
    "headers": null,
    "messageSchemaId": null,
    "messageSchema": null,
    "message": {
        "data": {
            "ACCOUNT_ID": 1,
            "FRUIT_ID": 1,
            "SOURCE_NAME": "Truck",
            "FRUIT_NAME": "NEW APPLES"
        },
        "beforeData": null,
        "headers": {
            "operation": "INSERT",
            "changeSequence": "20240529060703760000000000000000005",
            "timestamp": "2024-05-29T06:07:03.767",
            "streamPosition": "0071a49f:000f8e09:001c",
            "transactionId": "6EDBA1FA0E0000000000000000000000",
            "changeMask": "0F",
            "columnMask": "0F",
            "transactionEventCounter": 1,
            "transactionLastEvent": true
        }
    }
}

Everything is working correctly; FRUIT_NAMES got overwritten with “NEW APPLES” in the json output.

Testing the filters – placing bets

In the CDC task; a new filter was added:

$FRUIT_NAME == 'NEW ORANGES'

And the following SQL statement was run on the source system:

INSERT INTO dbo.TEST_LOOKUP VALUES(2, 2, 'OLD ORANGES', 'Fridge');

So – If Qlik Replicate filters on the base table’s field; the change WILL NOT be replicated through.

Likewise if Qlik Replicate is using the new derived field for filter; the change WILL come through.

And the results are…

{
    "magic": "atMSG",
    "type": "DT",
    "headers": null,
    "messageSchemaId": null,
    "messageSchema": null,
    "message": {
        "data": {
            "ACCOUNT_ID": 2,
            "FRUIT_ID": 2,
            "SOURCE_NAME": "Fridge",
            "FRUIT_NAME": "NEW ORANGES"
        },
        "beforeData": null,
        "headers": {
            "operation": "INSERT",
            "changeSequence": "20240529061434050000000000000000065",
            "timestamp": "2024-05-29T06:14:34.050",
            "streamPosition": "0071a49f:000f901a:0005",
            "transactionId": "28DCA1FA0E0000000000000000000000",    
            "changeMask": "17",
            "columnMask": "17",
            "transactionEventCounter": 1,
            "transactionLastEvent": true
        }
    }
}

Qlik will use the derived source lookup field over the original field in the table.

Conclusion

Once again this highlights the danger of putting ETL code into Qlik Replicate tasks. It obscures business rules and can lead to confusion in operations like the scenario above.

It is best to use Qlik Replicate to get the data out of the source database as quickly and as pure as possible and then use the power of the downstream systems to manipulate the data.

May 29, 2024 by jonny.donker@gmail.com Qlik Replicate 0

Qlik Replicate – The saga of replicating to AWS Part 5 – Is MS-SQL the answer?

For those who are following along at home…

We have been toiling on replicating to AWS Postgres RDS with Qlik Replicate for the past two months; trying to achieve a baseline of 300tps.

After many suggestions, tuning and tweaks, conference calls, benchmarks learning experiences, prayers; we couldn’t get our tps close to our baseline.

You can read the main findings in the following pages:

Things that were suggested to us that I did not add to this blog series:

Try Async commits on RDS Postgres. Tried it and got negligible increases.
Shifting the QR server to AWS. Bad idea as there will be even more traffic from the busy DB2 database going across for QR to consume.
Use Amazon Aurora instead of RDS. The downstream developers did not have the appetite to try Aurora; especially with the issues leaning towards network speed.
Use GCP version of Postgres instead of AWS. The downstream developers did not want to commit to another cloud provider.

The problem is how the network connectivity behaves with the Postgres ODBC and the round trips it must do between our location and the AWS data centre. We can try – but we are bound by the laws of physics and the speed of light.

Decisions to be made.

All though our benchmarking and investigation; we have been replicating to a Development MS-SQL database in the data centre as the DB2 database in parallel to give us an idea of what speed we could potentially reach. Without triggers on the MS-SQL destination table; we were easily hitting 300tps. Ramping the changes up; we can hold at 1K tps with no creep in latency.

We were happy with these results; especially with the MS-SQL database was just a small underpowered shared Dev machine; not a full-blown dedicated server.

It took a brave solution architect to propose that we shift from AWS RDS Postgres to an on prem MS-SQL server; especially when our senior management strategy is to push everything to the cloud to reduce the number of on prem servers.

In the end with all our evidence on the performance and the project’s willingness to push on with the proposed solution, the solution stakeholders agreed to move the destination to an on prem database.

They initially wanted us to go with a on prem Postgres database; but since all our Database Administrators are either Oracle or MS-SQL experts and we have no Postgres experts – we went to good old MS-SQL.

It worked; but…damn triggers.

I volunteered to convert the Postgres SQL code into T-SQL as I have worked with T-SQL for the past decade. The conversion went smoothly, and I took the opportunity to optimise several sections of the code to make the solution more maintainable and to run faster.

With our new MS-SQL database all coded up and the triggers turned off; the SVT (stress and volume testing) ran at the TPS for which we were aiming.

But when we turned on the triggers; the performance absolutely crashed.

I was mortified – was it my coding shot and the additional changes that I made the performance worse?

I checked the triggers. I checked the primary keys and the joins. I checked the horizontal partitioning. I checked the database server stats for CPU and memory usage.

Nothing – could not locate the performance problem.

I went back to Qlik Replicate and examined the log files.

Ahh – here is something. The log file was full of entries like this:

00012472: 2024-05-27T16:03:54 [TARGET_APPLY ]W: Source changes that would have had no impact were not applied to the target database. Refer to the 'attrep_apply_exceptions' table for details (endpointshell.c:7632)

Looking inside the attrep_apply_exceptions there corresponding entries like:

UPDATE [dbo].[TEST_DESTINATION] 
SET	[ACCOUNT_ID] = 2,
	[DATA_1] = 'Updated', 
	[DATA_2] = 'Data' 
WHERE 
	[ACCOUNT_ID] = 2;
	-- 0 rows affected

Which was confusing; I checked the destination table, and the update was applied. Why was this update deemed a failure and logged to the attrep_apply_exceptions table? It must be an error in the trigger.

The cause of the problem

Our code can be paraphrased like:

CREATE TABLE dbo.TEST_DESTINATION
(
	ACCOUNT_ID int NOT NULL,
	DATA_1 varchar(100) NULL,
	DATA_2 varchar(100) NULL,
	PRIMARY KEY CLUSTERED 
	(
		ACCOUNT_ID ASC
	)
);

GO

CREATE TABLE dbo.TEST_MERGE_TABLE
(
	ACCOUNT_ID int NOT NULL,
	DATA_1 varchar(100) NULL,
	DATA_2 varchar(100) NULL,
	DATA_3 varchar(100) NULL,
	PRIMARY KEY CLUSTERED 
	(
		ACCOUNT_ID ASC
	)
)

GO

CREATE OR ALTER TRIGGER dbo.TR_TEST_DESTINATION__INSERT
ON dbo.TEST_DESTINATION
AFTER INSERT 
AS
	INSERT INTO dbo.TEST_MERGE_TABLE
	SELECT
		ACCOUNT_ID,
		DATA_1,
		DATA_2,
		'TRIGGER INSERT' AS DATA_3
	FROM INSERTED;

GO

CREATE OR ALTER TRIGGER [dbo].[TR_TEST_DESTINATION__UPDATE]
ON [dbo].[TEST_DESTINATION]
AFTER UPDATE 
AS
	UPDATE dbo.TEST_MERGE_TABLE
	SET ACCOUNT_ID = X.ACCOUNT_ID,
		DATA_1 = X.DATA_1,
		DATA_2 = X.DATA_2,
		DATA_3 = 'TRIGGER DATA'
	FROM dbo.TEST_MERGE_TABLE T
	JOIN INSERTED X
	ON X.ACCOUNT_ID = T.ACCOUNT_ID
	WHERE
		1 = 0;  -- This predicate can be either true or false.  For example we set it false

GO

The problem is in how the trigger TR_TEST_DESTINATION__UPDATE behaves if it returns 0 rows. This can be a legitimate occurrence depending on a join in the trigger.

If I run a simple update like:

UPDATE dbo.TEST_DESTINATION
SET DATA_1 = 'Trigger Upate'
WHERE
	ACCOUNT_ID = 1;

The SQL engine returns:

(0 row(s) affected)    -- Returned from the trigger

(1 row(s) affected)    -- Returned from updating dbo.TEST_DESTINATION

My theory is that Qlik Replicate when reading the rows returned from executing the SQL statement on the destination server; only considers the first row to determine if the change was a success or not. Since the first row is an output from the trigger with 0 rows affected; Qlik considers that the update was a failure and therefore logs it into the attrep_apply_exceptions table.

Apart from this been incorrect as the trigger code is logically working correctly; Qlik Replicate must make another trip to write to the exception table. This resulted in drastically increased latency.

Fixing the issue

The fix (once the problem is known) is relatively straight forward. Any rows returned needs to be supressed from the trigger. For example:

CREATE OR ALTER TRIGGER [dbo].[TR_TEST_DESTINATION__UPDATE]
ON [dbo].[TEST_DESTINATION]
AFTER UPDATE 
AS
BEGIN
	SET NOCOUNT ON;  -- Supress returning the row count

	UPDATE dbo.TEST_MERGE_TABLE
	SET ACCOUNT_ID = X.ACCOUNT_ID,
		DATA_1 = X.DATA_1,
		DATA_2 = X.DATA_2,
		DATA_3 = 'TRIGGER DATA'
	FROM dbo.TEST_MERGE_TABLE T
	JOIN INSERTED X
	ON X.ACCOUNT_ID = T.ACCOUNT_ID
	WHERE
		1 = 0;  -- This predicate can be either true or false
END

When the update statement is run again; the following is returned:

(1 row(s) affected)

Qlik Replicate will now consider the update as a success and not log it as an exception.

May 27, 2024 by jonny.donker@gmail.com Qlik Replicate 0

Qlik Replicate – The saga of replicating to AWS Part 4 – Does stream size matter?

Continuing our ongoing Qlik Replicating story of trying to replicate a DB2/zOS database to AWS RDS Postgres.

We made small improvements; but nothing substantial to reach the TPS for which we were aiming. I was at my experience end of what I knew and decided to reach for professional help.

We have a support relationship with IBT; who helped us out with the initial set up QR in our organisation. But recently we have been self-resolving our own problems and have not been using their help. Now this was suitable time to ask for their help.

IBT has always been helpful when we have asked for assistance. Another handy aspect with the relationship is that IBT has a quick support relationship with Qlik. If they don’t know the answer; they can get the answer easily from Qlik.

IBT asked us to collect the usual data; diagnostic logs, source and target DB metrics and QR server core metrics. Nothing looked under duress, so IBT dived into the nitty and gritty details of the diagnostic packs.

Their techs noticed that our outgoing stream buffers were full. This means that the changes were coming in faster than were getting sent out to the destination. IBT suggested to try increasing the size of the outgoing stream.

Without going through the details of this step; here is a Qlik knowledge base article on Increasing the outgoing stream queue.

“She’s breaking up – I can’t hold her”

We started off with:

"stream_buffers_number" : 5,
"stream_buffer_size" : 10,

It was a marginal improvement. Measured in a thimble full of performance improvement. Still nowhere near the TPS we needed.

IBT asked us to increase the two variables in small increments of “stream_buffers_number” + 5 and “stream_buffer_size” + 10. With each increase there was a minuscule improvement.

But more worrying with each increase; the QR task was using more memory to the point that increasing the buffer size was unsustainable with the resources on the server. Even if increasing the buffer variables and the gained TPS was linear relationship; we would need a very beefy server to reach 300 TPS.

So again, it was a little gain; and with all the added “Little gains” over the past few fix iterations we were still no closer to our needed 300 TPS.

Increasing the buffer variables might be helpful if you are close to your TPS and trying to get over the last hurdle. But since we’re so far behind; we had to look for another solution.

May 15, 2024 by jonny.donker@gmail.com Qlik Replicate 0

Qlik Replicate – The saga of replicating to AWS Part 3 – Wireshark!

Continuing on the story

After concluding that the low TPS is not resulting from poor query performance; our attention was turned to the network latency between our OnPrem Qlik system and the AWS RDS database.

First, I asked the networks team if there were any suspect networking components between our on-premise’s Qlik server and the AWS DB. Anything like IPS, QOS, bandwidth limitation components that could explain the slowdown.

I also asked the cloud team if they can find anything as well.

It was a high hope for them to find anything; but since they are the SMEs in the area, it was worth asking the question.

As expected, they did not find anything.

But the Network team did come back with a couple of pieces of information:

The network bandwidth to the AWS was wide enough and we were not reaching its capacity.
It is a 16ms – 20ms round trip from our Data centre to the AWS data centre.

Loaction… Location…

Physically the distance to the AWS data centre is 700Km.

Unfortunately, AWS set up a closer data centre in the past few years, which is only 130Km away. We are not currently set up to use this new region yet.

The Network team gave me permission to install wire shark on our OnPrem Qlik server and our AWS EC2 Qlik server.

From both servers with psql I connected to the AWS RDS database and updated one row; capturing the traffic using Wireshark.

I lined up the two results from the different servers to see if there was anything obvious

Wireshark results

(ip.src == ip.of.qlik.server and ip.dst == ip.of.aws.rds) or (ip.src == ip.of.aws.rds and ip.dst == ip.of.qlik.server)

SEQ	Source	Destination	Protocol	Length	Info	On Prem 2 RDS	EC2 2 RDS	Difference (sec)	% of difference
1	Qlik server	RDS DB	TCP	66	58313 > 5432 [SYN, ECE, CWR] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM	0	0	0.000	0%
2	RDS DB	Qlik server	TCP	66	5432 > 58313 [SYN, ACK] Seq=0 Ack=1 Win=26883 Len=0 MSS=1460 SACK_PERM WS=8	0.019	0.001	0.018	10%
3	Qlik server	RDS DB	TCP	54	58313 > 5432 [ACK] Seq=1 Ack=1 Win=262656 Len=0	0.000	0.000	0.000	0%
4	Qlik server	RDS DB	PGSQL	62	>?	0.000	0.005	-0.005	-3%
5	RDS DB	Qlik server	TCP	60	5432 > 58313 [ACK] Seq=1 Ack=9 Win=26888 Len=0	0.018	0.000	0.018	10%
6	RDS DB	Qlik server	PGSQL	60	<	0.001	0.001	0.000	0%
7	Qlik server	RDS DB	TLSv1.3	343	Client Hello	0.004	0.004	0.001	0%
8	RDS DB	Qlik server	TLSv1.3	220	Hello Retry Request	0.021	0.001	0.021	12%
9	Qlik server	RDS DB	TLSv1.3	455	Change Cipher Spec, Client Hello	0.003	0.001	0.002	1%
10	RDS DB	Qlik server	TLSv1.3	566	Server Hello, Change Cipher Spec	0.023	0.005	0.019	11%
11	RDS DB	Qlik server	TCP	1514	5432 > 58313 [ACK] Seq=680 Ack=699 Win=29032 Len=1460 [TCP segment of a reassembled PDU]	0.000	0.000	0.000	0%
12	RDS DB	Qlik server	TCP	1514	5432 > 58313 [ACK] Seq=2140 Ack=699 Win=29032 Len=1460 [TCP segment of a reassembled PDU]	0.000	0.000	0.000	0%
13	RDS DB	Qlik server	TCP	1514	5432 > 58313 [ACK] Seq=3600 Ack=699 Win=29032 Len=1460 [TCP segment of a reassembled PDU]	0.000	0.000	0.000	0%
14	RDS DB	Qlik server	TLSv1.3	394	Application Data	0.000	0.000	0.000	0%
15	Qlik server	RDS DB	TCP	54	58313 > 5432 [ACK] Seq=699 Ack=5400 Win=262656 Len=0	0.000	0.000	0.000	0%
16	Qlik server	RDS DB	TLSv1.3	112	Application Data	0.003	0.002	0.001	1%
17	Qlik server	RDS DB	TLSv1.3	133	Application Data	0.000	0.000	0.000	0%
18	RDS DB	Qlik server	TCP	60	5432 > 58313 [ACK] Seq=5400 Ack=836 Win=29032 Len=0	0.018	0.000	0.018	10%
19	RDS DB	Qlik server	TLSv1.3	142	Application Data	0.001	0.008	-0.007	-4%
20	RDS DB	Qlik server	TLSv1.3	135	Application Data	0.006	0.003	0.003	2%
21	Qlik server	RDS DB	TCP	54	58313 > 5432 [ACK] Seq=836 Ack=5569 Win=262400 Len=0	0.000	0.001	-0.001	0%
22	Qlik server	RDS DB	TLSv1.3	157	Application Data	0.005	0.007	-0.002	-1%
23	RDS DB	Qlik server	TLSv1.3	179	Application Data	0.018	0.001	0.018	10%
24	Qlik server	RDS DB	TLSv1.3	251	Application Data	0.011	0.000	0.011	6%
25	RDS DB	Qlik server	TLSv1.3	147	Application Data	0.018	0.000	0.018	11%
26	RDS DB	Qlik server	TLSv1.3	433	Application Data, Application Data	0.000	0.000	0.000	0%
27	RDS DB	Qlik server	TLSv1.3	98	Application Data	0.000	0.000	0.000	0%
28	Qlik server	RDS DB	TCP	54	58313 > 5432 [ACK] Seq=1136 Ack=6210 Win=261888 Len=0	0.000	0.000	0.000	0%
29	Qlik server	RDS DB	TLSv1.3	93	Application Data	0.001	0.001	0.001	0%
30	RDS DB	Qlik server	TLSv1.3	148	Application Data	0.020	0.001	0.018	11%
31	RDS DB	Qlik server	TLSv1.3	98	Application Data	0.000	0.000	0.000	0%
32	Qlik server	RDS DB	TCP	54	58313 > 5432 [ACK] Seq=1175 Ack=6348 Win=261632 Len=0	0.000	0.000	0.000	0%
33	Qlik server	RDS DB	TLSv1.3	81	Application Data	0.000	0.000	0.000	0%
34	Qlik server	RDS DB	TLSv1.3	78	Application Data	0.000	0.000	0.000	0%
35	Qlik server	RDS DB	TCP	54	58313 > 5432 [FIN, ACK] Seq=1226 Ack=6348 Win=261632 Len=0	0.000	0.000	0.000	0%
36	RDS DB	Qlik server	TCP	60	5432 > 58313 [ACK] Seq=6348 Ack=1226 Win=30104 Len=0	0.019	0.000	0.018	11%
37	RDS DB	Qlik server	TCP	60	5432 > 58313 [FIN, ACK] Seq=6348 Ack=1227 Win=30104 Len=0	0.000	0.000	0.000	0%
38	Qlik server	RDS DB	TCP	54	58313 > 5432 [ACK] Seq=1227 Ack=6349 Win=261632 Len=0	0.000	0.000	0.000	0%

The data from the two captures showed a couple of things:

Firstly, both systems had the same number of events captured by Wireshark. This gives me an indication that there are no networking components from source to destination that is dropping traffic; or doing anything extra unexpected actions to the packet requests.

I cannot say for sure what is happening on the return trip if there is anything timing out from the AWS side back.

Also, when taking the difference between the OnPrem vs the EC2 server I can see the difference of 18ms keep popping up. I believe this is the round trip of the connection. Since this happens multiple times; our latency is compounded into quite a significant value.

What’s next?

I am not a network engineer, so I do not have the knowledge to dive deeper into the Wireshark packets.

It would be interesting to try the closer AWS data centre to see if the physical distance can help the latency. But to do this will require effort from the cloud team and the project budget wouldn’t extend to this piece of work.

Our other option is to reduce the number of round trips from our OnPrem server to the AWS datacentre as much as possible.

May 6, 2024 by jonny.donker@gmail.com Qlik Replicate 0

qlikreplicate

What statistics to use?

Core System Migration.

Slow. So Slow.

One by One.

Truncating data types

Conclusion

EBCDIC? Didn’t that die out with punch cards and the Dinosaurs?

Decoding EBCDIC in Postgres

Reference

When in doubt – change the file type

Results

Divide and conquer – until Oracle says no.

Conclusion

CentOS? Why CentOS?

The Dockerfile

Docker Compose

Background

Testing the filters – setting up

Testing the filters – placing bets

Conclusion

For those who are following along at home…

Decisions to be made.

It worked; but…damn triggers.

The cause of the problem

Fixing the issue

“She’s breaking up – I can’t hold her”

Continuing on the story

Loaction… Location…

Wireshark results

What’s next?

If you found this useful

Recent Posts

Categories

Archives