The Story so far:
For those who are following from home:
We deduced that replicating through the AWS RDS proxy was a bad idea for performance; but even going directly to the AWS RDS Postgres database from our on-prem system did not give us any noticeable performance gains.
It was time to try looking in different area.
Raw SQL
In the squad investigating the problem, someone asked, “What SQL is Qlik running on the Postgres server? Is it poorly optimised?”
To find out what Qlik Replicate is running; I bumped the TARGET_APPLY up to Verbose, let transactions run through for a couple of minutes and then assessed the log file. Since 99% of the changes are updates on the source database (we know this from a production job that is running on the same source database); our logs were full of update statements like this:
UPDATE dbo.my_dest_table
SET key_1 = ?,
key_2 = ?,
key_3 = ?,
field_1 = ?,
field_2 = ?,
field_3 = ?
WHERE
key_1 = ? AND
key_2 = ? AND
key_3 = ?;
The first thing I noticed was – Qlik was updating ALL the columns, even if only one column’s data changed on the source database.
This also included the Primary key.
Primary keys are computationally expensive updating compared to just your regular day to day columns. As for an example, I wrote a simple POC in t-sql:
/*
CREATE TABLE dbo.my_dest_table
(
key_1 SMALLDATETIME,
key_2 INT,
key_3 VARCHAR(10),
field_1 INT,
field_2 VARCHAR(256),
field_3 SMALLDATETIME,
PRIMARY KEY(key_1, key_2, key_3)
)
*/
/*
UPDATE dbo.my_dest_table
SET key_1 = ?,
key_2 = ?,
key_3 = ?,
field_1 = ?,
field_2 = ?,
field_3 = ?
WHERE
key_1 = ? AND
key_2 = ? AND
key_3 = ?;
*/
/*
INSERT INTO dbo.my_dest_table VALUES ('2024-01-01', 1, 'key 3', 1, 2, '2023-03-03');
INSERT INTO dbo.my_dest_table VALUES ('2024-01-01', 2, 'key 3', 2, 3, '2024-04-04');
*/
SET STATISTICS TIME ON;
UPDATE dbo.my_dest_table
SET key_1 = '2024-01-01',
key_2 = 1,
key_3 = 'key 3',
field_1 = 10,
field_2 = 2,
field_3 = '2023-03-03'
WHERE
key_1 = '2024-01-01' AND
key_2 = 1 AND
key_3 = 'key 3';
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 41 ms. <------- Oh dear
(1 row affected)
*/
UPDATE dbo.my_dest_table
SET field_1 = 10,
field_2 = 3,
field_3 = '2024-04-04'
WHERE
key_1 = '2024-01-01' AND
key_2 = 2 AND
key_3 = 'key 3';
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 6 ms.
(1 row affected)
*/
As demonstrated updating the PK columns (even with the same data) is eight times more expensive than just updating the other data columns.
Asking for help
I asked for help on the Qlik Community forum. I got a response back from the specialists that there is a setting in Qlik’s endpoints that will NOT update the Primary key.
The setting is:
$info.query_syntax.pk_segments_not_updateable
Set to TRUE
I made this change in the endpoint, restarted the task and captured a few more update sql samples. With the change in the settings; the updates now discarded the updates to the PK.
BUT…
This might be a helpful solution and worthy to note for the future; but in this case the solution did not really help us out. After making the change we noticed no noticeable change in our TPS throughput; confirming increasingly it was a network issue.
I could see also other problems with this.
The source system that we are reading from is exceptionally large, have been entrenched in our organisation for 30 years and there is constant work done in the background as other core system are merged into it. We as a downstream user; cannot guarantee the rules around updating Primary Keys. Gut feeling in day-to-day operation that once the PKs are set, they are set in stone in the database. But to base our data integrity for critical reporting on accuracy of data on a “Gut feeling” is not a good data integrity model.
Who to say that the owners of the source system perform an update to a PK to fix an incident; or a new source system consolidation have updates in the steps? Our change management in the organisation is not tight and detailed enough for us to be forewarned of changes like this.
If data integrity were not so critical; we could run a full refresh periodically to reset the data to a baseline to capture updates of PK. But this will require significant rework of the solution to manage these periodical refreshes.
So; we did learn. We did assess. But this didn’t help us out.