Qlik Replicate – Fight of the filters. Who will prevail?

Background

The business is doing their best to keep me on my toes with Qlik Replicate; finding new was to bend and stretch the system and consequently my sanity.

The initial request was, “Can we overwrite this field in a Qlik Replicate task with a SOURCE_LOOKUP?”

OK – we can do this.  I abhor putting ETL logic in Qlik Replicate tasks and wanting to keep them as simple as possible and allow the power and the flexibility of the downstream systems to manipulate data.

But project timelines were pressing, and I complied with their request.

Later, they came back to me and requested a to add a filter to the derived field in question.

And that led to me and our Tech Business analyst scratching our heads. 

If we apply a filter to our focus field; will it use the raw field that is in the table? Or will is use the new lookup field with the same name to base the filter on?

Testing the filters – setting up

To start with; some simple tables in MS-SQL:

CREATE TABLE dbo.TEST_LOOKUP
(
	ACCOUNT_ID INT PRIMARY KEY,
	FRUIT_ID INT,
	FRUIT_NAME VARCHAR(100),
	SOURCE_NAME VARCHAR(100)
);

GO

CREATE TABLE dbo.FRUITS
(
	FRUIT_ID INT PRIMARY KEY,
	NEW_FRUIT_NAME VARCHAR(100)
)

GO

INSERT INTO dbo.FRUITS VALUES(1, 'NEW APPLES');
INSERT INTO dbo.FRUITS VALUES(2, 'NEW ORANGES');

A simple Qlik Replicate task was created to replicate from the table dbo.TEST_LOOKUP.

All columns were brought across instead of FRUIT_NAME. FRUIT_NAME will be overwritten with the source lookup:

source_lookup('NO_CACHING','dbo','FRUITS','NEW_FRUIT_NAME','FRUIT_ID =?',$FRUIT_ID)

To test; a simple insert was added to ensure that the source lookup is working correctly:

INSERT INTO dbo.TEST_LOOKUP VALUES(1, 1, 'OLD APPLES', 'Truck');

Result:

{
    "magic": "atMSG",
    "type": "DT",
    "headers": null,
    "messageSchemaId": null,
    "messageSchema": null,
    "message": {
        "data": {
            "ACCOUNT_ID": 1,
            "FRUIT_ID": 1,
            "SOURCE_NAME": "Truck",
            "FRUIT_NAME": "NEW APPLES"
        },
        "beforeData": null,
        "headers": {
            "operation": "INSERT",
            "changeSequence": "20240529060703760000000000000000005",
            "timestamp": "2024-05-29T06:07:03.767",
            "streamPosition": "0071a49f:000f8e09:001c",
            "transactionId": "6EDBA1FA0E0000000000000000000000",
            "changeMask": "0F",
            "columnMask": "0F",
            "transactionEventCounter": 1,
            "transactionLastEvent": true
        }
    }
}

Everything is working correctly; FRUIT_NAMES got overwritten with “NEW APPLES” in the json output.

Testing the filters – placing bets

In the CDC task; a new filter was added:

$FRUIT_NAME == 'NEW ORANGES'

And the following SQL statement was run on the source system:

INSERT INTO dbo.TEST_LOOKUP VALUES(2, 2, 'OLD ORANGES', 'Fridge');

So – If Qlik Replicate filters on the base table’s field; the change WILL NOT be replicated through.

Likewise if Qlik Replicate is using the new derived field for filter; the change WILL come through.

And the results are…

{
    "magic": "atMSG",
    "type": "DT",
    "headers": null,
    "messageSchemaId": null,
    "messageSchema": null,
    "message": {
        "data": {
            "ACCOUNT_ID": 2,
            "FRUIT_ID": 2,
            "SOURCE_NAME": "Fridge",
            "FRUIT_NAME": "NEW ORANGES"
        },
        "beforeData": null,
        "headers": {
            "operation": "INSERT",
            "changeSequence": "20240529061434050000000000000000065",
            "timestamp": "2024-05-29T06:14:34.050",
            "streamPosition": "0071a49f:000f901a:0005",
            "transactionId": "28DCA1FA0E0000000000000000000000",    
            "changeMask": "17",
            "columnMask": "17",
            "transactionEventCounter": 1,
            "transactionLastEvent": true
        }
    }
}

Qlik will use the derived source lookup field over the original field in the table.

Conclusion

Once again this highlights the danger of putting ETL code into Qlik Replicate tasks. It obscures business rules and can lead to confusion in operations like the scenario above.

It is best to use Qlik Replicate to get the data out of the source database as quickly and as pure as possible and then use the power of the downstream systems to manipulate the data.