Qlik Replicate

Why can’t Qlik Replicate and Enterprise manager be friends?

A nice quiet evening with the family was interrupted with a critical alert coming through my phone.

Once again server patching had knocked off our Qlik Replicate node.

Logging in I could see that Enterprise manager could not reach one of our nodes with the following error message in the log:

2022-12-15 19:15:40 [ServerDto ] [ERROR] Test connection failed for server:MY_QR_CLUSTER. Message:'Unable to connect to the remote serverA connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond xxx.xxx.xxx.xxx:443'.

I have seen this problem before and it is usually resolved by:

  1. Failing over the QR windows cluster
  2. Restarting the new passive node

Our IT team is aware of the problem and have been researching into a cause and a fix.  Their prevailing theory was that when the cluster gets failed over in server patching – there are some residual connections to the previous active node.

But tonight after multiple failovers and stopping and starting the QR roles – still Enterprise manager couldn’t connect to that QR node.

I did the following checks:

  1. The log repsrv.log log file had no error messages and the service Web UI service was active
  2. From the Enterprise manager; I could ping the QR cluster address and the active node successfully 
  3. From a Chrome session on the Enterprise manager server; I could not get to the QR Web console
  4. From a Chrome session on the QR server; I could get to the QR Web console

A senior IT member joined the troubleshooting call and suggested that we reboot the Enterprise manager server.

So we did and then we couldn’t access Enterprise manager console.

At this point I wanted to quit IT and become a nomad in Mongolia.

Then the senior IT member worked it out.

The Windows server was randomly turning on the Windows Firewall

This was blocking our inbound connections; making the console inaccessible from other locations – except when you were logged onto the server.

This also explains why when this problem previously arise; restarting the server will eventually work because the server group policy will eventually get applied and turn off the Windows firewall. 

Lessons learnt

If you come above this problem in your environment try accessing the QR console from multiple locations:

  • From the Enterprise Manager server
  • From within the QR server with a local address like: https://localhost/attunityreplicate/login/

Good luck

Qlik Replicate – “Json doesn’t start with ‘{‘ [xxxxxxx] (at_cjson.c:1773)” error

Had a shocker of a week.

You know those weeks; where everything went wrong.

Busy fixing other systems; after Operating system patching done by our IT team I didn’t look closely to our Qlik Replicate nodes that have been running smoothly over the past year 

After all there were no alerts; and a quick glance all our tasks were in a running status and none were in suspended or error status.

Next day One of our junior admin the pointed out some Qlik tasks using a high amount of memory.

I looked in and my stomach dropped.

Although the task was “green” and running; no changes were getting through to the destinations (AWS S3 and GCS).  The log file was filled with errors like:

00002396: YYYY-MM-DDT15:21:14 [AT_GLOBAL ]E: Json doesn't start with '{' [xxxxxxx] (at_cjson.c:1773)
00002396: YYYY-MM-DDT15:21:14 [AT_GLOBAL ]E: Cannot parse json: [xxxxxxx(at_protobuf.c:1420)

And hundreds and thousands of transactions were waiting to be written out.

The problem only existed on one QR cluster and only jobs that were writing to AWS S3 and GCS; the Kafka one was fine.  The other QR clusters were running fine

The usual “Turn it off and on again” didn’t work in either stopping or resuming the task; or restarting the server.

In the end I contacted Qlik Supported.

They hypothesised that the blanked patching caused the Qlik Replicate cluster to fail over and corrupt the captured changes stored up waiting to be written out in the next batch process.  When QR tried to read the captured changes – the json was corrupted.

Their fix strategy was:

  1. Stop the task
  2. Using the log file; find out the last successful time or stream position that the task.  This is usually found at the end of the log files.
  3. Using the Run -> Advance Run option; restart the task from the time last written out.

If this didn’t work; the recommended rebuilding the whole task and following the above steps

Luckily their first steps worked.  After finding the correct timestamps we could restart the QR tasks from the correct position.

Now looking into some alerting to prevent this problem again.

Qlik Replicate – “SYS-E-HTTPFAIL, SYS-E-UNRECREQ, Unrecognized request pattern” problem

After a weekend of patching in our dev environment; we came in to discover that half of our Qlik Replicate Nodes were offline in Enterprise manager with the following error message

4 2022-08-08 13:19:24 [ServerDto      ] [ERROR] Test connection failed for server:MY_QRNODE. Message:'SYS-E-HTTPFAIL, SYS-E-UNRECREQ, Unrecognized request pattern 'GET: /servers/local/adusers/devdomain/qlik_user?directory_lookup=false'..'.

We could still access the node through their individual node’s web gui and the tasks were still happily running in the background.  The other half of our QR nodes were still online in Enterprise manager

The usual troubleshooting of restarting the services and restarting the server didn’t fix the problem; and the patching team washed their hands of the problem.  (I can’t really list the patches they applied due to security issues)

A database administrator mentioned it might be a TLS problem as the server team has been changing TLS settings in various places.  This lead me to comparing the connection settings between a QR node that was working to not working (we checked other things like patch levels etc).

Some servers had the username in domainusername format; while others didn’t.  Coincidentally the ones who had the  domainusername format were the ones not working,

Removing the domain section of the username resolved the problem.

So not sure what in the patching caused this issue; but something we have to check on our prod servers before the patches are applied to them.

Qlik Replicate – The Wide Text file problem

A wide problem

We had a new business requirement for our team to create a new source for a CSV through Qlik Replicate.

OK – simple enough; create the Source Connector and define the file’s schema in the table section of the connector.

But the file was 250 columns wide.

I had nightmares before of defining file schemas before in programs like SSIS where one little mistake can lead to hours of debugging.

A stroke of Luck

Luckily we knew the source of the CSV file was an output from a view of a database; so the first part of the problem was solved – we had the schema.  Now to import the schema into a Qlik Replicate task.

  1. Create a new File Source connector and prefill as much known information as you can
  2. In the Tables definition of the File Source; add a dummy field
  3. Create a NULL Target connector
  4. Create a dummy task with your new file source and the Null target and save it
  5. Export task with End Points to save out the json definition of the task

Creating the table definition in JSON

Knowing the view definition; a simple python script can convert it to json.

import csv
import json

if __name__ == '__main__':
  final_dic = {}
  source_file = "FileDef.txt"
  field_array = []

    with open(source_file, 'r') as f:
      reader = csv.reader(f, delimiter='t')
      schema_items = list(reader)

        for item in schema_items:
          focus_item = {}
          focus_item["name"] = item[0]
          focus_item["nullable"] = True
          focus_string = item[1]

            if item[1].find("NUMBER") != -1:
              focus_item["type"] = "kAR_DATA_TYPE_NUMERIC"
              tokens = focus_string[focus_string.find("(") + 1: -1].split(",")

                precision = tokens[0]
              focus_item["precision"] = int(precision)

                if len(tokens) == 2:
                  scale = tokens[1]
                  focus_item["scale"] = int(scale)

            elif item[1].find("VARCHAR2") != -1:  #VARCHAR2
              focus_item["type"] = "kAR_DATA_TYPE_STR"
              length = focus_string[focus_string.find("(") + 1 : focus_string.find("BYTE") - 1]
              focus_item["length"] = int(length)

            elif item[1].find("DATE") != -1:
              focus_item["type"] = "kAR_DATA_TYPE_TIMESTAMP"

          field_array.append(focus_item)

        columns = {}
      columns["columns"] = field_array

    f = open("out.json", "w")
  f.write(json.dumps(columns, indent=4))
  f.close()

FileDef.txt

The following text file contains the schema of the view with the table delimited columns of:

  • Column Name
  • Data type
  • Nullable
  • Column order
UDF_VARCHAR1 VARCHAR2(200 BYTE) Yes 1
UDF_VARCHAR2 VARCHAR2(200 BYTE) Yes 2
UDF_VARCHAR3 VARCHAR2(200 BYTE) Yes 3
UDF_NUMERIC1 NUMBER(10,6) Yes 4
UDF_NUMERIC2 NUMBER(10,6) Yes 5
UDF_NUMERIC3 NUMBER(10,6) Yes 6
UDF_INT1 NUMBER(10,0) Yes 7
UDF_INT2 NUMBER(10,0) Yes 8
UDF_INT3 NUMBER(10,0) Yes 9
UDF_DATE1 DATE Yes 10
UDF_DATE2 DATE Yes 11
UDF_DATE3 DATE Yes 12

This particular view definition is from Oracle so uses oracle data types; but it would be simple to change the code over to use a definition from a MS SQL database or other source.

Sticking it all back together

  1. Open up the exported task file json
  2. Find the dummy field that you created in the table definition
  3. Overwrite the dummy field with the json created in the script file
  4. Save and reimport the job.

If all goes well; the File source connector will be overwritten with the full table definition.