qlikreplicate

Qlik Replicate – The saga of replicating to AWS Part 1 – pg_bench

If you are new to this story; pick up the introduction from here 

50 TPS – that’s not good…

Our Stress and Volume Testing (SVT) did not turn out that great; which was an understatement. We were aiming for a minimum of 300tps, but we were only achieving 50tps.

Because we could not run the SVT process within business hours as the DB2 databases for production and development shares the same hardware; I had to write a script to simulate the changes on a MS-SQL server.

This means we could use an idle Dev MS-SQL box and run SVT testing whenever we needed it. With the new testing script, I tried various iterations of settings under Change Processing Tuning to see if there was an optimal setting. I could not get above 60tps.

Meanwhile the rest of the team contacted AWS support to see if they can see anything obvious.

AWS Support recommended using the Postgres utility “pg_bench” from various locations to work out if there is a bottle neck between On Prem and AWS RDS. 

They recommend running the benchmark for 15 minutes with different numbers of clients.

Armed with the pg_bench utility; I started performing benchmark runs against the AWS RDS database from various locations in our network, trying to work out where the bottle neck is.

Running the tests


Our network had several locations that we could run pg_bench on to assess the tps to AWS RDS.  To be more specific; to an AWS RDS Proxy.  

To start off we had to learn how to use the command first.  The benchmark database must be initialised with some tables and data.  This is done with the following command:

pgbench --host rds-database-proxy.aws.com --port 5432 --username qlikreplicate -i -s 50 dest_db

Once done; a 15min benchmark was run with the following command:

pgbench --host rds-database-proxy.aws.com --port 5432 --username qlikreplicate -l -c 2 -j 2 -T 900 dest_db

Parameters:

ParameterDescription
--hostAddress of the database/proxy
--portDatabase port
--usernameUsername to connect. Must have permissions to create objects and insert data. Since it was a Dev database I used the Qlik Replicate username
-lWrite information about each transaction to a log. not really useful and in future I would omit it
-cNumber of clients to simulate. I used 2 to replicate a couple of QR tasks
-jNumber of threads to use
-TTime in seconds to run the test for. 900 seconds = 15 minutes

Four tests were conducted:

  1. The OnPrem Qlik Replicate server to AWS RDS Proxy
  2. An EC2 machine to AWS RDS Proxy
  3. The OnPrem Qlik Replicate server to an OnPrem Postgres database.  (Actually to our Enterprise manager analytic database.  So it is not a fancy highly optimised database)
  4. An EC2 machine directly to the AWS RDS database

Why a AWS RDS Proxy?  I’m not sure of the story behind why there was a proxy between Qlik and RDS; rumours it was because the security team didn’t initially allow our On Prem services to connect directly to cloud database.  They insisted that a proxy between the services and databses.

Results

The pg_bench tests revealed several learnings to the team. But unfortunately, there were no answers to our QR latency problem.

It was a bit of grim relief it proved that QR was not the cause of our latency problem; there is a component between us and AWS RDS database that is causing the issue.

Even more unfortunate when I raise the issue to the network team thinking that there might be an IPS, QOS device that might be slowing us down; they could not find anything. I have been informed that the bandwidth between our On Prem and AWS is broad and fast and there should not be any obvious chocking the tps. Grumpily I mused that households can now stream multiple 4K videos at home without a frame dropped while we can’t send a few database changes from our datacentre to an AWS database at reasonable speeds

Interestingly the RDS proxy was a significant bottleneck when we started questioning its purpose in the pipeline. Even the AWS support team was scratching their heads of the purpose of it in the use case we had. Later on, I tried OnPrem to the direct RDS; but that made no difference to the tps. 

Lessons learnt

  • Validate poor latency through Qlik Replicate via another tool.  In this instance pg_bench took QR right out of the equation
  • Don’t use AWS RDS proxy for Qlik Replicate unless you have a critical use case.  Performance test thoroughly your design choices

Qlik Enterprise Manager – Upgrading from 2021.05 to 2022.11

As a business requirement of developers wanting to take advantage of new features in Qlik Enterprise Manager (QEM); we had to upgrade our system from 2021.05 to 2022.11.

This was not a smooth process and encountered several issues on the way.

First Attempt – Straight Upgrade

Simple to upgrade?

Just download the latest installer from Qlik’s website and kick it off?

Unfortunately when I gave developers keys to our dev QEM box to do the upgrade before Christmas as I was busy with other tasks.  

They gave it a go without a backout plan.  

Couple of days later I tried logging into the QEM console and couldn’t get on.

The AttunityEnterpriseManager service kept restarting every minute and the following errors were in the  C:\Program Files\Attunity\Enterprise Manager\data\logs\EnterpriseManager.log

[START: 2023-03-01 09:40:18 Attunity EnterpriseManager Version: 2022.11.0.335]
1 2023-01-02 09:40:18 [Command ] [INFO ] Executing ServiceRunCommand command.
6 2023-01-02 09:40:19 [Host ] [INFO ] Setting up web host
6 2023-01-02 09:40:20 [Host ] [INFO ] The server will listen on the following location: http://xxxxxxx.xxx.xxx/attunityenterprisemanager
6 2023-01-02 09:40:20 [Host ] [INFO ] The server will listen on the following location: https://xxxxxx.xxx.xxx/attunityenterprisemanager
6 2023-01-02 09:40:20 [Host ] [ERROR] Object reference not set to an instance of an object.
6 2023-01-02 09:40:20 [Host ] [INFO ] Stopping service on error...

I tried the usual desperation tricks of restarting the service and then restarting the box.

In the end I had to uninstall QEM and reinstall 2021.05 as there was no backout plan.

The developers were then subsequently banned from doing anymore work on the QEM upgrade

Second attempt – Two jump upgrade

Next attempt was upgrading to 2022.05 and then jumping to 2022.11.

This time I took a backup of C:Program FilesAttunityEnterprise Managerdata so I could easily backout the change if I had problems.

The upgrade from 2021.05 to 2022.05 went smoothly; but when jumping from 2022.05 to 2022.11 the same error occurred.

I then raised a case with Qlik and also posted on their forum asking if anyone else had come across this problem.

Final Success

With help from Qlik support we developed a upgrade work around.  Qlik said that there is a patch in the works as other users have had the same problem.

If you follow along – note that this method might differ for your system.  In the three upgrades I have done internally; they all differed slightly.

Make sure you TEST and HAVE A BACKOUT PLAN

Upgrade Perquisites

This upgrade was done on a Windows 2019 server.

You will need:

  • Admin rights to QEM’s server
  • Licenses for QEM and Analytics
  • Username/Password that AEM connects to the analytics database
  • Privilege user account for to the analytics database
  • Password for the user that you use to connect QR nodes to QEM
  • A SQLite database client
  • Handy to have – an editor that can validate json format

Method

  1. Log on to QEM console
  2. For each server under the server tab – Right click and go to “Stop Monitoring”
  3. Remote desktop to the QEM server
  4. Stop the AttunityEnterpriseManager service
  5. Back up C:\Program Files\Attunity\Enterprise Manager\data
  6. Run the upgrade to QEM 2022.05
  7. Once the upgrade is done; log onto QEM and ensure that version 2022.05 is working correctly
  8. Back up C:Program FilesAttunityEnterprise Managerdata again
  9. Stop the AttunityEnterpriseManager service again
  10. Run following command from the command line:
    cd "C:\Program Files\Attunity\Enterprise Manager\bin"
    aemctl repository export -f C:\temp\qem_repo_202205.json -w

  11. Script out the postgres database with the following commands:
    cd C:\Program Files\PostgreSQL10\bin
    pg_dump -U <username> -s -C -c <analyticdbname> > C:\temp\analyticdbname.sql

  12. Edit C:\temp\cdcanalyticsp.sql.  Find and replace “<analyticdbname>” with “cdcanalytics_2022_11_p”.  Save the file
  13. Run the following command to create a new temporary database on the postgres server:
    psql -d postgres -U <username> -f C:\temp\cdcanalyticsp.sql

  14. This is where the problem lies in our instance.  Open up C:tempqem_repo_202205.json
    Find any code blocks with “type”: “AttunityLicense” in it and remove the whole code block.
    1. On our dev system it was the section with “name”: “Replication Analytics”, but in prod it was section with “name”: “Replication Management”.  
    2. You may need to experiment to find the correct section
  15. Go to Add/Remove and uninstall Enterprise manager
  16. Once remove – delete the the folder C:\Program Files\Attunity\Enterprise Manager\data
  17. Install QEM 2022.11.  When prompted do not install Postgres
  18. Log into QEM and check that it is up and running correctly.
  19. Stop the AttunityEnterpriseManager service again
  20. Run the following command:
    aemctl repository import -f C:\temp\qem_repo_202205_Fix.json -w

  21. Start the AEM service and check that it is running correctly
  22. Re-add the QEM and Analytics license 
  23. Go to Settings -> Repository connections.
    Enter in username password set the database as cdcanalytics_2022_11_p
  24. Intialize the repository and start the connector
  25. Stop the AttunityEnterpriseManager service again
  26. Open up C:\Program Files\Attunity\Enterprise Manager\data\Java\GlobalRepository.sqlite in an SQLite editor
  27. Browse the table objects
    Look for name = “AnalyticsConfiguration” and modify the json value “cdcanalytics_2022_11_p” to the original analytics database name
  28. Save SQLite database
  29. Start the AEM service and check that it is running correctly
  30. Go to the server tab.  Edit each server and re-add the password.  Click on the button “Monitor this server’s tasks and messages”.  Test the connection and click OK
  31. PVT QEM

If things go wrong

Backout process

  1. Uninstall QEM
  2. Delete the folder C:\Program Files\Attunity\Enterprise Manager\data
  3. Reinstall either QEM 2021.05 or QEM 2022.05
  4. Stop the AttunityEnterpriseManager service
  5. Restore the appropriate backed up data folder to C:\Program Files\Attunity\Enterprise Manager\data
  6. Restart the AEM service