Qlik Replicate – The saga of replicating to AWS Part 3 – Wireshark!

Continuing on the story

After concluding that the low TPS is not resulting from poor query performance; our attention was turned to the network latency between our OnPrem Qlik system and the AWS RDS database.

First, I asked the networks team if there were any suspect networking components between our on-premise’s Qlik server and the AWS DB.  Anything like IPS, QOS, bandwidth limitation components that could explain the slowdown.

I also asked the cloud team if they can find anything as well.

It was a high hope for them to find anything; but since they are the SMEs in the area, it was worth asking the question. 

As expected, they did not find anything.

But the Network team  did come back with a couple of pieces of information:

  • The network bandwidth to the AWS was wide enough and we were not reaching its capacity.
  • It is a 16ms – 20ms round trip from our Data centre to the AWS data centre. 

Loaction… Location…

Physically the distance to the AWS data centre is 700Km. 

Unfortunately, AWS set up a closer data centre in the past few years, which is only 130Km away.  We are not currently set up to use this new region yet.

The Network team gave me permission to install wire shark on our OnPrem Qlik server and our AWS EC2 Qlik server. 

From both servers with psql I connected to the AWS RDS database and updated one row; capturing the traffic using Wireshark.

I lined up the two results from the different servers to see if there was anything obvious

Wireshark results

(ip.src == ip.of.qlik.server and ip.dst == ip.of.aws.rds) or (ip.src == ip.of.aws.rds and ip.dst == ip.of.qlik.server)
SEQSourceDestinationProtocolLengthInfoOn Prem 2 RDSEC2 2 RDSDifference (sec)% of difference
1Qlik serverRDS DBTCP6658313 > 5432 [SYN, ECE, CWR] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM000.0000%
2RDS DBQlik serverTCP665432 > 58313 [SYN, ACK] Seq=0 Ack=1 Win=26883 Len=0 MSS=1460 SACK_PERM WS=80.0190.0010.01810%
3Qlik serverRDS DBTCP5458313 > 5432 [ACK] Seq=1 Ack=1 Win=262656 Len=00.0000.0000.0000%
4Qlik serverRDS DBPGSQL62>?0.0000.005-0.005-3%
5RDS DBQlik serverTCP605432 > 58313 [ACK] Seq=1 Ack=9 Win=26888 Len=00.0180.0000.01810%
6RDS DBQlik serverPGSQL60<0.0010.0010.0000%
7Qlik serverRDS DBTLSv1.3343Client Hello0.0040.0040.0010%
8RDS DBQlik serverTLSv1.3220Hello Retry Request0.0210.0010.02112%
9Qlik serverRDS DBTLSv1.3455Change Cipher Spec, Client Hello0.0030.0010.0021%
10RDS DBQlik serverTLSv1.3566Server Hello, Change Cipher Spec0.0230.0050.01911%
11RDS DBQlik serverTCP15145432 > 58313 [ACK] Seq=680 Ack=699 Win=29032 Len=1460 [TCP segment of a reassembled PDU]0.0000.0000.0000%
12RDS DBQlik serverTCP15145432 > 58313 [ACK] Seq=2140 Ack=699 Win=29032 Len=1460 [TCP segment of a reassembled PDU]0.0000.0000.0000%
13RDS DBQlik serverTCP15145432 > 58313 [ACK] Seq=3600 Ack=699 Win=29032 Len=1460 [TCP segment of a reassembled PDU]0.0000.0000.0000%
14RDS DBQlik serverTLSv1.3394Application Data0.0000.0000.0000%
15Qlik serverRDS DBTCP5458313 > 5432 [ACK] Seq=699 Ack=5400 Win=262656 Len=00.0000.0000.0000%
16Qlik serverRDS DBTLSv1.3112Application Data0.0030.0020.0011%
17Qlik serverRDS DBTLSv1.3133Application Data0.0000.0000.0000%
18RDS DBQlik serverTCP605432 > 58313 [ACK] Seq=5400 Ack=836 Win=29032 Len=00.0180.0000.01810%
19RDS DBQlik serverTLSv1.3142Application Data0.0010.008-0.007-4%
20RDS DBQlik serverTLSv1.3135Application Data0.0060.0030.0032%
21Qlik serverRDS DBTCP5458313 > 5432 [ACK] Seq=836 Ack=5569 Win=262400 Len=00.0000.001-0.0010%
22Qlik serverRDS DBTLSv1.3157Application Data0.0050.007-0.002-1%
23RDS DBQlik serverTLSv1.3179Application Data0.0180.0010.01810%
24Qlik serverRDS DBTLSv1.3251Application Data0.0110.0000.0116%
25RDS DBQlik serverTLSv1.3147Application Data0.0180.0000.01811%
26RDS DBQlik serverTLSv1.3433Application Data, Application Data0.0000.0000.0000%
27RDS DBQlik serverTLSv1.398Application Data0.0000.0000.0000%
28Qlik serverRDS DBTCP5458313 > 5432 [ACK] Seq=1136 Ack=6210 Win=261888 Len=00.0000.0000.0000%
29Qlik serverRDS DBTLSv1.393Application Data0.0010.0010.0010%
30RDS DBQlik serverTLSv1.3148Application Data0.0200.0010.01811%
31RDS DBQlik serverTLSv1.398Application Data0.0000.0000.0000%
32Qlik serverRDS DBTCP5458313 > 5432 [ACK] Seq=1175 Ack=6348 Win=261632 Len=00.0000.0000.0000%
33Qlik serverRDS DBTLSv1.381Application Data0.0000.0000.0000%
34Qlik serverRDS DBTLSv1.378Application Data0.0000.0000.0000%
35Qlik serverRDS DBTCP5458313 > 5432 [FIN, ACK] Seq=1226 Ack=6348 Win=261632 Len=00.0000.0000.0000%
36RDS DBQlik serverTCP605432 > 58313 [ACK] Seq=6348 Ack=1226 Win=30104 Len=00.0190.0000.01811%
37RDS DBQlik serverTCP605432 > 58313 [FIN, ACK] Seq=6348 Ack=1227 Win=30104 Len=00.0000.0000.0000%
38Qlik serverRDS DBTCP5458313 > 5432 [ACK] Seq=1227 Ack=6349 Win=261632 Len=00.0000.0000.0000%

The data from the two captures showed a couple of things:

Firstly, both systems had the same number of events captured by Wireshark.  This gives me an indication that there are no networking components from source to destination that is dropping traffic; or doing anything extra unexpected actions to the packet requests. 

I cannot say for sure what is happening on the return trip if there is anything timing out from the AWS side back.

Also, when taking the difference between the OnPrem vs the EC2 server I can see the difference of 18ms keep popping up.  I believe this is the round trip of the connection.  Since this happens multiple times; our latency is compounded into quite a significant value.

What’s next?

I am not a network engineer, so I do not have the knowledge to dive deeper into the Wireshark packets. 

It would be interesting to try the closer AWS data centre to see if the physical distance can help the latency.  But to do this will require effort from the cloud team and the project budget wouldn’t extend to this piece of work.

Our other option is to reduce the number of round trips from our OnPrem server to the AWS datacentre as much as possible.