A nice quiet evening with the family was interrupted with a critical alert coming through my phone.
Once again server patching had knocked off our Qlik Replicate node.
Logging in I could see that Enterprise manager could not reach one of our nodes with the following error message in the log:
2022-12-15 19:15:40 [ServerDto ] [ERROR] Test connection failed for server:MY_QR_CLUSTER. Message:'Unable to connect to the remote serverA connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond xxx.xxx.xxx.xxx:443'.
I have seen this problem before and it is usually resolved by:
- Failing over the QR windows cluster
- Restarting the new passive node
Our IT team is aware of the problem and have been researching into a cause and a fix. Their prevailing theory was that when the cluster gets failed over in server patching – there are some residual connections to the previous active node.
But tonight after multiple failovers and stopping and starting the QR roles – still Enterprise manager couldn’t connect to that QR node.
I did the following checks:
- The log repsrv.log log file had no error messages and the service Web UI service was active
- From the Enterprise manager; I could ping the QR cluster address and the active node successfully
- From a Chrome session on the Enterprise manager server; I could not get to the QR Web console
- From a Chrome session on the QR server; I could get to the QR Web console
A senior IT member joined the troubleshooting call and suggested that we reboot the Enterprise manager server.
So we did and then we couldn’t access Enterprise manager console.
At this point I wanted to quit IT and become a nomad in Mongolia.
Then the senior IT member worked it out.
The Windows server was randomly turning on the Windows Firewall
This was blocking our inbound connections; making the console inaccessible from other locations – except when you were logged onto the server.
This also explains why when this problem previously arise; restarting the server will eventually work because the server group policy will eventually get applied and turn off the Windows firewall.
Lessons learnt
If you come above this problem in your environment try accessing the QR console from multiple locations:
- From the Enterprise Manager server
- From within the QR server with a local address like: https://localhost/attunityreplicate/login/
Good luck