jonny.donker@gmail.com

Python – Unpacking a COMP-3 number

New business requirement today.  We have some old mainframe files we have to run through Qlik Replicate.

Three problems we have to overcome:

  1. The files are in fixed width format
  2. The files are in EBCDIC format; specifically Code 1047
  3. Inside the records there are comp-3 packed fields 

The mainframe team did kindly provide us with a schema file that showed us the how many bytes make up each field so we could divide up the fixed width file by reading in a certain number of bytes per a field.

Python provided a decode function to decode the fields read to a readable format:

focus_data_ascii = focus_data.decode("cp1047").rstrip()

The hard part now was the comp-3 packed fields. They are made up with some bit magic and working with bits and shifts is not my strongest suite  

I have been a bit spoilt so far working with python and most problems can be solved by “find the module to do the magic for you.”

But after ages of scouring for a module to handle the conversion for me; I had a lot of false leads – testing questionable code with no luck.

Eventually I stumbles upon:

zorchenhimer/cobol-packed-numbers.py

Thank goodness.

It still works with bits ‘n’ shift magic – but it works on the data that I have and now have readable text 

I extended the code to fulfil the business requirements:

# Source https://gist.github.com/zorchenhimer/fd4d4208312d4175d106
def unpack_number(field, no_decimals):
    """ Unpack a COMP-3 number. """
    a = array('B', field)
    value = float(0)

    # For all but last digit (half byte)
    for focus_half_byte in a[:-1]:
        value = (value * 100) + ( ( (focus_half_byte & 0xf0) >> 4) * 10) + (focus_half_byte & 0xf)

    # Last digit
    focus_half_byte = a[-1]
    value = (value * 10) + ((focus_half_byte & 0xf0) >> 4)

    # Negative/Positve check.  If 0xd; it is a negative value
    if (focus_half_byte & 0xf) == 0xd:
        value = value * -1

    # If no_decimals = 0; it is just an int
    if no_decimals == 0:
        return_int = int(value)
        return (return_int)
    else:
        return_float = value / pow(10, no_decimals)
        return (return_float)

Qlik Replicate – “SYS-E-HTTPFAIL, SYS-E-UNRECREQ, Unrecognized request pattern” problem

After a weekend of patching in our dev environment; we came in to discover that half of our Qlik Replicate Nodes were offline in Enterprise manager with the following error message

4 2022-08-08 13:19:24 [ServerDto      ] [ERROR] Test connection failed for server:MY_QRNODE. Message:'SYS-E-HTTPFAIL, SYS-E-UNRECREQ, Unrecognized request pattern 'GET: /servers/local/adusers/devdomain/qlik_user?directory_lookup=false'..'.

We could still access the node through their individual node’s web gui and the tasks were still happily running in the background.  The other half of our QR nodes were still online in Enterprise manager

The usual troubleshooting of restarting the services and restarting the server didn’t fix the problem; and the patching team washed their hands of the problem.  (I can’t really list the patches they applied due to security issues)

A database administrator mentioned it might be a TLS problem as the server team has been changing TLS settings in various places.  This lead me to comparing the connection settings between a QR node that was working to not working (we checked other things like patch levels etc).

Some servers had the username in domainusername format; while others didn’t.  Coincidentally the ones who had the  domainusername format were the ones not working,

Removing the domain section of the username resolved the problem.

So not sure what in the patching caused this issue; but something we have to check on our prod servers before the patches are applied to them.

The Fake LEFT OUTER JOIN

In my role I review a wide variety of SQL code.

Some is highly professional that is above my skill level so I absorb as much learning as I can from it.

On the other side is self taught SQLers from the business world who are trying to build weird and wonderful reports with SQL held together with sticky tape and don’t truly comprehend the code that is written.

An example – I see the following a lot with people who don’t understand the data and trying to program stability in their code like this:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON   C.BDATE = @DATE
     C.CUST_ID = A.CUST_ID 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX';

This means if for some reason if there is an integrity problem between ACCOUNTS and CUSTOMER; they will still get the account data on the report.

The problem arises when the business requirement of the report evolve and the developer needs to filters on the left joined table. 

For instance – “Report on Accounts on type XX but remove deceased customers so we don’t ring them” 

I quite often see this:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON   C.BDATE = @DATE
     C.CUST_ID = A.CUST_ID 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX' 

    AND

    C.DECEASED = 'N';

Without knowing it; the developer has broken their “Get all account XX report” business rule by turning their LEFT OUTER JOIN into effectively a INNER JOIN.

The correct way to write the query is:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON  C.BDATE = @DATE
    C.CUST_ID = A.CUST_ID 

    -------------------------------- 
    AND

    C.DECEASED = 'N'
    -------------------------------- 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX' ;

You can get a good gauge of the skill level of the developer by checking their LEFT OUTER JOINS.  If they using them on every join where you know there is integrity between the two data objects; you can have an educated guess that you’re reviewing a novice developer.

This is where you have to pay extra attention to their joins and where predicates and make them aware of the consequences of the code.

Qlik Replicate – The Wide Text file problem

A wide problem

We had a new business requirement for our team to create a new source for a CSV through Qlik Replicate.

OK – simple enough; create the Source Connector and define the file’s schema in the table section of the connector.

But the file was 250 columns wide.

I had nightmares before of defining file schemas before in programs like SSIS where one little mistake can lead to hours of debugging.

A stroke of Luck

Luckily we knew the source of the CSV file was an output from a view of a database; so the first part of the problem was solved – we had the schema.  Now to import the schema into a Qlik Replicate task.

  1. Create a new File Source connector and prefill as much known information as you can
  2. In the Tables definition of the File Source; add a dummy field
  3. Create a NULL Target connector
  4. Create a dummy task with your new file source and the Null target and save it
  5. Export task with End Points to save out the json definition of the task

Creating the table definition in JSON

Knowing the view definition; a simple python script can convert it to json.

import csv
import json

if __name__ == '__main__':
  final_dic = {}
  source_file = "FileDef.txt"
  field_array = []

    with open(source_file, 'r') as f:
      reader = csv.reader(f, delimiter='t')
      schema_items = list(reader)

        for item in schema_items:
          focus_item = {}
          focus_item["name"] = item[0]
          focus_item["nullable"] = True
          focus_string = item[1]

            if item[1].find("NUMBER") != -1:
              focus_item["type"] = "kAR_DATA_TYPE_NUMERIC"
              tokens = focus_string[focus_string.find("(") + 1: -1].split(",")

                precision = tokens[0]
              focus_item["precision"] = int(precision)

                if len(tokens) == 2:
                  scale = tokens[1]
                  focus_item["scale"] = int(scale)

            elif item[1].find("VARCHAR2") != -1:  #VARCHAR2
              focus_item["type"] = "kAR_DATA_TYPE_STR"
              length = focus_string[focus_string.find("(") + 1 : focus_string.find("BYTE") - 1]
              focus_item["length"] = int(length)

            elif item[1].find("DATE") != -1:
              focus_item["type"] = "kAR_DATA_TYPE_TIMESTAMP"

          field_array.append(focus_item)

        columns = {}
      columns["columns"] = field_array

    f = open("out.json", "w")
  f.write(json.dumps(columns, indent=4))
  f.close()

FileDef.txt

The following text file contains the schema of the view with the table delimited columns of:

  • Column Name
  • Data type
  • Nullable
  • Column order
UDF_VARCHAR1 VARCHAR2(200 BYTE) Yes 1
UDF_VARCHAR2 VARCHAR2(200 BYTE) Yes 2
UDF_VARCHAR3 VARCHAR2(200 BYTE) Yes 3
UDF_NUMERIC1 NUMBER(10,6) Yes 4
UDF_NUMERIC2 NUMBER(10,6) Yes 5
UDF_NUMERIC3 NUMBER(10,6) Yes 6
UDF_INT1 NUMBER(10,0) Yes 7
UDF_INT2 NUMBER(10,0) Yes 8
UDF_INT3 NUMBER(10,0) Yes 9
UDF_DATE1 DATE Yes 10
UDF_DATE2 DATE Yes 11
UDF_DATE3 DATE Yes 12

This particular view definition is from Oracle so uses oracle data types; but it would be simple to change the code over to use a definition from a MS SQL database or other source.

Sticking it all back together

  1. Open up the exported task file json
  2. Find the dummy field that you created in the table definition
  3. Overwrite the dummy field with the json created in the script file
  4. Save and reimport the job.

If all goes well; the File source connector will be overwritten with the full table definition.