New business requirement today. We have some old mainframe files we have to run through Qlik Replicate.
Three problems we have to overcome:
- The files are in fixed width format
- The files are in EBCDIC format; specifically Code 1047
- Inside the records there are comp-3 packed fields
The mainframe team did kindly provide us with a schema file that showed us the how many bytes make up each field so we could divide up the fixed width file by reading in a certain number of bytes per a field.
Python provided a decode function to decode the fields read to a readable format:
focus_data_ascii = focus_data.decode("cp1047").rstrip()
The hard part now was the comp-3 packed fields. They are made up with some bit magic and working with bits and shifts is not my strongest suite
I have been a bit spoilt so far working with python and most problems can be solved by “find the module to do the magic for you.”
But after ages of scouring for a module to handle the conversion for me; I had a lot of false leads – testing questionable code with no luck.
Eventually I stumbles upon:
zorchenhimer/cobol-packed-numbers.py
Thank goodness.
It still works with bits ‘n’ shift magic – but it works on the data that I have and now have readable text
I extended the code to fulfil the business requirements:
# Source https://gist.github.com/zorchenhimer/fd4d4208312d4175d106
def unpack_number(field, no_decimals):
""" Unpack a COMP-3 number. """
a = array('B', field)
value = float(0)
# For all but last digit (half byte)
for focus_half_byte in a[:-1]:
value = (value * 100) + ( ( (focus_half_byte & 0xf0) >> 4) * 10) + (focus_half_byte & 0xf)
# Last digit
focus_half_byte = a[-1]
value = (value * 10) + ((focus_half_byte & 0xf0) >> 4)
# Negative/Positve check. If 0xd; it is a negative value
if (focus_half_byte & 0xf) == 0xd:
value = value * -1
# If no_decimals = 0; it is just an int
if no_decimals == 0:
return_int = int(value)
return (return_int)
else:
return_float = value / pow(10, no_decimals)
return (return_float)