A wide problem
We had a new business requirement for our team to create a new source for a CSV through Qlik Replicate.
OK – simple enough; create the Source Connector and define the file’s schema in the table section of the connector.
But the file was 250 columns wide.
I had nightmares before of defining file schemas before in programs like SSIS where one little mistake can lead to hours of debugging.
A stroke of Luck
Luckily we knew the source of the CSV file was an output from a view of a database; so the first part of the problem was solved – we had the schema. Now to import the schema into a Qlik Replicate task.
- Create a new File Source connector and prefill as much known information as you can
- In the Tables definition of the File Source; add a dummy field
- Create a NULL Target connector
- Create a dummy task with your new file source and the Null target and save it
- Export task with End Points to save out the json definition of the task
Creating the table definition in JSON
Knowing the view definition; a simple python script can convert it to json.
import csv
import json
if __name__ == '__main__':
final_dic = {}
source_file = "FileDef.txt"
field_array = []
with open(source_file, 'r') as f:
reader = csv.reader(f, delimiter='t')
schema_items = list(reader)
for item in schema_items:
focus_item = {}
focus_item["name"] = item[0]
focus_item["nullable"] = True
focus_string = item[1]
if item[1].find("NUMBER") != -1:
focus_item["type"] = "kAR_DATA_TYPE_NUMERIC"
tokens = focus_string[focus_string.find("(") + 1: -1].split(",")
precision = tokens[0]
focus_item["precision"] = int(precision)
if len(tokens) == 2:
scale = tokens[1]
focus_item["scale"] = int(scale)
elif item[1].find("VARCHAR2") != -1: #VARCHAR2
focus_item["type"] = "kAR_DATA_TYPE_STR"
length = focus_string[focus_string.find("(") + 1 : focus_string.find("BYTE") - 1]
focus_item["length"] = int(length)
elif item[1].find("DATE") != -1:
focus_item["type"] = "kAR_DATA_TYPE_TIMESTAMP"
field_array.append(focus_item)
columns = {}
columns["columns"] = field_array
f = open("out.json", "w")
f.write(json.dumps(columns, indent=4))
f.close()
FileDef.txt
The following text file contains the schema of the view with the table delimited columns of:
- Column Name
- Data type
- Nullable
- Column order
UDF_VARCHAR1 VARCHAR2(200 BYTE) Yes 1
UDF_VARCHAR2 VARCHAR2(200 BYTE) Yes 2
UDF_VARCHAR3 VARCHAR2(200 BYTE) Yes 3
UDF_NUMERIC1 NUMBER(10,6) Yes 4
UDF_NUMERIC2 NUMBER(10,6) Yes 5
UDF_NUMERIC3 NUMBER(10,6) Yes 6
UDF_INT1 NUMBER(10,0) Yes 7
UDF_INT2 NUMBER(10,0) Yes 8
UDF_INT3 NUMBER(10,0) Yes 9
UDF_DATE1 DATE Yes 10
UDF_DATE2 DATE Yes 11
UDF_DATE3 DATE Yes 12
This particular view definition is from Oracle so uses oracle data types; but it would be simple to change the code over to use a definition from a MS SQL database or other source.
Sticking it all back together
- Open up the exported task file json
- Find the dummy field that you created in the table definition
- Overwrite the dummy field with the json created in the script file
- Save and reimport the job.
If all goes well; the File source connector will be overwritten with the full table definition.