jonny.donker@gmail.com

The Fake LEFT OUTER JOIN

In my role I review a wide variety of SQL code.

Some is highly professional that is above my skill level so I absorb as much learning as I can from it.

On the other side is self taught SQLers from the business world who are trying to build weird and wonderful reports with SQL held together with sticky tape and don’t truly comprehend the code that is written.

An example – I see the following a lot with people who don’t understand the data and trying to program stability in their code like this:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON   C.BDATE = @DATE
     C.CUST_ID = A.CUST_ID 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX';

This means if for some reason if there is an integrity problem between ACCOUNTS and CUSTOMER; they will still get the account data on the report.

The problem arises when the business requirement of the report evolve and the developer needs to filters on the left joined table. 

For instance – “Report on Accounts on type XX but remove deceased customers so we don’t ring them” 

I quite often see this:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON   C.BDATE = @DATE
     C.CUST_ID = A.CUST_ID 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX' 

    AND

    C.DECEASED = 'N';

Without knowing it; the developer has broken their “Get all account XX report” business rule by turning their LEFT OUTER JOIN into effectively a INNER JOIN.

The correct way to write the query is:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON  C.BDATE = @DATE
    C.CUST_ID = A.CUST_ID 

    -------------------------------- 
    AND

    C.DECEASED = 'N'
    -------------------------------- 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX' ;

You can get a good gauge of the skill level of the developer by checking their LEFT OUTER JOINS.  If they using them on every join where you know there is integrity between the two data objects; you can have an educated guess that you’re reviewing a novice developer.

This is where you have to pay extra attention to their joins and where predicates and make them aware of the consequences of the code.

Qlik Replicate – The Wide Text file problem

A wide problem

We had a new business requirement for our team to create a new source for a CSV through Qlik Replicate.

OK – simple enough; create the Source Connector and define the file’s schema in the table section of the connector.

But the file was 250 columns wide.

I had nightmares before of defining file schemas before in programs like SSIS where one little mistake can lead to hours of debugging.

A stroke of Luck

Luckily we knew the source of the CSV file was an output from a view of a database; so the first part of the problem was solved – we had the schema.  Now to import the schema into a Qlik Replicate task.

  1. Create a new File Source connector and prefill as much known information as you can
  2. In the Tables definition of the File Source; add a dummy field
  3. Create a NULL Target connector
  4. Create a dummy task with your new file source and the Null target and save it
  5. Export task with End Points to save out the json definition of the task

Creating the table definition in JSON

Knowing the view definition; a simple python script can convert it to json.

import csv
import json

if __name__ == '__main__':
  final_dic = {}
  source_file = "FileDef.txt"
  field_array = []

    with open(source_file, 'r') as f:
      reader = csv.reader(f, delimiter='t')
      schema_items = list(reader)

        for item in schema_items:
          focus_item = {}
          focus_item["name"] = item[0]
          focus_item["nullable"] = True
          focus_string = item[1]

            if item[1].find("NUMBER") != -1:
              focus_item["type"] = "kAR_DATA_TYPE_NUMERIC"
              tokens = focus_string[focus_string.find("(") + 1: -1].split(",")

                precision = tokens[0]
              focus_item["precision"] = int(precision)

                if len(tokens) == 2:
                  scale = tokens[1]
                  focus_item["scale"] = int(scale)

            elif item[1].find("VARCHAR2") != -1:  #VARCHAR2
              focus_item["type"] = "kAR_DATA_TYPE_STR"
              length = focus_string[focus_string.find("(") + 1 : focus_string.find("BYTE") - 1]
              focus_item["length"] = int(length)

            elif item[1].find("DATE") != -1:
              focus_item["type"] = "kAR_DATA_TYPE_TIMESTAMP"

          field_array.append(focus_item)

        columns = {}
      columns["columns"] = field_array

    f = open("out.json", "w")
  f.write(json.dumps(columns, indent=4))
  f.close()

FileDef.txt

The following text file contains the schema of the view with the table delimited columns of:

  • Column Name
  • Data type
  • Nullable
  • Column order
UDF_VARCHAR1 VARCHAR2(200 BYTE) Yes 1
UDF_VARCHAR2 VARCHAR2(200 BYTE) Yes 2
UDF_VARCHAR3 VARCHAR2(200 BYTE) Yes 3
UDF_NUMERIC1 NUMBER(10,6) Yes 4
UDF_NUMERIC2 NUMBER(10,6) Yes 5
UDF_NUMERIC3 NUMBER(10,6) Yes 6
UDF_INT1 NUMBER(10,0) Yes 7
UDF_INT2 NUMBER(10,0) Yes 8
UDF_INT3 NUMBER(10,0) Yes 9
UDF_DATE1 DATE Yes 10
UDF_DATE2 DATE Yes 11
UDF_DATE3 DATE Yes 12

This particular view definition is from Oracle so uses oracle data types; but it would be simple to change the code over to use a definition from a MS SQL database or other source.

Sticking it all back together

  1. Open up the exported task file json
  2. Find the dummy field that you created in the table definition
  3. Overwrite the dummy field with the json created in the script file
  4. Save and reimport the job.

If all goes well; the File source connector will be overwritten with the full table definition.