SQL

SQL: Quick and dirty comparing result sets

When doing regression testing – I need to prove that the changes that I have done to two queries returned the same result sets.

If result sets are on the same database then you can do a MINUS operator. 

If not – it can be a pain to shift result sets from database to database; especially if they are large.

For quick and dirty testing; I like using the tsql function CHECKSUM and pl sql function ORA_HASH.  It can give you some quick results to test whether a column of data has a high probability of been the same as another column.

To test a result set; I have the following excel formulas saved in my “Handy formulas” notebook:

TSQL
=",SUM(CAST(CHECKSUM("&A1&") AS BIGINT)) AS "&A1
PL/SQL
=",SUM(ORA_HASH("&A1&")) AS "&A1

With a list of column name you can then make an checksum sql statement from Excel

Running this query on both result sets; differences can be found by seeing if there is a difference in the retuned values

Power BI and the lost tnsnames.ora file

A distraught user contacted me today with a problem that her Power BI dashboard was not refreshing and returning the following error:

OLE DB:ORA-12154: TNS:could not resolve the connect identifier specified

The problem was not just isolated to her; but common across all her team.

I’m not an expert on Oracle and Power BI; and I kindly suggest for her to contact the Helpdesk.  She came back a few minutes later saying that the Helpdesk just passed her off unceremoniously to another team.

Deciding to give her a hand – I had a think about the error message.

TNS files seem to end up in all sorts of weird and wonderful locations on our organisation’s systems.

I also remembered that there was a patching of the Oracle client just done last week.

I asked her to do a search for tnsnames.ora on her C drive.

Meanwhile I ran the following command through the command line on my own computer; which should be a similar configuration:

echo %TNS_ADMIN%

She got back:

C:\app\product\12.1.0\client_1\network\admin

And I got back:

C:\app\product\19.3.0\client_1\network\admin

My hypothesis that IT installed a new version of the Oracle client and this process updated the %TNS_ADMIN% field.

What the patching didn’t do was to shift the tnsnames.ora file to the new directory of C:\app\product\19.3.0\client_1\network\admin.

When Power BI tried to run an Oracle query; it was using the new client installation and hence not finding the tnsnames.ora file.

After copying the TNS file into the correct directory; right away she was able to rerun her Power BI dashboard.

The Fake LEFT OUTER JOIN

In my role I review a wide variety of SQL code.

Some is highly professional that is above my skill level so I absorb as much learning as I can from it.

On the other side is self taught SQLers from the business world who are trying to build weird and wonderful reports with SQL held together with sticky tape and don’t truly comprehend the code that is written.

An example – I see the following a lot with people who don’t understand the data and trying to program stability in their code like this:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON   C.BDATE = @DATE
     C.CUST_ID = A.CUST_ID 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX';

This means if for some reason if there is an integrity problem between ACCOUNTS and CUSTOMER; they will still get the account data on the report.

The problem arises when the business requirement of the report evolve and the developer needs to filters on the left joined table. 

For instance – “Report on Accounts on type XX but remove deceased customers so we don’t ring them” 

I quite often see this:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON   C.BDATE = @DATE
     C.CUST_ID = A.CUST_ID 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX' 

    AND

    C.DECEASED = 'N';

Without knowing it; the developer has broken their “Get all account XX report” business rule by turning their LEFT OUTER JOIN into effectively a INNER JOIN.

The correct way to write the query is:

SELECT 
    A.BDATE,
    A.ID,
    A.CUST_ID,
    C.CUSTOMER_NAME,
    C.PHONE
FROM dbo.ACCOUNTS A
LEFT OUTER JOIN dbo.CUSTOMER C
ON  C.BDATE = @DATE
    C.CUST_ID = A.CUST_ID 

    -------------------------------- 
    AND

    C.DECEASED = 'N'
    -------------------------------- 
WHERE
    A.BDATE = @DATE AND
    A.ACCOUNT_TYPE = 'XX' ;

You can get a good gauge of the skill level of the developer by checking their LEFT OUTER JOINS.  If they using them on every join where you know there is integrity between the two data objects; you can have an educated guess that you’re reviewing a novice developer.

This is where you have to pay extra attention to their joins and where predicates and make them aware of the consequences of the code.