{"id":1613,"date":"2025-06-02T05:53:02","date_gmt":"2025-06-02T05:53:02","guid":{"rendered":"https:\/\/craftcookcode.com\/?p=1613"},"modified":"2025-06-02T05:55:48","modified_gmt":"2025-06-02T05:55:48","slug":"idiots-doing-idiot-things-the-top-seven-worst-sql-queries-ever","status":"publish","type":"post","link":"https:\/\/craftcookcode.com\/?p=1613","title":{"rendered":"&#8220;Idiots doing Idiot things&#8221; &#8211; The Top Eight Worst SQL Queries Ever"},"content":{"rendered":"\n<p>For over fifteen years I have worked with a Microsoft SQL Data Warehouse that is over twenty-five years old.&nbsp; Although it is ancient in this world of huge cloud-based Data Vaults \u2013 it churns out an abundant of value to the organisation I work for.&nbsp;<\/p>\n\n\n\n<p>The problem with this value \u2013 it is opened to anyone who wants to access to build their queries off.&nbsp; This means we get a range of SQL queries running on the database.&nbsp; We have queries ranging from precise and efficient built queries that uses every trick in optimisation \u2013 all the way to \u201cWTF are you trying to do?!\u201d queries.<\/p>\n\n\n\n<p>The second category is the one we have the most problem with.&nbsp; With limited resources of an OnPrem SQL server; some uses write atrocious queries without a perceived care for other people using the database.&nbsp; There is nothing more frustrating getting a support call out because of a failed data load; only to find the table is locked by someone running a query over the past six hours.<\/p>\n\n\n\n<p>To counter this problem; I created a python script that checks what is running on the database every fifteen minutes.&nbsp; If a user has a query running for over half an hour, I get an alert to show what they are running.<\/p>\n\n\n\n<p>This allows our team to:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Work out if someone is running a query that is blocking or taking resources away from our critical processes \u2013 resulting in timeouts<\/li>\n\n\n\n<li>With our experience help users optimise poorly running queries so they have a better outcome<\/li>\n\n\n\n<li>To stop any stupid queries from running<\/li>\n<\/ol>\n\n\n\n<p>Since the queries are logged into a DB \u2013 I have four years of history of 14,000 unique queries running.&nbsp; This gives me a lot of learning experiences of what users are running and what problems they face.<\/p>\n\n\n\n<p>From this learning \u2013 this is the Top 10 most horrible queries I see running on our database.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">8. The &#8220;Discovery&#8221; Query<\/h2>\n\n\n\n<p>This query come from new and experience users exploring the data of the database.&nbsp; They will run something like this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT *\nFROM dbo.A_VERY_LARGE_TABLE;\n<\/pre><\/div>\n\n\n<p>Look at the data, grab what they need and then just leave the query running in the background in their Server Management Studio.&nbsp; When we contact them an hour later; they act surprise that the query is still running in the background \u2013 returning millions upon millions of rows to their client.<\/p>\n\n\n\n<p>This is an easy fix with training.&nbsp; Using the TOP command to limit the number of rows returned, use <a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/system-stored-procedures\/sp-help-transact-sql?view=sql-server-ver16\" target=\"_blank\" rel=\"noopener\" title=\"\">SP_HELP<\/a> to return a schema of an object, provide a sandbox for people to explore in are all relatively simple fixes for this problem<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">7. The &#8220;Application Preview&#8221; Query<\/h2>\n\n\n\n<p>Our SAS application have a nasty default setting that if a user previews a data in a table; it will try and return the whole table \u2013 with the user completely unaware that this is happening.&nbsp;<\/p>\n\n\n\n<p>Even worse; if the application locks up returning the data; the user kills the application from task manager, open it back up and performs the same steps.&nbsp; Since the first query was not killed, it will still be running on the database.&nbsp; So quite often you will see half a dozen of the same queries running on the database, all starting at different times.<\/p>\n\n\n\n<p>To prevent this \u2013 it is important to assess new DB client software to see what it is trying to do in the background of the database.&nbsp; What does \u201cpreview\u201d mean?&nbsp; Return 10, 100 or all the rows?&nbsp; Is there a setting that restricts the number of rows returned; or a timeout function that prevents complex views running forever just returning a small set of data?<\/p>\n\n\n\n<p>If these features are available \u2013 it is important to either set this as default in the application; or part of the initial user setup \/ training program.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">6. The &#8220;I am going to copy this to Excel and Analyse the Data&#8221; Query<\/h2>\n\n\n\n<p>This is like exploratory queries that come up.&nbsp; The user is running a query like:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT *\nFROM dbo.A_VERY_LARGE_TABLE;\n<\/pre><\/div>\n\n\n<blockquote class=\"wp-block-quote is-style-plain is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cWhy are you running this query?\u201d I politely enquire.<br>\u201cOh \u2013 I am going to copy the data into excel and search\/pivot\/do fancy things with it\u201d<br>\u201cUmmm &#8211; this table has 8 billion rows in it\u2026\u201d<\/p>\n\n\n\n<p><\/p>\n<\/blockquote>\n\n\n\n<p>I can sympathise with queries like these.<\/p>\n\n\n\n<p>The user thinks, &#8220;Why run the same query over and over again to get the data that I want to explore and present in different ways.&nbsp; Let just get one large cut and then manipulate it on the client.\u201d<\/p>\n\n\n\n<p>What users don\u2019t realise that dealing with a huge amount of data on the client side is difficult.&nbsp; Exporting data out of the query client, into an application like excel can be frustrating with memory and CPU limitations compared to a spec\u2019d-out server.&nbsp; Plus, Excel is not going to handle 8 billion rows.<\/p>\n\n\n\n<p>The requirements of queries like these needs to be analysed \u2013 do the users need that atomic level data?&nbsp; Can strategic aggregation tables be provided to the user, so they have more manageable data to return.&nbsp; Can different technology like OLAP cubes be built on top of large tables to answer the user\u2019s needs?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">5. The \u201cI got duplicates so I am going to get rid of them with a DISTINCT\u201d Query<\/h2>\n\n\n\n<p>This is a pet peeve of mine.<\/p>\n\n\n\n<p>The user has duplicate in their results and don&#8217;t know why. Instead of investigating why there is duplication (usually result of a poor join), they just slap a DISTINCT at the top of the query and call it done.&nbsp; Here is a simplified example that I regularly come across:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nDECLARE @ACCOUNTS TABLE\n(\n\tORG INT,\n\tACCOUNT_ID INT,\n\tBRANCH INT,\n\tBALANCE NUMERIC(18,2),\n\tPRIMARY KEY (ORG, ACCOUNT_ID)\n);\n\nINSERT INTO @ACCOUNTS VALUES(1, 18, 60, 50);\nINSERT INTO @ACCOUNTS VALUES(1, 19, 60, 150);\n\nDECLARE @BRANCHES TABLE\n(\n\tORG INT,\n\tBRANCH INT,\n\tBRANCH_NAME VARCHAR(20),\n\tBRANCH_STATE VARCHAR(3),\n\tPRIMARY KEY (ORG, BRANCH)\n);\n\nINSERT INTO @BRANCHES VALUES(1, 60, &#039;Branch of ORG 1&#039;, &#039;VIC&#039;);\nINSERT INTO @BRANCHES VALUES(2, 60, &#039;Branch of ORG 2&#039;, &#039;VIC&#039;);\n\n\nSELECT \n\tA.ORG,\n\tA.ACCOUNT_ID, \n\tA.BRANCH,\n\tA.BALANCE,\n\tB.BRANCH_STATE\nFROM @ACCOUNTS A\nJOIN @BRANCHES B\nON\t-- B.ORG = A.ORG\t-- User forgot this predicate in the join\n\tB.BRANCH = A.BRANCH;\n<\/pre><\/div>\n\n\n<p>Results:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td class=\"has-text-align-left\" data-align=\"left\"><strong>ORG<\/strong><\/td><td><strong>ACCOUNT_ID<\/strong><\/td><td><strong>BRANCH<\/strong><\/td><td><strong>BALANCE<\/strong><\/td><td><strong>BRANCH_STATE<\/strong><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>18<\/td><td>60<\/td><td>50.00<\/td><td>VIC<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>18<\/td><td>60<\/td><td>50.00<\/td><td>VIC<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>19<\/td><td>60<\/td><td>150.00<\/td><td>VIC<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>18<\/td><td>60<\/td><td>150.00<\/td><td>VIC<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The user looks &#8211; &#8220;Oh Dear &#8211; I&#8217;ve got duplicates!  Let&#8217;s get rid of them.&#8221;<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT DISTINCT\n\tA.ORG,\n\tA.ACCOUNT_ID, \n\tA.BRANCH,\n\tA.BALANCE,\n\tB.BRANCH_STATE\nFROM @ACCOUNTS A\nJOIN @BRANCHES B\nON\t-- B.ORG = A.ORG\t-- User forgot this predicate in the join\n\tB.BRANCH = A.BRANCH;\n<\/pre><\/div>\n\n\n<p>This poses many problems:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The Database must work harder using incomplete joins on indexes to bring across the data; and then work harder supressing the duplicates<\/li>\n\n\n\n<li>If the field list change; then the distinct might not work anymore.<\/li>\n<\/ol>\n\n\n\n<p>For example, with the above query \u2013 if the user brings in the column \u201cBRANCH_NAME\u201d then the duplicates return:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT DISTINCT\n\tA.ORG,\n\tA.ACCOUNT_ID, \n\tA.BRANCH,\n\tA.BALANCE,\n\t--------- Add in new column ---------\n\tB.BRANCH_NAME,\n\t-------------------------------------\n\tB.BRANCH_STATE\nFROM @ACCOUNTS A\nJOIN @BRANCHES B\nON\t-- B.ORG = A.ORG\t-- User forgot this predicate in the join\n\tB.BRANCH = A.BRANCH;\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td class=\"has-text-align-left\" data-align=\"left\"><strong>ORG<\/strong><\/td><td><strong>ACCOUNT_ID<\/strong><\/td><td><strong>BRANCH<\/strong><\/td><td><strong>BALANCE<\/strong><\/td><td><strong>BRANCH_NAME<\/strong><\/td><td><strong>BRANCH_STATE<\/strong><\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>18<\/td><td>60<\/td><td>50.00<\/td><td>Branch of ORG 1<\/td><td>VIC<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>18<\/td><td>60<\/td><td>50.00<\/td><td>Branch of ORG 2<\/td><td>VIC<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>19<\/td><td>60<\/td><td>150.00<\/td><td>Branch of ORG 1<\/td><td>VIC<\/td><\/tr><tr><td class=\"has-text-align-left\" data-align=\"left\">1<\/td><td>18<\/td><td>60<\/td><td>150.00<\/td><td>Branch of ORG 2<\/td><td>VIC<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Preventing queries like this comes down to user experience and training.&nbsp; If they see duplicates in their data \u2013 their initial thoughts should be \u201cWhy do I have duplicates?\u201d &nbsp;&nbsp;<\/p>\n\n\n\n<p>Duplicates might be legitimate and a DISINCT might be OK \u2013 but quite often it is because of a bad join.&nbsp; For an inexperience user \u2013 it might be overwhelming to break down a large table to find where the duplicates are coming from, and they might need help from a more experienced user.&nbsp; This is where internal data forums are useful \u2013 where people can help each other with their problems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. The &#8220;Ever lengthening&#8221; Query<\/h2>\n\n\n\n<p>This is a query I see scheduled daily for users getting trends over time for a BI tool:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT *\nFROM dbo.A_TABLE A\nJOIN dbo.B_TABLE B\nON\tB.KEY_1 = A.KEY_1\nWHERE\n\t--------------- Start Date ---------------\n\tA.BUSINESS_DATE &gt;= &#039;2025-01-01&#039; AND\n\t------------------------------------------\n\tA.PREDICATE_1 = &#039;X&#039; AND\n\tB.PREDICATE_2 = &#039;Y&#039;;\n\n<\/pre><\/div>\n\n\n<p>Initially it starts off OK \u2013 running fast and the user is happy.&nbsp; But as time goes by; the query gets slower and slower; returning more and more data.&nbsp; Eventually it gets to the point the query goes off into the nether and never returns.  Since it is always starting the from the same point; it is constantly re-querying the same data over an over again<\/p>\n\n\n\n<p>There are a couple of options you can handle queries like these:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Really examine the requirements of the user&#8217;s needs.  How much data is really relevant for their trending report?  In 2027; will data from 2025 be useful for the observer of the report?<\/li>\n\n\n\n<li>Create a table specific for this report and append a timeframe of data (eg daily) onto it.  The report than can just query this specific table and not having to regenerate complex joins over and over again for previous timeframes.<\/li>\n<\/ol>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. The &#8220;Linked Database&#8221; Query<\/h2>\n\n\n\n<p>With the growth of our Data Warehouse, other database on different servers started building process flows off our database; sometimes without out knowledge that they are doing this until we see a linked server connection coming up.<\/p>\n\n\n\n<p>What pattern they use our database is where the problems lie.&nbsp; One downstream database runs this query every day:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nSELECT *\nFROM REMOTE_SERVER.SOME_DATABASE.dbo.A_BIG_HISTORICAL_TABLE;\n<\/pre><\/div>\n\n\n<p>So, they are truncating a landing table on their database and grabbing the whole table.&nbsp; Hundreds of millions of rows transferring across a slow linked server; taking hours and hours.&nbsp; And with time; this will get slower and slower as the source table grows its history.<\/p>\n\n\n\n<p>This is a tricky one to tackle with the downstream users.&nbsp; Odds are if they design a cumbersome process like this their appetite for change to a faster (yet more complex solution) might not be high; especially if their process \u2018works\u2019.<\/p>\n\n\n\n<p>If you can get around the political hurdle and convince them the process must change; there are many options to improve the performance.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Use a ETL tool to transfer the data across.&nbsp; Even a simple python script to copy batch by bath files across will be quicker than using a linked server.<\/li>\n\n\n\n<li>Assess the downstream use cases for the data.&nbsp; Do they need the whole table with the complete history to satisfy their requirements?&nbsp; Do they need all the columns; or can some be trimmed off to reduce the amount of data getting transferred?<\/li>\n\n\n\n<li>If the source table a historical table; only bring across the period deltas.&nbsp; With the downstream process I am having trouble with now; they are bringing across twenty-one million rows daily. &nbsp;If they only bring across the deltas from the night loads it brings across a whopping five thousand rows.&nbsp; That is 0.02% of the total row count.&nbsp; Add in a row count check between the two systems to have confidence that the two tables are in sync and the process will be astronomically quicker.<\/li>\n<\/ol>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. The \u201cI don\u2019t know where that query comes from\u201d Query.<\/h2>\n\n\n\n<p>We have a continuing problem from a Microsoft Power BI report that runs a query that looks at data from the beginning of the month to the current date.&nbsp; It was a poorly written query and therefore as the month went along its performance got worse and worse.<\/p>\n\n\n\n<p>Since the username associated with the query was from the Power BI server\u2019s account \u2013 we had no idea who the query belonged to as a couple of simple fixes could drastically improve the performance.<\/p>\n\n\n\n<p>We contacted the team that manages Power BI server and asked them to find the query and update the sql code.&nbsp;<\/p>\n\n\n\n<p>They said it was completed, and the report sql was updated.<\/p>\n\n\n\n<p>But soon the sql was back \u2013 running for hours and hours.<\/p>\n\n\n\n<p>We contacted the team again \u2013 <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cHey that query is back.\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>They try updating the report again; but soon the query was back again.<\/p>\n\n\n\n<p>So, either someone was constantly uploading an original copy \u2013 or another copy of the report was buried somewhere on the server that the admins could not find.&nbsp;<\/p>\n\n\n\n<p>Since we do not have access to their system, it hard to determine what is happening.<\/p>\n\n\n\n<p>The barbaric solution would be to block the Power BI user from our system; but goodness knows how many critical business processes that will disrupt.&nbsp;<\/p>\n\n\n\n<p>The best solution at the minute we are doing is just constantly killing the running SQL code on the database with the hope that someone will identify the constantly failing report.&nbsp; This can be tricky as well if the Power BI server automatically tries rerunning failed reports.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. The \u201cI don\u2019t listen to your advice\u201d Query.<\/h2>\n\n\n\n<p>Disclaimer \u2013 this is a frustration rant that I have to get off my chest.<\/p>\n\n\n\n<p>We have a user that constantly runs a long running crook query.&nbsp; As in when I collated all the long running queries in research for this post \u2013 his was at the top by a significant margin.<\/p>\n\n\n\n<p>The problem with the query itself is a join using an uncommonly used business key in a table that is not indexed.&nbsp; The fix itself is quite simple fix; use the primary keys in join.<\/p>\n\n\n\n<p>But he has been running the same code for months \u2013 locked our nightly loads several times and caused incidents.<\/p>\n\n\n\n<p>We tried the carrot approach.&nbsp; \u201cHey here is optimised code.&nbsp; It will make your query run quicker and be less burden on our database.\u201d Got indifferent replies.&nbsp;<\/p>\n\n\n\n<p>More locked loads we included his manager into correspondence, but she did not seem to want to be involved.<\/p>\n\n\n\n<p>With my carrot approach communication, I gained the impression that he had a self-righteous personality and thinks his role and report is above question.<\/p>\n\n\n\n<p>Enough was enough \u2013 it was the stick approach time.<\/p>\n\n\n\n<p>We raised an operational risk against his activity.<\/p>\n\n\n\n<p>The Ops risk manager asks, \u201cDo you have evidence that you attempted to reason and help?\u201d<\/p>\n\n\n\n<p>\u201cYep,\u201d I replied, \u201cHere is a zipped-up folder of dozens of emails and chat messages that I sent him.\u201d<\/p>\n\n\n\n<p>Senior managers were involved; with my senior manager commenting to my manager in the background \u201cIs that guy all there?\u201d<\/p>\n\n\n\n<p>Anyway, my manager wrote a very terse email saying to correct their query and if they crash the loads again; they will get their access removed from the database.<\/p>\n\n\n\n<p>No committal reply.&nbsp; No apology.&nbsp; And to this day they are still running the same query; but just under the radar that it is not locking our loads.&nbsp;<\/p>\n\n\n\n<p>I am watching him like a hawk \u2013 waiting for the day that I can cut his access from the database.<\/p>\n\n\n\n<p>It is a pity that our Database does not have a charge back capability of processes used.&nbsp; I bet if his manager got a bill of $$$ from one person running one query; she would be more proactive in this problem.&nbsp;<\/p>\n\n\n\n<p>In retrospect, I would have campaigned to have his access cut right away and make him justify why he should have it back.&nbsp; When there is hundreds of other people doing the right thing on the database; it is not fair that their deliverables are getting impacted by the indifference attitude of one user.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For over fifteen years I have worked with a Microsoft SQL Data Warehouse that is over twenty-five years old.&nbsp; Although it is ancient in this world&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":1621,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[13,17],"tags":[],"class_list":["post-1613","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-code","category-sql"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts\/1613","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1613"}],"version-history":[{"count":12,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts\/1613\/revisions"}],"predecessor-version":[{"id":1629,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts\/1613\/revisions\/1629"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/media\/1621"}],"wp:attachment":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}