{"id":1550,"date":"2024-10-25T05:42:10","date_gmt":"2024-10-25T05:42:10","guid":{"rendered":"https:\/\/craftcookcode.com\/?p=1550"},"modified":"2024-10-25T05:42:10","modified_gmt":"2024-10-25T05:42:10","slug":"python-openpyxl-if-the-fraud-team-tells-you-to-do-something-do-it-quickly","status":"publish","type":"post","link":"https:\/\/craftcookcode.com\/?p=1550","title":{"rendered":"Python &amp; openpyxl &#8211;  If the fraud team tells you to do something; do it quickly!"},"content":{"rendered":"\n<p>A member of our Fraud team contacted me unexpectedly at work.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Hey.  I got this excel workbook of a customer&#8217;s statement with the data spread across multiple sheets.  Do you know how to consolidate them into one sheet?<\/p>\n\n\n\n<p>&#8220;OK,&#8221; I thought, &#8220;It can&#8217;t be that hard.&#8221;<\/p>\n\n\n\n<p>She sent me the workbook.  It had <em>447<\/em> sheets in it!  The statement was initially in a PDF format and the Fraud team exported it to Excel.  This resulted in a sheet for each page; looking like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"730\" src=\"https:\/\/craftcookcode.com\/wp-content\/uploads\/2024\/10\/code_fraud-1024x730.png\" alt=\"\" class=\"wp-image-1553\" srcset=\"https:\/\/craftcookcode.com\/wp-content\/uploads\/2024\/10\/code_fraud-1024x730.png 1024w, https:\/\/craftcookcode.com\/wp-content\/uploads\/2024\/10\/code_fraud-300x214.png 300w, https:\/\/craftcookcode.com\/wp-content\/uploads\/2024\/10\/code_fraud-768x547.png 768w, https:\/\/craftcookcode.com\/wp-content\/uploads\/2024\/10\/code_fraud-1536x1095.png 1536w, https:\/\/craftcookcode.com\/wp-content\/uploads\/2024\/10\/code_fraud.png 1584w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/blockquote>\n\n\n\n<p>The data that we were interested in was between A7 and E29 on each page.  I visually spot checked half a dozen sheets across the workbook and they were in similar format<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ditching VBA for openpyxl<\/h2>\n\n\n\n<p>In yesteryear; I would have VBA inside Excel to consolidate the sheets.  But I felt it was time to grow up from VBA and see what python had to offer.<\/p>\n\n\n\n<p>After a quick Google; I came across <a href=\"https:\/\/www.datacamp.com\/tutorial\/python-excel-tutorial\" target=\"_blank\" rel=\"noopener\" title=\"\">Data Camp&#8217;s Python Excel Tutorial<\/a>.<\/p>\n\n\n\n<p>It was surprisingly quick and easy to program up.  I honestly thought it would have taken me as long; if not longer in VBA.<\/p>\n\n\n\n<p>Here is the scratch code for your perusal <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport openpyxl \nimport logging\nfrom datetime import datetime\nimport csv\n\n# Formats a row into a format for the delimited file\ndef format_row(in_record):\n\n    try:\n        in_record&#x5B;0] = datetime.strftime(in_record&#x5B;0], &quot;%Y-%m-%d&quot;)  # Converts the python date to YYYY-MM-DD\n    except TypeError as e:\n        root_logger.warning(&quot;Cannot convert &quot; + in_record&#x5B;0] + &quot; to date\/time&quot;)\n\n    in_record&#x5B;1] = in_record&#x5B;1].replace(&quot;\\n&quot;, &quot; \\u21B5&quot;)    # Adds in a return arrow\n\n    if in_record&#x5B;2] == None:\n        in_record&#x5B;2] = &quot;&quot;\n\n    if in_record&#x5B;3] == None:\n        in_record&#x5B;3] = &quot;&quot;\n    \n    return in_record\n\n\nif __name__ == &#039;__main__&#039;:\n\n    now = datetime.now()\n\n    log_formatter = logging.Formatter(&quot;%(asctime)s &#x5B;%(threadName)-12.12s] &#x5B;%(levelname)-8.8s]  %(message)s&quot;)\n    root_logger = logging.getLogger()\n    root_logger.setLevel(logging.INFO)\n\n    log_console_handler = logging.StreamHandler()\n    log_console_handler.setFormatter(log_formatter)\n    root_logger.addHandler(log_console_handler)\n\n    final_array = &#x5B;]\n\n    wb = openpyxl.load_workbook(&quot;CONSOLIDATEDExcel.xlsx&quot;) \n\n    for focus_sheet_name in wb.sheetnames:\n\n        root_logger.info(&quot;Processing sheet: &quot; + focus_sheet_name)\n        \n        focus_sheet = wb&#x5B;focus_sheet_name]\n        root_logger.debug(&quot;Total number of rows: &quot; + str(focus_sheet.max_row) + &#039;. And total number of columns: &#039; + str(focus_sheet.max_column))\n\n\n        if focus_sheet&#x5B;&quot;A6&quot;].value == &quot;Date&quot;:   # Checks for the key field &quot;Date&quot; in position A6\n\n            my_list = &#x5B;]\n\n            for sheet_row in focus_sheet.iter_rows(\n                min_row=7, max_row=26, min_col=1, max_col=5,    # Gets specific section of the worksheet\n                values_only=True):\n\n                sheet_row_list = list(sheet_row)\n\n                if sheet_row_list&#x5B;0] != None:   # Ignore lines that are blank\n                    final_array.append(format_row(sheet_row_list))\n        else:\n            root_logger.info(&quot;Cannot find key column \\&quot;DATE\\&quot;.  Skipping sheet&quot;)\n\n\n    # Output records to a tab delimited file\n    with open(&quot;Consolidated_output.tab&quot;, &#039;w&#039;, newline=&#039;&#039;, encoding=&quot;utf-8&quot;) as myfile:\n        wr = csv.writer(myfile, quoting=csv.QUOTE_ALL, delimiter=&quot;\\t&quot;)\n        wr.writerows(final_array)\n        \n<\/pre><\/div>\n\n\n<p>Also; it ran blindingly quick &#8211; as quick as I feel in VBA with Application.ScreenUpdating = False<\/p>\n\n\n\n<p>Anyway; Fraud team was happy I turned their request around quickly.  I&#8217;m guessing if the fraud team is involved &#8211; someone is about to have a bad day.  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A member of our Fraud team contacted me unexpectedly at work. The data that we were interested in was between A7 and E29 on each page.&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":1554,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[18,52],"tags":[71,69,70],"class_list":["post-1550","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","category-vba-vbs","tag-openpyxl","tag-python","tag-vba"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts\/1550","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1550"}],"version-history":[{"count":3,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts\/1550\/revisions"}],"predecessor-version":[{"id":1555,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/posts\/1550\/revisions\/1555"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=\/wp\/v2\/media\/1554"}],"wp:attachment":[{"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1550"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1550"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftcookcode.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1550"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}