VBA / VBS

Python & openpyxl – If the fraud team tells you to do something; do it quickly!

A member of our Fraud team contacted me unexpectedly at work.

Hey. I got this excel workbook of a customer’s statement with the data spread across multiple sheets. Do you know how to consolidate them into one sheet?

“OK,” I thought, “It can’t be that hard.”

She sent me the workbook. It had 447 sheets in it! The statement was initially in a PDF format and the Fraud team exported it to Excel. This resulted in a sheet for each page; looking like this:

The data that we were interested in was between A7 and E29 on each page. I visually spot checked half a dozen sheets across the workbook and they were in similar format

Ditching VBA for openpyxl

In yesteryear; I would have VBA inside Excel to consolidate the sheets. But I felt it was time to grow up from VBA and see what python had to offer.

After a quick Google; I came across Data Camp’s Python Excel Tutorial.

It was surprisingly quick and easy to program up. I honestly thought it would have taken me as long; if not longer in VBA.

Here is the scratch code for your perusal

import openpyxl 
import logging
from datetime import datetime
import csv

# Formats a row into a format for the delimited file
def format_row(in_record):

    try:
        in_record[0] = datetime.strftime(in_record[0], "%Y-%m-%d")  # Converts the python date to YYYY-MM-DD
    except TypeError as e:
        root_logger.warning("Cannot convert " + in_record[0] + " to date/time")

    in_record[1] = in_record[1].replace("\n", " \u21B5")    # Adds in a return arrow

    if in_record[2] == None:
        in_record[2] = ""

    if in_record[3] == None:
        in_record[3] = ""
    
    return in_record


if __name__ == '__main__':

    now = datetime.now()

    log_formatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-8.8s]  %(message)s")
    root_logger = logging.getLogger()
    root_logger.setLevel(logging.INFO)

    log_console_handler = logging.StreamHandler()
    log_console_handler.setFormatter(log_formatter)
    root_logger.addHandler(log_console_handler)

    final_array = []

    wb = openpyxl.load_workbook("CONSOLIDATEDExcel.xlsx") 

    for focus_sheet_name in wb.sheetnames:

        root_logger.info("Processing sheet: " + focus_sheet_name)
        
        focus_sheet = wb[focus_sheet_name]
        root_logger.debug("Total number of rows: " + str(focus_sheet.max_row) + '. And total number of columns: ' + str(focus_sheet.max_column))


        if focus_sheet["A6"].value == "Date":   # Checks for the key field "Date" in position A6

            my_list = []

            for sheet_row in focus_sheet.iter_rows(
                min_row=7, max_row=26, min_col=1, max_col=5,    # Gets specific section of the worksheet
                values_only=True):

                sheet_row_list = list(sheet_row)

                if sheet_row_list[0] != None:   # Ignore lines that are blank
                    final_array.append(format_row(sheet_row_list))
        else:
            root_logger.info("Cannot find key column \"DATE\".  Skipping sheet")


    # Output records to a tab delimited file
    with open("Consolidated_output.tab", 'w', newline='', encoding="utf-8") as myfile:
        wr = csv.writer(myfile, quoting=csv.QUOTE_ALL, delimiter="\t")
        wr.writerows(final_array)
        

Also; it ran blindingly quick – as quick as I feel in VBA with Application.ScreenUpdating = False

Anyway; Fraud team was happy I turned their request around quickly. I’m guessing if the fraud team is involved – someone is about to have a bad day.

Excel & VBA – Concatenating two fields together with formatting

A work colleague asked me to do a “Mate’s rate” job for his team yesterday.

They had a MS Word document with a huge table in it that they needed to convert to Excel to load into an application.

Paraphrased  it looked like this:

My first thought was to copy this table out and paste it into excel to see how it looks:

Two problems:

  1. The cell “Some data” is merged across three rows
  2. Line 1, Line 2, Line 3 are now split across three rows instead of being in the one cell

I wrote a macro to split out the merge cells; but when it came to combining cells for Line 1, Line 2 and Line 3 I ran into a problem.

None of the usual commands and from my quick google searching could combine the cells and retained the formatting.

So commands like:

=CONCAT(B2:B4)

=B2 & B3 & B4

Would not retain the formatting; let alone add in line feed characters to make it a multiline cell

I’m sure there is a better way – but strapped for time I programmed the following subroutine in VBA

Public Sub combineCells(cell1 As Range, cell2 As Range, output As Range)
    Dim formatIndex As Long
    Dim formatOffset As Long
    Dim hyperlink1 As String
    Dim hyperlink2 As String

    hyperlink1 = ""
    hyperlink2 = ""

    ' Check if the cell has a hyperlin; but don't have a text version of the hyperlink in the Value 2
    If cell1.Hyperlinks.Count <> 0 And InStr(cell1.Value2, "https:") < 1 Then
        hyperlink1 = " [" & cell1.Hyperlinks(1).Address & "]"
    End If

    If cell2.Hyperlinks.Count <> 0 And InStr(cell2.Value2, "https:") < 1 Then
        hyperlink2 = " [" & cell2.Hyperlinks(1).Address & "]"
    End If

 
    ' Handling if the first cell is blank.  If so we don't want a LF at the top of the cell
    If Trim(cell1.Value2) <> "" Then
        output.Value2 = cell1.Value2 & hyperlink1 & vbLf & cell2.Value2 & hyperlink2
        formatOffset = Len(cell1.Value2) + Len(hyperlink1) + 1
    Else
        output.Value2 = cell2.Value2 & hyperlink2
        formatOffset = Len(cell1.Value2)
    End If

  
    ' Copies the formatting from cell1 to the final cell
    ' You can add more options to transfer over different formatting
    For formatIndex = 1 To Len(cell1.Value2)
        output.Characters(formatIndex, 1).Font.Bold = cell1.Characters(formatIndex, 1).Font.Bold
        output.Characters(formatIndex, 1).Font.Italic = cell1.Characters(formatIndex, 1).Font.Italic
        'output.Characters(formatIndex, 1).Font.Underline = cell1.Characters(formatIndex, 1).Font.Underline
    Next

  
    ' Copies the formatting from cell2 to the final cell
    For formatIndex = 1 To Len(cell2.Value2)
        output.Characters(formatIndex + formatOffset, 1).Font.Bold = cell2.Characters(formatIndex, 1).Font.Bold
        output.Characters(formatIndex + formatOffset, 1).Font.Italic = cell2.Characters(formatIndex, 1).Font.Italic
        'output.Characters(formatIndex + formatOffset, 1).Font.Underline = cell2.Characters(formatIndex, 1).Font.Underline
    Next

End Sub

Oh boy it runs slow – to combine a couple of thousands cells took half an hour and I had the typical worries that Excel crashed because everything was locked up.

But it worked for the quick task that I was doing