Monday, May 29, 2017

Mores snakes and Excel

In the previous post I listed the code to find external links in Excel WorkBooks.  In practice though I want loop thru and collect every Excel WorkBook in a directory, the Last Modification Date for the file, whether the WorkBook performs a specific function, the total # of and names of all WorkSheets in each WorkBook, and the total # of and names of all Defined Names in the WorkBooks.

This is what I ended up with:

# Find worksheets, defined names and CheckBox1 value in Excel WorkBooks.
# Loop thru directory and write each WorkBook's name, Last Modification Date,
# CheckBox1 value (to see if uncertainties are calculated), Total # of
# WorkSheets, the names of all Worksheets, Total # of Defined Names,and a
# list of all Defined Names into CSV
# In the CSV file:
#    Column 1 = WorkBook name
#    Column 2 = Last Modified Date
#    Column 3 = CheckBox1 value
#    Column 4 = Total # of WorkSheets in WorkBook
#    Column 5 = All Worksheet names
#    Column 6 = Total # of Defined Names in WorkBook
#    Column 7 = All WorkBook Defined Names
import win32com.client as win32
import os
import glob
import csv
import datetime
from collections import Counter

wb_def_names = Counter()
ws_names = Counter()

xlfiles = sorted(glob.glob("*.xls"))
outputFile = open('output.csv', 'w', newline='')
outputWriter = csv.writer(outputFile)

xlApp = win32.gencache.EnsureDispatch('Excel.Application')
xlApp.EnableEvents = False
#xlApp.ScreenUpdating = False
cwd = os.getcwd()
this_file = 0
for xlfile in xlfiles:
    showunc_val = 0
    unccalc_val = bool(False)
    showunc = 0
    found = 0
    checkbox1_caption = "None"
    this_file = this_file + 1
    if this_file%100 == 0:
        print("Processing ",this_file)
    if this_file >= 1:
        open_ok = 1
            xlApp.DisplayAlerts = False
            xlwb = xlApp.Workbooks.Open(cwd+"\\"+ xlfile)
            open_ok = -1
            print("error in open xlfile")
            xlwb_name = xlApp.ActiveWorkbook.Name
            open_ok = -1
            print("error in open ActiveWorkbook.Name")
        if open_ok:
            cntr = 1
                 for n in xlApp.ActiveWorkbook.Names:
                    wb_def_names[(n.Name)] += cntr
                    found = (str.find(n.Name, "showunc"))
                    found2 = (str.find(n.Name, "unccalc"))
                    if found == 0 or found2 == 0:
                        this_n = str(n)
                        dummy = this_n.split("!")
                        dummy2 = str(dummy[0]).replace("=", "")
                        dummy2 = str(dummy2).replace("'", "")
                        if found == 0:
                            dummy3 = str(dummy[1])
                            showunc_val = xlApp.Sheets(dummy2).Range(dummy3).Value
                        if found2 == 0:
                            checkbox1_caption = xlApp.Sheets(dummy2).OLEObjects("CheckBox1").Object.Caption
                            unccalc_val = xlApp.Sheets(dummy2).OLEObjects("CheckBox1").Object.Value
                    cntr = cntr + 1
                 list_wb_def_names = [v for v in wb_def_names.keys()]
                open_ok = -1
                print("error in ActiveWorkbook.Names",n,n.Name,cntr)
            cntr2 = 1
                for s in xlApp.ActiveWorkbook.Sheets:
                    ws_names[(s.Name)] +=cntr2
                    cntr2 = cntr2 + 1
                list_ws_names = [v for v in ws_names.keys()]
                open_ok = -1
                xlApp.ActiveWorkbook.Saved = True
                open_ok = -1
                print("error in closing workbook")
                last_modified_date = datetime.datetime.fromtimestamp(os.path.getctime(cwd+"\\"+ xlfile))
            except OSError:
                last_modified_date = 0
            last_modified_date = last_modified_date.strftime("%d-%m-%Y")
                outputWriter.writerow([xlwb_name, last_modified_date,unccalc_val,cntr2-1,list_ws_names,cntr-1,list_wb_def_names])
                print("error writing to outputfile")
            print("Workbook = ", xlwb_name)
            checkbox1_caption = "None"
            print("Unable to process ",xlfile)
        dummy5 = 0
#assert isinstance(xlApp.Application, object)

I found that everything worked better if Excel was already open before I ran the program - not sure why. I'm retrieving the value for CheckBox1 and not validating it only because I did when testing the code and using the CheckBox Caption property showed that - in my instance - they were all what I thought they were - thank you Dennis Green :)

It doesn't work flawlessly. During runtime I get a quite a few (2 or 3% ??) of the files reporting "error in ActiveWorkbook.Names " from one of my error handlers.  It doesn't seem to affect the output, so I haven't bothered to figure out why some files generate the error.  I will eventually store just the filenames that create the error and see why.

A more significant error is that certain poorly written Excel macros in the WorkBook generate errors on opening the WorkBook.  As I suspected when writing the previous post, moving the statement xlApp.DisplayAlerts=False  helped a lot.  I added xlApp.EnableEvents=False to try and stop macros from causing a problem, but it too may need to be moved inside the main loop to be effective.

All in all, I know my code isn't up to 'professional' standards, but it works, does what I set out for it to do, and gives me enough knowledge to use it for many other similar projects.  Eventually, if I use Python often enough, I'll start building class libraries and user-defined functions, but I'm happy with the results as it is.

No comments:

Post a Comment