Monday, May 29, 2017

Mores snakes and Excel

In the previous post I listed the code to find external links in Excel WorkBooks.  In practice though I want loop thru and collect every Excel WorkBook in a directory, the Last Modification Date for the file, whether the WorkBook performs a specific function, the total # of and names of all WorkSheets in each WorkBook, and the total # of and names of all Defined Names in the WorkBooks.

This is what I ended up with:

#
# excel_info.py
# Find worksheets, defined names and CheckBox1 value in Excel WorkBooks.
# Loop thru directory and write each WorkBook's name, Last Modification Date,
# CheckBox1 value (to see if uncertainties are calculated), Total # of
# WorkSheets, the names of all Worksheets, Total # of Defined Names,and a
# list of all Defined Names into CSV
#
# In the CSV file:
#    Column 1 = WorkBook name
#    Column 2 = Last Modified Date
#    Column 3 = CheckBox1 value
#    Column 4 = Total # of WorkSheets in WorkBook
#    Column 5 = All Worksheet names
#    Column 6 = Total # of Defined Names in WorkBook
#    Column 7 = All WorkBook Defined Names
#
import win32com.client as win32
import os
import glob
import csv
import datetime
from collections import Counter

wb_def_names = Counter()
ws_names = Counter()

os.chdir('C:\origin')
xlfiles = sorted(glob.glob("*.xls"))
outputFile = open('output.csv', 'w', newline='')
outputWriter = csv.writer(outputFile)

xlApp = win32.gencache.EnsureDispatch('Excel.Application')
xlApp.EnableEvents = False
#xlApp.ScreenUpdating = False
cwd = os.getcwd()
this_file = 0
for xlfile in xlfiles:
    showunc_val = 0
    unccalc_val = bool(False)
    showunc = 0
    found = 0
    checkbox1_caption = "None"
    this_file = this_file + 1
    if this_file%100 == 0:
        print("Processing ",this_file)
    if this_file >= 1:
        open_ok = 1
        try:
            xlApp.DisplayAlerts = False
            xlwb = xlApp.Workbooks.Open(cwd+"\\"+ xlfile)
        except:
            open_ok = -1
            print("error in open xlfile")
        try:
            xlwb_name = xlApp.ActiveWorkbook.Name
        except:
            open_ok = -1
            print("error in open ActiveWorkbook.Name")
        if open_ok:
            cntr = 1
            try:
                 for n in xlApp.ActiveWorkbook.Names:
                    wb_def_names[(n.Name)] += cntr
                    found = (str.find(n.Name, "showunc"))
                    found2 = (str.find(n.Name, "unccalc"))
                    if found == 0 or found2 == 0:
                        this_n = str(n)
                        dummy = this_n.split("!")
                        dummy2 = str(dummy[0]).replace("=", "")
                        dummy2 = str(dummy2).replace("'", "")
                        if found == 0:
                            dummy3 = str(dummy[1])
                            showunc_val = xlApp.Sheets(dummy2).Range(dummy3).Value
                        if found2 == 0:
                            checkbox1_caption = xlApp.Sheets(dummy2).OLEObjects("CheckBox1").Object.Caption
                            unccalc_val = xlApp.Sheets(dummy2).OLEObjects("CheckBox1").Object.Value
                    cntr = cntr + 1
                 list_wb_def_names = [v for v in wb_def_names.keys()]
            except:
                open_ok = -1
                print("error in ActiveWorkbook.Names",n,n.Name,cntr)
            cntr2 = 1
            try:
                for s in xlApp.ActiveWorkbook.Sheets:
                    ws_names[(s.Name)] +=cntr2
                    cntr2 = cntr2 + 1
                list_ws_names = [v for v in ws_names.keys()]
            except:
                open_ok = -1
            try:
                xlApp.ActiveWorkbook.Saved = True
                xlApp.ActiveWorkbook.Close()
            except:
                open_ok = -1
                print("error in closing workbook")
            try:
                last_modified_date = datetime.datetime.fromtimestamp(os.path.getctime(cwd+"\\"+ xlfile))
            except OSError:
                last_modified_date = 0
            last_modified_date = last_modified_date.strftime("%d-%m-%Y")
            try:
                outputWriter.writerow([xlwb_name, last_modified_date,unccalc_val,cntr2-1,list_ws_names,cntr-1,list_wb_def_names])
            except:
                print("error writing to outputfile")
            print("Workbook = ", xlwb_name)
            wb_def_names.clear()
            ws_names.clear()
            checkbox1_caption = "None"
        else:
            print("Unable to process ",xlfile)
    else:
        dummy5 = 0
#assert isinstance(xlApp.Application, object)
#xlApp.Application.Quit()
outputFile.close()

I found that everything worked better if Excel was already open before I ran the program - not sure why. I'm retrieving the value for CheckBox1 and not validating it only because I did when testing the code and using the CheckBox Caption property showed that - in my instance - they were all what I thought they were - thank you Dennis Green :)


It doesn't work flawlessly. During runtime I get a quite a few (2 or 3% ??) of the files reporting "error in ActiveWorkbook.Names " from one of my error handlers.  It doesn't seem to affect the output, so I haven't bothered to figure out why some files generate the error.  I will eventually store just the filenames that create the error and see why.

A more significant error is that certain poorly written Excel macros in the WorkBook generate errors on opening the WorkBook.  As I suspected when writing the previous post, moving the statement xlApp.DisplayAlerts=False  helped a lot.  I added xlApp.EnableEvents=False to try and stop macros from causing a problem, but it too may need to be moved inside the main loop to be effective.

All in all, I know my code isn't up to 'professional' standards, but it works, does what I set out for it to do, and gives me enough knowledge to use it for many other similar projects.  Eventually, if I use Python often enough, I'll start building class libraries and user-defined functions, but I'm happy with the results as it is.


Sunday, May 28, 2017

Snakes on a computer

Wikipedia has a lot to say about Python, in part:
Since 2003, Python has consistently ranked in the top ten most popular programming languages as measured by the TIOBE Programming Community Index. As of March 2017, it is the fifth most popular language. It was ranked as Programming Language of the Year for the year 2007 and 2010. It is the third most popular language whose grammatical syntax is not predominantly based on C...

So, why has it taken me so long to play around with it?  Yesterday I remedied my ignorance.

My project was to collect information on 8000 Excel Workbooks.  Doing it manually was simply out of the question.  In the past I've used GUI macros to automate opening and inspecting 1000s of files, but those were text files.  Excel files are - obviously - a different critter altogether. Hence the need for a different approach.  Greg Laden has posted some articles regarding Python and those spurred me to investigate.

Here's the newbie code I wrote to inspect each file for external links:

1:  #  
2:  # links.py  
3:  # Find all exteral links referenced in Excel WorkBook  
4:  # Loop thru directory and write each WorkBook's name, and  
5:  # the names of all external links.  
6:  #  
7:  import win32com.client as win32  
8:  import os  
9:  import glob  
10:  import csv  
11:  from collections import Counter  
12:  os.chdir('C:\origin\XLSM')  
13:  xlfiles = sorted(glob.glob("*.xlsm"))  
14:  outputFile = open('output.csv', 'w', newline='')  
15:  outputWriter = csv.writer(outputFile)  
16:  xlApp = win32.gencache.EnsureDispatch('Excel.Application')  
17:  xlApp.ScreenUpdating = False  
18:  xlApp.DisplayAlerts = False  
19:  cwd = os.getcwd()  
20:  wb_links = []  
21:  this_file = 0  
22:  for xlfile in xlfiles:  
23:    this_file = this_file + 1  
24:    if (this_file >= 1):  # Mostly for debug purposes
25:      open_ok = 1  
26:      try:  
27:        xlwb = xlApp.Workbooks.Open(cwd+"\\"+ xlfile)  
28:      except:  
29:        open_ok = -1  
30:      try:  
31:        xlwb_name = xlApp.ActiveWorkbook.Name  
32:      except:  
33:        open_ok = -1  
34:      if open_ok:  
35:        try:  
36:          wb_links = xlApp.ActiveWorkbook.LinkSources()  
37:          print("Workbook = ", xlwb_name, wb_links)  
38:        except:  
39:          open_ok = -1  
40:          print("Unable to process ",xlfile)  
41:        try:  
42:          xlApp.ActiveWorkbook.Saved = True  
43:        except:  
44:          open_ok = -1  
45:        try:  
46:          xlApp.ActiveWorkbook.Close()  
47:        except:  
48:          open_ok = -1  
49:        outputWriter.writerow([xlwb_name, wb_links])  
50:      else:  
51:        print("Unable to process ",xlfile)  
52:    else:  
53:      dummy = 0  
54:  assert isinstance(xlApp.Application, object)  
55:  xlApp.Application.Quit()  
56:  outputFile.close()  

While this code just collects the external links, in the end my program collected the filename of each Workbook, file size, last modification date, names of all Worksheets, names and ranges of all Defined Names, external links and the cell value for a specific Defined Name of interest and wrote it all to a CSV file.

The only real problem is 18:  xlApp.DisplayAlerts = False  When the program is first run this seems to be working fine, but inevitably there is a file that causes an error in Excel that isn't covered by the 'DisplayAlerts" function.  When that happens it seems to turn 'DisplayAlerts' back on.

I might try to move Line 18 into the main loop to see if that works better, but otherwise I collected all the information I was interested in.  Total runtime (from within PyCharm) was close to two hours.

Saturday, May 20, 2017

Clueless

I think the woman on the left may be grandma - and possibly aunt Kelly in the middle with the glasses.  But that's just a WAG.  It could be grandma with her brothers and sisters - but I don't know how many she actually had.

The men in the 2nd photo must be some relation on mom's side, but not the faintest idea who they are.

These were scanned in from old negatives, palate inverted, then brightness and contrast adjusted to get something viewable.



More old photos

Impossible to say if that's Blanche or Tudy with (I assume) their dad - George Hammond.

Love the glasses on that state ID photo.  Surprised it took her 4 months after her 18th birthday to get an ID.


Thursday, May 18, 2017

Thursday, April 20, 2017

Wednesday, February 8, 2017

Means and mean error

Three sets of synthetic data:

Set L is a linear function increasing in value with each year.  Set R is a random normal distribution, and Set S is a sine function.  Do they RMSEs in and of themselves tell us anything about these functions? No.  Only by looking at the actual data would we be able to see this.



Tuesday, February 7, 2017

Someone forgot to tell the arctic to wait for Karl to bust the pause.



I was going to add text and arrow for "Curry jumps shark" but there would have been too many arrows and the graph too cluttered.

Saturday, February 4, 2017

Annual FDDs per DMI N80

Annual Total Accumulated Freezing Degree Days
(August 1st thru May 31st)

Does not include this freezing season as it is not yet complete.

So, what's happening during the summer months of June & July?


Odd, melt season temperatures are actually *decreasing* - now why would that be?




Thursday, February 2, 2017

N80 Freezing Degree Days thru Feb 2, 2017

Freezing Degree Days thru Feb 2, 2017
(Climatology = 1958 to 2002)

Accumulated FDDs 
Climatology: 3095.9
2016-7:          1833.0
Anomaly:     -1262.9


Implied new ice thickness to date:
Per Lebedev:
Climo:   1.755 m
2016-7:  1.288 m

Per Bilello:
Climo:   1.408 m
2016-7:  1.039 m

Thursday, January 26, 2017

Accumulated Freezing Degree Days N80 thru Jan 26, 2017

Accumulated Freezing Degree Days N80 thru Jan 26, 2017:
Climatology: 2899.4
2016:            1683.9
Anomaly:     -1215.4



Implied new ice thickness to date:
Per Lebedev:
Climo: 1.688 m
2016:   1.225 m

Per Billelo
Climo:  1.355 m
2016:    0.989 m


Note: Anyone claiming a "recovery" in extent is ignoring new ice thickness this year is 27% less than climatology and 10 to 15% less than the average of the past decade.  The accumulated FDD anomaly is now greater than at *any* time in the DMI N80 dataset with 40% of the freezing season still to go.