Sunday, May 28, 2017

Snakes on a computer

Wikipedia has a lot to say about Python, in part:
Since 2003, Python has consistently ranked in the top ten most popular programming languages as measured by the TIOBE Programming Community Index. As of March 2017, it is the fifth most popular language. It was ranked as Programming Language of the Year for the year 2007 and 2010. It is the third most popular language whose grammatical syntax is not predominantly based on C...

So, why has it taken me so long to play around with it?  Yesterday I remedied my ignorance.

My project was to collect information on 8000 Excel Workbooks.  Doing it manually was simply out of the question.  In the past I've used GUI macros to automate opening and inspecting 1000s of files, but those were text files.  Excel files are - obviously - a different critter altogether. Hence the need for a different approach.  Greg Laden has posted some articles regarding Python and those spurred me to investigate.

Here's the newbie code I wrote to inspect each file for external links:

1:  #  
2:  # links.py  
3:  # Find all exteral links referenced in Excel WorkBook  
4:  # Loop thru directory and write each WorkBook's name, and  
5:  # the names of all external links.  
6:  #  
7:  import win32com.client as win32  
8:  import os  
9:  import glob  
10:  import csv  
11:  from collections import Counter  
12:  os.chdir('C:\origin\XLSM')  
13:  xlfiles = sorted(glob.glob("*.xlsm"))  
14:  outputFile = open('output.csv', 'w', newline='')  
15:  outputWriter = csv.writer(outputFile)  
16:  xlApp = win32.gencache.EnsureDispatch('Excel.Application')  
17:  xlApp.ScreenUpdating = False  
18:  xlApp.DisplayAlerts = False  
19:  cwd = os.getcwd()  
20:  wb_links = []  
21:  this_file = 0  
22:  for xlfile in xlfiles:  
23:    this_file = this_file + 1  
24:    if (this_file >= 1):  # Mostly for debug purposes
25:      open_ok = 1  
26:      try:  
27:        xlwb = xlApp.Workbooks.Open(cwd+"\\"+ xlfile)  
28:      except:  
29:        open_ok = -1  
30:      try:  
31:        xlwb_name = xlApp.ActiveWorkbook.Name  
32:      except:  
33:        open_ok = -1  
34:      if open_ok:  
35:        try:  
36:          wb_links = xlApp.ActiveWorkbook.LinkSources()  
37:          print("Workbook = ", xlwb_name, wb_links)  
38:        except:  
39:          open_ok = -1  
40:          print("Unable to process ",xlfile)  
41:        try:  
42:          xlApp.ActiveWorkbook.Saved = True  
43:        except:  
44:          open_ok = -1  
45:        try:  
46:          xlApp.ActiveWorkbook.Close()  
47:        except:  
48:          open_ok = -1  
49:        outputWriter.writerow([xlwb_name, wb_links])  
50:      else:  
51:        print("Unable to process ",xlfile)  
52:    else:  
53:      dummy = 0  
54:  assert isinstance(xlApp.Application, object)  
55:  xlApp.Application.Quit()  
56:  outputFile.close()  

While this code just collects the external links, in the end my program collected the filename of each Workbook, file size, last modification date, names of all Worksheets, names and ranges of all Defined Names, external links and the cell value for a specific Defined Name of interest and wrote it all to a CSV file.

The only real problem is 18:  xlApp.DisplayAlerts = False  When the program is first run this seems to be working fine, but inevitably there is a file that causes an error in Excel that isn't covered by the 'DisplayAlerts" function.  When that happens it seems to turn 'DisplayAlerts' back on.

I might try to move Line 18 into the main loop to see if that works better, but otherwise I collected all the information I was interested in.  Total runtime (from within PyCharm) was close to two hours.

No comments:

Post a Comment