Book author and xlwings creator here. Thanks for posting this link! Funny enough, about 10 days after sending the book to the printer (March 11), Microsoft updated the feature request ("Python as en Excel scripting language") on their UserVoice page with a comment that looks like official support could eventually happen: "Thank you for the continued interest in this space. While we don’t have specific plans to announce at this time, we have been researching the topic, conducting customer interviews, and are working with the Python team at Microsoft so we can build a plan that we think can address the scenarios you told us about and ensure it can run wherever Excel runs." Fingers crossed! See https://excel.uservoice.com/forums/304921-excel-for-windows-...
Thank you soooo much! Your library saved me 100's of hours when i was working at Tesla. Showed the whole CapEx Team and they loved it. Will definitely be buying this book.
Putting Python in Excel can potentially lift more analysts to proper developers or at least make the code base more robust. A lot of analysts' first language is VBA and then they move on to more serious development and land on a developer job.
I'm one of those examples. My first language was VBA and I dig a bit deeper into it (e.g. reading about moving pointers and general coding best practice, downloading rubberduck extension for VBA, etc.) and taught myself C++ and Python. Eventually I landed a BI developer job.
I have seen many move to VB by following that path on life science research labs that tend to be Windows based, as per reading devices and laboratory robots.
So they went Excel => .NET, and were served by .NET and COM APIs in Excel.
In modern Excel you also have Lambda, JavaScript and PowerQuery as an inbox option.
What's even more surprising to me is that (unless I missed something) MS hasn't pushed anything significant in terms of R integration to Excel. I just assumed that was one of their end-goals after they bought up Revolution Analytics.
I know they've done some R integration with PowerBI, but it's mind boggling how they ignore the two most popular programming languages that heavy Excel users would also be interested in.
Thanks for writing this book. I’m on the fence about going down this road or learning Power BI instead. Can you recommend a beginner resources on using Python in Excel? I’d rather use Excel and learn Python, for sure!
What do some of these tools do to the portability of an Excel doc? Will some things not work if I send the file to someone else who doesn't have python or xlwings?
Much of the Excel target audience - office workers - don't have much control over their work computer let alone ability to install stuff like pip allowing downloads from an unvetted repo
Agree. This is why there is so much MS Access stuff out there. It came as part of the MS Office suite so office workers had it on their desktop and could use it without having to IT and work their way up the priority queue.
VBA is terrible, but it is the only programming-like option for most office workers outside of IT/Engineering
I've had that same conversation, but subsititute FileMaker for Access (no access to Access on a Mac). I can run anything I want on my own Mac, but there's no way to make it shareable with the rest of the office.
you can install python and https://pypi.org/project/pywin32/ as a limited user. pywin32 allows you to automate ms office (among which excel) from within python.
if you can't disable the corporate proxy have a look at cntlm.
You're right, it's the biggest challenge. But a ton of companies do now have something like Anaconda in their official software catalogue and sometimes even an internal mirror of PyPI or anaconda.org for the packages.
> Does the book use any commercial software like xlwings PRO? No, the book uses exclusively software that is open source and free (except for Excel, of course).
This is no longer technically true of Anaconda. They’ve adopted a new licensing approach that prevents employees of companies from using the free edition. Please consider adding alternative install instructions, based on PyPI and pipenv or poetry, to future versions of your book. This would help a ton of data scientists out!
I’m looking forward to reading your book via O’Reilly!
Edit: Here’s an interesting comment on the Reddit discussion from the Anaconda CEO suggesting that Miniconda can still be used with conda forge: “You can download Miniconda, and change your conda config to use conda forge, and the Terms of Service do not apply to that. The ToS only applies to commercial usage of the package repository of packages we build, at repo.anaconda.com; it does not apply to community-built and uploaded packages at anaconda.org.”, https://www.reddit.com/r/Python/comments/iqsk3y/anaconda_is_...
While Anaconda Inc.’s licensing approach is currently confusing/frustrating, they’re correct that open source sustainability is a challenge. Projects need financial support as well as volunteers.
Considering the inclusion of Pandas in the book, I doubt manager types and econ guys are the target group. IMO, engineers would be the ones interested in applying Python to Excel work (because they realize how ridiculous VBScript is), and they are pretty likely to have install permissions on their machine.
Yeah, there is a lot of ridiculous usage of Excel in corporate America, but the idea that econ/finance/MBA types are not aware of Python/pandas these days is a farfetched.
In non tech companies it's super rare to have install permissions in my experience (Manufacturing/Automotive/Aerospace etc.) most companies are running proprietary software from a few key engineering software players, and very rarely there's some team writing legacy but hyper specialised FORTRAN / C / C++ engineering analysis tools.
It's a big issue to build tools for others to use - an excel file on a shared drive somewhere might be developed by one person with python installed and elevated permissions but used by 50 with no python installation etc.
IT arn't going to adjust their policies for a small group unless they have significant organizational sway.
I’m sure there are corporate users who would like to use this tool but won’t be able to install i. However, I do not think there is any call for posting dismissive comments solely based on that. It does not detract from the value of the tool, for those who can get it.
I'm sharing my experience of this exact problem we have right now. Users cannot install python on their machines, in an org of 60k engineers, asking for changes to a small group of users comes with heavy resistance from IT.
On these machines (most computers in most offices in every western organization) no software can be installed from online, and running any form of executable requires IT/Admin elevation.
It's like you guys have never worked in an office in the real world that isn't dev/tech.
I think your vastly overstating how locked down most companies workstations are. Sure if you work for a fortune 500 they might be but there are 10,000 other companies where that probably isn't the case.
Really? I work for a <50 employee manufacturer/distributor, and we're pretty firmly locked down. If people want to _update Java_, they need an admin password.
corporate IT in the companies I am talking about is significantly more locked down, python is not install-able by going to https://www.python.org/ and downloading the binary there. Downloading executables is locked down.
Not really sure how this is a rebuttal, maybe what you say is true on a default windows installation, not on an IT-dept managed system.
Is it slower than Python, though? It seems intuitively like it should be faster for a lot of things, since it has non-boxed numeric types, early binding, etc.
I'm not sure, selecting data from cells was slow. There might be ways to get around it. I did all sorts of optimization, but looking back on it, I probably shouldn't have saved anything in cells.
Somehow the really large MegaCorp I work at has Python, Anaconda3, and Pycharm. I was surprised and have been using it to automate some mundane work I have to do.
This isn't embedded though,this is running a python script side by side and using python to query libreoffice as opposed to enhancing a workbook via python.
I used xlwings in a variety of settings over the years, and it was my go-to when building the first two prototypes of our Excel commodity price sync functionality. We hit a performance wall pretty quickly due to the nature of our use case so we moved to Excel's C API and built our own add-in. This move took a lot of effort to get right, the C API has a tendency to punish you hard (crash Excel) for the smallest of transgressions. xlwings allowed us to prove the concept pretty quickly and without fear of crashing Excel before committing to C.
Don't mean for this to sound like a testimonial for xlwings (I guess, it kinda is) - it's been very useful for us so I had to share! It's a great library, has great documentation, and I can't recommend it enough. However, if you're after low latency* for real-time interactions across a large dataset, you'll need to look elsewhere, or roll your own.
* under 2 second response; you're dealing with Excel after all!
I guess with your tool collaboration with a few people across one excel sheet would work more smoothly than with an Excel file on OneDrive?
We work in a team of only 3 people with an excel file on OneDrive and what annoys me the most is that it happens quite often that excel "says" that your changes have been saved (e.g. it says it's saved in the header, also the blue onedrive symbol shows no circle. The best indicator is the file in the explorer: If that shows the green checkmark then it's good) while in reality it's not and you only get a notification (that it cannot save because it cannot updates from someone else with yours) when you try to close the file. Annoying. Anyone with solutions is very welcome :)
Absolutely, the people using us seem think so! We target a niche: real time sharing of commodity pricing. The linked data can live in different files and layouts, can be updated via web or mobile, and full change history is stored as well.
We explored a few options when trying to solve the sharing problem, including OneDrive, Google Sheets, and some 3rd parties. Couldn’t find anything that fit the bill.
FWIW- What you describe could be down to OneDrive’s data quotas. If you’re dealing with pricing data or any numerical data then CT Grid solves that issue. Even if it’s not a perfect fit, I’d be happy to share the experience if it helps you.
PyXLL (www.pyxll.com) is written using the Excel C API and that’s partly how it achieves its excellent performance. It’s not free software, but it ‘properly’ integrates Python into the Excel process so doesn’t suffer from the performance problems of interacting with Excel from a separate process via COM and VBA.
> Packages to allow Python to create/manipulate Excel files have been around for a long time.
Xlwings (1) has been around for a while, itself, and more to the point (2) isn’t just a “package to allow Python to create/manipulate Excel files”. It includes an Excel add-in to allow embedding Python in Excel, using it for purposes similar to VBA.
> Like MeinBlutIsBlau, I thought this was going to be Microsoft finally ditching VBA for Python as its macro language.
Well, it allows users to do that without Microsoft doing it, so that's closer to accurate than the description implied by your dismissal.
You can use any language that supports COM by creating an Excel add-in. But then you have to figure out how to distribute it to the users.
WSH is something quite different. It runs VBScript, which is a different language from VBA. VBA is the same language as Visual Basic 6 and uses the same IDE.
I wrote a handy wrapper for xlrd in 2014 while making my project Ninhursag.
I just checked and it still works with Python 2 – here is an example usage:
The Ninhursag project is visualisation and modeling of USGS data series 140, which include world production of almost all minerals starting from the year 1900:
I thought this was going to be about python INSIDE Excel. If MS ditched VBA for python, that would be incredible. VBA needs to honestly die cause it is just a horrible syntax to read. It's like staring at a SQL Procedure.
> I thought this was going to be about python INSIDE Excel.
Yes, xlwings (one of the packagess covered by the book, and the one on whose site the page is hosted) absolutely allows this via an Excel add-in: https://docs.xlwings.org/en/stable/addin.html
Not every company uses the latest versions of everything. We finally switched everyone to Windows 10 this year. We were paying for Windows 7 support while making the transition.
Python is a nicer language to use than C or C++ but that is not a very high bar to pass.
The main reason VBA is horrible is because it allows dynamic typing by default and Python would not help there at all.
VB syntax is actually nicer than Python syntax in at least 3 aspects: 1) VB is not case sensitive 2) VB allows specifying data types 3) VB uses If Then Else End If style syntax that makes it clear where things start and stop.
If Excel actually had the SQL based query interface inside the spreadsheet, it would make most of VBA unnecessary and probably improve the performance of Excel macros by several orders of magnitude.
> VB syntax is actually nicer than Python syntax in at least 3 aspects: 1) VB is not case sensitive
VB is not case sensitive, but case insensitivity isn't nice; it reduces clarity and forces circumlocution in cases where having case distinctions would not.
> 2) VB allows specifying data types
Python allows specifying data types, and the major python typecheckers support a more robust type system than VBA.
> 3) VB uses If Then Else End If style syntax that makes it clear where things start and stop
Python uses indentation-based syntax that makes it more immediately visually clear where things start and stop with less visible noise.
Case insensitivity might be fine on a language level, but the mandatory IDE makes it annoying. It tries to normalize the case of identifiers, which would be a nice feature except for the fact that it doesn't know about scopes and can't tell variables from properties.
The result is that if you declare a variable named "count" anywhere in your project, every property (and every unrelated variable) everywhere gets renamed from "Count" to "count". Makes for annoying diffs.
I really wish Resolver One had caught on, it was such an amazing product and really did find a nice middle ground between spreadsheet and code.
Here's a one-minute intro from their CTO showing off some of the basic features. The video is from 2009 and I'm amazed there's been nothing comparable released, open source or otherwise, in the intervening years. If there has and I've missed it, I'd love to know!
Does Excel online still have row and column limits?
It's always been a mystery to me why Microsoft never addressed this - they could have prevented a whole host of competitors in the "big data" space from ever existing if they'd found a way of abstracting this limit away from end users.
Being working on big data for a few years and I'm glad that Excel is mostly a place to paste some data for debugging. But I'm sure it's still useful for people who can leverage power pivot and all those tools for medium size data. I used xlwings occasionally some years ago and it was not a pleasant experience back then. I heard it improved a lot these years though.
It's fine - pretty easy to import data into a DataFrame and do whatever you want. The API seems relatively simple - you can marshal data into a DataFrame and vice versa.
Take it for a spin if you have any interesting spreadsheets in need of some love!
Pandas doesn't do formatting very well. Using pandas to generate non tabular data is also a non starter. So using pandas to fill out an existing Excel file, say by entering data in specific cells, won't work.
Libraries like xlread and xlwrite can and would be one way to automate such processes. Xlwings can too. I think that Xlwings can do macros where the others can't though
Clever! I've written routines to use xlsxwriter for things like exporting a table to a sheet. It was a fair bit of effort the first time, but now its an easy function call.
Excel has become the de facto report writer. Look at how many products have Excel plug ins. This allows them to skip the UI development and the report writing portion.
This would probably make a great Christmas gift for some analyst you know. Even if they don't use Excel now, it'll make them feel like their professional career is coming around full circle.