Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python for Excel: A Modern Environment for Automation and Data Analysis (xlwings.org)
196 points by teleforce on July 14, 2021 | hide | past | favorite | 90 comments


Book author and xlwings creator here. Thanks for posting this link! Funny enough, about 10 days after sending the book to the printer (March 11), Microsoft updated the feature request ("Python as en Excel scripting language") on their UserVoice page with a comment that looks like official support could eventually happen: "Thank you for the continued interest in this space. While we don’t have specific plans to announce at this time, we have been researching the topic, conducting customer interviews, and are working with the Python team at Microsoft so we can build a plan that we think can address the scenarios you told us about and ensure it can run wherever Excel runs." Fingers crossed! See https://excel.uservoice.com/forums/304921-excel-for-windows-...


Thank you soooo much! Your library saved me 100's of hours when i was working at Tesla. Showed the whole CapEx Team and they loved it. Will definitely be buying this book.


Haven't they been saying that for years?

Seem to recall a ~2017 article about it...


Yes, that is standard Microsoft speak for "thanks, we will think about it, just not now".


I guess they really get a lot of pressure there.

Putting Python in Excel can potentially lift more analysts to proper developers or at least make the code base more robust. A lot of analysts' first language is VBA and then they move on to more serious development and land on a developer job.

I'm one of those examples. My first language was VBA and I dig a bit deeper into it (e.g. reading about moving pointers and general coding best practice, downloading rubberduck extension for VBA, etc.) and taught myself C++ and Python. Eventually I landed a BI developer job.


I have seen many move to VB by following that path on life science research labs that tend to be Windows based, as per reading devices and laboratory robots.

So they went Excel => .NET, and were served by .NET and COM APIs in Excel.

In modern Excel you also have Lambda, JavaScript and PowerQuery as an inbox option.


I thought about moving to C# but we didn't really need the power so I didn't bother.

PowerQuery is also very good for data cleaning. Learned a few tricks here and there.


What's even more surprising to me is that (unless I missed something) MS hasn't pushed anything significant in terms of R integration to Excel. I just assumed that was one of their end-goals after they bought up Revolution Analytics.

I know they've done some R integration with PowerBI, but it's mind boggling how they ignore the two most popular programming languages that heavy Excel users would also be interested in.


Thanks for writing this book. I’m on the fence about going down this road or learning Power BI instead. Can you recommend a beginner resources on using Python in Excel? I’d rather use Excel and learn Python, for sure!


I have used Power BI and let's say Python (or any other programming language that has a graphic lib) gives you way more freedom.

Power BI and Tableau are favoured by analysts who do not necessarily want to put down a lot of efforts to learn programming (which is fine).


That was a very helpful distinction, thanks!


What do some of these tools do to the portability of an Excel doc? Will some things not work if I send the file to someone else who doesn't have python or xlwings?


Hi man I hope they can buy you out.


Past related threads:

The new dynamic arrays in Excel with Python and xlwings - https://news.ycombinator.com/item?id=20381083 - July 2019 (9 comments)

Call Python functions from VBA in all Microsoft Office apps - https://news.ycombinator.com/item?id=18166421 - Oct 2018 (25 comments)

Xlwings: Replace Excel VBA with Python - https://news.ycombinator.com/item?id=10304402 - Sept 2015 (1 comment)

Use Python in Excel without add-ins - https://news.ycombinator.com/item?id=8372329 - Sept 2014 (70 comments)


Missing a key component: Ability to install it.

Much of the Excel target audience - office workers - don't have much control over their work computer let alone ability to install stuff like pip allowing downloads from an unvetted repo


Agree. This is why there is so much MS Access stuff out there. It came as part of the MS Office suite so office workers had it on their desktop and could use it without having to IT and work their way up the priority queue.

VBA is terrible, but it is the only programming-like option for most office workers outside of IT/Engineering


Quite relatable.

At work, we want to build a simple database and app to interface against it. Asked our boss, asked the IT guys, got the same answer:

A : "Sure, but it has to be MS Access"

Q : "Why though, PostgreSQL / SQLite/ etc. works fine"

A : "Because we already have MS Access"

Q : "OK - could we at least get [programming language + frameworks] for the interface part"

A : "Use Excel and VBA. We already have that"


wait you asked IT to use something other than Access and they turned you down :rolleyes:

I am on a mission to eliminate Access... the day i can distribute office with out Access will be a glorious day indeed

No IT Dept should be telling users to use Access.....


I've had that same conversation, but subsititute FileMaker for Access (no access to Access on a Mac). I can run anything I want on my own Mac, but there's no way to make it shareable with the rest of the office.


you can install python and https://pypi.org/project/pywin32/ as a limited user. pywin32 allows you to automate ms office (among which excel) from within python. if you can't disable the corporate proxy have a look at cntlm.


You're right, it's the biggest challenge. But a ton of companies do now have something like Anaconda in their official software catalogue and sometimes even an internal mirror of PyPI or anaconda.org for the packages.


> Does the book use any commercial software like xlwings PRO? No, the book uses exclusively software that is open source and free (except for Excel, of course).

This is no longer technically true of Anaconda. They’ve adopted a new licensing approach that prevents employees of companies from using the free edition. Please consider adding alternative install instructions, based on PyPI and pipenv or poetry, to future versions of your book. This would help a ton of data scientists out!

I’m looking forward to reading your book via O’Reilly!

References:

Discussion on r/Python with comments from the Anaconda CEO and team, https://www.reddit.com/r/Python/comments/iqsk3y/anaconda_is_...

Anaconda license change press release, https://www.anaconda.com/blog/anaconda-commercial-edition-fa...

Anaconda Terms of Service, https://www.anaconda.com/terms-of-service

Edit: Here’s an interesting comment on the Reddit discussion from the Anaconda CEO suggesting that Miniconda can still be used with conda forge: “You can download Miniconda, and change your conda config to use conda forge, and the Terms of Service do not apply to that. The ToS only applies to commercial usage of the package repository of packages we build, at repo.anaconda.com; it does not apply to community-built and uploaded packages at anaconda.org.”, https://www.reddit.com/r/Python/comments/iqsk3y/anaconda_is_...


Side note…

While Anaconda Inc.’s licensing approach is currently confusing/frustrating, they’re correct that open source sustainability is a challenge. Projects need financial support as well as volunteers.

SciPy.org maintains Pandas, NumPy, Jupyter, iPython, Matplotlib, etc. They’ve buried a donation page on their website at… https://www.scipy.org/scipylib/donations.html


Thanks for pointing this out and I'll update the homepage accordingly. You're right about the sustainability challenge though!


miniconda remains an option though.


Considering the inclusion of Pandas in the book, I doubt manager types and econ guys are the target group. IMO, engineers would be the ones interested in applying Python to Excel work (because they realize how ridiculous VBScript is), and they are pretty likely to have install permissions on their machine.


I wouldn't sell the financial analysts short. Pandas was developed by financial analysts to assist in their routine analytics work.


Yeah, there is a lot of ridiculous usage of Excel in corporate America, but the idea that econ/finance/MBA types are not aware of Python/pandas these days is a farfetched.

There are tons of resources like this. https://quantecon.org/


In non tech companies it's super rare to have install permissions in my experience (Manufacturing/Automotive/Aerospace etc.) most companies are running proprietary software from a few key engineering software players, and very rarely there's some team writing legacy but hyper specialised FORTRAN / C / C++ engineering analysis tools.

It's a big issue to build tools for others to use - an excel file on a shared drive somewhere might be developed by one person with python installed and elevated permissions but used by 50 with no python installation etc.

IT arn't going to adjust their policies for a small group unless they have significant organizational sway.


I’m sure there are corporate users who would like to use this tool but won’t be able to install i. However, I do not think there is any call for posting dismissive comments solely based on that. It does not detract from the value of the tool, for those who can get it.


How is my comment dismissive?

I'm sharing my experience of this exact problem we have right now. Users cannot install python on their machines, in an org of 60k engineers, asking for changes to a small group of users comes with heavy resistance from IT.

On these machines (most computers in most offices in every western organization) no software can be installed from online, and running any form of executable requires IT/Admin elevation.

It's like you guys have never worked in an office in the real world that isn't dev/tech.


I think your vastly overstating how locked down most companies workstations are. Sure if you work for a fortune 500 they might be but there are 10,000 other companies where that probably isn't the case.


Really? I work for a <50 employee manufacturer/distributor, and we're pretty firmly locked down. If people want to _update Java_, they need an admin password.


Installing python does not require admin permissions in Windows, just choose $HOME/... as install path.

It also does not require admin permissions in Linux, although most distros do not offer a happy path for that.


corporate IT in the companies I am talking about is significantly more locked down, python is not install-able by going to https://www.python.org/ and downloading the binary there. Downloading executables is locked down.

Not really sure how this is a rebuttal, maybe what you say is true on a default windows installation, not on an IT-dept managed system.


You can install Anaconda without admin.


I wrote Random Forest in VBA. Programming is what you make of it.

That said, VBA is slow.


Is it slower than Python, though? It seems intuitively like it should be faster for a lot of things, since it has non-boxed numeric types, early binding, etc.


I'm not sure, selecting data from cells was slow. There might be ways to get around it. I did all sorts of optimization, but looking back on it, I probably shouldn't have saved anything in cells.


Somehow the really large MegaCorp I work at has Python, Anaconda3, and Pycharm. I was surprised and have been using it to automate some mundane work I have to do.


Virtualenv has been pretty handy, but you speak truth


Slightly offtopic, I'm why has python not been integrated into LibreOffice yet?



This isn't embedded though,this is running a python script side by side and using python to query libreoffice as opposed to enhancing a workbook via python.


Office compatibility perhaps?


I used xlwings in a variety of settings over the years, and it was my go-to when building the first two prototypes of our Excel commodity price sync functionality. We hit a performance wall pretty quickly due to the nature of our use case so we moved to Excel's C API and built our own add-in. This move took a lot of effort to get right, the C API has a tendency to punish you hard (crash Excel) for the smallest of transgressions. xlwings allowed us to prove the concept pretty quickly and without fear of crashing Excel before committing to C.

Don't mean for this to sound like a testimonial for xlwings (I guess, it kinda is) - it's been very useful for us so I had to share! It's a great library, has great documentation, and I can't recommend it enough. However, if you're after low latency* for real-time interactions across a large dataset, you'll need to look elsewhere, or roll your own.

* under 2 second response; you're dealing with Excel after all!

Shameless plug for our product: https://www.thectgrid.com/


I guess with your tool collaboration with a few people across one excel sheet would work more smoothly than with an Excel file on OneDrive?

We work in a team of only 3 people with an excel file on OneDrive and what annoys me the most is that it happens quite often that excel "says" that your changes have been saved (e.g. it says it's saved in the header, also the blue onedrive symbol shows no circle. The best indicator is the file in the explorer: If that shows the green checkmark then it's good) while in reality it's not and you only get a notification (that it cannot save because it cannot updates from someone else with yours) when you try to close the file. Annoying. Anyone with solutions is very welcome :)


Absolutely, the people using us seem think so! We target a niche: real time sharing of commodity pricing. The linked data can live in different files and layouts, can be updated via web or mobile, and full change history is stored as well.

We explored a few options when trying to solve the sharing problem, including OneDrive, Google Sheets, and some 3rd parties. Couldn’t find anything that fit the bill.

FWIW- What you describe could be down to OneDrive’s data quotas. If you’re dealing with pricing data or any numerical data then CT Grid solves that issue. Even if it’s not a perfect fit, I’d be happy to share the experience if it helps you.


PyXLL (www.pyxll.com) is written using the Excel C API and that’s partly how it achieves its excellent performance. It’s not free software, but it ‘properly’ integrates Python into the Excel process so doesn’t suffer from the performance problems of interacting with Excel from a separate process via COM and VBA.


What about in-process COM? Is that still too slow compared to the C API, or does it have other issues?


Terribly misleading title. Packages to allow Python to create/manipulate Excel files have been around for a long time.

Like MeinBlutIsBlau, I thought this was going to be Microsoft finally ditching VBA for Python as its macro language.


> Terribly misleading title.

No, its not.

> Packages to allow Python to create/manipulate Excel files have been around for a long time.

Xlwings (1) has been around for a while, itself, and more to the point (2) isn’t just a “package to allow Python to create/manipulate Excel files”. It includes an Excel add-in to allow embedding Python in Excel, using it for purposes similar to VBA.

> Like MeinBlutIsBlau, I thought this was going to be Microsoft finally ditching VBA for Python as its macro language.

Well, it allows users to do that without Microsoft doing it, so that's closer to accurate than the description implied by your dismissal.



> Microsoft finally ditching VBA for Python as its macro language.

Can't you use any language that 'integrates with' Microsoft Scripting Host?

https://en.wikipedia.org/wiki/Windows_Script_Host


You can use any language that supports COM by creating an Excel add-in. But then you have to figure out how to distribute it to the users.

WSH is something quite different. It runs VBScript, which is a different language from VBA. VBA is the same language as Visual Basic 6 and uses the same IDE.


I wrote a handy wrapper for xlrd in 2014 while making my project Ninhursag. I just checked and it still works with Python 2 – here is an example usage:

  pip2 install excelsior
  curl -O https://prd-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/atoms/files/ds140-2017-bauxi.xlsx
  excelsior -f tsv ds140-2017-bauxi.xlsx
Source code: https://github.com/peterhil/excelsior

The Ninhursag project is visualisation and modeling of USGS data series 140, which include world production of almost all minerals starting from the year 1900:

https://ninhursag.herokuapp.com/mineral/statistics (Heroku free tier – might take a few seconds to load)


I thought this was going to be about python INSIDE Excel. If MS ditched VBA for python, that would be incredible. VBA needs to honestly die cause it is just a horrible syntax to read. It's like staring at a SQL Procedure.


> I thought this was going to be about python INSIDE Excel.

Yes, xlwings (one of the packagess covered by the book, and the one on whose site the page is hosted) absolutely allows this via an Excel add-in: https://docs.xlwings.org/en/stable/addin.html


Unfortunately for all the corporate jobs I've worked at, even getting an add-on for notepad++ has more red tape than a FOIA request...


You can use lambda, JavaScript, any .NET language, any language that knows COM.

Those that only use VBA only have themselves to blame.


Back to permissions. Most office workers are not authorized to use those tools. They need the capability built into Excel.


While I agree permissions are an issue:

- lambda and JavaScript, out of the box in latest versions

- any .NET or COM aware language, PowerShell fits the bill, there is even a mini-IDE installed by default


>out of the box in latest versions

Not every company uses the latest versions of everything. We finally switched everyone to Windows 10 this year. We were paying for Windows 7 support while making the transition.


Thank for the clarification. I did see the lambda announcement, but have not looked at it yet. Will move that up the priority list.


While you are at it, have a look at Power Query as well, https://docs.microsoft.com/en-us/powerquery-m/


I don’t think one can make UDF with COM.


Python is a nicer language to use than C or C++ but that is not a very high bar to pass.

The main reason VBA is horrible is because it allows dynamic typing by default and Python would not help there at all.

VB syntax is actually nicer than Python syntax in at least 3 aspects: 1) VB is not case sensitive 2) VB allows specifying data types 3) VB uses If Then Else End If style syntax that makes it clear where things start and stop.

If Excel actually had the SQL based query interface inside the spreadsheet, it would make most of VBA unnecessary and probably improve the performance of Excel macros by several orders of magnitude.


> VB syntax is actually nicer than Python syntax in at least 3 aspects: 1) VB is not case sensitive

VB is not case sensitive, but case insensitivity isn't nice; it reduces clarity and forces circumlocution in cases where having case distinctions would not.

> 2) VB allows specifying data types

Python allows specifying data types, and the major python typecheckers support a more robust type system than VBA.

> 3) VB uses If Then Else End If style syntax that makes it clear where things start and stop

Python uses indentation-based syntax that makes it more immediately visually clear where things start and stop with less visible noise.


I'm not quite sure what my opinion on case sensitivity should be in general, but I feel like it's actually dangerous in dynamically typed languages.


Case insensitivity might be fine on a language level, but the mandatory IDE makes it annoying. It tries to normalize the case of identifiers, which would be a nice feature except for the fact that it doesn't know about scopes and can't tell variables from properties.

The result is that if you declare a variable named "count" anywhere in your project, every property (and every unrelated variable) everywhere gets renamed from "Count" to "count". Makes for annoying diffs.


I don't know - C++ has a sane static type system, value semantics and isn't ridiculously slow.

Also case insensitivity is a terrible idea. Everywhere it is used it causes problems.


On Error Resume Next


I really wish Resolver One had caught on, it was such an amazing product and really did find a nice middle ground between spreadsheet and code.

Here's a one-minute intro from their CTO showing off some of the basic features. The video is from 2009 and I'm amazed there's been nothing comparable released, open source or otherwise, in the intervening years. If there has and I've missed it, I'd love to know!

https://www.youtube.com/watch?v=u6EV2jiKRfc


There is https://gridstudio.io/ but I am not sure what the current project status is...


Ah thanks. It definitely looks like it has some similarities, hoping development picks up on it again.


I think Excel online with Office Script https://docs.microsoft.com/en-us/office/dev/scripts/overview... is awesome. Combine it with OneDrive & Power Automate & you can share & automate many boring stuff.


Does Excel online still have row and column limits?

It's always been a mystery to me why Microsoft never addressed this - they could have prevented a whole host of competitors in the "big data" space from ever existing if they'd found a way of abstracting this limit away from end users.


Most likely, because for doing those use cases you should buy into Power BI.


"big Data" and excel should never be used in the same sentence, This like using a screwdriver to drive a nail... Get a Hammer

Too many people believe excel should do everything... no, just no


Is it a limit in Excel or OOXML?


Being working on big data for a few years and I'm glad that Excel is mostly a place to paste some data for debugging. But I'm sure it's still useful for people who can leverage power pivot and all those tools for medium size data. I used xlwings occasionally some years ago and it was not a pleasant experience back then. I heard it improved a lot these years though.


It's fine - pretty easy to import data into a DataFrame and do whatever you want. The API seems relatively simple - you can marshal data into a DataFrame and vice versa.

Take it for a spin if you have any interesting spreadsheets in need of some love!


I still like Excel for certain things but I don't know why you wouldn't just use Pandas instead of this and cut out Excel.


Pandas doesn't do formatting very well. Using pandas to generate non tabular data is also a non starter. So using pandas to fill out an existing Excel file, say by entering data in specific cells, won't work.

Libraries like xlread and xlwrite can and would be one way to automate such processes. Xlwings can too. I think that Xlwings can do macros where the others can't though


Excel is just xml file. What I do is design the excel file how I need it. Export/Save as XML then use jinja2 and template it.

First sheet is usually raw data in standard rows/columns, second sheet would have the cells I need, etc a cell for sum of x columns from sheet 1.

Then use panda’s to get the data, and feed it through jinja2’s template render function.


Clever! I've written routines to use xlsxwriter for things like exporting a table to a sheet. It was a fair bit of effort the first time, but now its an easy function call.


Excel has become the de facto report writer. Look at how many products have Excel plug ins. This allows them to skip the UI development and the report writing portion.


What apt timing. I'm trying to do the same thing with Ruby; anyone knows a good Ruby gem dealing with Excel?


This would probably make a great Christmas gift for some analyst you know. Even if they don't use Excel now, it'll make them feel like their professional career is coming around full circle.


Is there something similar for Google Sheets? Perhaps as a browser extension?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: