Hacker Newsnew | past | comments | ask | show | jobs | submit | stevepike's commentslogin

It looks like this is a community continuation of the axlsx gem which was maintained back in the day by Randy Morgan (randym) over at https://github.com/randym/axlsx. One of my earliest open source contributions was adding support so that you could output spreadsheets with "conditional" formatting (color something red if it is below some value, for instance). I remember Randy being extremely supportive of new contributors and it made me want to be a part of the ruby community.

Thanks for continuing the work!


Good memories of the open source world. I couldn't find my commits in either repo and i'm afraid i might have been shy to upstream them and just did them in a fork. it was tiny, There was some issue in some less number formatting variation involving currencies in multiple locales. we needed xlsx for its ability to do some nice formatting etc but i really wanted to have a need for things like generating charts, embedding scripts etc just for the sheer nerdiness

I'm a bit surprised it gets this question wrong (ChatGPT gets it right, even on instant). All the pre-reasoning models failed this question, but it's seemed solved since o1, and Sonnet 4.5 got it right.

https://claude.ai/share/876e160a-7483-4788-8112-0bb4490192af

This was sonnet 4.6 with extended thinking.


Interesting, my sonnet 4.6 starts with the following:

The classic puzzle actually uses *eight 8s*, not nine. The unique solution is: 888+88+8+8+8=1000. Count: 3+2+1+1+1=8 eights.

It then proves that there is no solution for nine 8s.

https://claude.ai/share/9a6ee7cb-bcd6-4a09-9dc6-efcf0df6096b (for whatever reason the LaTeX rendering is messed up in the shared chat, but it looks fine for me).


Yeah, earlier in the GPT days I felt like this was a good example of LLMs being "a blurry jpeg of the web", since you could give them something that was very close to an existing puzzle that exists commonly on the web, and they'd regurgitate an answer from that training set. It was neat to me to see the question get solved consistently by the reasoning models (though often by churning a bunch of tokens trying and verifying to count 888 + 88 + 8 + 8 + 8 as nine digits).

I wonder if it's a temperature thing or if things are being throttled up/down on time of day. I was signed in to a paid claude account when I ran the test.


Chatgpt doesn't get it right: https://chatgpt.com/share/6994c312-d7dc-800f-976a-5e4fbec0ae...

``` Use digit concatenation plus addition: 888 + 88 + 8 + 8 + 8 = 1000 Digit count:

888 → three 8s

88 → two 8s

8 + 8 + 8 → three 8s

Total: 3 + 2 + 3 = 9 eights Operation used: addition only ```

Love the 3 + 2 + 3 = 9


chatgpt gets it right. maybe you are using free or non thinking version?

https://chatgpt.com/share/6994d25e-c174-800b-987e-9d32c94d95...


My locally running nemotron-3-nano quantized to Q4_K_M gets this right. (although it used 20k thought tokens before answering the question)


Off-by-one errors are one of the hardest problems in computer science.


That is not an off-by-one error in a computer science sense, nor is it "one of the hardest problems in computer science".


This was in reference to a well-known joke, see here: https://martinfowler.com/bliki/TwoHardThings.html



This is hard for lots of companies. Some ignore the problem entirely until there's a fire drill (which can be a huge risk if you end up on an old major version that won't get patched). Some keep everything up to date, and then taking a new security patch is trivial. It's always risk/reward tradeoff between the risk of breaking production with an upgrade and the value an org sees from staying up to date. We work on this problem at Infield (https://www.infield.ai/post/introducing-infield) where we tackle both sides of the project management: "Which dependencies should I prioritize upgrading" and "How difficult and likely to break production is this upgrade".

To your specific points

> 1. How do you decide what's actually urgent? CVSS? EPSS? Manual assessment?

The risk factors we track are open CVEs, abandonment (is this package supported by the maintainer?), and staleness (how deep in the hole am I?).

We also look at the libyear metric as an overall indication of dependency health.

> 2. Do you treat "outdated but not vulnerable" dependencies differently from "has CVEs"?

We group upgrades into three general swimlanes:

  - "trivial" upgrades (minor/patch versions of packages that respect semantic versioning, dev/test only packages). We batch these together for our customers regardless of priority.

  - "could break". These deserve standalone PRs and an engineer triaging when these become worth tackling, if ever.

  - "major frameworks". Think something like Rails. These are critical to keep on supported versions of because the rest of the ecosystem moves with them, and vulnerabilities in them tend to have a large blast radius. Upgrading these can be hard. You'll definitely need to upgrade someday to stay supported, and getting there has follow-on benefits on all your other dependencies, so these are high priority.
> 3. For those using Dependabot/Renovate/Snyk - what's your workflow? Do you review every alert or have you found a good filtering system?

We offer a Github app that integrates with alerts from Dependabot. While security teams are happy with just a scanner, the engineering teams that actually do this upgrade work need to mash that up with all the other data we're talking about here.


Sounds very interesting solution! Do you support all the famous programming languages? Do you also offer prioritasion on the "issues"?


Thanks! We support Python, JS, and Ruby right now (started with dynamic languages).

I'm not sure what you mean by prioritization on the issues, but generally we are trying to help you figure out what to upgrade next, and to actually do it too.


Yeah that's exactly what I meant by issues prioritasion, thanks! Do you plan to support PHP or it's totally out of scope?


PHP would definitely be in scope, either that or Java are likely to be next for us. If you are familiar with PHPs ecosystem I'd be interested in your take on what's most important / problematic there.


This is cool, it looks to me like you're integrating static analysis on the user's codebase and the underlying dependency. Very curious to see where it goes.

We've found dependency upgrades to be deceptively complex to evaluate safety for. Often you need context that's difficult or impossible to determine statically in a dynamically typed language. An example I use for Ruby is the kwarg migration from ruby 2.7->3 (https://www.ruby-lang.org/en/news/2019/12/12/separation-of-p...). It's trivial to profile for impacted sites at runtime but basically impossible to do it statically without adopting something like sorbet. Do you have any benchmarks on how reliable your evaluations are on plain JS vs. typescript codebases?

We ended up embracing runtime profiling for deprecation warnings / breaking changes as part of upgrading dependencies for our customers and have found that context to unlock more reliable code transformations. But you're stuck building an SDK for every language you want to support, and it's more friction than installing a github app.


This is very true. It can be a real fire drill if it turns out you need to go up a major version in some other dependency in order to get a security fix. It can get even worse in JS if you're on some abandoned package that's pinned to an old version of some transient dependency which turns out to be vulnerable. Then you're scrambling to migrate to some alternate package with no clear upgrade path.

On the flipside sometimes you get lucky and being on an old version of a package means you don't have the vulnerability in the first place.

libyear is a helpful metric for tracking how much of this debt you might have.


I have been in the position of having a mix of having to contend with very old (4+ year) transient dependencies brought in by contemporary dependencies where npm and node versions complain about deprecations and associated security issues. I get into icky package.json `overrides` to force these transient dependencies to upgrade.


This seems to show the power of the reasoning models over interacting with a prompted chat-tuned LLM directly. If I navigate backwards on your link Sonnet 4 gets it right.

I've used a similar prompt - "How can you make 1000 with exactly nine 8s using only addition?"

Here's GPT 4.5 getting it wrong: https://chatgpt.com/share/683f3aca-8fbc-8000-91e4-717f5d81bc...

It tricks it because it's a slight variation of an existing puzzle (making 1000 with 8 8s and addition only).

The reasoning models seem to reliably figure it out, though. Some of them even come up with a proof of why it's impossible to do with 9 8s. Here's o4 getting it right: https://chatgpt.com/share/683f3bc2-70b8-8000-9675-4d96e72b58...


I think the kind of application here matters a lot, specifically whether you're trying to make a change to a web app or if you're hacking on library code.

In ruby, for example, I can pretty trivially clone any open source gem and run the specs in < 5 minutes. Patching something and opening a PR in under an hour is definitely standard.

On the other hand, getting a development environment running for a company's proprietary web app is often a hassle. Mostly though this isn't because of the language or dependencies, it's because of:

  - Getting all the dependent services up and running (postgres version X, redis Y, whatever else) with appropriate seed data. 
  - Getting access to development secrets
My company (infield.ai) upgrades legacy apps, so we deal with setting up a lot of these. We run them in individual siloed remote developer environments using devcontainers. It works OK once we've configured the service containers.


It doesn't do this for me. I've got side-loaded and Amazon store books on the same device, no problem.


It also deleted all my side-loaded books. That was the last straw for me. I only buy DRM-free media from now on and only use respectful hardware. I use my Remarkable 2 primarily for e-reading now, though I concede fully it's not the best user experience for reading. But I don't have to worry that it will delete my books! I can also now "write in the margins" which I've found to be a powerful way to take notes. I can't bring myself to write on physical books, but with Remarkable you can have a copy that is stock and a copy with your notes on it. Best of both worlds!


They're tracking different related things. I run a startup in this space and we track: aggregate libyear of your direct dependencies; total # of direct dependencies with libyear > 2; # of direct dependencies at least one major version behind; dependencies that have been abandoned by the maintainer.

I think the top-line aggregate libyear number is helpful to monitor over time to get a general sense of the slope of your technical debt. If the number is trending upwards then your situation is getting worse and you're increasing the chance you find yourself in an emergency (i.e., a CVE comes out and you're on an unsupported release line and need to go up major versions to take the patch).

Tracking total # of major versions behind gets at the same thing but it's less informative. If you're on v1 of some package that has a v2 but is actively releasing patches for the v1 line that should be a lower priority upgrade than some other package where your v1 line hasn't had a release in 5 years.


It feels like it just has so many weird edge cases. A stable 2.3 branch that hasn't changed while the 1.2 branch has major security issues punishes you for not using the 1.x version.

A regularly updated 1.x branch for docs/security looks like you're doing fine even though the project is on 3.x and deprecating soon.

Perhaps as a vague guide to point to potential issues, sure.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: