> It seems like the main thing that bun does to stay ahead is cache the manifest...

the_duke · on July 6, 2022

Did you benchmark if a the binary lockfile actually makes any appreciable difference for execution time?

Considering the speed with which a fast parser can gobble up JSON I'm somewhat skeptical that this would be relevant for common operations.

joeblubaugh · on July 6, 2022

> The other big optimization is the binary formats for both the lockfile and the manifest. npm clients waste a lot of time parsing JSON.

Yes he did

Ygg2 · on July 6, 2022

Where exactly?

I don't see it either. Perf data that shows that was the issue.

rattray · on July 6, 2022

Take a look at Jarred's twitter, and you'll see he spends a lot of time profiling things:

https://news.ycombinator.com/item?id=31993429

Of course, I can't say for sure that he looked at the fastest possible way to parse json here, but my intuition would be that if he didn't, it's because he had an educated guess that it'd still be slower.

Ygg2 · on July 10, 2022

That's not logically connected to statement "parsing JSON is major bottle neck".

It's just comparison of execution times of several different package manager.

Better would be parsing JSON vs binary in Bun.

jahewson · on July 6, 2022

Fast JSON parsers use many exotic tricks. Picking one optimisation and baking it into the file format isn’t so bad.

the_duke · on July 6, 2022

You don't need to go straight to simdjson et al, something like Rust serde which desierializes to typed structs with data bllike strings borrowed from the input can be very fast.

IshKebab · on July 6, 2022

It's still very slow compared to binary formats. Especially indexed ones like SQLite.

laumars · on July 6, 2022

Nobody is arguing that JSON is equally as performant as binary formats. What the others are saying is that the amount of JSON in your average lock file should be small enough that parsing it is negligible.

If you were dealing with a multi-gigabyte lock file then it would be a different matter but frankly I agree with their point that parsing a lock file which is only a few KB shouldn’t be a differentiator (and if it is, then the JSON parser is the issue, and fixing that should be the priority rather than changing to a binary format).

Moreover the earlier comment about lock files needing to be human readable is correct. Being able to read, diff and edit them is absolutely a feature worth preserving even if it costs you a fraction of a second in execution time.

IshKebab · on July 6, 2022

> I agree with their point that parsing a lock file which is only a few KB

You mean a few MB? NPM projects typically have thousands of dependencies. A 10MB lock file wouldn't be atypical and parse time for a 10MB JSON file can absolutely be significant. Especially if you have to do it multiple times.

> Being able to read, diff and edit them is absolutely a feature worth preserving even if it costs you a fraction of a second in execution time.

You can read and edit a SQLite file way easier than a huge JSON file.

zebracanevra · on July 5, 2022

> If you add this to your .gitattributes:

Not applicable to GitHub etc though.

I'm also not seeing any speed differences when using -y/yarn lockfile. Why not make it the default?

brasic · on July 6, 2022

> Not applicable to GitHub etc though.

GitHub (disclosure: where I work) does respect some directives in a repo’s .gitattributes file. For example, you can use them to override language detection or mark files as generated or vendored to change diff presentation. You can also improve the diff hunk headers we generate by default by specifying e.g. `*.rb diff=ruby` (although come to think of it I don’t know why that’s necessary since we already know the filetype — I’ll look into it)

In principal there’s no reason we couldn’t extend our existing rich diff support used for diffing things like images to enhance the presentation of lockfile diffs. There’s not a huge benefit for text-based lock files but for binary ones (if such a scheme were to take off) it would be a lot more useful.

hoten · on July 6, 2022

Any way to use `.gitattributes` to specify a file is _not_ generated? I work on a repo with a build/ directory with build scripts, which is unfortunately excluded by default from GitHub's file search or quick-file selection (T).

brasic · on July 6, 2022

Yes! Use `<pattern> -linguist-generated` (the minus sets a negative override for any gitattribute).

Here's a test demonstrating that this usage works: https://github.com/github/linguist/blob/32ec19c013a7f81ffaee...

jakub_g · on July 6, 2022

Does this really work for jump to file? (we're not talking language statistics or supressing diffs on PRs, which is mostly what linguist readme is talking about).

Quoting the docs on finding files:

https://docs.github.com/en/search-github/searching-on-github...

> File finder results exclude some directories like build, log, tmp, and vendor. To search for files within these directories, use the filename code search qualifier.

(The inability of quick jumping to files from /build/ folder with `T` has been driving me crazy for YEARS!)

Correct me if I'm wrong, but checking those two files:

- https://github.com/github/linguist/blob/master/lib/linguist/...

I don't see `/build` matching anything there. So to me this `/build` suppression from search results seems like controlled by some other piece of software at GitHub :/

Also, files from `/build` are not hidden in diffs, so per this table: https://github.com/github/linguist/blob/HEAD/docs/overrides.... they are not "linguist-generated".

brasic · on July 6, 2022

I checked and you're right: The endpoint that returns the file list has a hardcoded set of excludes and pays no attention to `.gitattributes`.

I think it's reasonable to respect the linguist overrides here so I'll open a PR to remove entries from the exclude if the repo has a `-linguist-generated` or `-linguist-vendored` gitattribute for that directory [1]. So in your case you can add

  build/** -linguist-generated

to `.gitattributes` and once my PR lands files under `build` should be findable in file-finder.

Thanks for pointing this out! Feel free to DM me on twitter (@cbrasic) if you have more questions.

[1] Recursively matching a directory with gitattributes requires the `/**` syntax unlike .gitignore: https://git-scm.com/docs/gitattributes#:~:text=with%20a%20fe...

brasic · on July 14, 2022

For anyone watching this thread, shoot me a DM if you’d like to test this change. It should be enabled in all repos within a week or two.

brasic · on July 26, 2022

This is shipped for github.com! See https://github.com/github/docs/commit/37df8eadb1f3bd0c260653... which is now live on https://docs.github.com/en/search-github/searching-on-github...

jakub_g · on July 6, 2022

Awesome! Thanks!

hoten · on July 6, 2022

Amazing, thanks!

oynqr · on July 6, 2022

Does it try to use reflinking with `--backend=copyfile` when the FS supports it?

Jarred · on July 6, 2022

On macOS it explicitly uses clonefile()

On Linux, not yet. I don't have a machine that supports reflinks right now and I am hesitant to push code for this without manually testing it works. That being said, it does use copy_file_range if --backend=copyfile, which can use reflinks.

aconbere · on July 6, 2022

Hmmm, pnpm also uses hardlinks (or a reflink if it’s available) to copy out of a content addressable on disk cache.

jbverschoor · on July 6, 2022

Still don't understand whhy we even need all these inodes.. The repo is centrally accessible (and should be read-only btw). Resolving that shouldn't be a problem. It's been more than a decade and npm is still a mess.

aconbere · on July 6, 2022

No arguments here! I'm consistently dismayed at the state of these tools :(