Hacker Newsnew | past | comments | ask | show | jobs | submit | messe's commentslogin

Now I'm really curious. What field are you in that ndjson files of that size are common?

I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised.


> Now I'm really curious. What field are you in that ndjson files of that size are common?

I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time.


So what's the use case for keeping them in that format rather than something more easily indexed and queryable?

I'd probably just shove it all into Postgres, but even a multi terabyte SQLite database seems more reasonable.


Replying here because the other comment is too deeply nested to reply.

Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it.

Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput.

rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds.


You make some good points. I've worked in support before, so I shouldn't have discounted how frequent "once-offs" can be.

The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline

Fair, but for a once-off thing performance isn't usually a major factor.

The comment I was replying to implied this was something more regular.

EDIT: why is this being downvoted? I didn't think I was rude. The person I responded to made a good point, I was just clarifying that it wasn't quite the situation I was asking about.


At scale, low performance can very easily mean "longer than the lifetime of the universe to execute." The question isn't how quickly something will get done, but whether it can be done at all.

Good point. I said it above, but I'll repeat it here that I shouldn't have discounted how frequent once offs can be. I've worked in support before so I really should've known better

Certain people/businesses deal with one-off things every day. Even for something truly one-off, if one tool is too slow it might still be the difference between being able to do it once or not at all.

> No one cares at amateur level

Except people clearly fucking do for some reason, and all that's going to happen is make life worse for women both cis and trans. Trans women will get excluded, and cis women who are "too good" or not fitting societal ideals of femininity will be accused of being trans. This is already happening to children.

> If you chose to identify as another sex

When did you choose to identify as the gender you were born with?


Because they're probably a native speaker.

EDIT: this is exactly the kind of mistake that native speakers make, that ESL speakers don't.


If you can rsync from the other system, and likely have an SSH connection between them, why don't you just add it as an additional remote and git pull from it directly?

I probably could. How does that work with uncommitted changes on the host? Would that be a problem?

You cannot git push something that is not committed. The solution is to commit often (and do it over ssh if you forget on a remote system). It doesn't need to a presentable commit. That can be cleaned up later. I use `git commit -amwip` all the time.

Sure, you might neglect to add a file to your commit, or commit at all, but that's a problem whether you're pushing to a central public git forge or not.


You'd create a bare git repo (just the contents of .git) on the host with git init --bare, separate from your usual working tree, and set it as a remote for your working trees, to which you can push and pull using ssh or even a path from the same machine.

If you have ssh access to the remote machine to set up a git remote, you can login to the remote machine and commit the changes that you forgot to commit.

Roughly:

`ssh remote "cd $src/repo ; git diff" | git apply`

(You'll need to season to taste: what to do with staged changes, how to make sure both trees are in the same HEAD, etc)


> It's not about bypassing access restrictions.

Yes. It is. You've just made an arbitrary choice not to define it as such.


I will add a PR to enforce robots.txt before the actual scraping.

Or just follow web standards and define and publish your User-Agent header, so that people can block that as needed.

You're creating the wrong kind of value. I really hope your company fails, as its success implies a failure of the web in general.

I wish you the best success outside of your current endeavour.


LLM written comments are not permitted on this site.

1. What makes you think it is written by an LLM

2. Where is that rule, could you cite it?

3. How dow I know you did not use LLM for your comment?


1. Word choice, phrasing, and sentence structure make it seem likely. Ironically, one has to go on vibes. One gets a feel for the voice and tone used by LLMs after a while. It's also a new account with one comment.

2. "Don't post generated comments or AI-edited comments. HN is for conversation between humans." From https://news.ycombinator.com/newsguidelines.html

3. You don't.


1 and 3 contradict each other. Last thing people need is anti-AI hysteria.

For fucks sake.

I've been considering it for a while, but I'm definitely now pitching a move away from GitHub at our organization.


My fucking god the American exceptionalism arrogance runs strong.


It's fine if you think the American approach to free speech is bad - you don't have to live here - but please justify that rather than just name-calling.


Can you remind me when those actually passed? I can pull equally up equally ridiculous bills from the US that never came to fruition.


I am not saying passing, but seems there is a large group of politicians(supposedly backed by voters?) who lobby such initiatives who are not some alt-right fascist outliers?

(I am not from US, please keep that strawman out)


Don't discount that it could be both. It's still early in some parts of the US, they might not have had their coffee yet.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: