More

Eiim · on Dec 2, 2024

They already have human segmenters segmenting existing scrolls, which presumably is used to train the program in much the same way.

Eiim · on Nov 22, 2024

You haven't substantiated why nobody else could make use of WebGPU. Are Google the only ones who can understand Beacons because they make $300B/year? GPU is hard, but it doesn't take billions to figure out.

Eiim · on June 29, 2024

Interestingly, if you ever scan a FedEx barcode, they use tons of ASCII separator characters. About the only time I've seen them significantly used though.

Eiim · on May 12, 2024

I'm working on my Master's in Statistics, so I feel I can comment on some of what's going on here (although there are others more experienced than me in the comments as well, and I generally agree with their assessments). I'm going to look only at the diabetes example paper for now, mostly because I have finals tomorrow. I find it to be the equivalent of a STA261 final project at our university, with some extra fluff and nicer formatting. It's certainly not close to something I could submit to a journal.

The whole paper is "we took an existing dataset and ran the simplest reasonable model (a logistics regression) on it". That's about 5-10 minutes in R (or Python, or SAS, or whatever else). It's a very well-understood process, and it's a good starting point to understand the data, but it can't be the only thing in your paper, this isn't the 80's anymore.

The overall style is verbose and flowery, typical of LLMs. Good research papers should be straightforward and to the point. There's also strange mixing of "we" and "I" throughout.

We learn in the introduction that interaction effects were tested. That's fine, I'd want to see it set up earlier why these interaction effects are posited to be interesting. It said earlier that "a comprehensive investigation considering a multitude of diabetes-influencing lifestyle factors concurrently in relation to obesity remains to befully considered", but quite frankly, I don't believe that. Diabetes is remarkably well-studied, especially in observational studies like this one, due to its prevalence. I haven't searched the literature but I really doubt that no similar analysis has been done. This is one of the hardest parts of a research paper, finding existing research and where its gaps are, and I don't think an LLM will be sufficiently capable of that any time soon.

There's a complete lack of EDA in the paper. I don't need much (the whole analysis of this paper could be part of the EDA for a proper paper), but some basic distributional statistics of the variables. How many responses in the dataset were diabetic? Is there a sex bias? What about age distribution? Are any values missing? These are really important for observational studies because if there's any issues they should be addressed in some way. As it is, it's basically saying "trust us, our data is perfect" which is a huge ask. It's really weird that a bunch of this is in the appendix (which is way too long to be included in the paper, would need to be supplementary materials, but that's fine) (and also it's poorly formatted) but not mentioned anywhere in the paper itself. When looking at the appendix, the main concern that I have is that only 14% of the dataset is diabetic. This means that models will be biased towards predicting non-diabetic (if you just predict non-diabetic all of the time, you're already 86% accurate!). It's not as big of an issue for logistic regression, or for observational modeling like this, but I would have preferred an adjustment related to this.

In the results, I'm disappointed by the over-reliance on p-values. This is something that the statistics field is trying to move away from, of a multitude of reasons, one of which is demonstrated quite nicely here: p-values are (almost) always miniscule with large n, and in this case n=253680 is very large. Standard errors and CIs have the same issue. The Z-value is the most useful measure of confidence here in my eyes. Effect sizes are typically the more interesting metric for such studies. On that note, I would have liked to see predictors normalized so that coefficients can be directly compared. BMI, for example, has a small coefficient, but that's likely just because it has a large range and variance.

It's claimed that the AIC shows improved fit for the second model, but the change is only ~0.5%, which isn't especially convincing. In fact, it could be much less, because we don't have enough significant figures to see how the rounding went down. p-value is basically meaningless as previously stated.

The methods section says almost nothing that isn't already stated at least once. I'd like to know something about the tools which were used in this section, which is completely lacking. I do want it highlight this quote: "Both models employed a method to adjust for all possible confounders inthe analysis." What??? All possible confounders? If you know what that means you know that that's BS. "A method"? What is your magic tool to remove all variance not reflected in the dataset, I need to know! I certainly don't see it reflected in the code.

The code itself seems fine, maybe a little over-complicated but that might be necessary for how it Interfaces with the LLM. The actual analysis is equivalent to 3 basic lines of R (read CSV, basic log reg with default parameters 1, basic log reg with default parameters 2).

This paper would probably get about a B+ in 261, but shouldn't pass a 400-level class. The analysis is very simple and unimpressive for a few reasons. For one, the questions asked of the dataset are very light. More interesting, for example, might have been to do variable selection on all interaction terms and find which are important. More models should have been compared. The dataset is also extremely simple and doesn't demand complex analysis. An experimental design, or messy data with errors and missing values, or something requiring multiple datasets, would be a more serious challenge. It's quite possible that one of the other papers addresses this though.

roykishony · on May 12, 2024

Thanks so much for these thorough comments.

You suggested some directions for more complex analysis that could be done on this data - I would be so curious to see what you get if you could take the time to try out running data-to-paper as a co-pilot on your own - you can then give it directions and feedback on where to go - will be fascinating to see where you take it!

We also must look ahead: complexity and novelty will rapidly increase as ChatGPT5, ChatGPT6 etc are rolled in. The key with data-to-paper is to build a platform that harnesses these tools in a structured way that creates transparent and well-traceable papers. Your ability to read and understand and follow all the analysis in these manuscripts so quickly speaks to your talent of course, but also to the way these papers are structured. Talking from experience, it is much harder to review human-created papers at such speed and accuracy...

As for your comments on “it's certainly not close to something I could submit to a journal” - please kindly look at the examples where we show reproducing peer reviewed publications (published in a completely reasonable Q1 journal, PLOS One). See this original paper by Saint-Fleur et al: https://journals.plos.org/plosone/article?id=10.1371/journal...

and here are 10 different independent data-to-paper runs in which we gave it the raw data and the research goal of the original publication and asked it to do the analysis reach conclusions and write the paper: https://github.com/rkishony/data-to-paper-supplementary/tree... (look up the 10 manuscripts designated “manuscriptC1.pdf” - “manuscriptC10.pdf”)

See our own analysis of these manuscripts and reliability in our arxiv preprint: https://arxiv.org/abs/2404.17605

Note that the original paper was published after the training horizon of the LLM that we used and also that we have programmatically removed the original paper from the result of the literature search that data-to-paper does so that it cannot see it in the search.

Thanks so much again and good luck for the exam tomorrow!

Eiim · on May 3, 2024

I'm actually actively researching the Grimms tales - I'm glad you linked to Zipe's translation! I think he has the most accurate and complete English translation out there, but it's still being frequently overlooked for lower-quality public-domain translations.

Eiim · on April 28, 2024

Nvidia is a great example of why a flat rule like this wouldn't work. Nvidia pretty much just does one, pretty specialized thing (GPUs) and trying to break it up into >10 pieces worth <$20B each (approximate median GDP of African nations by IMF) would be completely unnecessary. Just their gaming GPUs had ~$6B in profit in just the last year alone, and we know that their market cap comes much more from the AI market. We definitely could use stronger anti-monopoly laws, but market cap limits aren't the way to do it.

Eiim · on March 20, 2024

You can still be sued. At that point, you either: a) defend yourself as normal, while attempting to maintain anonymity (difficult & doesn't solve anything) b) let the case go to default judgement (for likely an enormous amount of money) at which point Nintendo has every reason to try to track you down in real life, and can use the courts to help them do so. Hope you were really, really thorough about your anonymity.

bitcharmer · on March 20, 2024

How do you sue someone that is anonymous?

Eiim · on March 21, 2024

File a lawsuit against a John/Jane Doe defendant(s) and describe who the defendants are, e.g., the people who code and maintain the emulator. Maybe a little harder than if you had the names but not by a lot.

Eiim · on March 21, 2024

I should say: the real difficulty is service, but if there's any way to get in contact with you, there's at least a chance it will be accepted as good service. There was famously a recent case where a defendant known only by a Bitcoin address was served by sending a transaction with an attached message to that address.

gaganyaan · on March 20, 2024

OpSec is hard. The Silk Road guy was anonymous, up until he wasn't.

Eiim · on March 19, 2024

Bad evidence is not the same thing as inadmissible evidence. Evidence is admitted, and then the fact finder determines whether to consider it, and how much weight to give it. It is likely that surveillance video will be slightly less credible now, but can still be part of a large, convincing body of evidence.

Eiim · on March 15, 2024

This is basically just a compression dictionary made up of random noise, which is... a thing that you can do

Eiim · on March 15, 2024

Part of Liebowitz's problem was that he did ~0 research before filing a lawsuit. Often the only thing he did was send the photographer a list of URLs and ask for "GO" or "NO GO" for each one. In at least some cases there was a license that he or probably should have known about, but he didn't even bother to check. He probably would've been fine with that though if he hadn't lied to the judge(s) so, so much.

freejazz · on March 15, 2024

In cases like these, you need to rely on the photographer, they are the ones who will know there is a license. So him relying upon the client, who might make a mistake, wont necessarily be the end of him as you point out, which I agree with. These photographers do not have agents, and to the extent they did, it'd be Getty, who pays them less than $5 for perpetual web-use licenses. That's why so many of them leave Getty. Getty then continues to offer new licenses for the photographs, despite their contract being terminated. The article seems to mistake that as well. I'm a copyright attorney and we have the opposite approach of Liebowitz.