More

aquafox · 2025-11-13T05:21:15 1763011275

Having gone through the explainations of the Transformer Explainer [1], I now have a good intuition for GPT-2. Is there a resource that gives intuition on what changes since then improve things like more conceptually approaching a problem, being better at coding, suggesting next steps if wanted etc? I have a feeling this is a result of more than just increasing transformer blocks, heads, and embedding dimension.

[1] https://poloclub.github.io/transformer-explainer/

ACCount37 · 2025-11-13T09:08:10 1763024890

Most improvements like this don't come from the architecture itself, scale aside. It comes down to training, which is a hair away from being black magic.

The exceptions are improvements in context length and inference efficiency, as well as modality support. Those are architectural. But behavioral changes are almost always down to: scale, pretraining data, SFT, RLHF, RLVR.

aquafox · 2025-10-27T16:17:41 1761581861

I once bought an Office 2016 license and when I installed it this year on a new laptop, it turned itself into a trimmed down O365. After the first Office update, I got a non-closable ad next to my Excel spreadsheet to upgrade to a full O365. Even more, I was only able to save files to OneDrive and not locally. That was not what I originally paid for!

Tepix · 2025-10-27T17:00:54 1761584454

It's fraud. Plain and simple.

amlib · 2025-10-27T19:30:39 1761593439

Software as a Service is fraud

paganel · 2025-10-27T20:22:36 1761596556

> I was only able to save files to OneDrive and not locally.

I find this very infuriating, and I've stopped using MS for more than 10 years now. They used to be a proper software company, with their flows, of course, but quite professional in the great scheme of things. But what you're describing goes against everything that I've valued as a computer programmer when I entered this field of work ~20 years ago.

aquafox · 2025-10-20T18:51:19 1760986279

> Our liveable breathable atmosphere is razor thin compared to the size of earth.

If earth were a grapefruit, our atmosphere would be ~1mm thick!

aquafox · 2025-10-19T13:46:27 1760881587

The problem for me is not getting my DNA sequenced but not having to trust a third party with my genetic information. As wirtten in the article, they only achieved a 13% coverage (even less if because you have to assume that not all base calls are correct), which is not useful for any sort of genetic analysis. So the title is really misleading.

aquafox · 2025-09-17T19:16:25 1758136585

Terence Tao uses a trick, I think he calls "structured procrastination": When there is a thing he doesn't want to do, he recalls another thing he doesn't want to do more. This way he's procrastinating on the other thing by doing the not favoured one.

1dom · 2025-09-17T19:37:29 1758137849

I think that sounds like productive procrastion, it won an Ig-Nobel award. As you say, it's basically finding something you don't want to do even more than the thing you need to do, so you instead procrastinate productively by doing the needful.

aquafox · 2025-08-18T20:18:01 1755548281

The consensus here seems to be that Python is missing a pipe operator. That was one of the things I quickly learned to appreciate when transitioning from Mathematica to R. It makes writing data science code, where the data are transformed by a series of different steps, so much more readable and intuitive.

I know that Python is used for many more things than just data science, so I'd love to hear if in these other contexts, a pipe would also make sense. Just trying to understand why the pipe hasn't made it into Python already.

atq2119 · 2025-08-19T02:33:02 1755570782

The next step after pipe operators would be reverse assignment statements to capture the results.

I find myself increasingly frustrated at seeing code like 'let foo = many lines of code'. Let me write something like 'many lines of code =: foo'.

teo_zero · 2025-08-19T06:43:44 1755585824

> reverse assignment statements to capture the results

Interesting idea! However, I'm not sure I would prefer

"Mix water, flour [...] and finally you'll get a pie"

to

"To make a pie: mix water, flour [...]"

redochre · 2025-08-19T07:46:47 1755589607

R does have a right assign operator, namely ->

It's use is discourages in most style guides. I do not use it in scripts, but I use it heavily in console/terminal workflows where I'm experimenting.

df |> filter() |> summarise() -> x

x |> mutate() -> y

plot(y)

levocardia · 2025-08-19T01:51:25 1755568285

The pipe operator in R (really, tidyverse R, which might as well be its own language) is one of its "killer apps" for me. Working with data is so, so pleasant and easy. I remember a textbook that showed two ways of "coding" a cookie recipe:

bake(divide(add(knead(mix(flour, water, sugar, butter)),eggs),12),450,12)

versus

mix(flour, water, sugar, butter) %>% knead() %>% add(eggs) %>% divide(12) %>% bake(temp=450, minutes=12)

So much easier!

yccs27 · 2025-08-19T08:29:27 1755592167

You'd never write that ugly one-liner. Just write the recipe imperatively:

    dough = mix(flour, water, sugar, butter)
    dough.knead()
    dough = dough.add(eggs)
    cookies = dough.divide(12)
    cookies = bake(temp=450, minutes=12)

Might be more verbose, but definitely readable.

aredox · 2025-08-19T14:13:27 1755612807

Meh, it is really annoying to define all those brarely-used variables (and pray you don't have more than one kind of "dough" in your program... Otherwise you begin to have cookie_dough/CookieDough/cookie-dough/cookieDough and friends everywhere, and refactors begin to become annoying fast)

hagendaasalpine · 2025-08-19T03:30:02 1755574202

pandas and polars both have pipe methods available on dataframes. you can method chain to the same effect. it's considered best practise in pandas as you're hopefully not mutating the initial df

nxpnsv · 2025-08-18T20:34:11 1755549251

I don't know if

   result = (df
      .pipe(fun1, arg1=1)
      .pipe(fun2, arg2=2)
   )

is much less readable than

   result <- df |> 
      fun1(., arg1=1) |> 
      fun2(., arg2=2)

but I guess the R thing also works beyond dataframes which is pretty cool

itishappy · 2025-08-18T21:57:37 1755554257

The pipe operator uses what comes before as the first argument of the function. This means in R it would be:

    result <- df
      |> fun1(arg1=1)
      |> fun2(arg2=2)

Python doesn't have a pipe operator, but if it did it would have similar syntax:

    result = df
      |> fun1(arg1=1)
      |> fun2(arg2=2)

In existing Python, this might look something like:

    result = pipe(df, [
      (fun1, 1),
      (fun2, 2)
    ])

(Implementing `pipe` would be fun, but I'll leave it as an exercise for the reader.)

Edit: Realized my last example won't work with named arguments like you've given. You'd need a function for that, which start looking awful similar to what you've written:

    result = pipe(df, [
      step(fun1, arg1=1),
      step(fun2, arg2=2)
    ])

superbatfish · 2025-08-19T02:11:44 1755569504

>Implementing `pipe` would be fun, but I'll leave it as an exercise for the reader.

I like exercise:

https://gist.github.com/stuarteberg/6bcbe3feb7fba4dc2574a989...

nxpnsv · 2025-08-19T08:28:29 1755592109

Neat!

Izkata · 2025-08-19T00:59:16 1755565156

Python supports a syntax like your first example by implementing the appropriate magic method for the desired operator and starting the chain with that special object. For example, using just a single pipe: https://flexiple.com/python/python-pipeline-operator

The functions with extra arguments could be curried, or done ad-hoc like lambda v: fun1(v, arg1=1)

lsaferite · 2025-08-19T12:35:11 1755606911

I find this (hypothetical) syntax *very* elegant.

    result = df
      |> fun1(arg1=1)
      |> fun2(arg2=2)

nxpnsv · 2025-08-19T08:30:23 1755592223

Thanks!

mb7733 · 2025-08-18T21:04:19 1755551059

I haven't used R in forever, but is your `.` placeholder actually necessary? From my recollection of pipe operator the value being pipe piped is automatically as the first argument to the next function. That may have been a different implementation of a pipe operator though.

nxpnsv · 2025-08-19T08:29:23 1755592163

Probably not, I didn’t use R much during the last decade …

tekknik · 2025-08-20T19:54:24 1755719664

It seem like creative use of the map function and some iterators would provide the same functionality as a pipe does

aquafox · 2025-07-19T05:33:48 1752903228

> This sort of thing is exactly like preventative whole body MRI scans. It's very noisy, very overwhelming data that is only statistically useful in cases we're not even sure about yet. To use it in a treatment program is witchcraft at this moment, probably doing more harm than good.

The child of a friend of mine has PTEN-Hamartom-Tumor-Syndrom, a tendency to develop tumors throughout life due to a mutation in the PTEN gene. The poor child gets whole body MRIs and other check-ups every half year. As someone in biological data science, I always tell the parents how difficult it will be to prevent false positives, because we don't have a lot of data on routine full body check-ups on healty people. We just know the huge spectrum on how healthy/ok tissue looks like.

lokrian · 2025-07-19T07:29:16 1752910156

Hopefully gene therapy can fix this sort of problem.

LoganDark · 2025-07-19T08:51:03 1752915063

is it even possible for gene therapy to just rewrite all the existing DNA in a body? can't you only do that to cells that are dividing or whatever?

tim333 · 2025-07-19T12:47:15 1752929235

They've managed to treat sickle cell.

>CRISPR/Cas9 can be directed to cut DNA in targeted areas, enabling the ability to accurately edit (remove, add, or replace) DNA where it was cut. The modified blood stem cells are transplanted back into the patient where they engraft (attach and multiply) within the bone marrow...

https://www.fda.gov/news-events/press-announcements/fda-appr...

lakhim · 2025-07-19T21:12:34 1752959554

They did it by killing all the normal blood stem cells in the body. This is difficult to say the least for something that is completely systemic

aquafox · 2025-06-27T18:53:40 1751050420

I would plot the destination matrix as a jeatap where each row is a departure and each column an arrival and color is the number of trips. Additionally, you could cluster the rows and columns of this heatmap.

aquafox · 2025-06-21T18:12:58 1750529578

On a related note: Transporting a human in a car is (in relation to weight and size) like using a standard shopping cart to transport two 1L bottles of water. So the next time you walk through a pedestrian area, imagine everyone carrying a bag would use a shopping cart instead. That would be a huge traffic jam -- exactly like what you see on the road!

WD-42 · 2025-06-21T18:58:07 1750532287

I've been pretty aware of this ever since I became a cyclist. I will ride down to the corner store to pick up a six pack and some chips, throw them in a backpack and ride back. It's easy. I see people driving their cars to do the same thing. All that weight and space for a 6 bottles of beer. There is massive waste all around us.

spiritplumber · 2025-06-22T09:09:01 1750583341

A long time ago in San Antonio TX I was pulled over by the cops while biking back to my little apartment with a bunch of groceries. They were unwilling to believe that an adult would leave the car home to get groceries by bicycle.

(I'm from Italy originally).

linker3000 · 2025-06-22T20:08:37 1750622917

Context: We're from the UK.

My wife and I left a meeting in a business park in Phoenix and decided to walk the 5 mins to the local shopping mall, have a look around and then get a taxi back to the apartment in which we were staying (We'd taken a taxi to the meeting).

We were about 2 minutes into our walk when a car pulled up and it was one of the people from the meeting. People in the office had spotted us walking and assumed there was some kind of emergency or our car had broken down.

We had to be very politely insistent that we didn't need a lift to the mall and were perfectly fine.

bjelkeman-again · 2025-06-22T13:00:43 1750597243

We experienced the same when we walked down the hill to go shopping in Laguna Niguel, CA. Stopped by cops for walking to the store. Nothing more happened.

lostlogin · 2025-06-21T19:42:56 1750534976

There is also the time component. Off peak and with a decent sized backpack (change of clothes, laptop, food etc) it takes me the same time to go 6km as it does to drive it.

At peak it’s 1/4 to 1/3rd the time.

Cars are slow around town.

Zambyte · 2025-06-22T13:32:58 1750599178

The time component has to factor in both the traffic while driving, and the extra time required to find available parking. I bought an electric scooter a few weeks ago, and I have come to realize that my travel time is pretty much purely a function of distance. I just roll up past traffic if there is any, lock up on any bolted down object, do my business, unlock, and roll out.

If anything, I feel like traveling at rush hour is actually strictly better for me. Cars being slow doesn't slow me down, but with the average speed being so much lower during rush hour, it seems like it makes it so if a driver hits me, it would be at a lower speed.

tengwar2 · 2025-06-21T21:54:05 1750542845

It's a reasonable solution, but let's not forget that simply walking is often at least as good a solution in many countries.

ornornor · 2025-06-22T06:32:23 1750573943

Getting a trailer (burley cargo in this case, but applies more generally) has been a game changer. I can even bike to ikea and bring back flat packed furniture with it. Or do the weekly groceries. The trailer can carry up to 100lbs iirc (I have an e-bike)

Short errands are much nicer with a bike: less effort than walking, much faster than walking, no parking headache at destination, cool breeze in your hair, and free (no gas, insurance, parking, tickets…)

bongodongobob · 2025-06-21T20:23:52 1750537432

Those people could be driving from 20 minutes away or on their way home from work, or running other errands or picking kids up from school or any number of things. Good for you though.

Zambyte · 2025-06-22T06:12:09 1750572729

Fortunately trains and buses exist.

bongodongobob · 2025-06-23T01:26:21 1750641981

Not where I live. Not where a lot of people live.

Zambyte · 2025-06-23T21:42:09 1750714929

Okay, but for the 80+% of people living in cities[1] in one of the countries not particularly known for it's density nor public transportation, buses and trains exist.

[1] https://www.census.gov/newsroom/press-releases/2022/urban-ru...

eskibars · 2025-06-24T06:00:55 1750744855

This is so absurd.

1 year ago, I lived in San Rafael (Marin county, Bay Area). I occasionally needed to go to Palo Alto for work meetings. The fastest public transit option was to take a 40 minute bus to Larkspur Landing, then a 30 minute ferry to the SF Ferry building, walk for 20 minutes, and then take Caltrain for 45 minutes or more and then walk from there. With transfers, at minimum it was a 2.5h journey, but typically 3+h

All to cover a 60 mile / 100 km distance

Zambyte · 2025-06-24T12:14:44 1750767284

Fortunately bikes (and even e-bikes!) exist.

Edit: Also Google Maps says San Rafael to Palo Alto will take 2 hours give or take a few minutes on public transit, with 3 buses, but the middle one you could easily cut out with a bike or a 4 block walk. That doesn't really seem absurd at all for an occasional trip. People do 2 hour drives for an occasional trip and no one bats an eye.

eskibars · 2025-07-02T04:15:37 1751429737

This is just wrong. FWIW, I owned a bike, and this is wrong under both "bike" and "non-bike" conditions.

If you live directly next to the San Rafael central station, that'd be easiest/fastest. But San Rafael is much bigger than that. I'll get into that in a second. There are 2 basic options to do this trip:

Fastest option 1 was to go to San Rafael Station (I'll call it SR here on out) then bus to SF, then bike/walk to the Caltrain station, which was about a 25 minute walk. The buses from SR to SF ran often as rarely as once per hour, and occasionally they just don't show up at all. The ride took 30-60 minutes depending on traffic. There weren't always bike spaces on the bus, so sometimes you needed to lock your bike up in SR and you were going to be walking in SF to Caltrain. But because of the variability on traffic times, you have to leave incredibly early if you want to catch the fastest train to Palo Alto. And if you're going to California Avenue (which was where I was going to), the express option basically doesn't exist.

Here's how that plays out: 10 minute bike to SR station (or 30-40 minute walk, depending on your walk speed), you have hopefully timed things right to get on a bus leaving once every 30-60 minutes and that the bus is actually showing up: otherwise, you're waiting 30-60 minutes for the next one. Then a 30-60 minute ride into SF. Then a ~5 minute bike ride or 15-20 minute walk to Caltrain. Then a 45-60 minute ride to Palo Alto, but again the transfers aren't timed (they couldn't be, given the difference of where the bus dropoff is)

The second real alternative is replacing the first bus leg with a ferry leg by going to Larkspur Landing. There is the SMART train that goes there, but for some wild reason drops people off a 15 minute walk from the ferry and then has no timed transfer.

I did the journey dozens of times and never completed it in less than 2h 30m but more commonly was 4h and had more than 1 occasion where it took much longer than that.

aquafox · 2025-06-14T18:18:56 1749925136

Could someone maybe give a high-level explanation into why commercial ILP solvers (e.g. Gurobi) are that much better than free/open-source ones? Is it because ILP is inherently that difficult to solve (I know it's NP-hard), that the best solvers are just a large ensemble of heuristics for very specific sub-problems and thus no general "good" strategy has made it's way into the public domain?

christina97 · 2025-06-14T19:22:31 1749928951

It’s mostly that they work closely with clients in a very hands on way to implement problem-specific speedups. And they’ve been doing this for 10-20 years. In the MILP world this means good heuristics (to find good starting points for branch & bound, and to effectively prune the B&B tree), as well as custom cuts (to cut off fractional solutions in a way that effectively improves the objective and solution integrality).

It’s common that when researchers in Operations Research pick a problem, they can often beat Gurobi and other solvers pretty easily by writing their own cuts & heuristics. The solver companies just do this consistently (by hiring teams of PhDs and researchers) and have a battery of client problems to track improvements and watch for regressions.

cpgxiii · 2025-06-14T19:11:35 1749928295

> the best solvers are just a large ensemble of heuristics for very specific sub-problems

The big commercial solvers have the resources (and the clients interested in helping) to have invested a lot of time in tuning everything in their solves to real-world problems. Heuristics are part of that; recognizing simpler sub-problems or approximations that can be fed back into the full problem is also part.

I think a big part is that the OSS solvers are somewhat hamstrung by the combination of several issues: (1) the barrier to entry in SoTA optimizer development is very high, meaning that there are very few researchers/developers capable of usefully contributing both the mathematical and programming needed in the first place, (2) if you are capable of (1), the career paths that make lots money lead you away from OSS contribution, and (3) the nature of OSS projects means that "customers" are unlikely to contribute back to kind of examples, performance data, and/or profiling that is really needed to improve the solvers.

There are some exceptions to (2), although being outside of traditional commercial solver development doesn't guarantee being OSS (e.g. SNOPT, developed at Stanford, is still commercially licensed). A lot of academic solver work happens in the context of particular applications (e.g. Clarabel) and so tends to be more narrowly focused on particular problem classes. A lot of other fields have gotten past this bottleneck by having a large tech company acquire an existing commercial project (e.g. Mujoco) or fund an OSS project as a means of undercutting competitors. There are narrow examples of this for solvers (e.g. Ceres) but I suspect the investment to develop an entire general-purpose solver stack from scratch has been considered prohibitive.

whatever1 · 2025-06-15T05:31:41 1749965501

Commercial solvers have a huge bag of tricks & good pattern detection mechanisms to detect which tricks will likely help the problem at hand.

If you know your problem structure then you can exploit it and it is possible to surpass commercial solver performance. But for a random problem, we stand 0 chance.

zozbot234 · 2025-06-14T18:23:25 1749925405

> solvers are just a large ensemble of heuristics for very specific sub-problems

Isn't that statement trivially applicable to anything NP-Hard (which ILP is, since it's equivalent to SAT)?

graycat · 2025-06-14T23:52:35 1749945155

NP-hard is really hard, but it is hard for (a) polynomial running time, (b) for exact solutions, (c) on worst case problems.

One might suspect that fast enough on specific problems for approximate solutions that still make/save a lot of money might also be welcome. Ah, perhaps not!

E.g., in NYC, two guys had a marketing resource allocation problem, tried simulated annealing, and ran for days before giving up.

They sent me the problem statement via email, and in one week I had the software written and in the next week used the IBM OSL (Optimization Subroutine Library) and some Lagrangian relaxation. In 500 primal-dual iterations with

600,000 variables

40,000 constraints

found a feasible solution within 0.025% of optimality.

So, I'd solved their problem (for practical purposes, the 0.025% has to count as a solving) for free.

They were so embarrassed they wanted nothing to do with me. We never got to where I set a price for my work.

The problem those two guys had was likely that, if they worked with me, then I would understand their customers and, then, beat the guys and take their customers. There in NYC, that happened a second time.

If a guy is in, say, the auto business, and needs a lawyer, the guy might want the best lawyer but will not fear that the lawyer will enter the auto business as a powerful competitor. Similarly for a good medical doctor.

For an optimization guy saving, say, 5% of the operating costs of a big business, say, $billion in revenue a year, all the management suite will be afraid of the guy getting too much power and work to get him out -- Goal Subordination 101 or just fighting to retain position in the tribe.

After having some grand successes in applied math where other people had the problem but then being afraid that I would be too powerful, I formulated:

If some technical, computing, math, etc. idea you have is so valuable, then start your own business exploiting that idea -- of course, need a suitable business for the idea to be powerful.

aleph_minus_one · 2025-06-15T12:50:49 1749991849

> If a guy is in, say, the auto business, and needs a lawyer, the guy might want the best lawyer but will not fear that the lawyer will enter the auto business as a powerful competitor. Similarly for a good medical doctor.

> For an optimization guy saving, say, 5% of the operating costs of a big business, say, $billion in revenue a year, all the management suite will be afraid of the guy getting too much power and work to get him out -- *Goal Subordination 101 or just fighting to retain position in the tribe.

The optimization guy will also not have the infrastructure to compete with the big business. Additionally, the optimization guy will likely not fight for the management position (not every great applied mathematician is a great manager (in my opinion in particular because leadership of employees and office politics are very different skills)).

So, there is no competition: simply pay the optimization guy a great salary and somewhat isolate him from the gory office politics - problem solved, everybody will live in peace.

But this is not what happens in your example; so the only reason that I can imagine is the usual, irrational bullying of nerds that many nerds know from the schoolyard.

loehnsberg · 2025-06-15T08:39:35 1749976775

I would argue that not spending money on it and showing to upper mgmt that the folks they hired can actually get the job done often contributes to an external contractor not getting hired.

You might want to question, why didn‘t you ask the guys for money before starting to work for them. True, but I guess they were of the kind, show me results and then maybe we move on. On the other side, a 30-40k pilot project in this area is not difficult to negotiate if you‘re patient.

It takes so much more to running a business than lower cost with clever math that this step often comes at a later stage when larger companies look for ways to stay competitive, which is when they start to take a look at their accounts and figure that certain cost really stack up. Then you come in. The only real power you would have gotten over that company would have been for those guys getting fired and replaced by a vendor - ideally that‘s you!

fooker · 2025-06-14T20:06:05 1749931565

No, good algorithms for NP hard problems can be more than just heuristics.

Modern SAT solvers are a good example of this. CDCL is elegant.

sirwhinesalot · 2025-06-15T08:31:21 1749976281

A SAT solver without any preprocessing won't be competitive with SoTA SAT solver.

CDCL is core to the problem, but it is not sufficient. You even have SAT solvers like CryptoMiniSAT that try to detect clauses that encode xor gates so they can use Gaussian Elimination.

This is also true of ILP solvers. Simplex + Branch & Cut is elegant. But that's not how you get to the top.

lukebuehler · 2025-06-14T19:02:42 1749927762

scale and speed. for example, most quant trading firms run huge optimizations as often as possible. open-source solver often cannot even solve the problems (OOM exceptions, etc)

FilosofumRex · 2025-06-14T21:30:24 1749936624

In most MILP domains, the underlying engineering know-how is more critical than mathematical formulations or CS coding: (that's why most OR groups operate independently of math or CS departments).

OSS never took off among professional engineers because they've have "skin in the game", unlike math and CS folks who just reboot, and pretend nothing is wrong.