Hacker Newsnew | past | comments | ask | show | jobs | submit | mrcslws's commentslogin

Ha, after writing this post I found a Wikipedia article on this effect: https://en.wikipedia.org/wiki/Display_motion_blur


From the blog post: "more than 99% of them had no activity in the last month" https://developers.googleblog.com/en/google-url-shortener-li...

This is a classic product data decision-making fallacy. The right question is "how much total value do all of the links provide", not "what percent are used".


> The right question is "how much total value do all of the links provide", not "what percent are used".

Yes, but it doesn't bring in the sweet promotion home, unfortunately. Ironically, if 99% of them doesn't see any traffic, you can scale back the infra, run it in 2 VMs, and make sure a single person can keep it up as a side quest, just for fun (but, of course, pay them for their work).

This beancounting really makes me sad.


Configuring a static set of redirects would take a couple hours to set up, and literally zero maintenance forever.

Amazon should volunteer a free-tier EC2 instance to help Google in their time of economic struggles.


This is what I mean, actually.

If they’re so inclined, Oracle has an always free tier with ample resources. They can use that one, too.


If they wanted the sweat promotion they could add an interstitial. Yes, people would complain, but at least the old links would not stop working.


> just for fun (but, of course, pay them for their work).

Doing things for fun isn't in Google's remit


Then they shouldn't have offered it as a free service in the first place. It's like that discussion about how Google in all its 2-ton ADHD gorilla glory will enter an industry, offer a (near) free service or product, decimate all competition, then decide its not worth it and shutdown. Leaving a desolate crater behind of ruined businesses, angry and abandoned users.


I’m still sore about reader. Gap has never been filled for me.


Alas, it was, once upon a time.


It used to be. AdSense came from 20% time!


Indeed. I've probably looked at less than 1% of my family photos this month but I still want to keep them.


I bet 99% of URLs that exist on the public web had no activity last month. Might as well delete the entire WWW because it's obviously worthless.


Where'd all my porn go!?


From Google's perspective, the question is "How many ads are we selling on these links" and if it's near zero, that's the value to them.


Don't be confused! That's not how they made the decision; it's how they're selling it.


So how did they decide?


new person got hired after old person left. new person says "we can save x% by shutting down these links. 99% arent used" and the new boss that's only been there for 6 months says "yeah sure".

Why does google kill any project? the people who made it moved on, the new people dont care because it doesn't make their resume look any better.

basically nobody wants to own this service and it requires upkeep to maintain it alongside other google services.

google's history shows a clear choice to reward new projects, not old ones.

https://killedbygoogle.com/


I expect cost on a budget sheet, then an analysis was done about the impact of shutting it down


You can't get promoted at Google for not changing anything.


They launched Firebase Dynamic Links and someone didn't like the overlap.


> "more than 99% of them had no activity in the last month"

Better to have a short URL and not need it, than need a short URL and not have it IMO.


What fraction of indexed Google sites, Youtube videos, or Google Photos were retrieved in the last month? Think of the cost savings!


Youtube already does this, to some extent, by slowly reduce the quality of your videos, if they're not accessed frequently enough.

Many videos I uploaded in 4k are now only available in 480p, after about a decade.


I don’t think they’re actually that dumb. I think the dirty secret behind “data driven decision making” is managers don’t want data to tell them what to do, they want “data” to make even the idea of disagreeing with them look objectively wrong and stupid.


It's a bit like the the difference between "rule of law" and "rule by law" (aka legalism).

It's less "data-driven decisions", more "how to lie with statistics".


"Data-driven decision making"


There may be tricks that I don't know about. One quick experimental answer I can give: if I change to looping over the sums and rerun Benchmark 3, my time in the aten::sum CUDA kernel increases from 0.779s (before) to 0.840ms (after). So CUDA doesn't seem to automagically handle this.

I will note that these grouped operations occasionally cause a net loss in performance compared to "naive" looping, since it involves calling PyTorch's "x.view(...)" which is usually ~instant but sometimes adds some extra CUDA operations on the backward pass. It always reduces the time spent in aten::add, but adds these extra ops. A really smart vectorizer would use heuristics to decide how/whether to group operations according to the target hardware; my current vectorizer just does the grouping every time.


Yeah, one unspoken theme of this blog post is "look how nice torch.compile" is :)

Fun fact, I had to put in extra work to get torch.compile working with my code, for understandable reasons. My library, Vexpr, literally runs an interpreter inside of Python, reading a big tree-like namedtuple-of-namedtuples "expression" data structure and evaluating it recursively. That data structure was way too fancy for torch.compile's guards, so I actually wrote code [1] that converts a Vexpr expression into a big Python code string and evals it, factoring the interpreter out of the code, then I pass that eval'd string into torch.compile.

One torch.compile capability I would be excited to see is compatibility with torch.vmap. One selling point of Vexpr is that you can use vmap with it, so I was sad when I found I couldn't use vmap and still support torch.compile. This made me convert a bunch of my GP kernels [2] to be batch-aware. (This missing capability is also understandable -- both vmap and compile are new.)

Anyway, I'm a fan of what y'all are doing!

[1] https://github.com/outergroup/vexpr/blob/e732e034768443386f9... [2] https://github.com/outergroup/outer-loop-cookbook/blob/5d94c...


I spend a lot of sweat in the guards - I am very interested in how it failed! Can you say more? Did guard creation fail? or did guard check_fn perf overhead destroy it?

> One torch.compile capability I would be excited to see is compatibility with torch.vmap

We added support for torch.func.vmap, iirc - check out test_higher_order_ops.py, grep for vmap.


Glad to hear :)

Yes, I'm off doing my own thing now. Deep Learning went so much further than I ever expected, and now I'm drawn to all the things that can be built today. Who knows, maybe I'll swing back into neuroscience in a few years. (Still friends with my old coworkers / bosses.)


I wondered about this same thing. Your logic about cache/registers is certainly true on CPUs, but what about GPUs? Hence this blurb:

> I studied the CUDA traces closely and found that vectorization does indeed reduce many aspects of the GPU workload, greatly reducing the number of operations and decreasing the total amount of time spent on the fundamental computations of the algorithm. However it also introduces overhead (mentioned above) by interspersing operations that permute and reorder the tensors, or splitting them into groups then concatenating results. Sometimes the reduced “fundamental” time outweighs the additional overhead, while other times the overhead outweighs the reduction in fundamental time.

Here are some examples not included in the blog post:

- Total time spent in aten::cdist kernel

  - Baseline: 2.834s (4900 calls)
  - Vectorized: 2.686s (500 calls)
- Total time spent in aten::mul kernel

  - Baseline: 5.745s (80700 calls)
  - Vectorized: 5.555s (8100 calls)
This nice little win applies to tons of other kernels, almost across the board. As you point out, CPU intuition suggests this should have been slower, so this was an interesting outcome.

On the other hand, some specific increases occur:

- Total time spent in aten::cat kernel

  - Baseline: 0.680s
  - Vectorized: 1.849s
So working in fewer, larger batches doesn't only enable outrunning the GPU. It decreases the total GPU workload... then adds some overhead. But some of this overhead could be removed with custom CUDA kernels, so I think this is an interesting direction even if you solve the CPU problem some other way.

(The pow(x, 2) is only there in the toy code, not my actual kernel, so I didn't performance-tune it.)


Aha, I was hoping to learn about something like this, thanks for sharing. I'll try this some time. PyTorch does use different threads for the forward and backward pass, so as you suggest, setting that flag might only improve the forward pass.


The CUDA Runtime and Driver APIs have per-thread state, so using threads would unfortunately bypass our trick here to set the flag. Assuming you're on Linux, I might suggest creating a shared library to intercept calls to the Driver API, as all Runtime functions are implemented as wrappers around Driver functions. You'd have to intercept all calls to context creation and flag setting:

  * `cuCtxCreate`

  * `cuCtxCreate_v3`

  * `cuCtxSetFlags`

  * `cuDevicePrimaryCtxRetain`

  * `cuDevicePrimaryCtxSetFlags`
... and make sure that the three least significant bits of any `flags` variable are set to `CU_CTX_SCHED_BLOCKING_SYNC`.

cuDevicePrimaryCtxSetFlags: https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PR...

dlsym(3): https://man.archlinux.org/man/dlsym.3.en

ld.so(8): https://man.archlinux.org/man/ld.so.8.en


One of my favorite photos from space is a photo of Alan Bean: https://upload.wikimedia.org/wikipedia/commons/9/97/Apollo12...


this is my iPhone case. it’s an amazing photo by pete conrad.

https://society6.com/product/apollo-12-face-of-an-astronaut_...


One nitpick:

> That some of the most heavily used and reliable software in the world is built on C is proof that the flaws are overblown, and easy to detect and fix.

Easy to detect? Multiple times per year we find out about a security bug in Linux or Windows that has existed since the 90s.


The full index: http://history.nasa.gov/computers/contents.html

(it's just the "Index Page" button from the bottom)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: