> where he argued against the other search leads that Google should use less machine-learning
This better echoes my personal experience with the decline of Google search than TFA: it seems to be connected to the increasing use of ML in that the more of it Google put in, the worse the results I got were.
It's also a good lesson for the new AI cycle we're in now. Often inserting ML subsystems into your broader system just makes it go from "deterministically but fixably bad" to "mysteriously and unfixably bad".
I think that’ll define the industry for the coming decades. I used to work in machine translation and it was the same. The older rules-based engines that were carefully crafted by humans worked well on the test suite and if a new case was found, a human could fix it. When machine learning came on the scene, more “impressive” models that were built quicker came out - but when a translation was bad no one knew how to fix it other than retraining and crossing one’s fingers.
As someone who worked in rules-based ML before the recent transformers (and unsupervised learning in general) hype, rules-based approaches were laughably bad. Only now are nondeterministic approaches to ML surpassing human level tasks, something which would not have been feasible, perhaps not even possible in a finite amount of human development time, via human-created rules.
The thing is that AI is completely unpredictable without human curated results. Stable diffusion made me relent and admit that AI is here now for real, but I no longer think so. It's more like artificial schizophrenia. It does have some results, often plausible seeming results, but it's not real.
Yes, but I think the other lesson might be that those black box machine translations have ended up being more valuable? It sucks when things don't always work, but that is also kind of life and if the AI version worked more often that is usually ok (as long as the occasional failures aren't so catastrophic as to ruin everything)
> Yes, but I think the other lesson might be that those black box machine translations have ended up being more valuable?
The key difference is how tolerant the specific use case is of a probably-correct answer.
The things recent-AI excels at now (generative, translation, etc.) are very tolerant of "usually correct." If a model can do more, and is right most of the time, then it's more valuable.
A case in point is the ubiquity of Pleco in the Chinese/English space. It’s a dictionary, not a translator, and pretty much every non-native speaker who learns or needs to speak Chinese uses it. It has no ML features and hasn’t changed much in the past decade (or even two). People love it because it does one specific task extremely well.
On the other hand ML has absolutely revolutionised translation (of longer text), where having a model containing prior knowledge about the world is essential.
Can’t help but read that and think of Tesla’s Autopilot and “Full Self Driving”. For some comparisons they claim to be safer per mile than human drivers … just don’t think too much about the error modes where the occasional stationary object isn’t detected and you plow into it at highway speed.
relevant to the grandparent’s point: I am demoing FSD in my Tesla and what I find really annoying is that the old Autopilot allowed you to select a maximum speed that the car will drive. Well, on “FSD” apparently you have no choice but to hand full longitudinal control over to the model.
I am probably the 0.01% of Tesla drivers who have the computer chime when I exceed the speed limit by some offset. Very regularly, even when FSD is in “chill” mode, the model will speed by +7-9 mph on most roads. (I gotta think that the young 20 somethings who make up Tesla's audience also contributed their poor driving habits to Tesla's training data set) This results in constant beeps, even as the FSD software violates my own criteria for speed warning.
So somehow the FSD feature becomes "more capable" while becoming much less legible to the human controller. I think this is a bad thing generally but it seems to be the fad today.
I have no experience with Tesla and their self-driving features. When you wrote "chill" mode, I assume it means the lowest level of aggressiveness. Did you contact Tesla to complain the car is still too aggressive? There should be a mode that tries to drive exactly the speed limit, where reasonable -- not over or under.
Yes there is a “chill” mode that refers to maximum allowed acceleration and “chill mode” that refers to the level if aggressiveness with autopilot. With both turned on the car still exceeds the speed limit by quite a bit. I am sure Tesla is aware.
> For some comparisons they claim to be safer per mile than human drivers
They are lying with statistics, for the more challenging locations and conditions the AI will give up and let the human take over or the human notices something bad and takes over. So Tesla miles are miles are cherry picked and their data is not open so a third party can make real statistics and compare apples to apples.
Tesla's driver assist since the very beginning to now seems to not posses object/decision permanence.
Here you can see it detected an obstacle (as evidenced by info on screen), made a decision to stop, however it failed to detect existence of the object right in front of the car, promptly forgot about the object and decision to stop and happily accelerated over the obstacle. When tackling a more complex intersection it can happily change its mind with regards to exit lane multiple times, e.g. it will plan to exit on one side of a divider, replan to exit onto upcoming traffic, replan again.
Well Tesla might be the single worst actor in the entire AI space, but I do somewhat understand your point. The lake of predictable failures is a huge problem with AI, I'm not sure that understandability is by itself. I will never understand the brain of an Uber driver for example
My guess: They are hoping user feedback will help them to fix the bugs later -- iterate to 99%. Plus, they are probably under unrealistic deadlines to delivery _something_.
But rule-based machine translation, from what I've seen, is just so bad. ChatGPT (and other LLM) is miles ahead. After seeing what ChatGPT does, I can't even call rule-based machine translation "tranlation".
*Disclaimer: as someone who's not an AI researcher but did quite some human translation works before.
Rules could never work for translation unless the incoming text was formatted in a specific way. Eg, you just couldn't translate a conversation transcript in a pro-drop language like Japanese into English sentence-by-sentence, because the original text just wouldn't have sentences in it. So you need some "intelligence" to know who is saying what.
I think - I hope, rather - that technically minded people who are advocating for the use of ML understand the short comings and hallucinations... but we need to be frank about the fact that the business layer above us (with a few rare exceptions) absolutely does not understand the limitations of AI and views it as a magic box where they type in "Write me a story about a bunny" and get twelve paragraphs of text out. As someone working in a healthcare adjacent field I've seen the glint in executive's eyes when talking about AI and it can provide real benefits in data summarization and annotation assistance... but there are limits to what you should trust it with and if it's something big-i Important then you'll always want to have a human vetting step.
> I hope, rather - that technically minded people who are advocating for the use of ML understand the short comings and hallucinations.
The people I see who are most excited about ML are business types who just see it as a black boxes that makes stock valuation go vroom.
The people that deeply love building things, really enjoy the process of making itself, are profoundly sceptical.
I look at generative AI as sort of like an army of free interns. If your idea of a fun way to make a thing is to dictate orders to a horde of well-meaning but untrained highly-caffienated interns, then using generative AI to make your thing is probably thrilling. You get to feel like an executive producer who can make a lot of stuff happen by simply prompting someone/something to do your bidding.
But if you actually care about the grit and texture of actual creation, then that workflow isn't exactly appealing.
We get it, you're skeptical of the current hype bubble. But that's one helluva no true Scotsman you've got going on there. Because a true builder, one that deeply loves building things wouldn't want to use text to create an image. Anyone who does is a business type or an executive producer. A true builder wouldn't think about what they want to do in such nasty thing as words. Creation comes from the soul, which we all know machines, and business people, don't have.
Using English, instead of C, to get a computer to do something doesn't turn you into a beaurocrat any more than using Python or Javascript instead does.
Only a person that truly loves building things, far deeper than you'll ever know, someone that's never programmed in a compiled language, would get that.
> Using English, instead of C, to get a computer to do something doesn't turn you into a beaurocrat any more than using Python or Javascript instead does.
If one uses English in as precise a way as one crafts code, sure.
Most people do not (cannot?) use English that precisely.
There's little technical difference between using English and using code to create...
... but there is a huge difference on the other side of the keyboard, as lots of people know English, including people who aren't used to fully thinking through a problem and tackling all the corner cases.
> Most people do not (cannot?) use English that precisely.
No one can, which is why any place human interaction needs anything anywhere close to the determinancy of code, normal natural langauge is abandoned for domain-specific constructed languages built from pieces of natural language with meanings crafted especially for the particular domain as the interface language between the people (and often formalized domain-specific human-to-human communication protocols with specs as detailed as you’d see from the IETF.)
Yeah, I was also reading their response and was confused. "Creation comes from the soul, which we all know machines, and business people, don't have" ... "far deeper than you'll ever know", I mean, come on.
I’m not optimistic on that point: the executive class is very openly salivating at the prospect of mass layoffs, and that means a lot of technical staff aren’t quick to inject some reality – if Gartner is saying it’s rainbows and unicorns, saying they’re exaggerating can be taken as volunteering to be laid off first even if you’re right.
Yeah but what comes after the mass layoffs? Getting hired to clean up the mess that AI eventually creates? Depending on the business it could end up becoming more expensive than if they had never adopted GenAI at all. Think about how many companies hopped on the Big Data Bandwagon when they had nothing even coming close to what "Big Data" actually meant. That wasn't as catastrophic as what AI would do but it still was throwing money in the wrong direction.
I’m sure we’re going to see plenty of that but from the perspective of a person who isn’t rich enough to laugh off unemployment, how does that help? If speaking up got you fired, you won’t get your old job back or compensation for the stress of looking in a bad market. If you stick around, you’re under more pressure to bail out the business from the added stress of those bad calls and you’re far more likely to see retribution than thanks for having disagreed with your CEO: it takes a very rare person to appreciate criticism and the people who don’t aren’t going to get in the situation of making such a huge bet on a fad to begin with – they’d have been more careful to find something it’s actually good for.
> technically minded people who are advocating for the use of ML understand the short comings and hallucinations
really, my impression is the opposite. They are driven by doing cool tech things and building fresh product, while getting rid of "antiquated, old" product. Very little thought given to the long term impact of their work. Criticism of the use cases are often hand waved away because you are messing with their bread and butter.
> but we need to be frank about the fact that the business layer above us (with a few rare exceptions) absolutely does not understand the limitations of AI and views it as a magic box where they type in
I think we also need to be aware that this business layer above us that often sees __computers__ as a magic box where they type in. There's definitely a large spectrum of how magical this seems to that layer, but the issue remains that there are subtleties that are often important but difficult to explain without detailed technical knowledge. I think there's a lot of good ML can do (being a ML researcher myself), but I often find it ham-fisted into projects simply to say that the project has ML. I think the clearest flag to any engineer that this layer above them has limited domain knowledge is by looking at how much importance they place on KPIs/metrics. Are they targets or are they guides? Because I can assure you, all metrics are flawed -- but some metrics are less flawed than others (and benchmark hacking is unfortunately the norm in ML research[0]).
[0] There's just too much happening so fast and too many papers to reasonably review in a timely manner. It's a competitive environment, where gatekeepers are competitors, and where everyone is absolutely crunched for time and pressured to feel like they need to move even faster. You bet reviews get lazy. The problems aren't "posting preprints on twitter" or "LLMs giving summaries", it's that the traditional peer review system (especially in conference settings) poorly scales and is significantly affected by hype. Unfortunately I think this ends up railroading us in research directions and makes it significantly challenging for graduate students to publish without being connected to big labs (aka, requiring big compute) (tuning is another common way to escape compute constraints, but that falls under "railroading"). There's still some pretty big and fundamental questions that need to be chipped away at but are difficult to publish given the environment. /rant
So... obviously SOAP was dumb[1], and lots of people saw that at the time. But SOAP was dumb in obvious ways, and it failed for obvious reasons, and really no one was surprised at all.
ML isn't like that. It's new. It's different. It may not succeed in the ways we expect; it may even look dumb in hindsight. But it absolutely represents a genuinely new paradigm for computing and is worth studying and understanding on that basis. We look back to SOAP and see something that might as well be forgotten. We'll never look back to the dawn of AI and forget what it was about.
[1] For anyone who missed that particular long-sunken boat, SOAP was a RPC protocol like any other. Yes, that's really all it was. It did nothing special, or well, or that you couldn't do via trivially accessible alternative means. All it had was the right adjective ("XML" in this case) for the moment. It's otherwise forgettable, and forgotten.
ML has already succeeded to the point that it is ubiquitous and taken for granted. OCR, voice recognition, spam filters, and many other now boring technologies are all based on ML.
Anyone claiming it’s some sort of snake oil shouldn’t be taken seriously. Certainly the current hype around it has given rise to many inappropriate applications, but it’s a wildly successful and ubiquitous technology class that has no replacement.
Yeah, I'm staring at my use of chatgpt to write a 50 line python program that connected to a local sqlite db and ran a query; for each element returned, made an api call or ran a query against a remote postgres db; depending on the results of that api call, made another api call; saved the results to a file; and presented results in a table.
Chatgpt generated the entirety of the above w/ me tweaking one line of code and putting creds in. I could have written all of the above, but it probably would have taken 20-30 minutes. With chatgpt I banged it out in under a minute, helped a colleague out, and went on my way.
Chatgpt absolutely is a real advancement. Before they released gpt4, there was no tech in the world that could do what it did.
For what it's worth, I do not remember a time when YouTube's suggestions or search results were good. Absurdities like that happened 10 and 15 years ago as well.
These days my biggest gripe is that they put unrelated ragebait or clickbait videos in search results that I very clearly did not search for - often about American politics.
15 years ago, I used to keep many tabs of youtube videos open just because the "related" section was full of interesting videos. Then each of those videos had interesting relations. There was so much to explore before hitting a dead-end and starting somewhere else.
Now the "related" section is gone in favor of "recommended" samey clickbait garbage. The relations between human interests are too esoteric for current ML classifiers to understand. The old Markov-chain style works with the human, and lets them recognize what kind of space they've gotten themselves into, and make intelligent decisions, which ultimately benefit the system.
If you judge the system by the presence of negative outliers, rather than positive, then I can understand seeing no difference.
>The relations between human interests are too esoteric for current ML classifiers to understand.
I would go further and say that it is impossible. Human interests are contextual and change over time, sometimes in the span of minutes.
Imagine that all the videos on the internet would be on one big video website. You would watch car videos, movie trailers, listen to music, and watch porn in one place. Could the algorithm correctly predict when you're in the mood for porn and when you aren't? No, it couldn't.
The website might know what kind of cars, what kind of music, and what kind of porn you like, but it wouldn't be able to tell which of these categories you would currently be interested in.
I think current YouTube (and other recommendation-heavy services) have this problem. Sometimes I want to watch videos about programming, but sometimes I don't. But the algorithm doesn't know that. It can't know that without being able to track me outside of the website.
>I would go further and say that it is impossible. Human interests are contextual and change over time, sometimes in the span of minutes.
Theres a general problem in the tech world where people seem to inexplicably disregard the issue of non-reducibility. The point about the algorithm lacking access to necessary external information is good.
A dictionary app obviously can't predict what word I want to look up without simulating my mind-state. A set of probabilistic state transitions is at least a tangible shadow of typical human mind-states who make those transitions.
I think there are things they could do and that ML could maybe help?
* They could let me directly enter my interests instead of guessing
* They could classify videos by expertise (tags or ML) and stop recommending beginner videos to someone who expresses an interest in expert videos.
* They could let me opt out of recommending videos I've already watched
* They could separate sites into larger categories and stop recommending things not in that category. For me personally, when I got to youtube.com I don't want music but 30-70% of the recommendations are for music. If the split into 2 categories (videos.youtube.com - no music) and (music.youtube.com - only music) they'd end up recommending far more to me that I'm actually interested in at the time. They could add other broad categories like (gaming.youtube.com, documentaries.youtube.com, science.youtube.com, cooking.youtube.com, ...., as deep as they want). Classifying a video could be ML or creator decided. If you're only allowed one category they would be incentive to not mis-classify. If they need more incentive they could dis-recommend your videos if you mis-classify too many/too often).
* They could let me mark videos as watched and actually track that the same as read/unread email. As it is, if you click "not interested -> already watched" they don't mark the video as visibly watched (the red bar under the video). Further, if you start watching again you lose the red-bar (it gets reset to your current position). I get that tracking where you are in a video is something that's different for email vs video but at the same time (1) if I made it to 90% of the way through then for me at least, that's "watched" - same as "read" for email and I'd like it "archived" (don't recommend this to me again) even if I start watching it again (same as reading an email marked as "read)
They probably optimize your engagement NOW - with clickbaity videos. So their KPIs show big increases. But in long term you realize that what you watch is garbage and stop watching alltogether.
Someone probably changed the engine that shows videos for you - exactly as with search.
> Or when they didn't show 3 unskippable ads in a 5 minute video.
On desktop Chrome, a modern ad-blocking browser extension will block 100% of YouTube adverts. I haven't watched one, literally, in years. I don't watch YouTube from a mobile phone, but I think the situation is different. (Can anyone else comment about the mobile experience?)
> I do remember when Youtube would show more than 2 search results per page on my 23" display.
Wait what?! You "Consume Content" on a COMPUTER? What are you some kinda grandpa? Why aren't you consuming content from your phone like everyone else? Or casting it from your phone to your SMART TV! Great way to CONSUME CONTENT!
Lol, Youtube on Apple TV is great. Mostly because I either need to find something fast or I switch it off because the remote is not conducive to skipping. But the only time I watch Youtube on my computer is for a specific video. The waste of space is horrendous. Same with Twitter (rarely visited), just a 3/4 inches wide column of posts on my 24 inch screen.
I'm not consuming the content on my phone, because the user experience of using these services on my phone sucks. Just the app vs website difference with urls is a difference in behavior I hate let alone all the UI differences that make the mobile experience awkward.
YouTube seems to treat popular videos as their own interest category and it’s very aggressive about recommending them if you show any interest at all. If you watch even one or two popular videos (like in the millions of views), suddenly the quality of the recommendations drops off a cliff, since it is suggesting things that aren’t relevant to your interest categories, it’s just suggesting popular things.
If I entirely avoid watching any popular videos, the recommendations are quite good and don’t seem to include anything like what you are seeing. If I don’t entirely avoid them, then I do get what you are seeing (among other nonsense).
Long long time ago; youtube "staff" would manually put certain videos on the top of the front page when they started. Im sure there we're biases and prioritization of marketing dollars but at least there was human recommending it compared to poorly recorded early family guy clips. I dont know when they stopped manually adding "editors/staff" choice videos but I recall some of my favorite early youtubers like CGPGgrey claim that recommendation built the career.
See this >15-year-old video "How to get featured on YouTube" - https://www.youtube.com/watch?v=-uzXeP4g_qA, which I remember as being originally uploaded to the official Youtube channel but looks like it's been removed now, this reupload is from October 2008.
It all depends on your use case but a lot of people seem to be in agreement it fell off in the mid to late 10s and the suggestions became noticeably worse.
YT Shorts recommendations are a joke.
I'm an atheist and very rarely watch anything related to religion, and even so Shorts put me in 3 or 4 live prayers/scams (not sure) the last few months.
Similarly, Google News. The "For You" section shows me articles about astrology because I'm interested in astronomy. I get suggestions for articles about I-80 because I search for I-80 traffic cams to get traffic cam info for Tahoe, but it shows me I-80 news all the way across the country, suggestions about MOuntain View because I worked there (for google!) over 3 years ago, commanders being fired from the Navy (because I read a couple articles once), it goes on and on. From what I can tell, there are no News Quality people actually paying attention to their recommendations (and "Show Fewer" doesn't actually work. I filed a bug and was told that while the desktop version of the site shows Show Fewer for Google News, it doesn't actually have an effect).
Part of the reason I switched from google to duckduckgo for searching was I didn't WANT "personalization" I want my search results to be deterministic. If I am in Seattle and search for "ducks" I want the exact fucking same search results as if I travel to Rio de Janeiro and search for "ducks".
Honestly, I'd prefer my voice assistant (siri mostly) to be like that as well. It was at first, and I think everyone hated that lol.
YT Shorts itself is kind of a mystery to me. It's an objective degradation of the interface; why on earth would I want to use it? It doesn't even allow adjustment of the playback speed or scrubbing!
So, there's a few ways to explain it. From a business strategy level, TikTok exists, and is a threat to YouTube, so we need to compete with it.
From a user perspective, Shorts highlights a specific format of YouTube that happened to have been around for a lot longer than people realize. TikTok isn't anything new, Vine was doing exactly the same thing TikTok was a decade prior. It was shut down for what I can only assume was really dumb reasons. A lot of Viners moved to YouTube, but they had to change their creative process to fit what the YouTube algorithm valued at the time: longer videos.
Pre-Shorts, there really wasn't a good place on YouTube for short videos. Animators were getting screwed by the algorithm because you really can't do daily uploads of animation[0] and whatever you upload is going to be a few minutes max. A video essayist can rack up hundreds of thousands of hours of watch time while you get maybe a thousand.
(Fun fact: YouTube Shorts status was applied retroactively to old short videos, so there's actually Shorts that are decades old. AFAIK, some of the Petscop creator's old videos are Shorts now.)
But that's why users or creators would want to use Shorts. A lot of the UX problems with Shorts boils down to YouTube building TikTok inside of YouTube out of sheer corporate envy. To be clear, they could have used the existing player and added short-video features on top (e.g. swipe-to-skip). In fact, any Short can be opened in the standard player by just changing the URL! There's literally no difference other than a worse UI because SOMEONE wanted "launched a new YouTube vertical" on their promo packet!
FWIW the Shorts player is gradually getting its missing features back but it's still got several pain points for me. One in particular that I think exemplifies Shorts: if I watch Shorts on a portrait 1080p monitor - i.e. the perfect thing to watch vertical video on - you can't see comments. When you open the comments drawer it doesn't move over enough and the comments get cut off. The desktop experience is also really bad; occasionally scrolling just stops working, or it skips two videos per mousewheel event, or one video will just never play no matter how much I scroll back and forth.
If you’re watching a single subject of interest video on your phone (TikTok type of content), it’s great. But landscape videos is more pleasant and there’s a reason we move from 4:3 for media. But that actually means watching the videos, but what I see is a lot of skipping.
I only get those when it's new content with <20 likes and they are testing it out. Doesn't bother me, I like to receive some untested content - even though 99% of it is pure crap (like some random non-sense film with a trendy music on top).
Just because you're an atheist doesn't mean you won't engage with religious content though. YT rewards all kinds of engagement not just positive ones. I.e. if you leave a snide remark or just a dislike on a religious short that still counts as engagement.
Yes I know, not the case, and before you ask, I also don't engage with atheist videos. But that's only one example: the recommendations are really bad in a lot of ways for me.
But I associate YouTube promotions with garbage any how. The few things I might buy like Tide laundry detergent are entirely despite occasional YouTube promotion.
Lmao. I'm very positive that the conversion rate for placing an atheist in a live mass out of the blue is very very very low. Because I never stayed for more than 3 seconds, I'm not sure if it's real religious content or a scam, though - and they don't even let me report live shorts :(
I think it's probably pushing pattern it sees in other users.
There's videos I'll watch multiple times, music videos are the obvious kind, but for some others I'm just not watching/understanding it the first time and will go back and rewatch later.
But I guess youtube has no way to understand which one I'll rewatch and which other I don't want to see ever again, and if my behavior is used as training data for the other users like you, they're probably screwed.
A simple "rewatch?" line along the top would make this problem not so brain dead bad, imho. Without it you just think the algorithm is bad (although maybe it is? I don't know).
This is happening to me to, but from the kind of videos it's suggested for I suspect that people actually do tend to rewatch those particular videos, hence the recommendation.
This better echoes my personal experience with the decline of Google search than TFA: it seems to be connected to the increasing use of ML in that the more of it Google put in, the worse the results I got were.