Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: YouTube Summaries Using GPT (eightify.app)
147 points by ax8080 on Jan 27, 2023 | hide | past | favorite | 120 comments
Hi, I'm Alex. I created Eightify to take my mind off things during a weekend, but I was surprised that my friends were genuinely interested in it. I kept going, and now it's been nine weeks since I started.

I got the idea to summarize videos when my friend sent me a lengthy video again. This happens to me often; the video title is so enticing, and then it turns out to be nothing. I had been working with GPT for 6 months by the time, so everything looked like a nail to me.

It's a Chrome extension, and I'm offering 5 free tries for videos under an hour. After that, you have to buy a package. I'm not making money yet, but it pays for GPT, which can be pricey for long texts. And some of Lex Fridman's podcasts are incredibly long.

I'm one of those overly optimistic people when it comes to GPT. So many people tell me, "Oh, it doesn't solve this problem yet; let's wait for GPT-4". The real issue is that their prompts are usually inadequate, and it takes you anywhere from two days to two weeks to make it work. Testing and debugging, preferably with automated tests. I believe you can solve many problems with GPT-3 already.

I would love to answer any questions you have about the product and GPT in general. I've invested at least 500 hours into prompt engineering. And I enjoy watching other people's prompts too!



I think is funny that in the near future people are going to use ChatGPT to pad their content and then consumers are going to use it reversely to get a summary.


>I think is funny that in the near future people are going to use ChatGPT to pad their content and then consumers are going to use it reversely to get a summary.

Brilliant observation! It's sort of like if you took the most extreme lossy data expansion algorithm -- and then fed the output of that through the most extreme lossy data compression algorithm...

User: "Computer, turn binary 1 -- into everything in the Universe..."

Computer: "OK, here it is..." (spits out a result which is [exa|zetta|yotta|ronna|quetta|???]bytes long...)

User: "Computer, now turn everything in the Universe back into 1..."

Computer: "Processing... this may take some time... please wait..." (puts up progress bar that increments so slowly that it appears not to move...)

<g>

Related quote by Douglas Adams:

"There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened."

- Douglas Adams, The Restaurant at the End of the Universe

Anyway, I could definitely see content creators padding with AI -- and consumers summarizing the content with it...


At some point people won't be needed in this crazy cycle.


How much energy will the chain of AIs and counter-AIs between content production and content consumption/discovery burn? More or less than Bitcoin does? Because we complain about that, but this crap would be even less beneficial than Bitcoin is. But how can one opt out of this game? If you do it unilaterally, it just means you lose. Government regulation, the usual solution to this kind of coordination problem? What would that even look like?


>But how can one opt out of this game?

Move to alternative webs such as Gopher or Gemini, which make it impractical / impossible to monetize based on clicks / engagement / amount of content. The issue isn't that you can generate spam. The issue is that you can make money by doing so.


This saves you from nothing.

While making money is a big game, it's not the only game. Your thoughts and actions leads to behaviors off your computer. This is why we see all kinds of political spam and propaganda to affect your choice at the ballot box.


As long as we assume that the html web doesn't completely die and remains significantly larger, it will always remain the biggest target for those kind of activities. Better to have a potential target of millions, than a target of hundreds or thousands.


Depends what you're targeting. For example targeting a much smaller self select group of 'computer experts' for a watering hole attack of one type or another is a great way to get in to some juicy systems.

If a system is machine readable and writable (which AI is fastly increasing this definition) you must assume that not only can you be attacked, you must assume that you are being attacked.


Gemini is an anti-intellectual artistic project that forbids interactivity and rich media like inline images, which massively increase learning and comprehension. If you're looking to learn, you'd want to avoid Gemini (and Gopher, which has similar restrictions).


Gemini has interactivity, it just needs to be server side. Of course it is limited but that is the point. I'll take "anti-intellectual artistic project" over "cesspool of lowest common denominator" any day.


Server-side interactivity is clearly not what anyone means by "interactivity" - you're intentionally twisting the usage of words.

> I'll take "anti-intellectual artistic project" over "cesspool of lowest common denominator" any day.

If everyone else thought like you, humanity would be set back millennia.

Not to mention that it's just flat-out insane to discard the massive amounts of educational content and knowledge present on the internet just because it happens to also contain some undesirable content.

This mindset is not one that we want to spread throughout our society, full-stop. Gemini is a threat to human progress.


> Government regulation, the usual solution to this kind of coordination problem? What would that even look like?

Which government is the question you should be asking.

For example, lets say I'm a authoritarian government that is hostile to other nations. In general there is a strong imperative to block GPT use in my own country except by approved users to make propaganda. At the same time it is a very useful weapon for me to use against my enemies. I can use the 'bullshit asymmetry theory' against them and drown their population in conflicting propaganda.

But you see unlike nuclear weapons, we really don't need multibillion dollar facilities to make these things. Midsized companies make these things easy enough. There is no opting out, unless of course you want a world wide police state to ensure that you're not making anything naughty with your computer. And of course you should realize your government, along with everyone else's government is making their own naughty versions of this kind of application anyway.

So, yea, some smaller EU countries won't make one because of the law, but the US, China, and likely Russia would do it anyway because screw you, we have nukes behavior that drives their other actions.


> How much energy will the chain of AIs and counter-AIs between content production and content consumption/discovery burn?

While I have plenty of cynicism about this and also expect it to at least partially play out like this, let me offer a more optimistic perspective on the same thing.

People come into media with different amounts of background knowledge and context. Currently, this is basically solved by a tiered system of 'knowledge' distribution. As an example (though I think something similar exists outside of the sciences too), scientists write papers that are read by science communicators who put out press releases which are read by journalists who write articles about stuff all of which is read by various content creators who remix all of this into their own content tailored to their specific following. Part of this is tailoring is knowing what context/knowledge your audience already has and giving them enough new information for the new content to be digestible without the consumer needing to seek out other sources. So when ChatGPT-N is reencoding the content for you, it can personalize it your level of knowledge, without wasting your time by either rehashing stuff you're already aware of or by including context that you wouldn't necessarily have known that you're missing.

This of course means that ChatGPT will need to know what you do/don't know...


> Because we complain about that, but this crap would be even less beneficial than Bitcoin is.

This is an issue of capital allocation. Enormous amounts of private money have been wasted chasing dreams of monopolizing currency (crypto), monopolizing taxis (Uber/Lyft), and of course monopolizing the food delivery industry. There is little difference between government picking winners and private industry doing it. How many cumulative billions has Uber lost so far? Likely far more than what the Chilean government lost in a futile attempt to develop a domestic model of automobile in the early '70s.

Now even more of that monopoly money is going to be shoveled into consumer AI plays that will continue to waste energy, not to mention pollute the internet further with half-baked 'content' with the stench of Wikipedia and ArtStation all over it.


>Government regulation, the usual solution to this kind of coordination problem? What would that even look like?

Maybe we should get an AI to write it.


Someone has to pay to run the AI though. With Bitcoin it's a bit different, you mine and you get money directly. Even if you have a AI burning GPU cycles all day generating content, you still gotta turn it into cash.


The motivation to pay is to generate better spam on the one side, and to spot that spam on the other. Plausibly, a lot of content will go through multiple iterations of both of those before reaching any actual human. Same game that's going on now, just cheaper (even more automated) and at higher volume.


In the US presidential election billions of dollars are spent each cycle. If you think advertisements are the only use for AI you're underestimating the scope of the problem.


this is exactly what philosopher Nick Land has said.


We just need "Keywords" to help ChatGPT decide what to include in summaries. This will lead to keyword stuffing. And eventually it comes full-circle with SEO-like word soup so that ChatGPT is forced to output recipe blog spam. It's perfect.


> consumers are going to use it reversely to get a summary.

Isn't that pretty much why short videos is getting popular. Who has time for 15minute recipe videos scripted for youtube ads when you can find 1m cliff note version on tiktok or youtube shorts. Back to the days of ehow but with more personality and filter.


if you think about it humans can take years to write a book, and then someone just makes a cliffnotes from it and it gets boiled down from that into a main summary in a sentence or two from there. We're constantly baking and then boiling down everything.


This app shows how dangerous it is to trust ChatGPT for anything even somewhat important.

I just threw in a simple example news video and while it did summarize the video somewhat accurately(it got senator Joe Manchin's name wrong), it missed complete segments! (the "Goodbye Toyota e-TNGA" segment)

[1]:https://www.youtube.com/watch?v=UM4n1Isfr6E

[2]: https://i.imgur.com/RyFStVm.png


Yes, it's still far from perfect, sorry about that. And it's not about the GPT. I'm sure I can greatly increase the quality on my side.


If you have access to chapter markers, that might be a solid way to segment


Your app is slick and works great. How can you increase the quality? I thought you just pass the subtitles to chatgpt and return the results?


It's not that straightforward. I would say that the most important part of pre-processing is how to break the transcript into parts.

But there are many more things to improve. It's a pipeline: you can add more models, you can train them, you can change prompts, you can post-process the results. But it takes time. More time each step.


Ah ok then maybe I was too harsh. Your app is pretty cool. Great job!


Oh, it's okay! In the end, what matters is how much value it gives the user, not how complicated it is inside! thank you!


This is the death of all tutorial video that strech 1 minute of information to a 20 minute video


You know, I'm generally a fan of being brief. So many people use GPT to create more content. I am on the opposite side and I will reduce content.


Reducing content is more valuable for humans. Unfortunately all platforms incentivize the opposite.


If the AI can turn a video back into the image and text article it should've been in the first place, I will count it as a win.


Which is why sponsorblock is a thing


It would be nice if you didn't have to log in with your google account to use it, or at least that it didn't request excessive PII, such as name and profile picture.

There is no reason to request more than email.


I hastily connected google auth and seemed to choose the minimum of data. I'll see if I can use just email (I definitely don't need anything else). I need auth to limit the use, otherwise I could wake up bankrupt one day :)


No payment limits?


Well, I set the limit in OpenAI, yes. But if the service stops working I will also be very upset.


But then you can't monetize it as easily.


This is a neat project, but it also makes me wonder how much useful information is actually provided by an average YouTube video.

Taking the headphone review on the landing page as an example, the generated summary is "The Sony XM5s offer improved audio and call quality, but may not be worth the extra cost compared to the XM4s."

Like, duh? You probably could deduce that even without watching the video, 14 mins of your life saved.


Most YouTube influencer guides recommend at least 10 minute videos, and by god you will watch 10 minutes of someone describing information that could be read on a short table while they add sound effects, animations, jokes, and pogfaces.

When people tell me they don't watch TV, I ask them if the watch YouTube or Twitch, because these kind of services are what TV was before: something with low density of information, better used as distraction while you do something else.

Nevertheless, I'm uncomfortable with playing a video in the background, because that might give the platforms the impression that wasting my time is fine if it improves metrics.

These days, our attention is constantly being pulled in several directions at once, so I praise projects like this one, who try to wrestle control back.


I believe this maybe might be due to YouTube being able to interject 2 advertisements on videos surpassing ~7 minutes in length and “engagement” (not clicking out) being a potential metric related to monetization.

If you can get the headline click with 30 seconds worth of insight, then your payday is related to padding to get to the appropriate length. Imho


I think it's perfect for videos where you need an answer. Product reviews to choose from are the most extreme case - what headphones should I buy? Many of the videos, though, we watch for fun and, enjoying the picture and the voice. I use is myself daily and have realized that I don't watch all videos for information and answers.

For me it's 30% for information, 70% for fun.


I see that as a good feature. Some of these review videos are just a guy going over the product's own marketing spiel and you don't notice until halfway through the video.

If the summary spits out a bunch of useless info, you can find a better one.


On a similar angle, does anyone use this tech to automate "saved you a click" answers to clickbait headlines?

i.e. "You won't believe which beloved celebrity just died" and it just tells you the vital info.


The answer is very little and that's why the OP wanted something to summarize it.


TIL: was gonna ask how do you get the transcript but YouTube actually provides it, it's hidden in the "…" menu next to "Clip".


I use this this library to do that: https://pypi.org/project/youtube-transcript-api/


Hey Alex, this is really cool. I've used ChatGPT quite a bit and Stable Diffusion in the past but still feel like I've only scratched the surface of what's possible. Great to see lots of projects popping up using the tech in new innovative ways!

Please could you give an overview of how this actually works? Have some ideas of where the tech could be useful but not sure how I'd actually go about implementing it. Do you have a GPT model on a server and code to transcribe the video then summarise the transcription. Or do you use one of the APIs from OpenAI?

If you use their APIs:

* How costly has it been to run your service? (If you don't mind answering)

* Is it customisable? If you wanted to run a chat bot for example, would you be able to make it understand the request (I'd assume something similar to an 'intent' when developing Alexa skills) and give it data so it knows the answer?


Thank you!!

> Please could you give an overview of how this actually works?

1. I download Youtube subtitles (it doesn't work for videos without youtube subtitles yet. my analytics shows that 15% of videos don't have subtitles. I tried to use OpenAI Whisper, but it takes several minutes to transcribe a video, so I put that task off for now)

2. Then I break the transcript into parts.

3. Then I summarize each part with GPT → and then I summarize the summaries to get chapter names → and then I summarize again to get the title.

Yes, I use OpenAI GPT API. I pay them their standard pricing for davinci-003 and the cost for 1 video is between $0.1 and $0.9 depending on the video length (actually, the transcript length). I have a hard limit to prevent abuse.

Yep, it's fully customizable. Yes, you can provide data to it. It would take 1 hour of coding to make a prototype of a chat bot. And then 500 hours to make it work well.


I built something similar; I've got code links up if you want to cargo.

https://news.ycombinator.com/item?id=34549402


That's interesting. I really like watching other people's prompts. I'll look into it this weekend!


This new level of content understanding & summarizing is huge.

My young son uses YouTube for tutorials to learn programming and 3D apps. But I really struggle because he's come across objectionable content as well, and the tools YouTube provides for moderation or filtering are completely worthless. They don't care. I'm only left to think they want our kids to see controversial and even radicalizing content because it increases engagement metrics.

AI that can prescreen videos?! Regaining some feeling of control and confidence about the content that comes into my house?! I AM SO IN!

I have no interest in censoring anyone else or limiting access for others. I just want to have some agency over what my kids are exposed to without removing the actual knowledge share advantages that the internet can and does provide.


Interesting application! It hadn't occurred to me. To my shame, the best I've come up with is to cut ads from influencers videos (make a rewind button).


I'll chime in with hearty agreement. Being able to let my son watch what he finds without my having to watch it all first (as I do now)? I hope this product/service is available in about 4-5 years, when he's at an age where his interests will hopefully start diverging from ours.

There are probably loads of other parents who would love this.


Cool project and great presentation, but do you have any plans for a Safari extension?

Apple has supported Chrome extension porting for at least 1 year now, and a conversion tool is built into Xcode: https://developer.apple.com/documentation/safariservices/saf...

------------

For videos less than 1 hour in length, I prefer https://youtubetranscript.com , then scroll to about 1/2 - 3/4 way through the transcript where youtubers generally hide their nuggets of info.

Eightify does seem better suited for long Lex Fridman/etc. type content though.


Thank you! Yep. Safari and Edge are on my list. It should be easy enough to port there.


I hope you make one for firefox too



Yeah, I've seen it, sure. Thank you. When it went into public release, it made me nervous and in a hurry to finish mine.

The author uses ChatGPT and that limits it a bit. Because it took me 80% of my time and effort to code the preprocessing of the transcript before sending it to GPT. (I mean if you just put transcript to GPT and ask to summarize it sucks)

But his product is free, that's cool. Because he uses ChatGPT on the client. And I have to pay for GPT.


It would be really great if there was a URL generated with the results of the summary. Then I could save it to Pocket / Reader and it would pass through their integrations to Obsidian.


I will definitely do that!


Playing last weekend I extracted the audio using yt-dlp, ran that through Whisper (found the quality of that better than YT subs/transcript).

However, then I ran into the 2048 token limit for longer videos. Because it doesn’t hold the full context, it wasn’t good enough at summarizing or providing insights.

The solution is to do smaller summaries of 2048 chunks recursively until you have a single one.

This felt and worked… meh.

We’re you able to get around this in some other clever way?


Amazing product. Something I will gladly be paying for!

You mention that you have invested 500 hours into prompt engineering. Are there any specific resources you would suggest to get maximum value out of GPT? Any videos, websites, podcasts, ebooks, books, anything that really stands out?

I have been playing with it for a while now and am getting good at having it spit out what I'm looking for but usually it takes an extra 3-4 prompts to rearrange the responses that I want.

Thanks and again, great execution on a cool idea!



When I first got to know it, there wasn't much good resources. And now there are 100 times more articles, but I haven't watched them. So I can't recommend anything yet.

Oh yes, I remembered that I saw this - it's good advice: https://help.openai.com/en/articles/6654000-best-practices-f...


Target Video: https://www.youtube.com/watch?v=gdl1GKIbaWo

Video has less than 30K views Free plan is limited to videos less than 1 hour long and with more than 30k views

This is a very lame limitation in regards to views. Honestly trying to see if my videos were delivering enough value to justify the title.


I'm guessing OP is caching results so only has to process each video once (which I would do as well). The limitation makes more sense when you consider % of cache hits with the limitation will be much higher, which means lower cost for running the free tier.


It makes sense though, because you might be the only one interested in a summary of those long-tail videos, whereas the cost of the summary of more popular videos can be amortized across many users.


Yes, the other guys above have already responded that this way I want to increase the likelihood of repetition. If you do the summary for free, at least there's a chance someone else will need it.

It's a hypothesis and on my current traffic I can't say it's worked. Maybe I'll change the limits when I have more data.


This is amazing.

I thought of something much more primitive recently: a Reddit bot that transcribes videos behind YouTube links (since I hate watching videos but do like reading).

I won't use this right now (since I don't do Chrome) but will gladly pay for this service when I can throw a YouTube link at it and get back a wall of text (assuming costs are reasonable).


Thank you! It is very easy for me to make such a service, but then I would lose a lot of users who would install the extension otherwise.

By the way, what browser do you use?


Safari Tech Preview!


Have you compared GPT with prompting to other models like https://huggingface.co/sshleifer/distilbart-cnn-12-6? Do you know what the state of the art is in summarization technology?


I have tried many (all the popular) huggingface summarization models, they don't compare to GPT. Basically, they highlight the parts they think are important. But they can't "understand" and generalize.

Frankly, I haven't read the papers about summarization. But I will have to when I'll work on reducing costs.


I see, thank you for the insight. The OpenAI paper about summarization is an easy read and very similar to your approach, if you are looking for somewhere to start: https://openai.com/blog/summarizing-books/.


That’s commonly referred to as abstractive vs extractive summarisation


Can you state the price of the "packages" in the landing page?

Also, does it always give me 8 bullet points from each video? It would make sense for videos that have chapters to give summaries to each chapter instead.


Yeah, sorry. I'm changing prices every 2 days here, and I want to try subscriptions instead of packages as well. I'll put prices on the site in 1-2 weeks!

Currently the prices are: $8.6 for 20 summaries ($0.43 each) $19.8 for 60 summaries ($0.33 each) $48.6 for 180 summaries ($0.27 each)

> Also, does it always give me 8 bullet points from each video? It would make sense for videos that have chapters to give summaries to each chapter instead.

Yes, it's now a fixed number of 8 parts. I tested different numbers, it seemed to me a universally convenient simple solution for videos of any length. But yes, I will take chapters into account when splitting, right now I ignore that information.

At the same time, the product isn't as useful as if when there are no chapters at all.


Really cool Alex! I think it would also be interesting to pull and use the most rewatched data[1] from each video too as an additional weight. That is how I usually skip to the "important" parts without a summary.

[1] https://stackoverflow.com/questions/72610552/most-replayed-d...


Thank you! Yes, I'll definitely do that. That's on the list!


I get the point of charging since the compute resources cost you money, but how are the video rights holders not able to make a claim against any profits?


I'll contemplate this issue later. I deem this risk to be lower than not discovering a reliable acquisition channel or not being able to attain high margins.

Instinctively it appears as though I'm creating original content. However, all this legal stuff is often counterintuitive.


Now i need an AI to make a video of the summary


It's an interesting idea, but here it's a more pressing issue of copyright infringement.


If someone else has already summarized the video will I be able to see the summary for free?


No, sorry. If I do that, I have to make the plans even more expensive. And users will randomly get free summaries, which will confuse them. "Why should I pay if sometimes I get them for free for no reason?"


Perhaps you might let the transcripts of the example videos on your website be accessed for free without signing in to see what it works like.


I wonder why there's no app capitalize YT comments (don't know if there's api for that or not). Finalized decision often come from reading comments .. it's sort of review+debate on both sides and a little bit more authentic.


I'm really thinking about doing summaries for comments as well.

It's a more interesting task when there are two sides. Even now, my app doesn't work well for debates. It tries to bring the points of view together. I want to work on separation.


Awesome stuff! What were your biggest learnings from iterations on prompt engineering?


Someday, I will make a separate post about it. Although today there seem to be hundreds of people selling "how to work with GPT".

This is what comes to mind immediately: 1. Don't solve more than 1 problem with 1 Prompt. Decompose it into different tasks and make a separate prompt for each one. 2. Use instructions at the beginning. Very short and unambiguous. You have to understand exactly what you want and mean. "Answer me as a philosopher" is an example of an unspecific instruction. 3. If the instructions don't work, show the concept providing an example. Examples are more expensive than instructions because they take up more tokens. 4. The best way to debug propts is when you have a dataset and autotests. I used GPT to evaluate the results. 5. Temperature 0 is fine 99% of the time. (btw I was surprised to find that it does not guarantee a deterministic result; the OpenAI support confirmed that)


I understand language changes, but learnings as a noun just drives me up the wall. Management jargon making its way into everyday speech is too much for me.

Just use lessons, or “what did you learn?”

This isn’t directed at you personally.


Hey if you want a lesson, I'll learn you at least one more way you can use the word.


Firefox version planned?

Also, please, less Elon on the front page. It's a little nauseating.


Nope, Firefox is not planned :( Maybe after Safari, Edge and mobile. Sorry about that.

Yes, I'll probably remove Elon :) Fun fact: twitter banned my ads account, when I tried to create ads with this video.


This is great! Can you use the same tech to do for podcasts/audio?


Sure I can. Some podcasts are already posted on youtube. If I see a demand for it, I'll do it, of course.


Hello Alex, I've built a similar app for my own usage, I don't plan to ship it online. How do you handle token limitation for summarizing lengthy videos?


I do many requests. Then I summarize the summaries. This approach is well described in this OpenAI article: https://openai.com/blog/summarizing-books/


Useful link thanks. This approach is not cost effective though, I guess you're working on solution to optimize your cost. Fine tuning? Embedding?


Honestly, I haven't worked on cost reduction yet. I have a backlog, but I haven't done anything from there.

One idea is to use other models to shorten text by throwing out meaningless words. I estimate this will reduce the length of the text (and thus the GPT cost) by 30%.


> One idea is to use other models to shorten text by throwing out meaningless words.

Does this mean GPT does not need coherent sentences to understand?


My feeling is that she understands much better than people do. If you understand a sentence from which some of the conjunction words have been taken out, she will understand 100%. And I think this will all be smoothed out on a transcript of 150,000 characters (that's the average size of a podcast).


Hey jcq3

Anything in particular thing stopping you from shipping it online? Or just that it's not necessary for your use?

(Context: I've been working on `Heroku for LLM apps` and trying to understand where the value/frictions are)


I want to search youtube transcripts. Is that possible? Like google search but instead of articles it searching in the transcripts of all youtube videos


That would require parsing all of YouTube. I thought I heard that YouTube already did that, but I can't seem to find it now.


The way it works is you get the transcript that Google provides and you ship that to GPT-3 and ask it to summarize? Cool tool.


Yep. That's how it works in general. Thank you!


Awesome idea! Timesavers should be moneymakers in this time.

Also, did you get 5000 downloads in a single day?

Or when did you launch it?


I launched it on Product Hunt last week. I've been tweaking and fixing bugs since then and have now published it here.

Thank you!


I'm ready to sign up for a subscription. Hurry up with that! :)


You can buy the "package" for now :) Subscriptions will be ready by Monday or Tuesday!


How do I sign in with google to use it?


After installing the extension, go to youtube. There is a "Sign in with Google" button where "related videos" are usually located. If something went wrong, please send a screenshot to me at support@eightify.app


I see it now, had to turn off adblock


Have you built any GPTs yourself?


Oh no. I think it takes a lot of resources to do that. Definitely can't be done in a weekend :)


this is great, keep on hacking!


thanks, that's inspiring




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: