Yes, I do. For the past three years, I have done nothing but work on this project nonstop. I've been working on massive improvements (that some have pointed out in this thread) that I've been stuck on for the past several months, but I'm getting close to finishing that up.
I don't feel comfortable publishing or releasing anything until I know for a fact that I can make no further improvements. It's not out of corporate greed or anything like that - I'm just really paranoid about getting out the best work possible.
Respectfully, the perfect is the enemy of the good, and it’s entirely reasonable to publish what you have now. If later you make further improvements, you can simply publish again.
You're completely correct, but I'm afraid this is more of a personal problem. I know I'll never be able to forgive myself if I figured out a solution to one of the more obvious problems with the model after I've already published it. I'd just be far more comfortable being happy with my own work before I release it to the wild. I know that this is selfish, and I apologize.
This entire thread is honestly so disturbing, this comment especially. Not only is it rife with misinformation (using copyrighted material for training is totally legal and the whole project is paid out of pocket), but is it really that big of a deal to want credit for the work they’ve done? The developer has had their work stolen by companies, influencers, and grifters, and people here are getting pissy that they can’t wait 10 seconds to wait for a popup.
I don’t know why, but I honestly expected more from HN.
You're right about the compute part being wrong. I never said it wasn't legal, just that they took someone else's work to train it. I would hope that voice synthesis is illegal without permission from the voice's owner, but I imagine it is untested so far.
But it's not just about the popup - it 's more that when your work is fundamentally about using reusing someone else's character, it feels pretty hypocritical to be so focused on making sure you get credit.
If they are used in a tool that lets you generate someone's likeness as part of user-specified new content, yes. But unlike 15.ai that isn't their core purpose and no such tool exists.
The problem is that after having to wait for 10 seconds to reject their terms of service (which you should be able to reject right away) before even being able to see what the site is about, they are rickrolling you, effectively giving you the finger for not wanting to agree to their terms without context. That‘s quite unprofessional, counterproductive and antagonistic.
I share this sentiment entirely. There seems to be a growing trend on HN that negativity is popular. A project like this, to me at least, would seem to be right up HN's street.
Shame to see the toxicity over a passion project, whos creator generously went out of his way to answer the questions and ridiculous comments.
Making things up out of thin air like “the creator used someone else’s compute” goes beyond negativity because someone thinks the project is in the grey. That is just straight up disinformation.
MIT doesn’t own the model, where did you get that idea from? If you read through the website, it says that the developer alone owns everything related to the project, and the only funding he received from MIT was a small amount from the beginning.
It’s really strange reading these ignorant comments from HN…
“Make money”? The creator loses several thousands of dollars a month hosting the site, and it’s done for free. The Patreon donations are all voluntary and only offer a pittance to the developer.
I highly suggest reading into the project first. The Wiki article I linked before (https://en.wikipedia.org/wiki/15.ai) answers all of your questions about copyright infringement.
Feel free to replace "make money" with "collect revenue". This is currently a research project (with funding). However it's long-term goal is to achieve commercial quality voice acting and dubbing. It could be given away for free, sold directly, sold downstream, sold indirectly, or otherwise generate commercial value.
In terms of copyright infringement, your wiki link answers nothing. A court ruled that Google could use copyrighted book text to train an algorithm to improve search results because the copying was highly transformative and did not serve as a market substitute to the original work.
Meanwhile 15.ai is using copyrighted voice recordings to train an algorithm to synthesize new voice recordings that sound like they came from the original speaker. This is radically different from the Google case. Just because one instance of using copyrighted material to train an algorithm qualifies as fair use does not mean that all use of copyright material to train any algorithm also qualifies as fair use.
There is absolutely nothing about this that is settled law. In the next 20 years there are going to be lots of lawsuits, lots of settlements, possibly a few rulings, and maybe even a few new laws. I find the whole topic very interesting. YMMV.
Like you say, the law is not settled on this, but I assume if the author got a takedown request they would probably comply.
In many instances a policy of "ask for forgiveness rather than permission" can get you further, faster. While Nickelodeon are unlikely to grant you a license to the Spongebob voice because that has broader licensing and IP repercussions, they are likely to tolerate a research project using their characters (e.g. just as they have to-date tolerated The SpongeBob SquarePants Movie Rehydrated, which was a fan re-creation of one of their actual movies).
It is indeed several thousands of dollars a month. I can show you AWS invoices, if you're skeptical. Just send me an email and I'd be happy to show proof.
The whole history behind the project is fascinating: 4chan had a huge role in its development, and the project's work was stolen by an NFT company that a famous voice actor endorsed not too long ago.
Ah, I was wondering why they were so concerned about attribution.
The truth is that, today, if I was going to use a tool to generate voices (say for YouTube), I wouldn't necessarily pick a small SaaS tool. I'd use Amazon Polly or some other GCP-style platform voice creation tool. There are already a few products in the space, and their costs are so low as to be almost negligible (example: Polly, 5 million characters free). For a commercial project, I could probably stay on a free tier for a whole year.
With Dall*E, it seems like the only option, and it's such a superior option that a website could abuse it for commercial profits. But for voice synthesis, it's already dirt cheap and commercially available without limitations.