This is super impressive and something that I didn't think would be possible without someone very skilled in photoshop going over the images.
As a photo enthusiast, I am very excited about this, but also a little worried that soon very simple apps are capable of doing the craziest of edits through the power of neural nets. Imagine the next 'deep beauty transfer', able to copy perfect skin from a model onto everyone, making everything a little more fake and less genuine.
The engineer in me now wants to understand how to build something like this from scratch but I think I'm probably lacking the math skills necessary.
I think you need a little more information on the Gram matrix (maybe even ditch some of the elementary "this is gradient descent" stuff and just assume the reader knows a bit more about convnets so that you can dive deeper into the style transfer specifics -- there are plenty of other sources that cover the former already).
> I didn't think would be possible without someone very skilled in photoshop going over the images
And this is similar how deep learning will likely erode the need for programming (IMHO).
Deep learning won't necessarily write programs (any more than this AI manipulates images via photoshop). Folks that say, don't worry, we can't write programs easily with AI are missing the vector. Writing programs isn't necessary for there to be widespread disruption.
Most programming is essentially hooking up I/O (of which UIs are a subset) to APIs, data stores and data manipulation. The "goal" of programming is not the code, but the functionality it provides.
AI's don't need to learn to code any more than they need to learn to use photoshop. They need to learn to provide functionality (or in this case manipulate image data).
> "AI's don't need to learn to code any more than they need to learn to use photoshop. They need to learn to provide functionality (or in this case manipulate image data)."
This is interesting. My counterpoint would be that if you rely on AI over programs you lose human-editability and determinism. So fixing a bug or adding a new feature might mean diving into some opaque model rather than adding a few lines of code. You couldn't do anything where consistency is important, like security, manipulating a database with important information, or GUI design. I think that at least protects large swaths of software development.
Even this example seems less like a replacement for Photoshop and more like a cool new feature Photoshop could add
In the real world we rely on humans for lots of stuff, and humans aren't actually deterministic either. Sure, if you train someone to perform a task they'll probably do a good job, or they might suddenly come into work distracted and cause a problem. Diagnosing problems with people is often similarly hard, and we've had all of civilization to work on it.
This hasn't caused the sky to fall, yet. So, perhaps we'll just learn to make AI behave properly under most circumstances, and deal with failures and glitches as we always have with people.
_Coding_ is more about understanding what your boss/clients want and turning it into something more concrete, so it's merely a NLP problem. This will, I think, see adoption in "app/website builder" tools like Squarespace.
Then there is real programming, which IMHO will get automated in the far future.
Its a NLP problem if your boss/client wants to talk with you. At the end of the day, they don't really "want" to TALK WITH YOU, they want the functionality they get as a result of talking with you.
If there are other ways for them to efficiently get the functionality, they are good with that (as much as they might like you).
Similar to you wanting a pizza. You could call and talk with someone (which you don't really want, it was a necessary step), or you fill out the right form/app.
Either way, you want the result, not the process.
Your boss/client wants the result of your work, not the work process required to get it necessarily.
It seems likely to me that modern deep learning enabled tools will make it easier for your boss/client to get the result they want directly.
Deep learning + more graphically oriented data flow UIs seems like it will heavily erode the need for traditional programming as users will be able to more directly achieve the functionality they are looking for.
The planning an AI could take over from existing elevator controllers already uses constrained access to the motors. Nothing about AI demands stupid system design.
No, but AIs will override all elevator scheduling code. It just need to keep tuning all the knobs until everyone gets to their floor as fast as possible.
Not everyone would agree with you :) Although I think even most of us who enjoy programming would be happy to have some form of automation as an option.
Feed it a million or so porn images of people with all kinds of different body types. Then have it guess the closest match. Finally run this. Presto change-o! It's those x-rays glasses kids everywhere wanted for years.
I could easily imagine it being done live-motion and 3-d. Run the whole thing on a set of AR glasses.
Or morph a face onto existing footage. I'm surprised this doesn't more obviously exist already, although I guess I haven't been actively looking for it.
The tech is there to do a rough approximation of a dozen combinations of this. I could imagine an intermediate step where a wire-frame mesh is constructed around the image. As I understand (And I know nothing of this stuff), there was already an app to take a picture of breasts and jiggle them.
I think it's just that nobody has put all of the pieces together yet. (Or if they have, the mainstream media hasn't heard about it)
I can't help but think that DiscoGAN (https://arxiv.org/abs/1703.05192 "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks", Kim et al 2017) would be perfect for this. Simply feed it a ton of photos of regular clothed people and other naked people (don't have to be the same people), and it'll learn a mapping on its own. Scale that up, refine it for a few years...
Looks like I can't edit this to avoid gaining more down votes. I'm not sure if I'm losing points for asking to be informed of technological advancements or that I'm okay with being creepy. Any insight is appreciated.
Yes but autotune is a style choice for artists. It's a not something every musician uses. It hasn't taken over music, and it's not like anyone can become a musician because autotune exists.
Disclaimer: I'm not a music producer, and my ear isn't that great, but I did write software for one of the major pitch-correction vendors for many years, and I think I have a better-than-average ear for it after listening to it for many years.
Pitch correction is something which is used on many, many professionally-produced tracks, and often without the knowledge or consent of the performers. Whether you can hear it or not is a stylistic choice (provided adequate skill from the production team: see [1]). But just because the pitch correction isn't in your face, T-Pain- or Cher-style, doesn't mean it isn't there. The software is better than that, and in the right hands, it just makes people sound more skilled than they are, and you can't hear it.
Producers generally are pretty quiet about where they use it to mask blemishes in the performance, probably because they don't want to embarrass anyone. But the producers we sold to would certainly say how much they used it, without naming artists or tracks.
I was involved in the recording and production for a top 40 producer, and can confirm that there was autotune on every single vocal track that left the studio.
Here are a few that I was in the room when the artist was recording, and can confirm pitch correction:
The first one has that metallic sound that is a dead give away. First falsetto is quieter than the second one, and you can really hear as he increases the loudness of his voice, the metallic kicks in https://youtu.be/450p7goxZqg?t=1m27s
Second one has a "Cher moment" almost straight away, just after "wandering the desert a thousand days" the following "mmmm" has a glissando between two notes where we clearly hear the hard edge on what I assume is an auto tune lookahead. I don't actually know how they work, I just assume there's a lookahead for the next note approximation which makes glissandos sound funny. https://youtu.be/M8uPvX2te0I?t=31s
The last one I can't really fault for too much autotune, more a lack of it. The bridge is especially intense https://youtu.be/E0oyglKjbFQ?t=1m51s
Say the singer loses the pitch slightly for half a second on a held note. If that fluctuation is corrected, what auditory information could be left for you to detect the modification?
I believe I hear pitch correction when it's obviously used, and it's a lot. Pretty much most of the "top forty" pop pablum from the last 20 years. I believe there is pitch correction that I don't spot: the "dark matter" of pitch correction that is done less cheesily.
The worst of it sounds almost like packet loss concealment in a G.722 voice stream: the sustained part of a vocal note basically sounding synthesized.
>Yes but autotune is a style choice for artists. It's a not something every musician uses. It hasn't taken over music, and it's not like anyone can become a musician because autotune exists.
You'd be surprised on both fronts :-)
On the first because autotune is prevalent regardless of genre and style choices (even in rock, country, etc). It's just the Cher/T-pain effect that has been toned done, but autotune is very much in use in the industry for vocal correction.
On the second, because almost any crap singer pop idol with nice looks can pretend to be in tune and pass out bearable results because of autotune.
We disagree on that, and that's OK. To me there is something special about a live performance. Even more so when it's challenging for the performer. When a signer demonstrates range, or a musician displays technical excellence or provides emotional depth through expression it adds a LOT in my opinion. Knowing this is all faked in recorded music takes something away from it.
Ditto for photography. To take an image and retouch it, or to artificially saturate colors can make a great picture. But with a raw photo it's even more interesting to think that scene actually existed and someone captured it for us to look at.
In either case, I can enjoy the work but will only be impressed if I know that it's authentic. This is more true than ever today.
But then again, where do you draw this line for what is authentic and isn't?
In music are you allowed to use amplifiers and speakers? They can add a lot of color and distortion. How about reverb? Rooms that aren't there. EQ to remove unwanted frequencies? Synthesizers? Digital effects? At what point is it not authentic anymore?
Same for photography. Are you allowed to touch the aperture? ISO? Shutter speed? Flash? Digital camera? At what point is it not authentic?
The thing is, the subject of a photo, the scene and its subjects, are usually not the artists. The photographer is the artist. Photography is processing from the get-go: how the film or CCD responds to light and so on. The grain from low-light on sensitive film can be part of the art and so on.
If you mess with the colors of a scene, you're not taking away artistic control from that scene.
You also don't put limitations on the post processing art; you're not doing it to fool anyone.
There is post processing in music that is obvious art in an analogous way, like taking sampled sounds and re-mixing them to create new stuff. There are effects that are obviously effects. I'm not going to scoff at a great studio reverb, or some echo applied to a vocal or whatever. Nobody is saying that this was recorded in some fjord in Sweden with real echo bouncing off a distant ice wall; there is no lie.
In that case, we are just transferring artistic control from one human into another. In the past recording audio had fairly little artistic control and the subjects of the recording most of the control. Now with better audio manipulation software the person doing the recording has artistic tools at their disposal. They are the 'photographers' of the scene, while the singer is the subject.
That's right, and the subject might as well be a dog, or any other audio signal source, just like anything that reflects light can be a suitable one for the photographer's creative process. Cute puppies; water lilies; sunsets ...
The thing is, I somehow don't hear the studio's creative input either when I hear the latest auto-tuned Fido or Bowser. They're just applying some automatic something that's supposed to make the dog sound like a more able dog.
This is like when people just batch apply the same color enhancement and sharpening of their Florida vacation pictures. I've seen one instance, I've seen them all.
This is where auto-tune actually can fall. Singing is not exactly all about hitting the "in equal temperament tune" note all the time.
Take the fantastic singer with great technical skill. Most pitch correction algorithms, as far as I know, are strictly based on equal temperament / 12TET. Fantastic singers are capable of hitting the right harmonics, some of which are not 12TET. Fantastic singers slide into notes, they use vibrato, they add "blue notes" (https://en.wikipedia.org/wiki/Blue_note). If you over-apply pitch correction, in other words, you could easily make a fantastic singer sound worse.
Let's also take the singer who is technically a bit pitch deficient, but has "character" that makes up for it. You don't want to make this type of singer too in-tune, either. Too much tuning might remove the "character".
I understand in the industry there are some engineers that are good enough to selectively apply auto-tune, to only fix obvious issues, and avoid the pitfalls. There are also some productions that just apply auto-tune to everything with no consideration of the content. The later will probably work for glossy pop productions, but if I was a really good singer (or a singer with "character") I probably wouldn't like the results.
Thanks for the information, that's one product I'm not too familiar with. I've demoed the Antares product and a couple of freebies. (It seems like there is a couple of newer plugins out there as well, eg Synchro Arts Revoice Pro).
The problem is, I'm not sure though that even a pure alternate tuning can work though for all examples. EG: for blue notes, what is "correct" varies with performers and style.
Now, I'm more talking about the "automatic modes"; I understand Melodyne offers a pretty impressive level of editing control (Antares did too from what I remember). So it would certainly be possible to get a really great take, and then hand-correct any truly off notes to whatever frequency you wanted.
As in many things (see: Photoshop and model photos), a lot of the reaction to the tool is less on how it could be used, and more on how it is being used in glossy "crap singer pop idol" productions.
As a photo enthusiast, I am very excited about this, but also a little worried that soon very simple apps are capable of doing the craziest of edits through the power of neural nets. Imagine the next 'deep beauty transfer', able to copy perfect skin from a model onto everyone, making everything a little more fake and less genuine.
The engineer in me now wants to understand how to build something like this from scratch but I think I'm probably lacking the math skills necessary.