Do you think it's possible 90% of this is due to the studio quality lighting, large high quality screen, good mic/speakers, and low latency network? It seems like those factors alone would get most of the way there and the 3D aspect is just a bonus. Obviously I haven't used it in person but this was just a thought since most people are used to video calls on their small phone/laptop with poor lighting, mics, etc
What I think you're trying to describe already exists as a product from Cisco called "telepresence". It is/was insanely expensive, was a permanent installation that only Cisco contracted techs could install, and did what you describe: It is a series of large, curved HD displays with desks at an appropriate distance from the screens/cameras, and copious amounts of indirect lighting from behind the setup to make each party look good.
It seems like the imaging/rendering technology that Google is using is much more advanced.
I’ve used such a Cisco system. Compared to regular video calls the latency and quality was light years ahead, much more natural conversations were possible. By which I mean it was possible to laugh, interject, and generally have a realistic conversation with a colleague in another country without having to compensate for video lag in that very careful way I find necessary on Meet and Zoom.
That said, there was no “emotional connection” like the Google one is described as offering. It was still a video call. There was no forgetting that. I suspect the 3D and the apparent physical closeness to the display add a lot.
Wow I forgot about Telepresence. I used it a decade ago at a Fortune 500 company. With all of the cameras and displays perfectly positions, everyone was life-sized on video, felt like you were sitting around a roundtable. Now I'm imagining that with higher resolution and 3D light field display, wow.
Low latency is more important than all the other bits. I worked at Bell Communication Research in the nineties and they had an experimental video conferencing system that used analog circuit switched video and it worked really well, mainly because the latency was only a little more than the speed of light.
I spent a year using telepresence a few times a week. It was genuinely amazing. Lifesize people 1200 miles away with audio so crisp that one of the guys was idly rubbing the edge of some papers with his thumb and I could hear it.
I've spent a decent amount of time and money to make this happen and it helps less than you would hope.
Partially there are just affordance issues of things like eye contact which are physically out of alignment unless you start using two way mirrors [1].
There are technologies that can automatically adjust videos for eye contact today, so I imagine something similar could be implemented for this later on.
It looks like it came out of the high-fidelity Immersive Light Field Video presented at SIGGRAPH 2020. Quite impressive that within a year it's now a consumer product
WebAssembly SIMD is coming to Chrome as well. 2D images and video that only consisted of RGB and Alpha channels may appear downright primitive to future generations as depth camera rigs gain distribution ;)
I am sure that the immersion of the experience is higher. My question (and perhaps that of GP) is: is this greater immersion actually beneficial to communication?
I think this is cool tech, and valuable. I'm just not sure that it offers a communication benefit over well-lit, well-miced, wired, low latency, 8K videoconferencing.
Maybe there's some 3D emotional perception face processing stuff that we have deep in our brains that can immensely benefit from this, but I'm skeptical. I think simply doing 4k or 8k low latency high quality videoconferencing might be a 90 or 95% solution without needing special cameras/displays.
I think you might be underestimating the value of viewing a 3D model on a no-glasses 3D display. This is one of the basic aspects of in-person communication we take for granted that current 2D technology can't replicate. You can move your head and actually see a different angle of the person in front of you. This can even be subtle, our brain will still pick up the effect, and it makes the experience beyond what we usually consider as "immersive".
Yes, having low latencies and high definition video is an important aspect of this, but the 3D part is no gimmick. Once the technology improves and gets affordable this is a game changer for how we communicate online. The step after that are holographic displays, and since we'd be used to 3D models and smart displays, it probably won't feel like such a big jump.
I'm _super_ excited about this project. Hopefully Google doesn't axe it. (:
C'mon, they're showcasing prototypes or early 1st gen products here. There is some artifacting, true, though not nearly as much as I expected. Kudos to them for choosing to show objects difficult to scan/model accurately and doing a pretty impressive job at it. Under ideal conditions to be sure, but still. It's certain this will improve with future advancements and probably will by the time general consumers get to use it. Unless it never gets a widespread release and ends up as another Google research project ala Google Glass, Project Ara, etc. Hopefully not, but if nothing else it would have served as inspiration for other companies to step in now that we know what's possible.
I thought that as well watching the video, but have you used a PSVR?
I've got one and the first minute is always noticing how low res the eye screens are, then as soon as the game starts, I've forgotten and I'm _there_. The 3D part makes up for the low quality
Being able to feel like another person in the room is enough for me to reconsider working from home. As of right now I strongly have a preference for in person, but I do acknowledge most people prefer commute and cost benefits over productivity.
The state of video conferencing today is a poor one and I'm very excited for something that can change the industry like this.
I'm right there with you, and I use a 4k camera and a boom mic and headphones and wired ethernet to videoconference now: I have been regularly complaining about the low resolution and framerates of current videoconferencing systems (10-15fps, 720p, low bitrate - and that's the highest quality setting available!).
If Google wanted to make me believe they care about videoconferencing quality, they'd have a 4k 60fps option that auto-enables in Meet if it detects everyone on the call is on wired gigabit with a 4k camera.
A lot of residential areas in the US have gigabit options, in some cases symmetric. There are lots that have 1000mbps down/40mbps up cable.
Even 100mbps is sufficient for a 1-on-1 4k video call, as high-bitrate 4k is 30-40mbps. Most commercial office buildings in business districts have it available. Even Starlink (20mbps up) should be sufficient for 1-1 30fps 4k videoconferencing with a lower bitrate.
>I think this is cool tech, and valuable. I'm just not sure that it offers a communication benefit over well-lit, well-miced, wired, low latency, 8K videoconferencing.
>Maybe there's some 3D emotional perception face processing stuff that we have deep in our brains that can immensely benefit from this, but I'm skeptical.
>I think simply doing 4k or 8k low latency high quality videoconferencing might be a 90 or 95% solution without needing special cameras/displays.
From my experience, 4k or 8k doesn't matter. Sound quality actually matters most, really clear low latency audio alone will give you a surprisingly strong sense of presence.
Video quality is important but 1080p is enough, beyond that the lighting and latency matter more.
Equally important from my personal POV is video size - physical size. Take a cheap 65 inch TV, turn it vertically, and talk to someone on that. When your talking to someone that is actually life size the sense of presence is vastly improved, even at the exact same video quality. And TVs are so cheap this doesn't seem like much of a techical barrier.
If you just screen share from your cell phone to your 65 inch TV and video chat -- holding everything else equal for audio and video quality -- it's SO MUCH BETTER.
My intuition is that a great lighting+microphone+speaker setup is necessary, but not sufficient, for this demo.
Even from viewing the short demo, the stereo display alone is an entirely new dimension that no amount of studio lighting will recreate. While better lighting and audio setup would certainly improve the average person's videoconferencing experience, this looks to be a genuine step beyond.
That said, we've been seeing holographic-display prototypes for the better part of a decade, and it'll be interesting if this actually pans out or fizzles.
Eye tracking is the core feature here--the rendered "hologram" is correct from every possible angle. The things you mention are probably closer to 2% of the final result.