InterDigital Inc (IDCC): Realism in Augmented Reality Motivati...

Reply Private New

Next 10 Prev Next

Send PM Follow Ignore

Followers	40
Posts	9682
Boards Moderated	0
Alias Born	01/02/2003

Gamco

Re: None

Wednesday, 06/07/2017 3:14:10 PM

Wednesday, June 07, 2017 3:14:10 PM

Realism in Augmented Reality

Motivation / background
The promise of augmented reality (AR) is the ability to embed virtual information directly into our
physical environment, so realistically rendered that virtual elements become indistinguishable from real
ones. Using AR to add realistic virtual elements to our physical surroundings will allow for new kinds of
experiences where the world is improved by virtual content blending seamlessly with the real, all tailored
to meet our individual needs and desires.
Early head-mounted display devices (HMD), such as the Microsoft Hololens, do a fine job in overlaying
virtual information directly on top of our normal view. As they become more widely adopted
by consumers, the level of realism and the quality of the illusion will become significantly more
important. With the advent of these devices, we are in the early days of a completely new form
of media. The transition is imminent, as HMDs are almost at the level of usability expected by the
average consumer, and the computing power enabling real-time high-end graphics, limited to desktop
computers only a few years ago, is rapidly becoming commonplace on mobile computing platforms.
Despite the relatively recent commercial availability of consumer devices, various market forecasts are
expecting a rapid increase in the number of users and the market value of AR content in the coming
years. Indeed, Pokémon Go, a mobile game developed by Niantic, became a worldwide phenomenon
in a matter of weeks. Its success has demonstrated that even more primitive forms of mobile AR
can take off quickly when the content and usability are nailed down correctly. This underscores a
significant point – when it comes to media consumption, the quality of the experience is the major
driving factor for success.
In the near future, AR will be used as an ingredient for completely new kinds of experiences, new
types of entertainment, and productive applications. However, when crafting novel AR experiences,
the details will be extremely important, as they can mean the difference between being just one app
among thousands or a killer app – the next Pokémon Go.
With future AR experiences, the realism of the content will be a key factor in determining the quality of
the overall experience. This white paper will highlight some key aspects of realism in the context of AR
and introduce related solutions developed by InterDigital’s Innovation Partners open innovation initiative.

Varieties of realism
Ignoring optical filtering and scattering in the eye, physical realism means that the image produced
by the computer has to be an accurate point-by-point representation of the spectral irradiance values
at a particular viewpoint in a real world scene. A computer-generated image should contain identical
representations of all objects in all light energy spectral and intensity ranges that the real world
scene features.
Photorealism means that the produced image needs to be photo-metrically realistic. The image
produced by the computer has to produce the same visual response as the real world scene, even if
the physical energies between the two vary in spectral and intensity ranges.
The criterion for functional realism suggests that the image produced by the computer has the same
visual information as the real world scene. Information helps the user to understand meaningful
properties of objects in a scene, such as shapes, sizes, positions, motions, and materials, and
therefore enables the user to perform useful visual tasks.

The emphasis of academic research on computer graphics has traditionally been on photorealism, which has led to the steadily
increasing quality of real-time rendered 3D graphics. Despite having a close connection with computer graphics, the realism
in AR needs to be considered from a wider perspective. AR is not only about the visual appearance. AR experiences are meant
to be consumed as a part of the real physical environment and, as such, they are interactive, reacting to both the user and the
environment. To address the interactive and physical aspects of AR, we propose criteria to rate AR experiences that are similar
to Ferwerda’s for computer graphics with some subtle, and some not-so-subtle, modifications:
1. Physical realism in AR: Virtual and physical elements are indistinguishable from each other; all elements look
and behave as if they were physically part of the reality.
2. Photorealism in AR: All virtual elements look as if they are physically part of the real world scene that the
user is observing.
3. Functional realism in AR: All virtual elements behave according to the physical reality. They represent the virtual
object correctly in relation to the physical space and help the user perform tasks assisted by AR.
With a closer examination of each of these aspects of realism in AR, we can determine how close we are to achieving full
realism, and what novel solutions have recently been developed to move us closer to that goal.

Physical realism in AR
Full physical realism with true Holodeck-like experiences will remain a distant ultimate goal for AR. Steps towards increasing
physical realism in AR involve developing not only how virtual elements appear to the viewer, but also interactivity - how a
viewer can literally touch, feel, and manipulate virtual objects directly with their bare hands.
A complexity and richness of interaction is required when moving towards a fully natural and intuitive interaction versus
the traditional point-and-click style of interaction lifted from Windows-based user interfaces that is prevalent in early AR
applications. Advanced AR applications require completely new methods for input and output, as well as a different approach to
crafting the user experience.
From an input and output technologies perspective, to enable a new level of interaction, there is a need for accurate detection
and structural understanding of the physical environment and the users within it. Methods for enabling users to directly
manipulate virtual elements by touching them are also needed. This not only requires accurate detection of user actions and
gestures, but a means for providing the realistic haptic sensation of virtual objects so that the user “feels” what they would if
they were touching real physical objects.
An accurate structural understanding of the physical environment requires the use of various depth-sensing technologies,
which solve the biggest challenges – to an extent. Depth cameras, or RGB-D sensors as they are often called, consist of a camera
sensor providing regular 2D RGB information and a depth sensor operating on structured light or time-of-flight principle,
providing depth information in the form of depth map images. Data collected by such RGB-D sensors enable reconstruction
of full 3D models of the local physical environment surrounding the user. These RGB-D sensors are now being integrated with
many of the AR HMD devices in development. We are also seeing mobile phones with embedded RGB-D sensors entering
consumer markets, led by Google’s Tango (formerly Project Tango), which paved the way for RGB-D mobile device sensor
integration, allowing the development of new applications and uses.
RGB-D sensors can also be used for the detection of user hand poses and gestures, which form a control input for AR
applications. Despite the relatively good accuracy of these solutions, there is room for new innovations to better support natural
input. There are a limited number of hand gestures that can be accurately recognized from RGB-D data today. Robust detection
of individual physical objects, user interaction with them and context of use cases are areas of study still in their infancy. Speech
recognition has been often used as an alternative or supplementary input method for cases where hand gesture and direct
manipulation do not suffice. Traditional input methods, such as a keyboard, do not work well with AR.

Compared to other relevant interaction technology areas, haptic feedback is both the single most important area and one that
is severely lacking in easy-to-deploy solutions. Haptic feedback, the sensation of touch that we get when we are in contact with
physical objects, is an essential input channel we use when carrying out tasks in everyday life. At this time, there are a number
of AR haptic feedback approaches under consideration, which range from ultrasonic to mechanical haptic force feedback. These
approaches tend to be experimental in nature. The haptic approach used is very application-specific, as no single approach is
suitable for all generic, easy-to-deploy applications. Until radically new approaches exist, the use of physical proxy objects and
the replacement of haptic feedback using other sensory channels such as audio will often need to be used.
Even with these shortcomings, goals for increasing physical
realism in AR are worthwhile. Any solutions, even with limited
use cases and poor interaction fidelity, that increase the
physical realism can result in a dramatic change in the nature
of the experience. Often, when using several sub-optimal input
and output methods in combination, a user’s brain will fuse the
feedback from different sensory systems, resulting in a jump to
a completely new level of immersion as compared to the use of
a single sensory channel such as vision.
One example of increasing physical realism with a combination of feedback channels is a solution developed by InterDigital to
provide haptic feedback with relatively flat physical proxy objects. AR is used to generate visual feedback and track the user’s
hand for input. In this solution, proxy objects can be compressed so that instead of the full 3D shape matching the virtual
object, the proxy objects are compressed along the depth axis, resulting in a physical proxy that is easy to produce with current
3D or 2.5D printing methods and is easier to handle than full 3D objects matching the virtual elements. When a user touches
the physical proxy object, the point of touch is detected and a simultaneous AR visualization of the virtual object is aligned
with the touch point. The depth axis compression of the proxy object can be substituted, and the user feels as if they are really
touching the full 3D shape of the virtual object.
Photorealism in AR
From the photorealistic point of view, the focus is on the image quality that the AR system is capable of producing. The goal of high
photorealism is to combine virtual elements with a real world view so that the virtual elements are indistinguishable from real
ones using visual inspection alone.
To achieve such photorealistic visual quality, virtual elements need to be lit with lighting identical to that present in the
environment, and the visual implications of the virtual elements on the physical environment need to be considered. This is
easier said than done. Lighting solutions simulating lighting effects that capture the look of the real world are required. A detailed
understanding of the environment’s structure and materials, the simulation of light interaction between the virtual and real
elements, and the seamless embedding of the computer-generated synthetic elements with the real world view make this a
challenging problem to solve. For some of these tasks, solutions already exist, but for others such as seamless composition, it is
quite hard to solve and requires novel approaches.
Seamless composition of virtual elements blended with the real world view has specific challenges. For example, with optical seethrough
AR, the viewer sees an unobstructed view of the real world, and virtual elements need to be overlaid on top of it. With
any of the currently available display technologies used for existing AR HMDs, the mismatch of image quality between the virtual
and real view causes virtual elements to stand out clearly from the composited view. This is often even further highlighted by
mismatched lighting, unwanted transparency, and erroneous occlusions featured on the virtual elements.

The dynamic range of the human visual system and the accommodation of visual perception to prevailing conditions further
complicates the reproduction of virtual objects. Existing displays can reproduce only a fraction of the dynamic range of the human
visual system and optical see-through display systems generally suffer from image transparency, which causes virtual elements to
appear semi-transparent with a ghost image of the background environment bleeding through. Much of these shortcomings need
to be solved by the display hardware, but solutions that enable better matching of lighting conditions between virtual and real
elements as well as correct handling of occlusions and better tone range matching can improve the realism of AR significantly and,
by doing so, enhance the photorealism of the AR.
InterDigital has developed solutions that enable automatic adjustment of virtual content elements so that the content as a
whole features uniform lighting conditions, thus improving the photorealism of the AR as well as allowing methods enabling full
re-lighting of real-time captured 3D content. These enable use cases where the visual appearance of a user is captured in real-time
and augmented to photorealistic quality in another physical space with a significantly different kind of lighting setup.

Functional realism in AR
Functional realism is the least rigid variety of AR realism. When looking at realism from this point of view, the quality comes
from how accurately the AR content manages to convey meaningful information. As an example, in an AR-assisted maintenance
use case where the goal for the AR system is to help a maintenance worker see which parts of the machinery need to be
worked on, it is essential that the AR system can point out the correct parts and effectively illustrate operational details with
an easy-to-understand graphical style. Functional realism is a measure of how effectively the AR system helps the user perform
the assigned task.
When considering functional realism, the augmented elements can be rendered with any rendering style, be it shell shading, line
drawing or photorealistic textured materials. For the user to be able to carry out the task with the help of AR, the rendering style is
irrelevant as long as it helps the AR system provide information necessary to execute the task. When focusing on the effectiveness
of information transfer, a much wider range of content types can be considered and compared. The realism here is not so much
dependent on the details of the rendering, but rather on underlying technology. Accuracy of 3D tracking (i.e., knowing where the
AR device is in relation to the environment), understanding of the environment, object recognition, and context recognition are
some of the key elements that enable an AR application to display correct information at the correct moment for the user, thus
providing needed information efficiently.
If the aim of AR technology is to move towards a ubiquitous
computing dream, where users wear AR HMDs throughout
their daily lives, then functional realism becomes a very
important consideration. AR devices need to be able to adjust
operation according to many different contexts, always aiming
for the most efficient way of delivering needed information to
the user at the moment when it’s needed.
In addition to core technologies, such as robust 3D tracking
and context recognition, solutions aimed at improving
functional realism can include ways to extract richer
information from the environment, which in turn can
be used to tailor content delivered back to the user.
The dynamic range of the human visual system and the accommodation of visual perception to prevailing conditions further
complicates the reproduction of virtual objects. Existing displays can reproduce only a fraction of the dynamic range of the human
visual system and optical see-through display systems generally suffer from image transparency, which causes virtual elements to
appear semi-transparent with a ghost image of the background environment bleeding through. Much of these shortcomings need
to be solved by the display hardware, but solutions that enable better matching of lighting conditions between virtual and real
elements as well as correct handling of occlusions and better tone range matching can improve the realism of AR significantly and,
by doing so, enhance the photorealism of the AR.
InterDigital has developed solutions that enable automatic adjustment of virtual content elements so that the content as a
whole features uniform lighting conditions, thus improving the photorealism of the AR as well as allowing methods enabling full
re-lighting of real-time captured 3D content. These enable use cases where the visual appearance of a user is captured in real-time
and augmented to photorealistic quality in another physical space with a significantly different kind of lighting setup.

InterDigital has been developing a concept system for collecting and refining information from different
user environments, which can then be used to improve functional realism. With this solution, the
data collected by device sensors (such as RGB-D sensors, motion, etc.) that are embedded in the
AR HMDs are collected in a continuous and centralized manner in order to create an extremely rich
information repository. From this information repository, deeper insight into the environment, users,
and relationships between them can be extracted with further analysis. Such extracted information
enables the improvement of functional realism for various AR experiences and services, as well as
the development of completely new kinds of services that can identify situations where users need
assistance, match virtual experiences with physical environments, and offer the potential to perform
virtual product placement for advertisers.

Conclusions
We have discussed some views on the realism of AR experiences and the differences as compared to
realism for computer graphics. Clearly, we see that computer graphics are an essential component of AR
and, as such, they have much similarity with AR when viewed from the perspective of realism. However,
we have also pointed out that there are significant differences between the two, especially as AR has a
much more profound connection with the physical environment and the interaction between virtual,
real, and user. Based on these observations, we have extended the viewpoint of realism from purely
visual aspects addressed by computer graphics to a wider perspective that also considers the interaction
between virtual content, the real physical environment, and the user. We see that with this wider
perspective, the visual aspects of realism are not necessarily the ones most severely lacking solutions,
and we believe solutions that add physical realism in AR such as haptic feedback and natural interaction
are very much in demand, since they carry the potential of elevating the realism and immersion of AR to
the next level. We have also introduced recent solutions developed by InterDigital aimed at improving
the realism of AR, and we want to conclude this white paper by encouraging everyone to continue to
push the limits of realism in AR.

Innovation Partners
www.innovation-partners.com

"Intellectuals solve problems; geniuses prevent them." - Albert Einstein