Monday, August 06, 2001 7:46:58 PM
cksla: page 2
a user speech volume indicator (yellow = too soft, green =
OK, red = clipping). These indicators are provided to make
status information immediately available to the user without
using precious screen space, or requiring the user to search
the screen for it.
3. THE SPOKEN LANGUAGE SOFTWARE STACK
The PSA software stack comprises an operating system (Vx-Works
version 1.01), a collection of “engines” providing
spoken language, communication and other services, and a
dialog manager. The role of the dialog manager is primarily
construction of the user interface by coordinating the oper-ation
of the service engines and utilizing data from the user
interface file set.
Two engines are central to the operation of the soft-ware
stack. These are the speech recognizer and the text-to-
speech encoder; these engines are currently available as
the IBM Embedded ViaVoice product. Other engines were
written specifically to meet project requirements. All project
code components are designed to be portable and can be
adapted to new hardware or operating systems by modify-ing
a hardware portability layer and recompiling.
3.1. The Embedded Dialog Manager: Philosophy
The Embedded Dialog Manager (EDM) is successful to the
extent that it makes the user comfortable with the spoken
language interface. The conditions for this are easily rec-ognized
from personal experience with conversations that
have failed. Human parties to a conversation must feel that
they are being paid attention to, that they are understood and
elicit a response when they speak, that there is meaning in
what is spoken to them, that they can express the same re-quest
in several ways, that if understanding fails the other
party will cooperate in a mutual effort to restore the conver-sation,
and that the conversation itself may be discussed and
its rules changed dynamically. Moreover, the behavior of
the conversation must be varied and take into account both
recent and long-term conversational history. These prop-erties
are supported in the EDM by means of its built-in
properties and its collection of user interface data.
Either party, in the course of a conversation, may create
utterances in four domains of discourse. The first three are
(1) the content of subject of the conversation (“Tom pitched
a curve”), (2) the subject of the conversation (“Let’s stop
talking about baseball”), and (3) the conversational condi-tions
(“Please speak a little louder, I’m having trouble hear-ing
you over the game”). In a conversation between a person
and a device, these correspond to addressing an application,
addressing the operating system under the application (nav-igation),
and addressing the dialog manager itself. Utter-ances
in the fourth domain occur when two people converse
and a third listens; in this case, some portion of the utter-ances
can be intended to influence the listener. In a conver-sation
between a person and a device, these utterances cor-respond
addressing or launching background applications.
3.2. The Embedded Dialog Manager: Design
The user interface (UI) data collection is structured around
these four domains. All applications addressed by the spo-ken
language interface software stack provide UI data files
for the EDM. UI files can be created and manipulated with
any text editor. Specifying behaviors through data files that
govern the dialog permits developers to build applications
with a conversational interface without any specific knowl-edge
of the APIs of the supporting engines.
A minimal UI file set contains a VOC (vocabulary) file;
a typical set of such files will include at least one VOC file,
a PMT (prompt) file and a PRF (profile) file. Multiple files
of each kind can be provided for each application in order to
provide both default and application-state-specific vocabu-laries,
prompts and hardware properties.
VOC files map user utterances into events that may be
processed in the event loops of the VOC target. When the
EDM accepts a spoken utterance for processing, the de-coded
string returned by the recognition engine is used as
a search key in the set of active VOC files. The search or-der
(dialog manager VOC, target platform VOC, application
VOC, background VOC), prevents any application devel-oper
from overriding the default vocabularies and damaging
default functionality such as inter-application navigation.
PMT (prompt) files provide a set of useful system re-sponses
for programmed events such as error conditions or
acknowledgments. Well-designed prompts play an impor-tant
role in creating the illusion of conversation. A PMT
file is a list of sets of two elements comprising a prompt
key and a prompt string. Prompt strings, which may con-tain
references to environment variables (such as the name
of the user), and to sound files, are played after composition.
Any process capable of sending a message to the EDM can
issue a command to play a prompt, so the same mechanism
can be used by non-application functions such as low bat-tery
warnings or appointment notifications.
In a minimal prompt file, only one string is specified
for each index. If one prompt is spoken to the user several
times in a row, the user will perceive this as “mechanical”
and grow frustrated. Three conditions arise in which more
than one prompt should be specified. If the user needs to be
alerted to the same condition several times, a collection of
prompts of similar content may be used. A set of prompts of
the same information value is called a “rotation.” The EDM
can also select prompt complexity on the basis of conver-sational
history. Repeated errors cause prompts of succes-sively
increased content (taper up) [2]. Growing user expe-rience
reduces feature prompts, down to “Ready” or a sound
icon (taper down).
Properties such as voicing, power management and but-ton
properties also play a role in the user experience. These
aspects of the interface are controlled by environment vari-ables.
A PRF (profile) file contains a list of variable names
and values. When active, these are treated as environment
variables and are used to control services, settings, substi-tutions
in prompts, and to pass values between applications.
Values in these files may be defaults or application-specific.
a user speech volume indicator (yellow = too soft, green =
OK, red = clipping). These indicators are provided to make
status information immediately available to the user without
using precious screen space, or requiring the user to search
the screen for it.
3. THE SPOKEN LANGUAGE SOFTWARE STACK
The PSA software stack comprises an operating system (Vx-Works
version 1.01), a collection of “engines” providing
spoken language, communication and other services, and a
dialog manager. The role of the dialog manager is primarily
construction of the user interface by coordinating the oper-ation
of the service engines and utilizing data from the user
interface file set.
Two engines are central to the operation of the soft-ware
stack. These are the speech recognizer and the text-to-
speech encoder; these engines are currently available as
the IBM Embedded ViaVoice product. Other engines were
written specifically to meet project requirements. All project
code components are designed to be portable and can be
adapted to new hardware or operating systems by modify-ing
a hardware portability layer and recompiling.
3.1. The Embedded Dialog Manager: Philosophy
The Embedded Dialog Manager (EDM) is successful to the
extent that it makes the user comfortable with the spoken
language interface. The conditions for this are easily rec-ognized
from personal experience with conversations that
have failed. Human parties to a conversation must feel that
they are being paid attention to, that they are understood and
elicit a response when they speak, that there is meaning in
what is spoken to them, that they can express the same re-quest
in several ways, that if understanding fails the other
party will cooperate in a mutual effort to restore the conver-sation,
and that the conversation itself may be discussed and
its rules changed dynamically. Moreover, the behavior of
the conversation must be varied and take into account both
recent and long-term conversational history. These prop-erties
are supported in the EDM by means of its built-in
properties and its collection of user interface data.
Either party, in the course of a conversation, may create
utterances in four domains of discourse. The first three are
(1) the content of subject of the conversation (“Tom pitched
a curve”), (2) the subject of the conversation (“Let’s stop
talking about baseball”), and (3) the conversational condi-tions
(“Please speak a little louder, I’m having trouble hear-ing
you over the game”). In a conversation between a person
and a device, these correspond to addressing an application,
addressing the operating system under the application (nav-igation),
and addressing the dialog manager itself. Utter-ances
in the fourth domain occur when two people converse
and a third listens; in this case, some portion of the utter-ances
can be intended to influence the listener. In a conver-sation
between a person and a device, these utterances cor-respond
addressing or launching background applications.
3.2. The Embedded Dialog Manager: Design
The user interface (UI) data collection is structured around
these four domains. All applications addressed by the spo-ken
language interface software stack provide UI data files
for the EDM. UI files can be created and manipulated with
any text editor. Specifying behaviors through data files that
govern the dialog permits developers to build applications
with a conversational interface without any specific knowl-edge
of the APIs of the supporting engines.
A minimal UI file set contains a VOC (vocabulary) file;
a typical set of such files will include at least one VOC file,
a PMT (prompt) file and a PRF (profile) file. Multiple files
of each kind can be provided for each application in order to
provide both default and application-state-specific vocabu-laries,
prompts and hardware properties.
VOC files map user utterances into events that may be
processed in the event loops of the VOC target. When the
EDM accepts a spoken utterance for processing, the de-coded
string returned by the recognition engine is used as
a search key in the set of active VOC files. The search or-der
(dialog manager VOC, target platform VOC, application
VOC, background VOC), prevents any application devel-oper
from overriding the default vocabularies and damaging
default functionality such as inter-application navigation.
PMT (prompt) files provide a set of useful system re-sponses
for programmed events such as error conditions or
acknowledgments. Well-designed prompts play an impor-tant
role in creating the illusion of conversation. A PMT
file is a list of sets of two elements comprising a prompt
key and a prompt string. Prompt strings, which may con-tain
references to environment variables (such as the name
of the user), and to sound files, are played after composition.
Any process capable of sending a message to the EDM can
issue a command to play a prompt, so the same mechanism
can be used by non-application functions such as low bat-tery
warnings or appointment notifications.
In a minimal prompt file, only one string is specified
for each index. If one prompt is spoken to the user several
times in a row, the user will perceive this as “mechanical”
and grow frustrated. Three conditions arise in which more
than one prompt should be specified. If the user needs to be
alerted to the same condition several times, a collection of
prompts of similar content may be used. A set of prompts of
the same information value is called a “rotation.” The EDM
can also select prompt complexity on the basis of conver-sational
history. Repeated errors cause prompts of succes-sively
increased content (taper up) [2]. Growing user expe-rience
reduces feature prompts, down to “Ready” or a sound
icon (taper down).
Properties such as voicing, power management and but-ton
properties also play a role in the user experience. These
aspects of the interface are controlled by environment vari-ables.
A PRF (profile) file contains a list of variable names
and values. When active, these are treated as environment
variables and are used to control services, settings, substi-tutions
in prompts, and to pass values between applications.
Values in these files may be defaults or application-specific.
Join the InvestorsHub Community
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.