Reply Private New

Next 10 Prev Next

Send PM Follow Ignore

Followers	60
Posts	1211
Boards Moderated	1
Alias Born	03/27/2001

Tinroad

Re: None

Monday, 06/11/2001 1:30:23 PM

Monday, June 11, 2001 1:30:23 PM

The Power of Speech - part 2

Back-End block
The back-end block (an ARM7 general processor, for example) performs all other code functions, including topic management and search. All code, data models, and dictionaries are loaded into ROM or FLASH memory. The main processor sub-system must provide sufficient processing horsepower to allow real-time voice recognition responsiveness; you must factor in considerations such as system kernel overhead, memory access times, and processor CPU clock speed to achieve this level of responsiveness. A typical implementation includes less than 2-wait states for RAM and less than 2-wait states for ROM access.

CASSI components and resource requirements
Conversay delivers separate object modules for the DSP and the main processor due to the dual-processor implementation. With its modular nature, CASSI is available in feature scalable implementations. These different implementations give system designers the ability to match hardware capabilities (memory) against a specific, desired feature set. The following are example configurations:
•Slim: Limited word set (e.g. digits and command and control), unlimited vocabulary, pre-built output prompts (no TTS)
•Mid-Tier: Dynamic word list, unlimited vocabulary, pre-built output prompts (no TTS)
•Robust: Dynamic word list, unlimited vocabulary, full synthetic TTS, browser ready

BENEFITS
CASSI provides a number of unique features and capabilities for the system developer. This section highlights some of the key features available for the developer who uses CASSI.

Speaker independent
Speaker independent speech recognition enables developers to support a robust recognition experience. Users do not need to undergo any speaker training; their speech will be recognized immediately

Continuous speech recognition
Continuous speech recognition provides the ability for a user to speak naturally at a normal speech rate (cadence). Pauses are not required between words or digits. This feature is especially important in a situation that involves digit dialing, for example. When digit dialing, users can speak the digits as quickly as they like without pausing between digits or phrases, and the recognition engine will keep up.

Portable code
The core speech recognition engine is written in ANSI C and is highly portable. The CASSI libraries are typically provided as object files for a specific processor. Due to the portable nature of the underlying code, porting to a new processor requires minimal effort and can usually be accomplished within three or four weeks, leaving the rest of the project time spent on optimization and integration for specific hardware implementation and application development.

Small kernel
As illustrated earlier, the overall speech recognition is small and requires very few resources. This small code size makes the speech recognition engine viable for a large variety of devices, such as:
•Cellular phones
•Smart Phones/Communicators
•PDA’s
•Televisions and set top boxes
•Internet appliances
•MP3 players
•Other handheld computing devices

Shared resources
The design of the speech recognition engine allows implementation of both speech recognition and text-to-speech with little additional memory overhead. Both of these modules use the same dictionary and language models to conserve resources. Because of this capability, developers can add text-to-speech capabilities with minimal additional memory or processor requirements.

No vocabulary limits
With its unique speech-to-pronunciation (STP) capabilities, CASSI has the ability to determine pronunciation rules for words that do not exist in its dictionary set. This powerful feature is critical for applications in which name spellings vary and are not typically located in a dictionary. With STP it is possible to perform real-time recognition on unknown words and names without additional ROM or code changes. Without STP, applications such as voice-able email, voice-activated Internet surfing, or spoken name dialing are nearly impossible to implement. The STP feature is the key to Conversay’s competitive advantage of an unlimited vocabulary.

Ambient noise robustness
The CASSI engine performs audio pre-processing that guarantees robust speech recognition in noisy environments. This pre-processing removes ambient noise from the audio signal in order to improve the effective signal-to-noise ratio (SNR). Speech recognition accuracy improves significantly under a variety of noisy environmental conditions when noise reduction is used.

Channel adaptation
To improve overall speech recognition accuracy, CASSI performs continuous adaptation for the selected input audio channel. This capability continually improves recognition accuracy for a given speaker within a short time period. Instead of training a system to recognize a specific user's audio characteristics, channel adaptation provides fine-tuning of recognition performance. In addition, the CASSI API allows the developer to pre-load (or retrieve) a specific adaptation setting prior to a given session.

DESIGN CONSIDERATIONS
Unlike PC based applications, embedded systems have limited resources and adverse noise environments. Therefore, certain inherent limitations can affect recognition capability.

Active grammar
The CASSI engine supports an unlimited vocabulary of words that can be complied by the engine in real time. The active grammar defines the words and phrases that the CASSI engine will recognize at any moment. The size of this grammar impacts the RAM requirements for the CASSI engine. As the grammar size increases, processing time increases. Recognition accuracy may also suffer if the active grammar becomes too large. Generally, grammars of 100 words or less are easily handled and are more than sufficient for typical embedded applications. The total system grammar only affects ROM/RAM requirements, as active grammars may be dynamically loaded and unloaded at run-time.

Available MIPS
The CASSI percentage of processor requirements varies depending on several factors. These factors include simultaneous use of recognition and TTS, and the complexity and size of the active vocabulary. It is not possible to factor in all the needs of the various system components. Conversay will assist device manufacturers to determine whether they have enough processing power to support real-time voice recognition.

RAM resources
The RAM usage of CASSI at any given moment varies depending upon the topics in use (active vocabulary), the size of the topics being compiled, and the length and complexity of the text being sent to the TTS engine. All of these processes can happen concurrently, affecting the size of the scratch RAM that each function uses.

Math capability
The CASSI front-end processor block is highly math intensive. Various filters and vector analyses are performed that use the vector math operations found in standard DSPs. Modern DSPs are optimized to perform such operations many times faster than general purpose processor architectures (such as the ARM7).

RAM/ROM access time
A fast processor may be used for implementation, but if the memory access times are slow, the effective speed is reduced. CASSI requires random access to large external tables, particularly in its search routines. Therefore, RAM/ROM access times and specific types of implementations can severely affect the performance of this component.

SUMMARY
CASSI provides a highly portable, fast, efficient, Internet optimized speech recognition engine that supports a wide range of device applications. The CASSI embedded speech engine provides a new device interface method that goes beyond techniques such as handwriting analysis, DTMF input, or chiclets keypad. Devices that need to support screenless operations or that provide voiceable content (such as Internet and e-mail access) are only a few of the environments that would benefit from a voice recognition interface. Many system designers can add speech input to existing devices while avoiding the need for major hardware changes. This ease of implementation reduces development time and lowers hardware component costs, two factors that are important for most manufacturers.

Always tell the truth. Then you'll never have to remember what you said the last time.

Keep Last Read

Next 10 Prev Next

Join the InvestorsHub Community

Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.

Volume
Day Range:
Bid Price
Ask Price
Last Trade Time:

Boards:

Quotes:

Boards

News

Market Data

Markets

Discover

Discover

Boards:

Quotes:

Join the InvestorsHub Community