Monday, February 18, 2002 1:40:25 PM
Interview: Wavemakers
Vincent Lau speaks to Richard Sones, Director, Sales and Marketing (2/18/2002)
Wavemakers develops voice optimization software that improves the accuracy and performance of voice user interfaces driving telematics systems, PDAs, PCs, tablets and consumer electronics. Wavemakers' Audio Intelligence(tm) system focuses on enhancing and reconstructing speech and has been proven to reduce the error rate of speech recognition systems by as much as 85%.
Wavemakers offers a suite of software solutions (ClearStream, VoiceTrigger, WaveBeam and EchoBlock), as well as voice system expertise and experience, for acoustical environments, microphones, ASR's and wireless communications. The company is a third-party developer for the Intel® StrongARM® processing platform, a member of the Intel PCA Developer Network and is Texas Instruments eXpress DSP(tm) compliant.
What is the vision of Wavemakers?
Our vision is that in the near future, using a computer voice user interface will feel as natural as speaking to another person. As users, our expectations are high so there are quite a number of components that need to come together to reach this goal. Wavemakers’ component is what we would call the front end of a speech interface. This is for everything from an acoustical environment to echo cancellation to microphones to the front end of an ASR engine.
How do you see this vision applied in the telematic space?
The telematic space is of course a very exciting area and one that we have worked in for several years. From an acoustical perspective, these are very complex environments and are likely to become even more complex. To start, there are a large variety of noisy situations and environments. In addition to that, there are the specifics of the cabin space and, going forward, more and more electronic devices. So to say the least, we are very excited about the value that our expertise and experience can bring to these challenges.
Can you give me an overview of the technology?
Sure. Our technology is grounded around an understanding of what is unique about human speech. For example, if you or I were to go to Japan or Russia we would probably not understand what people are saying but we would be able to recognize that it is human speech. Our software works in a similar way; it can recognize what is important and what is not important, all in real time processing. As a second step, we enhance this signal by removing the noise and, more importantly, by reconstructing and enhancing what has been masked or lost due to noise. Then, as a last step, we optimize the speech for a specific speech engine or for human listening.
How is this technology different from your competitors?
There are number of things that give us a good advantage. The first one is that we started out from the very beginning to solve the problem of improving machine listening, or, ASR accuracy. This is important as accuracy is a much harder problem to solve than just making the voice signal sound better to the human ear (which Wavemakers now does as well). Second, as I mentioned earlier, is that we are able to reconstruct speech that is lost due to noise, even with single microphone input, and we do it in away that improves speech recognition accuracy. The third reason would be the strength of the team that we have assembled. Approximately 50% of our staff has a Masters degree or PhD, as well as a variety of backgrounds. In the end it means we have the expertise and experience to bring the best solution possible to the customer.
What do you mean by “targeted speech”?
Targeted speech is something that you and I have learned to do very naturally. Specifically, it means listening to one person in the context of other people’s voices. An example of this in an automotive environment is where the driver is speaking and does not want the voices from the backseat or passenger seat to disrupt the system or his or her communications. We have been awarded a $4.4 million investment to accelerate our work in solving exactly this problem. The program has well defined milestones that will allow us to commercialize our technology along the path towards the end goal – Targeted Speech.
What are the current product offerings?
Our Waveware suite of products consists of 3 modules centered around ClearStream™, our core software module. They are VoiceTrigger™, a sophisticated voice activity detection algorithm; EchoBlock™, an acoustical and line-echo cancellation software that is optimized for speech recognition; and finally, WaveBeam™, which is our multi-element technology that takes advantage of the extra information that is available when using more than one microphone.
So the space you compete in is not so much as speech recognition as it is speech enhancement and noise/echo cancellation?
That’s right. We optimize the performance of speech recognition systems. To date we have worked with over 12 different speech recognition engines and have been proven to reduce the error rate of these speech recognition systems by as much as 85%.
Which Voice Recognition systems have you conducted tests with?
Some companies have several different engines. The list would include companies like IBM, Philips, ART, L&H, Dragon, Conversay, Microsoft, Nuance and Speechworks. The interesting thing is that they are all different. Each engine has its own strengths and weaknesses and each of them requires a different optimized input signal in order to be optimized for accuracy.
What type of customers do you seek? Do you have any now?
We tend to work with companies that are creating the end product; this tends to be the tier 1-type companies in the automotive space. An example of this would be our work with Johnson Controls. As a core technology company our goal is to work with a small number of strategic partners that can use our technology and expertise to provide a better product and ultimately differentiate their products in the market.
Can you tell me more about the recent partnership with Johnson Controls?
We are providing both speech system expertise as well as our full Waveware suite of software. The end product is Johnson Controls’ voice controlled Bluetooth wireless-networking solutions and is a concert effort of a number of companies such as Intel, Gentex, IBM, and QNX.
What are some of the technological and business barriers you consider for a successful speech system?
That is a very interesting question and I presume you mean from a broader perspective of speech in the automotive industry. Technologically, I would say the key is making sure you bring together the right components to solve the problem and then work very clearly within the scope of the available technology but, at the same time, anticipate advancements in the technology every six months.
As for business barriers, there will always be the price point pressures of the systems that will be key. The lower the price the more ubiquitous the products.
How do you expect to overcome these barriers?
On the technical level, we can work together with our strategic partners in the evaluation of technologies and then work hard at integrating and optimizing the different components to yield a robust speech system.
Managing technological advancements in the auto industry is of course is a much broader question. Being a software approach (rather than a hardware approach) does make it easier for us to adapt our technology and bring it up to date, but the broader processes need to be in place.
In terms of cost, our value proposition is exactly that: better performance and lower cost. As we have seen in so many other industries, replacing hardware with software components results in lower overall system costs.
What is your prediction on how the speech technology industry will evolve in the short & long term?
I think it truly is an exciting time for speech as a user interface and it will definitely become a widespread reality sooner rather than later. The main reason for this is that technological advances are coming quicker and cost per MIP is dropping - a very exciting convergence. The industry will definitely grow with this and I think we will see further consolidations.
Vincent Lau speaks to Richard Sones, Director, Sales and Marketing (2/18/2002)
Wavemakers develops voice optimization software that improves the accuracy and performance of voice user interfaces driving telematics systems, PDAs, PCs, tablets and consumer electronics. Wavemakers' Audio Intelligence(tm) system focuses on enhancing and reconstructing speech and has been proven to reduce the error rate of speech recognition systems by as much as 85%.
Wavemakers offers a suite of software solutions (ClearStream, VoiceTrigger, WaveBeam and EchoBlock), as well as voice system expertise and experience, for acoustical environments, microphones, ASR's and wireless communications. The company is a third-party developer for the Intel® StrongARM® processing platform, a member of the Intel PCA Developer Network and is Texas Instruments eXpress DSP(tm) compliant.
What is the vision of Wavemakers?
Our vision is that in the near future, using a computer voice user interface will feel as natural as speaking to another person. As users, our expectations are high so there are quite a number of components that need to come together to reach this goal. Wavemakers’ component is what we would call the front end of a speech interface. This is for everything from an acoustical environment to echo cancellation to microphones to the front end of an ASR engine.
How do you see this vision applied in the telematic space?
The telematic space is of course a very exciting area and one that we have worked in for several years. From an acoustical perspective, these are very complex environments and are likely to become even more complex. To start, there are a large variety of noisy situations and environments. In addition to that, there are the specifics of the cabin space and, going forward, more and more electronic devices. So to say the least, we are very excited about the value that our expertise and experience can bring to these challenges.
Can you give me an overview of the technology?
Sure. Our technology is grounded around an understanding of what is unique about human speech. For example, if you or I were to go to Japan or Russia we would probably not understand what people are saying but we would be able to recognize that it is human speech. Our software works in a similar way; it can recognize what is important and what is not important, all in real time processing. As a second step, we enhance this signal by removing the noise and, more importantly, by reconstructing and enhancing what has been masked or lost due to noise. Then, as a last step, we optimize the speech for a specific speech engine or for human listening.
How is this technology different from your competitors?
There are number of things that give us a good advantage. The first one is that we started out from the very beginning to solve the problem of improving machine listening, or, ASR accuracy. This is important as accuracy is a much harder problem to solve than just making the voice signal sound better to the human ear (which Wavemakers now does as well). Second, as I mentioned earlier, is that we are able to reconstruct speech that is lost due to noise, even with single microphone input, and we do it in away that improves speech recognition accuracy. The third reason would be the strength of the team that we have assembled. Approximately 50% of our staff has a Masters degree or PhD, as well as a variety of backgrounds. In the end it means we have the expertise and experience to bring the best solution possible to the customer.
What do you mean by “targeted speech”?
Targeted speech is something that you and I have learned to do very naturally. Specifically, it means listening to one person in the context of other people’s voices. An example of this in an automotive environment is where the driver is speaking and does not want the voices from the backseat or passenger seat to disrupt the system or his or her communications. We have been awarded a $4.4 million investment to accelerate our work in solving exactly this problem. The program has well defined milestones that will allow us to commercialize our technology along the path towards the end goal – Targeted Speech.
What are the current product offerings?
Our Waveware suite of products consists of 3 modules centered around ClearStream™, our core software module. They are VoiceTrigger™, a sophisticated voice activity detection algorithm; EchoBlock™, an acoustical and line-echo cancellation software that is optimized for speech recognition; and finally, WaveBeam™, which is our multi-element technology that takes advantage of the extra information that is available when using more than one microphone.
So the space you compete in is not so much as speech recognition as it is speech enhancement and noise/echo cancellation?
That’s right. We optimize the performance of speech recognition systems. To date we have worked with over 12 different speech recognition engines and have been proven to reduce the error rate of these speech recognition systems by as much as 85%.
Which Voice Recognition systems have you conducted tests with?
Some companies have several different engines. The list would include companies like IBM, Philips, ART, L&H, Dragon, Conversay, Microsoft, Nuance and Speechworks. The interesting thing is that they are all different. Each engine has its own strengths and weaknesses and each of them requires a different optimized input signal in order to be optimized for accuracy.
What type of customers do you seek? Do you have any now?
We tend to work with companies that are creating the end product; this tends to be the tier 1-type companies in the automotive space. An example of this would be our work with Johnson Controls. As a core technology company our goal is to work with a small number of strategic partners that can use our technology and expertise to provide a better product and ultimately differentiate their products in the market.
Can you tell me more about the recent partnership with Johnson Controls?
We are providing both speech system expertise as well as our full Waveware suite of software. The end product is Johnson Controls’ voice controlled Bluetooth wireless-networking solutions and is a concert effort of a number of companies such as Intel, Gentex, IBM, and QNX.
What are some of the technological and business barriers you consider for a successful speech system?
That is a very interesting question and I presume you mean from a broader perspective of speech in the automotive industry. Technologically, I would say the key is making sure you bring together the right components to solve the problem and then work very clearly within the scope of the available technology but, at the same time, anticipate advancements in the technology every six months.
As for business barriers, there will always be the price point pressures of the systems that will be key. The lower the price the more ubiquitous the products.
How do you expect to overcome these barriers?
On the technical level, we can work together with our strategic partners in the evaluation of technologies and then work hard at integrating and optimizing the different components to yield a robust speech system.
Managing technological advancements in the auto industry is of course is a much broader question. Being a software approach (rather than a hardware approach) does make it easier for us to adapt our technology and bring it up to date, but the broader processes need to be in place.
In terms of cost, our value proposition is exactly that: better performance and lower cost. As we have seen in so many other industries, replacing hardware with software components results in lower overall system costs.
What is your prediction on how the speech technology industry will evolve in the short & long term?
I think it truly is an exciting time for speech as a user interface and it will definitely become a widespread reality sooner rather than later. The main reason for this is that technological advances are coming quicker and cost per MIP is dropping - a very exciting convergence. The industry will definitely grow with this and I think we will see further consolidations.
Join the InvestorsHub Community
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.