Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Apple event fuels speculation
By Ian Fried and Joe Wilcox
Staff Writers, CNET News.com
April 20, 2001, 1:20 p.m. PT
update Apple Computer on Friday scheduled a press event for May 1 with CEO Steve Jobs, sparking speculation that he will unveil new Macs or announce plans for retail stores.
An Apple representative wouldn't say what will be announced at the event. But sources have told CNET News.com that Cupertino, Calif.-based Apple is working on a slimmer version of its iBook consumer notebook. There has also been speculation that Apple will introduce new dual-processor PowerMacs.
Both of these products are expected to come no later than July at Macworld Expo in New York, but some sources have said they will come sooner.
"Apple CEO Steve Jobs and senior management will make some exciting announcements and host a Q&A," Apple said in its invitation to reporters. Apple press events are relatively infrequent and, in the past, have served as launches for new products.
Distributors have been running low on iBooks, which is often a harbinger of new Apple models. As of Thursday, distribution giant Ingram Micro had no 366MHz iBooks and only a few of the faster 466MHz machines, sources said. Ingram Micro listed no date as to when more systems might be available. Apple recently stopped selling its 667MHz single-processor PowerMac.
"If Apple has run the iBook inventory to zero and there are no backorders, then that means new models are coming," Gartner analyst Chris LeTocq said.
Apple does not comment on its future products. However, Jobs said at Wednesday's annual shareholders meeting that the company will introduce products this year in events other than keynote speeches at industry trade shows.
"I think you will see some major products announced this year--not at keynotes," he told shareholders.
The company has also been making plans to open a series of retail outlets, with locations said to be set in Chicago, Minneapolis, Palo Alto, Calif., and Littleton, Colo., among other locales. Apple has repeatedly refused to comment on those plans.
"It's about time Apple discussed their retail store plan," Technology Business Research analyst Tim Deal said. At the same time, Deal has reservations about Apple opening the stores.
"In light of Gateway--the Wintel equivalent of Apple--and the recent closing of their stores, I don't think it's a good time for Apple to promote a retail presence," he said. With more than 34 percent of sales coming directly from Apple's online store and the recent departure of Mac dealers, he added, "it's clear more and more of their customers are buying online."
The only way a retail strategy makes sense is "Apple getting in the face of non-Mac users and converting some of them," Deal said.
If Apple does introduce new products at the event, it would be a step in the right direction, NPD Intelect analyst Stephen Baker said.
"The things that Apple needs to do to improve their performance is to become more like (other PC makers) in terms of how they manage their product lifecycles," he said. "When you only have a couple of product refreshes and nothing in between, the pace of the market can get ahead of you."
Typically, Apple releases new products at trade shows, such as Macworld, which occur only a few times of year. Critics have said that such an approach limits Apple's flexibility in reacting to market trends quickly.
For example, last summer the company began offering Macs with DVD drives instead of machines with hot-selling CD-rewritable drives. Not only did Apple miss the trend, but the company didn't respond with CD-RW iMacs until January's Macworld Expo in San Francisco.
"Certainly (with) the whole CD-RW thing, the pace of the market got ahead of them," Baker said. "It took them too long to catch back up again."
RioPort Service Delivers Music to Devices
Digital audio platform developer RioPort is testing a service that delivers secure music directly to audio players and cell phones. RioPort will provide the service to consumer electronics and Internet appliance makers, and online retailers as part of its downloadable music platform. The service would prevent unauthorized redistribution of music by delivering tracks to devices; the tracks bypass the user's PC and cannot be passed along to others. Labels including A&M Records, Dreamworks Records, Priority Records and Moonshine Music are providing promotional tracks for trials of the service. RioPort has digital distribution deals in place with all five major label groups; MTV is integrating RioPort's platform to sell downloads through its streaming audio sites.
More Than The Sum Of The Parts -- In a recent issue discussing how data is rapidly moving into our pockets (http://www.compaq.com/rcfoc/20010319.html#_Toc509133222), I mentioned my surprise at a Yankelovich Partner survey that found that 53% of people would prefer to carry several devices, rather than an all-in-one combo. Thinking that my cell phone and PDA have already exceeded my free pockets, I mused, "Go figure..."
Well, RCFoC reader Don Lyle figures that there will be some very good reasons for carrying multiple devices, especially as each device gets much smaller -- it's that new technologies such as Bluetooth will make the functional "whole" of the active devices we carry around, MUCH greater than the mere sum of the parts:
"In your latest RCFofC you seemed surprised that a significant percentage of users said they’d carry multiple devices... I think it’s inevitable that we’ll carry multiple devices, and desirable to do so.
A major problem with PCs today is that they’re general-purpose devices... An analogy I use is one of a general-purpose carpenter’s tool. Want to drive a nail, saw a board, square-up an upright? Just use our handy-dandy carpenter’s tool with a 300-page instruction book and an industry’s worth of supporting literature, like “Carpentry for Dummies.”
Bluetooth, for one, promises that we can carry our “aura” with us and perform ad-hoc network formation, adding devices that come within the scope of our aura, and then dropping off those that “leave” the field. I can visualize carrying or wearing a voice-activated cellphone module that interfaces with the earpiece module (an earring?), the microphone module (a necklace or tie tack?), and the database module where ALL of our personal data resides, including all the PIM information. Blackberrys, Palm Pilots, etc. will be nothing more than template-driven user interfaces to the database module.
My wristwatch module will adjust itself as necessary to the correct time obtained from the cellular network, or from overhead GPS satellites (assuming I’m carrying a GPS module, or I walk a path such that my aura encompasses one). Where is some of this [much smaller] stuff carried? Your wallet might be nice.
With these very-special-purpose devices, form will follow function, and thick instruction books won’t be necessary. The instruction book that comes with a watch doesn’t contain a single paragraph about how to tell time -- just instructions on how to figure out all of the ancillary bells and whistles that “designers” have crammed into the shell..."
Indeed, the unintended consequences of all of our devices playing in a constantly evolving ad-hoc network, will be quite amazing. And very un-planned!
There's Speed, and Then There's Speed.
As 3G ("Third Generation") wireless data networks are preparing to spring up in Japan very shortly, and on U.S. shores before the end of this year (http://www.compaq.com/rcfoc/20010326.html#_Toc509819394), it's time to wonder just how much bandwidth our pockets will actually see. Some of the claims are startling -- everywhere from ISDN-like 144 kilobits/second, up through 2-3 megabits/second. But because there are different technologies involved, from multiple vendors, and since we're still waiting for real-world installations, the reality has been rather hard to nail down. (There are many technical issues that will mitigate the bandwidth that any given pocket might experience at any given time. For example, all data users of a cell site will share its bandwidth at any instant, noise to and from a handset will impact the data rate, etc.)
Which is why I was pleased to see what sounds like some realistic results from tests conducted by wireless carriers themselves, described in the April 10 Computerworld (http://www.pcworld.com/news/article/0,aid,46824,00.asp). Essentially (and unsurprisingly):
"Carriers acknowledged last week that the average throughput on third-generation mobile wireless networks will be in the range of only one-third to one-half of the peak speeds they hyped in announcements."
What this means, is that a typical pocket might see real-world throughput of 50-70 kilobits/second when using Verizon's initial "144 kilobits/second" service. A Gartner Group analyst expects "somewhere between 28 kbps and 64 kbps." A year later, as Verizon rolls out "2.4 megabit/second" service, a pocket phone might see 500-600 kilobits/second (a fixed installation using this network might actually see a full 2.4 megabits/second of bandwidth.) Of course, any of these would surely beat today's common 19.2 kilobits/second or slower...
It's not (exactly) that the carriers have been trying to pull the wireless wool over our eyes (we all do have our "hype filters" turned ON, don't we?), but there are very real differences between the optimum, single-packet speed in an uncongested test network, compared to what happens when phones move away from and between antennas, encounter noise, and have to share bandwidth with other users.
Personally, I find these more realistic figures comforting -- I can't forget what happened to the way-over-hyped Apple Newton -- it was a rather nice PDA well before its time, yet the hype leading up to its introduction set expectations so unreasonably high that, since it didn't actually cure world hunger, it could never satisfy its buyers. But if our expectations for wireless data are set realistically, then applications can be designed that actually delight, rather than disappoint... And that will be a good thing for carriers and users alike!
Technology And Economy.
I'm hardly an economist, and the economy isn't the focus of this journal. But given the tie-in between the economy and the possible technology trends that we just explored, it's interesting to gain some perspective on the economic side of things from Alvin Toffler, the author of Future Shock. (I thank RCFoC reader Cindy Blake for bringing Toffler's comments to our attention.)
According to Toffler, in an excellent article well worth reading in-full in the March 29 Wall Street Journal (http://interactive.wsj.com/archive/retrieve.cgi?id=SB985829231298844352.djm),
"Yes, Virginia, there is a new economy, and it's just getting ready to launch its next phase."
Mr. Toffler makes the case that during the journey from an agrarian culture's emphasis on muscle power, to the Industrial Age's early value on electricity and fossil fuel power, and later its value on mind-power (innovation), each transition caused changes in the "right stuff' that made businesses profitable. Changes, yes -- including the demise of some businesses -- it didn't matter if you made the best, most technologically-advanced, most cost-effective buggy whip... But these disruptions hardly banished profits, overall.
Similarly, as we now transition into the Knowledge Age, Mr. Toffler believes that,
"Revolutions, by definition, are marked by surprises, reversals, upsets, wildly volatile swings, and a heightened role for chance... [Now,] on an even bigger, faster scale, a new economic and social system is taking form. It, too, will transform just about everything else... "
"There are today more than three million digital switches for every human being alive on the planet... There are nearly half a billion PCs on the planet -- one for every 13 human beings... Are hundreds of millions of mobile phone users going to throw their phones away?"
What are the elements that will take us beyond the transistor, which lifted us out of the Industrial Age and is now thrusting us, headlong, into the Knowledge Age? Toffler suggests that,
"What comes after the first digital revolution?... It is, of course, in genetics and biotechnology that the most powerful effects are about to be unleashed... Many [of the effects] have implications that will feed back into, and change, the future of information technology itself, whether in the form of biochips or DNA-based computing, and, who knows, new communications technologies based on DNA models and biochips...
If you think the revolution is over, get ready to be shocked again as information technology fully converges with, and is in turn remade by, the biological revolution."
What Might Tomorrow Bring?
That, of course, is a perennial question, when it comes to something moving as fast as technology. And while crystal balls are always of necessity somewhat cloudy, it's interesting to occasionally kick back and explore where the experts expect things to go. This view, though, isn't directly about technology per se -- it's about what the results of various technologies' explosive growth are likely to mean to how we work, live, and play. And those are the most interesting results of innovation, after all...
The March 9 CNET put this question to several high-profile research labs, including MIT's Artificial Intelligence Laboratory, Belgium's Starlab, and the famed Xerox PARC. Collectively, they feel that by the end of this year, toy robots such as the Sony Aibo that we explored at COMDEX (http://www.compaq.com/rcfoc/20001127.html#_Toc499300003) will be all the rage. And just about any toy or appliance that might benefit from being able to "talk," will do so. (Playrooms may become even noisier...) Also, we might (finally) find the first practical attempts at a "Rosie the Robot" home cleaning robot, while "smart clothing" might warn us if we leave the office without our car keys!
Voice will see a resurgence over the next few years, with keyboard-driven chat rooms and online support services using Voice Over IP (VoIP) to remove the impersonal keyboard and text element, according to the March 19 Fortune (http://www.fortune.com/indexw.jhtml?channel=artcol.jhtml&doc_id=200846&page=1&_DARGS=%2F.... And "privacy" may become a thing of the past, with a great deal of our business and personal information becoming available (internationally or otherwise) over the Web.
(Indeed, the recent posting of an ICQ chat log allegedly detailing conversations between a company's CEO and his top executives, has resulted in wholesale resignations and significant corporate problems. Don't EVER assume that unencrypted communications of any sort on the Internet are secure, or that an off-hand comment will "go away." Someone may well be able to dredge it up for all to see - http://news.cnet.com/news/0-1005-200-5148422.html).
By the end of 2005, the research labs expect that Web-controlled robotic cleaning devices will be affordably priced, and will begin to become common. Biology and electronics will continue to come together to allow prosthetic limbs to be directly controlled by the brain, and we may gain the ability to transplant additional organs, such as eyes.
(Don't think this is possible? Reader Don McArthur points us to preliminary experiments at Northwestern University, where a lamprey's brain tissue and been fused with electronics to create an experimental cyborg -- it guides itself towards a flashing light, and can "learn"! See http://washingtonpost.com/wp-dyn/nation/A24800-2001Apr16.html .)
And remember those toys and appliances that we just noted will soon begin to speak? By 2006, they'll begin to listen to what we have to say -- not so much to the words, but to our emotions -- and then react accordingly.
By the end of 2010, we'll see C-3PO-like "advanced personal robots," and the groundbreaking robot-assisted surgery of today will become common. Continuing work on the human genome, and on tiny chips that dramatically improve gene and drug research, will lead to sophisticated home testing kits, and to medicine specifically created for our individual bodies, targeted directly toward exactly what ails us. And digital art, displayed on larger displays, will become common (extending a contemporary experiment in huge displays, at http://www.parc.xerox.com/red/projects/xfr/readingwall.html).
Finally, in the unimaginably distant year of 2050, these experts believe that we'll "have precise digital control of cells," and biologically-grown robotic add-ons will be available as "upgrades" to "Human, V1.0." Nanotechnology will be moving forward nicely, with billionth-of-a-meter machines beginning to do our bidding. With both good and with troubling implications, they expect that "genetic engineering will take hold; expect the creation and replication of creatures large and small."
Details begin at (http://www.cnet.com/techtrends/0-6014-8-4962347-1.html).
Of course, this is a very high-level overview of the things that these folks expect will escape from the research labs, into our society. In some cases, we've heard similar predictions before (the household cleaning robot, to point.) But given the rate of technology growth, and the fact that each year's technology builds on the shoulders of the growth before it, fascinating things, even if different from these predictions, are sure to emerge.
Xybernaut to collaborate with IBM, Texas Instruments
COMPUTER HARDWARE
By Nora Macaluso, LocalBusiness.com
Apr 19, 2001 08:01 AM ET
UPDATED FAIRFAX, Va., April 19 (LocalBusiness.com) -- Xybernaut Corp., IBM Corp. and Texas Instruments Inc. are reportedly preparing to introduce a jointly developed wearable computer.
A spokesman for Xybernaut (Nasdaq: XYBR) would not comment on the report, except to say that the company and IBM are planning a joint press conference at the International Conference on Wearable Computing, to be held here May 30 and 31.
Earlier this month, Xybernaut said it received a contract worth more than $1 million to equip FedEx Corp. (NYSE: FDX) flight line maintenance technicians with its MA IV wearable computers, enabling them to check key information while working on aircraft.
Xybernaut has been testing the devices with a number of major companies, and many of those participating in the pilot programs are preparing to sign on to purchase them, President and Chief Operating Officer Todd Rehm told LocalBusiness.com at the time.
Other companies currently testing Xybernaut computers are General Electric Co., Hilton Hotels Corp. and British Airways Plc.
Big screen, small package
The MA IV, which is about the size of a clock radio, is capable of running all standard operating systems, including Windows and Linux. The main unit is worn on a belt buckle, and a mirror attached to a headset lets the wearer view a video console that gives the appearance of a 17-inch computer monitor.
Xybernaut, www.xybernaut.com, says it's the No. 1 maker of wearable computers, which sell for $4,000 to $5,000 each.
In January, Xybernaut said it would collaborate with Hitachi Ltd. and Shimadzu Corp. to market a line of wearable Internet appliances, with Hitachi systems powering the control units, Shimadzu providing the displays and Xybernaut exploring business opportunities for the products.
Xybernaut shares closed Wednesday at $2.39, up 37 cents. The shares have traded as high as $12.25 and as low as $1.16 over the past year.
Military study begins
Meanwhile, Xybernaut today announced it had completed an initial shipment of wearable computers to the U.S. Navy and National Guard. The services are conducting feasibility studies testing whether the technology can trim repair time for aircraft and weapons systems.
OT:Handhelds: Tagger's Best Friend?
by Aparna Kumar
2:00 a.m. Apr. 20, 2001 PDT
Think of a cell phone or Palm handheld as a can of spray paint.
But instead of tagging a building with it, imagine using it to leave a floating digital message in front of the building, so that people walking by would see your comments on their phone or PDA.
A San Francisco company has developed a wireless content application called HaikuHaiku that will let people with Web-enabled Palm or WAP-enabled mobile phones leave trails of digital messages wherever they go.
At Neoku's site, users can select a neighborhood-level location (such as Manhattan's Upper West Side) and either view or post 3-line haikus specific to that location. There are very few restrictions: haikus don't have to follow the traditional 5-7-5 syllabic pattern, and they can be about whatever you want, whether it's your impression of El Capitan at sunset or a romantic tribute to a stranger on the bus.
"My original intent was to turn mobile devices into art objects and get people to write whatever they want," said Jay Bain, co-founder of Neoku, the two-person company that created HaikuHaiku. "But now we're trying to make a legitimate business out of it."
Though Bain admits that HaikuHaiku could be used in "potentially underground and subversive" ways, the company plans to license the application to other mobile-content companies as a way to capture on-the-fly customer feedback, such as restaurant or movie reviews.
Still, "wireless graffiti" remains an ahead-of-its-time concept.
"The challenge we face is how to build a location-based service when location-based technology is not yet available," said Ori Neidich, Neoku's wireless engineer.
At the moment, Neoku's service is simply a global index of messages that can be accessed wirelessly but must be input on the World Wide Web. Users have to register on Neoku's site in order to view or add haikus.
In order to "beam" targeted messages —- for instance, a review of a movie when a user is in the vicinity of a theater where it's playing —- Neoku would have to go through the user's carrier to pinpoint the exact location, which isn't logistically possible yet.
Carriers and handset manufacturers are expected to introduce location-tracking systems in the United States as early as this fall. Once the standards for location-tracking technology are set, Neoku plans to automatically map and dispatch targeted messages to registered users based on where they're standing -- whether it's a Taco Bell in downtown Cleveland or a riverbank in Ireland.
"The big question for the industry is, 'What is the role of community in wireless networks?'" Bain said. "Are we all just subscribers on the same virtual network who don't interact with each other, or are we neighbors living together in real-world communities?"
But like any community application, the success of Neoku's service will depend on attracting a critical mass of users to take advantage of it.
Critics of the wireless Web say the tiny keypads and screens of handheld devices make them unsuitable for text-based communication.
And what's more, allowing users to post whatever they wish raises pressing liability questions for Neoku and for any company that might license its service.
Two years ago, a company called Third Voice launched a Web plug-in that allowed users to post their comments on the face of any Web page, provoking angry resistance from an army of Web hosts who called it "Web graffiti."
On Monday, the company finally discontinued its service because of a failure to generate ad revenue.
"You've got to be kidding about that," Eng-Siong Tan, founder of Third Voice, said about Neoku's wireless Web graffiti plan.
"It's all well and good to promote free speech, but if you're going to run a business, you need to figure out how you're going to pay for this," Tan added.
Tan also said that in a community application like Neoku's, the signal-to-noise ratio could be a problem, with users spamming each other with inane comments. "If you leave it open, it's going to go to the lowest common denominator," Tan said.
But Bain and Neidich are unfazed, and say they hope people will use HaikuHaiku in ways its developers never anticipated.
During the beta test of HaikuHaiku, a user named "arcana" at 23rd and Mission streets in San Francisco's Mission District was inspired to write:
"Dot com wireless
internet startup etailer
blah blah blah blah blah"
Telematics Set for Big Euro Growth
allNetDevices
04/20/2001
In-car communications, or telematics, capabilities are emerging as a major differentiator in the battle for market share by European automobile makers, according to a study released Friday by research analysts Frost & Sullivan.
The study said that the market for telematics hardware, applications and services was just over one billion Euros in 2000. However, that will grow to about 8.55 billion euros in Europe in 2007, the study said.
The study predicted that most automakers will have affordable telematics systems available in most of their cars by 2004. However, Frost & Sullivan senior automotive analyst Tif Awan said that some significant challenges must be overcome before European carmakers will see those sorts of revenues.
"For example, the safety and security applications favoured by U.S. consumers do not appear to have gone down well in Europe where the navigation applications seem to have a higher value proposition," he said.
In addition, on-board navigation systems should be made more attractive to consumers and navigation and safety applications should merge into a single modular unit.
Awan said that the U.K., Germany, Italy and France will dominate the European telematics market.
Toshiba eyes July launch for Pocket PC PDA
4/19/01
The first PDA from Toshiba
Martyn Williams, IDG News Service
Toshiba Corp. is planning its long-awaited entry into the personal digital assistant (PDA) market for July, a company source has told IDG News Service.
The company is nearing design completion of its PDA, the first from the notebook market leader, which will be based on Microsoft Corp.'s Pocket PC platform and Windows CE OS (operating system), the source said.
Further details, such as target price and international launch details, are still under consideration.
Toshiba introduced the world to the concept of mobile computing in June 1989 when it launched the world's first notebook computer, but since then, although its machines have gotten smaller, it has never deviated from the basic notebook PC form factor. The new PDA will be a test of whether Toshiba can translate its mobile computing leadership into new markets.
It will face stiff competition from a handful of companies, some already established and others that have more recently entered the market. Among them are Sharp Corp., Compaq Computer Corp., Casio Computer Co. Ltd., Palm Inc., Handspring Inc. and Sony Corp. The latter three companies all have machines based on Palm's proprietary OS, while Compaq and Casio, like Toshiba, use Microsoft's Pocket PC and Windows CE. Sharp, which has its own OS, is planning to capture a greater market share by offering machines based on the open-source Linux OS.
Toshiba, in Tokyo, can be contacted at +81-3-3457-2105 or found online at http://www.toshiba.co.jp/.
Mitsubishi dials up Pocket PC with phone-handheld combo
By Richard Shim
Special to CNET News.com
January 10, 2001, 2:40 p.m. PT
Mitsubishi killed two birds with one device Wednesday.
The Japanese company introduced the Trium Mondo, a combination phone and handheld computer that uses Microsoft's Pocket PC operating system.
This is not the first such combination product to come from the Pocket PC ranks. But it is Mitsubishi's first foray into handheld computers.
The Mondo is also part of Microsoft's ongoing efforts to piggyback onto the lucrative mobile phone market, a move ARS analyst Matt Sargent says has been a long time coming. Mobile phones now ship by the hundreds of millions each year.
"Microsoft is trying to provide a device that integrates wireless because 'Stinger' isn't ready yet and won't be for a while," Sargent explained.
Stinger is the code name for Microsoft's upcoming operating system for so-called smart phones, which combine the functions of a cell phone and a personal digital assistant.
Mondo differs from Stinger, which will look like a cell phone, because its design is more reminiscent of a PDA.
"Form factor is really the distinguishing feature," said Mary Starman, Microsoft's product manager for mobile devices. "This is for those who want a PDA with voice capabilities...Stinger is for those who want a phone first."
Mondo is not alone
The Mondo won't be the only phone-PDA combo out there.
In November, Sagem announced a similar Pocket PC device, the WA3050. It will cost about $780 and soon be available in Europe and Asia.
Handspring recently released the $299 VisorPhone module. The VisorPhone fits into the Springboard expansion slot on Handspring Visors, which are based on the Palm operating system.
In addition, Palm and Hong Kong-based RealVision announced last year that they are working on a device that will attach to a Palm V and add voice capabilities.
Mondo's price has not been set yet and will vary depending on the carrier. It will ship in Europe in the first quarter, Starman said. In Europe, Mitsubishi will sell the product under the Trium brand name. The device could eventually come to the United States as well.
The Mondo uses Global System for Mobile Communications (GSM) and General Packet Radio Switching (GPRS) networks and is based on a 166-MHz Intel processor. Mondo owners will be able to use a headset with the device and to make calls directly by selecting a name from the address book.
According to Mitsubishi, the company is working with applications providers to bring full Web browsing and video-clip viewing to the Mondo.
Mitsubishi Electric Chooses Microsoft Smart Phone Software Platform to Power Next Generation of Trium Mobile Phones
CANNES, France, and NEW YORK -- Feb. 19, 2001 -- Today at the 3GSM World Congress and Internet World Wireless, Mitsubishi Electric Corp. and Microsoft Corp. announced that Mitsubishi Electric has licensed and plans to develop next-generation phones based on Microsoft's smart phone platform, currently code-named "Stinger." Microsoft® smart phone software sets a new standard for the form factor and functionality of smart phones, enabling a richer user experience for new data services that are optimized through 2.5G and 3G broadband wireless networks, including secure corporate and Web access, access to e-mail, and continually updated personal information manager (PIM) information. Following the success of its Mondo voice-enabled Pocket PC, Mitsubishi Electric plans to release smart phones running Microsoft's smart phone platform under the Trium phone brand in late 2001 for GSM and GPRS networks.
"We looked at many options to power our future Trium smart phones and believe Microsoft's smart phone software is the best choice to help us build the great phones we're known for coupled with functionality that can't be matched -- from Microsoft's mobility services and Mobile Outlook® capabilities to WAP services and secure Web access," said Yasuhide Iwata, president director general, Mitsubishi Electric Telecom Europe (Trium). "We're thrilled to be working with Microsoft and look forward to bringing innovative smart phone solutions to market together."
"Microsoft is excited to be working with Trium to bring next-generation smart phones to market based on our 'Stinger' platform," said Ben Waldman, vice president of the Mobile Devices Division at Microsoft. "Microsoft's vision is to enable our customers to access their information any time, anywhere and on any device -- this announcement moves us closer to making this vision a reality."
About Microsoft and Trium Solutions
Trium is working with Microsoft to bring two types of mobile devices to market: Trium smart phones based on Microsoft's smart phone software and the Trium Mondo, a voice-enabled personal digital assistant based on the Microsoft Pocket PC platform. The Trium Mondo is currently shipping in Europe, with Trium smart phone solutions coming to market later this year.
About Microsoft Phone Platforms
Microsoft is delivering software to handset manufacturers and mobile operators for two phone solutions. Microsoft Mobile Explorer™ technology provides a lightweight microbrowser-based solution for mobile phones offering online services. The Microsoft smart phone platform, code-named "Stinger," is built on a version of the Microsoft Windows® CE 3.0 operating system specifically optimized for mobile phones to extend battery life and reduce memory requirements.
About Mitsubishi Electric Telecom Europe
In Europe, Mitsubishi Electric Corp.'s telecommunications business is handled by Mitsubishi Electric Telecom Europe (METE). METE is responsible for the manufacture, research, development, design and marketing of all mobile handsets and accessories, which are sold under the brand name Trium. METE's research and development center is one of the world's most advanced, dedicated to developing GSM and 3G/UMTS technology.
About Microsoft
Founded in 1975, Microsoft (Nasdaq "MSFT") is the worldwide leader in software, services and Internet technologies for personal and business computing. The company offers a wide range of products and services designed to empower people through great software -- any time, any place and on any device.
Microsoft, Outlook, Mobile Explorer and Windows are either registered trademarks or trademarks of Microsoft Corp. in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
For more information about this event:
More information about Microsoft announcements and products highlighted at 3GSM World Congress and Internet World Wireless is available at: http://www.microsoft.com/presspass/events/gsm01/default.asp.
Note to editors: If you are interested in viewing additional information on Microsoft, please visit the Microsoft Web page at http://www.microsoft.com/presspass/ on Microsoft's corporate information pages.
Intel Means Business with Communication LSI Development for Mobile Phones
April 20, 2001 (TOKYO) -- Intel plans to put serious effort into developing LSIs for communication devices, Ronald J. Smith, senior vice president of Intel Corp., said in a keynote speech at the Intel Developer Forum 2001 Spring Japan held in Tokyo.
He also announced the followings: (1) Intel has verified operation of its new processor "NSA" for next-generation cellular phones which operates with a higher speed, (2) It started shipping the processor "PDCharm 2.0" for Java-capable PDC mobile phones, and (3) It started selling LSIs that can link to networks with optical fibers.
The announcement that Intel started selling those LSIs for optical fiber networks suggests that Intel will start providing LSI products to a wide range of network-related devices, such as network devices for communications infrastructure, network servers, and client-based devices including PCs, mobile phones and other devices.
Intel has proposed its new communication processors' architecture "micro signal architecture (MSA)" for next-generation cellular phones, and is now developing the LSI products using the architecture. According to the plan, the MSA processor operates at 400MHz. Smith initiated the steady progress of developing the MSA processor through demonstrating the processor operation at 340MHz. As the MSA processor has a high operation ability, it can be operated through the high level processing code of C or C++ language.
Intel calls its new next-generation cellular phone system "Intel Personal Internet Client Architecture." It comprises an MSA processor, "StrongARM" CPU for mobile devices, the "Xscale" CPU chip, and Intel-made flash memories. Intel projects shipment of the MSA processors within 2001. Mitsubishi Electric Corp. plans to adopt the processor in its next-generation cellular phones.
KDDI to Launch Bluetooth Mobile Phone within 2001
April 20, 2001 (TOKYO) -- KDDI Corp. revealed that it will launch a mobile phone that uses Bluetooth wireless communication technology within 2001.
No further details have been disclosed by the company as to the launch date or shipment plan.
Industry observers say that possible handset manufacturers could be Sony Corp. or Toshiba Corp., given the fact that these manufacturers have been the suppliers of KDDI's mobile phones.
A KDDI group company DDI Pocket Inc., operator of the personal handyphone system services, has already revealed that a Bluetooth adapter for connection to its "feel H" (feel edge)" PHS phones will launch on April 28.
from intel website- written the month b4 intel/edig announce PR
An Overview of Speech Technology and Applications
About this whitepaper
This paper is designed to give Independent Software Vendors (ISVs) and solution providers an overview of speech technology, speech applications (including design requirements), and development tools for speech-related business applications on the Intel architecture. It is not intended as a developer guide; rather, it will provide enough information for product development managers to understand how speech technology can add value to their business application. To help developers move to the next step, it will also provide summaries and contact information for the leading speech technology products and tools available in the marketplace today.
Overview
Speech technology is ready to become a critical element of the PC user interface - and for good reason. Speech adds tremendous value to a wide variety of applications, and for example can help businesses reduce training and customer support costs, raise employee productivity, and support international, web-based electronic commerce.
Speech is happening now because of technology breakthroughs enabled by the rising performance and falling cost of desktop computing. The latest generation of Intel® Pentium® II processors provides an even more powerful platform for speech-related applications. It is clear that speech technologies are advancing rapidly and moving into the mainstream to make PCs more functional and user friendly:
Business Week describes the market for speech applications in the year 2000 and beyond as "astronomic" and says speech technology "may become ubiquitous." (February 23, 1998)
Information Week reports, "By the end of 1998, computer inputs will be the keyboard, the mouse, and the voice..." (January 1998)
Jackie Fenn, VP and research director of advanced technologies at Gartner Group Inc. said that "Continuous speech is the true beginning of speech for the desktop", and that "More than 30% of general office workers will use some form of voice recognition by 2001" in an interview with Information Week (November 1997)
For applications ranging from desktop financial applications to web-based customer support, speech can help you offer a more compelling solution to customer needs. Engines and tools are available for application development and have been optimized for performance on the latest Intel Architecture based PCs. Application developers and solution providers who commit to speech technology today stand to gain a significant competitive edge.
CONTENTS:
Introduction and Market Overview
Intel Architecture: The Platform for Speech
Speech Technology Basics
Technology Concepts
How Speech Recognition Works
Speech-Enabled Applications
Desktop
Overview
Design Considerations
Hardware and Software Requirements
Telephony
Overview
Design Considerations
Hardware and Software Requirements
Handheld
Overview
Design Considerations
Hardware and Software Requirements
Conclusion: The Time Is Now
Appendix A: Glossary of Application Criteria
Appendix B: Development Tools (ISV and SDK Summary)
Appendix C: Handheld recorder support for speech
Appendix D: Resources for More Information
Appendix E: References
--------------------------------------------------------------------------------
Market Overview
Why Speech?
The rapid pace of business today requires employees and customers to have fast, constant access to information. Technology has provided a mechanism for efficient communication and information storage - but it has also introduced new levels of complexity into the business environment. Long learning curves impact productivity, and complex interfaces make software difficult to use. In a highly competitive marketplace, companies need to simultaneously cut operational costs and increase customer loyalty by improving the quality of service.
Speech-enabled interfaces to computers can help solve important business problems:
Cost control. Speech-enabled applications can help reduce the training costs of rapidly changing software products by providing a more intuitive user interface, allowing users to substitute complex drop-down menus commands with simple spoken commands. Customer service departments can also replace thousands of operators to provide customers with automated access to information and services.
Productivity enhancement. Speech provides many categories of users with increased mobility. Accountants can dictate and enter data without having to have their hands on a keyboard. Mobile users can dictate notes into a handheld recorder, keeping their hands free for driving; later, the notes are transferred to the desktop computer and automatically converted to text. Speech can also reduce the risk of repetitive stress injuries, allowing employees to spend more time at work.
Improved customer service. Speech-enabled applications can eliminate the constraints of the telephone keypad and allow easier-to-use-automated systems to provide a more economical service. A speech-enabled application can replace frustrating "Press 1 for choice x" interfaces with simpler, or "Say the name of the person or department you'd like to reach" responses.
Taking advantage of business opportunities on the web and internationally. The Internet and electronic business are spurring more companies to become global players. Speech-enabled applications can provide language translation services, guide-users through on-line help menus, and simplify data entry for situations that need numerous forms to be filled out. Speech can also be used as an alternative to complex Asian keyboards or phonetic spelling.
There is an opportunity today for leading edge software solution providers to use speech technologies creatively to build a competitive advantage for their solutions.
Why Now?
Speech technology has been an ongoing research topic since the 1950s, but it's only now that a convergence of technology breakthroughs is making speech ready for broader use. These include:
A steady increase in desktop computing power, most recently the Pentium® II processor
Development of the overall PC platform such as USB and faster RAM technologies
Improvements in speech algorithms
Advances in signal processing
As a result of breakthroughs in these and other technologies, the marketplace has seen rapid improvements in the quality and affordability of speech applications:
Just a few years ago, these applications required specialized hardware and were priced in the thousands of dollars. They had a highly limited vocabulary or required a significant "enrollment" period to train the product to recognize the user's speech patterns.
Now, commercial speech applications have broken the $50 price barrier, offer an extremely high accuracy and a vocabulary of over 100,000 words, and require no special-purpose, dedicated hardware.
Further improvements in the next few years in core technologies and natural language processing will further increase the cost effectiveness of speech applications as well as their ability to accurately process natural language.
Momentum Is Building Now
Because of these trends, many vendors are concluding that the time is right to investigate building speech into their user interface. The market for speech-enabled applications is expected to grow rapidly and early developments are happening:
Voice Information Associates (Lexington, MA) reports that the market was $245M in 1997, rising to $335M in 1998 and $810M in 2001.
More businesses are using automated attendants for mission-critical customer services: Charles Schwab has deployed VoiceBroker, which allows brokers to obtain quotes and make trades; United Airlines and American Airlines are deploying automated attendants for airline reservations.
Speech applications are on the market or in development in areas as diverse as travel, financial services, telephony, law, education, medicine, government, manufacturing and small business.
Mainstream applications including Lotus Smartsuite* and Corel Wordperfect* are using speech technologies in their most recent products to improve ease of use.
Industry giants including IBM*, Intel, and Microsoft* are devoting research dollars to speech technologies:
IBM has over 200 people dedicated to research, product development, and marketing of speech technology.
Microsoft announced plans to incorporate speech interfaces into future versions of the Windows NT* operating system as well as its application software. They also invested $45 million in L&H, a company with over 1,000 people dedicated to development of speech technology.
Intel, Microsoft*, leading speech technology providers including IBM*, Dragon*, and L&H*, as well as leading audio hardware vendors have worked with the industry to develop specifications such as the Audio Codec '97 and PC' 99 System Design Guide that optimize PC platforms for speech technology.
Intel is developing a new processor, the Intel® Pentium® III processor, which will enhance the performance of speech recognition algorithms, resulting in reduced error rates and a shortened response time. The Pentium® III processor is scheduled to ship in 1999.
Speech technology is still evolving, with improvements occurring in areas such as accuracy, vocabulary size, recognition speed and natural language processing. Even so, speech today is a viable technology that offers an exciting way for developers to add value to their software products and expand into new arenas.
--------------------------------------------------------------------------------
Intel Architecture: The Platform for Speech
The PC is the platform of choice for a variety of speech applications, and the steadily growing power of the desktop is a major enabler of today's speech applications.
Speech processing is both data and computationally intensive, so a more powerful PC can improve the processing of the speech engine, enable a larger vocabulary and so forth. Intel's high-end Pentium® II processor based systems provide the performance, memory and cache capability to ensure your speech-enabled application provides the level of performance customers expect.
In addition, because of the strong market position of the Intel architecture, designing your application for the Pentium® II processor family ensures not only a large potential audience for your products but a wealth of available speech engines, development tools and speech-related peripherals. The installed base of PCs capable of handling speech recognition processing is increasing rapidly; Pentium® II processor shipments are expected to rise to 120 million in 2000, up from 80 million units today.
Pentium® II processor
The Pentium® II processor brings several advantages to speech technologies. With the introduction of the P6 microarchitecture and MMX™ Technology, speech vendors can now optimize their speech engines to take advantage of the more powerful platform. Clock speeds of 400 MHz and higher also enhance the efficiency of the speech engines in response time, which in turn can affect error rate.
Speech engines are also very sensitive to memory and memory bandwidth. While an increase in system memory (RAM) is greatly beneficial, even more benefit will come from the use of larger cache memory of Pentium® II and Pentium® III (see below) processors. The ability to more efficiently utilize memory will increase the speed and performance of the speech engine. Adding system memory (RAM) can be helpful to a certain extent, however at some point the cost outweighs the return. The addition of the faster front side bus to the 440BX chipset also improves the access of data from system memory to the processor and thereby reducing some data access overhead.
Pentium® III Processor
Intel is scheduled to introduce a new processor, the Pentium® III processor, in the first half of 1999. This new processor will include technology that extends the power of the Pentium® II processor. Speech engines will benefit from this new processor with several improvements including response time and accuracy. Additional information will be made available later this fall.
PC99
The PC 99 Design Guide has been developed after extensive consultation with speech vendors. Pentium® II and Pentium® III processor based systems that are PC 99 compliant will provide significant added value for speech applications. Peripherals such as microphones will be more efficient and robust. Higher quality microphones will provide less distortion and make it possible to deliver cleaner acoustic samples (i.e., with less noise) to the speech engine.
--------------------------------------------------------------------------------
Speech Technology Basics
Technology Concepts
Speech Recognition (Speech to Text)
Speech recognition or speech to text captures human speech and translates it to a written format. In command and control applications, the captured speech commands can be used to trigger an action such as launching an application or dialing a phone. In dictation applications, the captured speech data can be transcribed and stored as a text file, edited by the user, and used like any other file.
Speech to text includes continuous-speech recognition, in which the user can talk in a natural fashion and doesn't need to pause between words, and discrete-speech, which requires the user to pause after each word or phrase.
Command and control applications may be structured (the user speaks from a limited list of commands, e.g., "Make table", "3 rows", "2 columns"), or may use natural language ("OK, I want a 2x3 table at the top of page 4").
Applications may be speaker-dependent, requiring an initial enrollment period of "training" the product to accommodate variations in the speaker's pronunciation, or speaker-independent, requiring no such training period.
An application's vocabulary may be limited to a few terms needed for answering specific questions (yes, sure, yep, OK, no, nope) in a voice response system; a few thousand to correspond to a structured application such as an air traffic control system; or extensive enough to recognize 85-95% of native language speech regardless of the speaker's accent or idiosyncrasies.
Most dictation applications today combine continuous speech recognition with command and control to make correction and editing as natural as possible.
Note that the terms "speech recognition" and "voice recognition" are sometimes used interchangeably. However, voice recognition is primarily the task of determining the identity of the speaker, as in a security application.
Speech Synthesis (Text to Speech)
The inverse of speech recognition is speech synthesis or text to speech, in which the computer converts ASCII text to computer-generated speech. The words may be formed by piecing together recorded sounds (concatenative speech), by using models, or by using pre-recorded words. The latter technique currently has a fairly limited vocabulary but the most realistic sound. Clearly, the larger the vocabulary and the closer the application is to using natural language, the greater the sophistication of the software and the more processing power and memory are needed.
Speech-enabled applications may run on desktop PCs, or may involve telephony (items such as digital assistants, voice dialers or interactive voice response (IVR) systems) or handheld devices such as portable digital voice recorders.
How Speech Recognition Works
The diagram below provides an overview of how speech recognition algorithms work.
Speech in. Users can talk directly into the PC microphone for (almost) real-time processing. They can also record speech on a handheld recordable device or PDA to be processed later on the PC, or use a telephone over a network to a server that runs the speech processing software.
Prefiltering /echo cancellation. Speech consists of sound waves. Once they've been captured, these audio signals are converted from analog to digital and filtered to eliminate background noise from the signal. Then they're compressed, using techniques similar to those employed in video compression. This reduces the bandwidth and storage needed to process the data and transport it through the computer.
Feature extraction. To further reduce the bandwidth requirement, the resulting audio clip is divided into shorter clips or samples; for instance, a 100ms acoustic signal is broken into 5ms samples. These are mathematically processed to produce a series of vectors that are compared to a pre-defined database of sounds to determine the probability that the user uttered a particular sound.
HMM. The hidden Markov model (HMM) is the most frequently used method of speech pattern recognition today. In contrast to earlier template matching approaches, HMM uses statistical modeling and libraries of word and grammar rules to select the highest probability outcome from a sequence of samples. With the increase in capabilities of the Pentium® II processor, speech vendors can optimize their HMM speech engines to improve speech recognition. With the extra headroom, larger vocabulary sets and drastic improvements in response time can be realized.
Word models and grammar. The HMM uses a vocabulary database (around 70,000 words in today's most advanced applications) and grammar. This vocabulary and language model data depends on the language being spoken (English, French, Mandarin, etc.) and the application (is the user dictating legal reports or making travel arrangements?). These help the HMM identify words and choose the most likely speech pattern (did Jimi Hendrix say "kiss the sky" or "kiss this guy"?).
Text or command out. The result of a speech recognition application is a file of text (dictation) or an executed command (command and control).
--------------------------------------------------------------------------------
Speech-Enabled Applications
Speech applications are possible in a very wide range of applications, including industrial, embedded devices and consumer devices. Our focus here is on business applications suitable for powerful personal computers. In this market space, typical applications include desktop command and control, dictation, handheld speech recording, and telephony.
Desktop Command and Control, Dictation
Overview
On the desktop, speech offers an intuitive, efficient interface:
For command and control applications, speech can replace complex menu commands with simple spoken words.
For dictation applications, speech allows faster data entry for those unfamiliar or uncomfortable with keyboards.
Speech technologies from companies such as IBM*, Dragon*, and L&H* are being integrated into widely used desktop products such as Microsoft Word*, Lotus WordPro* and Worldbook*, and IMSI's TurboCAD* software. Speech also allows people to interface with their PC even when their eyes or hands are busy; for example, they can dictate expense reports into an application like Lotus* 1-2-3 instead of trying to read and type simultaneously.
Design Considerations
When the user sits at the PC, dictation and desktop command and control have to provide an end-user benefit beyond what the keyboard and mouse already offer. Voice is a natural way for people to interact with a computer, and when integrated well with the application (and the mouse and keyboard) provides a more efficient user interface. Therefore, it's important to ensure that the end-user works faster, more efficiently and more naturally with the application when it is speech enabled than when it is not. Speech extends and complements the current modes of interaction with computers.
To achieve this, the following speech recognition features are important (see Appendix A, Glossary of Application Criteria, for definitions):
For dictation:
Accuracy. Speech recognition engines in recent years claim to achieve accuracy of 95% or more. Dictation provides productivity gains only when data entry is faster than with a keyboard. The quality of the speech engine and appropriate hardware such as high quality microphones and the latest Pentium® II processor based PCs (see Hardware Requirements, below) will all improve accuracy. The new seamless and powerful editing/correction tools (see below) will also reduce the dependence on high accuracy.
Response time. The user can get distracted when words do not appear on the screen in real time. Response time also affects accuracy (see "Response Time" in Appendix A). A speech engine that has been optimized for the Pentium® II processor will process the acoustic samples faster and with more word models, thereby increasing accuracy and response time.
Speaker independence. Most speech recognition engines for dictation require up to 30 or 60 minutes of training time to learn the end-user's voice. For speaker independence, the engine must distinguish between differences in tone, speed, and accents. Efforts are underway to incorporate speaker independence. For example, by applying adaptive learning or training, the engine will gradually learn the speaker's voice over time without the initial enrollment.
International language support. Required if the application will be localized for various geographical areas. As shown in Appendix B, many leading speech vendors offer speech recognition technology in multiple languages.
Support for multiple topics. Some engines allow multiple "topic" vocabularies to be used, in addition to a base vocabulary. For example, a finance topic might be used to augment a general English base vocabulary. This can improve accuracy.
Support for delegated correction. This allows a dictated draft document to be handed off to an assistant for final correction. This document will generally contain the dictated text as well as the corresponding audio and engine-specific data required for correction and updating of the user models.
Naturalness. Speech recognition engines are allowing more natural dictation of numbers, punctuation, and spelling. For example, it is more intuitive for users to say "fifty-two dollars and thirty-three cents" than to say "dollar sign, fifty-two, point, thirty-three" ; it is also easier to say "a-b-c" instead of "Alpha, Bravo, Charlie"
For command and control:
Error correction capability. A well-designed error correction interface can compensate for less than 100% accurate speech recognition engines.
Integration. Efficiency improves when the end-user doesn't have to walk through drop-down menu commands ("File, Open, Insert"). The user interface should be designed with speech in mind ("Insert a 2x3 table after this paragraph").
Naturalness. When end-users can issue commands naturally, they are more likely to achieve productivity gains; the learning curve is reduced, and issuing commands becomes faster than using a mouse or keyboard. The purpose is to move away from discrete vocabularies and to converse in a more relax manner. The application designer should consider the range of options for how the end-user will issue a command. For example, command options for opening a document titled temp.doc could include "Open Word with last edited document" or "Open temp.doc".
Hardware Requirements
A variety of speech software vendors recommend the following platform hardware as the minimum configuration for command and control and dictation. These recommendations may change as new revisions of software come to the market.
Processor. Speech vendors recommend a minimum 166 MHz Pentium® processor with MMX™ Technology. However, to ensure that accuracy and response time meet end-user expectations, systems based on Intel's latest Pentium® II or Pentium® III processors are recommended.
Memory. Continuous speech recognition requires a minimum of 32MB of RAM and 130MB of available hard drive space for the speech engine and phoneme (sound) and vocabulary databases. The high-end Intel processors also improve performance since speech engines require frequent access to memory for pattern matching; the Pentium® II processor has a large cache and provides faster access to memory.
Audio. High quality audio input is critical to ensure that the speech engine operates on the cleanest possible input signal. Current engines require a 16-bit sound card from Creative Labs, Sound Blaster16 or other 100% SoundBlaster16 compatible cards. Pentium® III processor based systems that are PC '99 compliant will meet minimum requirements for speech recognition.
Microphone. A high quality, noise canceling microphone is one of the most important requirements for accurate speech recognition. Microphones from Shure Brothers, VXI, Andrea Electronics, Telex, Knowles Electronics and others comply with speech standards. Generally, speech recognition applications are bundled with a high quality microphone. With the PC '99 specification, microphone quality will continue to improve.
Software Requirements
Operating system. Windows* 98, Windows NT*, IBM OS/2 Warp
Speech recognition, command and control engines. To develop and run a speech-based application, the developer will require a SDK as well as the speech engine runtime. The SDK allows the developer to make API calls to the engine for recognition processing. Without the engine, these API hooks will be non-functional. The runtimes can also be licensed for distribution with the application. IBM, Dragon, and Lernout & Hauspie (L&H) are the market leaders; all three offer SDKs and engines for continuous speech recognition and command & control. See Appendix B for a summary of SDK features and contact information.
Text to speech engine. This technology is optional and could be used for proofreading or for reading responses from animated 3D characters (see below). While text to speech is not covered in detail in this whitepaper, a list of vendors offering text to speech technology is included in Appendix D. IBM, Dragon, Microsoft, and L&H also include text to speech in their tool kits.
3D animated characters. Several vendors offer tools to integrate speech-enabled 3D animated characters into applications. These characters could be used to guide users through an application or web solution using text to speech technology, and could allow basic command and control responses. This technology is not covered in detail in the whitepaper, however a list of vendors offering animated characters is included in Appendix D. IBM includes this technology in their speech SDK.
Telephony
Overview
In the telephony market space, speech technology offers mobile users continuous access to database records, voice mail, email and information through the telephone by eliminating the constraints of the touch-tone interface. Products like Wildfire's Personal Assistant and Registry Magic's automated attendant allow remote access to voice messaging systems, and call management using spoken commands.
Speech also provides customers with 24-hour access to help agents and reduced hold times, while saving the service provider significantly in labor costs. Nuance's Voicebroker, developed for Charles Schwab, allows brokers to perform stock transactions through an artificial agent. AlTech's Travel Reservations System allows employees of United Airlines to check flight times and make reservations. Companies deploying such systems report major cost savings. An automated voice attendant reportedly is saving AT&T Network Systems $300M annually in labor costs (WebBuilder, July 1998). And according to Nuance Communications, automated phone transfer has allowed Sears to reassign almost 3,000 human operators. Automated attendants can be particularly beneficial to small businesses, as it allows these companies to provide a higher level of service with fewer employees.
Design Considerations
In telephony applications, the customer will expect to get the same level of service that they could with a human operator. The system has to be robust enough to accept a wide range of responses, and has to provide an intuitive menu structure. Service providers can eliminate the sometimes-frustrating "Press 1 for x, press 2 for y" interface by allowing natural language responses, e.g., "Show me the flights from Chicago to Boston."
The following speech recognition features are important for telephony applications (see Appendix A, Glossary of Application Criteria, for definitions):
User interface/usability design. Since an automated attendant is replacing a human operator, the (telephony) user interface design is the most critical factor. It must account for variations in responses, accents, slang, pauses, etc. and should verify that it recognized the command correctly without slowing the transaction time.
Speaker independence. Because multiple users can access the system and may not use the system frequently, most users will demand speaker independence.
Noise robustness/noise cancellation. Telephone connections and varying environmental conditions (i.e. cellular phone calls, high background noise, etc) result in a noisy input signal - the computer has to filter this and recognize enough to understand the commands.
Response time. Near immediate response is expected.
Hardware and Software Requirements
Engines and tools to build speech-driven telephony solutions are rapidly being developed by companies such as Nuance, ALTech, L&H, IBM, and Voice Control Systems (VCS). For example, ALTech offers SpeechWorks*, which allows companies to build speech-enabled telephony applications based on IBM's Viavoice Telephony RunTime engine. IBM is also working with VCS to deliver technologies for development of interactive voice response (IVR) and messaging/voice mail applications.
In addition to tools and engines, all of these companies offer training and application development support.
Contact Nuance Communications, ALTech, Voice Control Systems, IBM Speech Systems or L&H for more information on hardware and software requirements for telephony based applications.
Handheld
Overview
Handheld devices such as digital recorders with speech technology enables people to be productive when their eyes and hands are busy or when they are away from their PC. For example, an Olympus digital recorder can be used to record speech on a flash memory card; the voice files can later be transferred to a PC using IBM Via Voice for transcription. Other handheld device manufacturers, including Sony, Dictaphone, Norcom, and Voice-it, are also supporting high quality recording for speech recognition.
Design Considerations
Unlike telephony-based applications where speech complements the keyboard or mouse, speech applications using a handheld device provide the end-user with a new capability.
The following features are important to consider in designing applications and solutions that utilize handheld recorders (see Appendix A, Glossary of Application Criteria, for definitions):
Selecting the speech engine:
Noise cancellation. Handhelds may be used in a range of noisy environments (in the car, airport, etc.). The speech engine should be robust enough to perform recognition on a noisy voice file.
Speaker independence. Similarly, the end-user should not have to re-train the engine for each environment they will use the recorder in.
International language support. Required if the application will be localized for each area.
Designing the application:
Error correction capability. Since the user will record once and transcribe later, the speech file should be linked to the text to allow end-users to play back what they said for convenient error correction.
Multi-tasking. The transcription should occur in the background so that the end-user is free to run other applications while the transcription occurs. Intel® Pentium® II and Pentium® III processors enable this to occur without slowing down other applications.
Hardware Requirements
For the PC:
See "Hardware Requirements" in the section on "Desktop Command and Control, Dictation."
The PC should be pre-configured to receive the audio input from the handheld, whether it is via USB, audio-in or another device. The application should minimize or eliminate special hardware requirements and should be easy to install.
For the handheld unit:
Currently, Olympus*, Voice-it*, Dictaphone*, Norcom*, and Sony* are marketing handhelds with voice recognition software. See Appendix C for more information on how the different manufacturers support the requirements listed below, as well as contact information.
Sampling Rate. Speech recognition requires a higher sampling rate than that found on most digital recorders. Current engines require 11.02 kHz for reasonable accuracy, while IBM Via Voice '98 supports 8kHz, 11kHz, and 22kHz.
Compression. The amount of compression applied to the speech signal can significantly affect accuracy. Compression is typically measured by the number of minutes stored per megabyte. Higher compression factors will allow longer recordings for a given memory size, but usually with an increase in recognition errors. Speech engines can be optimized, however, to compensate for some of the effects of compression.
Time and date. Stamps on voice files, search by time or date.
Folders. For organizing categories of thoughts and a file naming scheme, high speed playback or scanning function within a folder.
Recording/editing functions. Insertion within a file, appending to the end of a file, voice over a section of a file, deletion and compaction of a section within a file, restoring erased files, write protection.
Connection to PC. This should be as easy to use and set up as possible, with minimal additional hardware requirements. Connection via Universal Serial Bus (USB) is preferred; other options include Parallel Port, Serial Port, removable PCMCIA cards, or audio in through a sound card.
Voice storage. The handheld unit must be able to store an adequate amount of digital audio. This will vary from vendor to vendor.
Other features to consider. LCD Screen, backlighting optional.
Software requirements
See "Software Requirements" in the section on "Desktop Command and Control, Dictation."
PC based playback software
Conclusion: The Time is Now for Building Speech into Your Application
Speech technology is ready for use in applications and solutions today, thanks to the combination of:
rapid advances in speech technology
a powerful new generation of Pentium® II processor based desktops
the availability of engines and tools for speech application development
As with the Internet, ISVs who adopt speech technology early can gain a competitive advantage over those who wait. By developing speech-enabled applications and interfaces for high-end Pentium® II processor based desktops, ISVs can add value to their business applications, enabling business users to reduce costs while improving productivity and customer service.
Intel and the leading speech vendors are working to ensure that speech technology and the computing platform continue to evolve, allowing for better recognition, faster performance, enhanced usability, and improved natural language processing.
Last update: 11 September 98
IBM has developed a hardware jacket that fits Palm devices by using a serial port. IBM showed the prototype jacket that works in tandem with its own line of WorkPad handheld computers. The jacket itself has a 133-MHz processor and extra flash memory for speech processing. The sleeve also has a microphone and a speaker and uses IBM's ViaVoice speech recognition technology.
Using its translation application, you speak an English phrase into the device's microphone and a synthesized voice reads it back in your choice of five languages. Using voice commands, you can also navigate any WorkPad application and have text read back to you.
The prototype stores 30 minutes of audio files in 4MB of flash memory. Then when you sync the handheld with your desktop PC, IBM's ViaVoice engine on your desktop automatically transcribes the audio clip and uploads the transcript to the handheld. However, IBM representatives say a beefier model could be built that would handle speech-to-text dictation on the device itself.
"It really depends on what someone wants to build and how much a consumer is willing to spend on hardware," say Michael Buss, IBM Voice Evangelist.
IBM says it isn't planning to sell the voice-recognition sleeve itself, but is working with several unnamed manufacturers to build the jackets and possibly cobrand the hardware.
IBM voice-enables consumer appliances
13 November 2000
IBM announced recently that it is teaming with Canon to drive the implementation of certain voice-enabled consumer devices, such as kitchen appliances, toys and game consoles.
PHILIPPINES (Manila Bulletin) - The introduction of IBM Embedded ViaVoice, Mobile Device Edition allows consumers to simply use their voices to speak to a variety of workplace and home devices. The new IBM Embedded ViaVoice, Mobile Device Edition joins the existing offerings of embedded ViaVoice products. These include:
IBM Embedded ViaVoice, Multiplatform Edition which enables developers to create voice-enabled mobile solutions.
IBM ViaVoice SDK for Linux which enables Linux developers to incorporate voice recognition technology into their next generation of applications.
"The IBM Embedded ViaVoice, Mobile Device Edition is ideal for a hands-free, eyes-free environment, where consumers don't need to rely on their hands or eyes to interact with an electronic device they just need their voice," said W. S. Osborne, general manager, IBM Voice Systems.
Customers will be able to interact with the smallest of devices and consumer appliances. Instead of pushing a button to begin brewing a cup of coffee, a customer will just need to direct it through words, opening up a world of possibilities for maximizing voice interaction between humans and machines.
"Creating voice interfaces for these small devices marks the success of voice recognition technology," said Dr. Kazuya Matsumoto, chairman of Canon Research Centre Europe Ltd. "Canon looks forward to working with IBM to bring a voice interface to a variety of devices and appliances."
Requiring only 510 MIPS (Million Instructions Per Second) for voice recognition processing, IBM Embedded ViaVoice, Mobile Device Edition provides an active vocabulary of up to 50 words for consumers who personalize their own devices, such as cell phones and PDAs. It also offers the option of pre-loading 10-20 active words for user independent vocabularies, ideal for electronic devices that need only simplified commands, such as kitchen appliances and toys.
IBM Embedded ViaVoice, Mobile Device Edition's vocabulary can expand to the amount of flash memory in the device. Applications can dynamically switch between active vocabularies or share a single vocabulary.
Part of the IBM Voice Systems product line, IBM Embedded ViaVoice, Mobile Device Edition offers a low cost solution for incorporating voice capability into a number of mobile devices. Customers will find entry level command and control features, high accuracy rates, vocabulary development and compatibility, telephone quality microphones, as well as worldwide support from IBM and Canon.
IBM Embedded ViaVoice, Mobile Device Edition vocabularies may be customized by the user or manufacturer to operate in any language and will be available by the end of this year.
February 20, 2001 mitsubishi voice PDA
Microsoft wins more souls for Stinger smart phones
By Joris Evers
Microsoft has signed agreements with Japan's Mitsubishi Electric and the U.K.'s Sendo to manufacture mobile phone handsets using its Stinger operating system, the company has announced.
Sendo will unveil a prototype of its Z100 smart phone today at the GSM World Congress in Cannes. The device has a colour display and uses Microsoft's software platform for such devices, code-named Stinger. Mitsubishi's European division Mitsubishi Electric Telecom Europe will develop a smart phone using Stinger.
Microsoft last year announced an agreement with South Korea's Samsung Electronics to produce Stinger-enabled cellphones. Samsung will demonstrate a phone running Stringer at the mobile phone conference, Microsoft said. The Korean company will also launch a phone that incorporates Microsoft's Mobile Explorer 3.0.
"Our engineers will work together with Microsoft to develop a handset that will work on Stinger," said Nicolas Kenedi, spokesman for Mitsubishi, adding that a prototype of the phone will be displayed at the CeBIT tradeshow in Hanover, Germany in March.
The Mitsubishi device, which will combine the features of a PDA (personal digital assistant) and a mobile telephone, has yet to be given a name. It is scheduled to be released late 2001 for GSM and GPRS networks and will be sold under Mitsubishi's Trium brand.
With Monday's announcement, Mitsubishi and Microsoft are expanding their existing relationship. Mitsubishi's Mondo, a voice-enabled PDA based on Microsoft's Pocket PC platform, is scheduled to be on store shelves in March, Kenedi said.
According to Microsoft, Sendo's Z100 weighs a mere 99g, which makes the prototype smaller and lighter than GPRS phones currently on the market. It features a 65,000-colour TFT display with a resolution of 208x240 pixels. The Z100 plays music files in the MP3 and Microsoft's WMA (Windows Media Audio) formats. The device can be interconnected using USB, infrared or serial connections, and has a slot for expansion cards.
Sendo, headquartered in Birmingham, in the UK, was founded in August 1999. It will start selling phones in Europe and Asia during the course of this year, according to a Microsoft release. Sendo also provides hardware to mobile phone operators, who sell the phones under their own brand.
Microsoft said the phones using Stinger will go into trials with operators in the next months. Among the testing operators are Germany's T-Mobil International, Spain's Telefónica and the UK's Vodafone Group. The software giant expects to see the first handsets on the market this year.
Microsoft is locked in a battle with Symbian to gain first-mover advantage in this new market for advanced cellphones running embedded operating systems. The news comes one day before the GSM World Congress, a major cellular telephone conference, is scheduled to open.
Symbian, with its EPOC operating system, has the support of a host of big-name players such as L.M. Ericsson Telephone, Kenwood, Matsushita Electric Industrial, Motorola, Nokia, Philips Consumer Electronics, Psion, Sanyo Electric and Sony.
Intel and OradNet Change the Face of Interactive Sports
New Technology Immerses Viewers Into Soccer Games
SANTA CLARA, Calif.--(BUSINESS WIRE)--April 19, 2001--Intel Corporation and OradNet Inc. today announced the release of OradNet's TOPlay(TM) Soccer, the first immersive sports application to utilize the new Intel® Interactive Sports software and the OradNet sport tracking platform. TOPlay(TM) Soccer creates an interactive 3-D graphics representation of actual soccer matches that can be viewed by fans online. Fans have control of how they view a soccer match, from zooming in and out of plays to watching the game through the eyes of their favorite player. They can also view replays in slow motion from any angle.
TOPlay(TM) Soccer represents a new category of immersive sports media. Immersive sports not only lets fans enjoy the game in a new way, but also creates valuable new marketing opportunities for sports content providers, including new ``virtual'' advertising and e-commerce possibilities that do not compete with traditional broadcast revenue models.
TOPlay(TM) Soccer, for example, allows any element of a scene on the computer screen to be `clickable,' enabling sports Web sites to link to e-commerce opportunities, such as the ability to click on a player and buy that player's jersey online. Sports Web sites can also use TOPlay(TM) Soccer as an intuitive search engine for the site's more traditional content, such as player statistics, articles, photos, etc.
``Immersive sports technology will increase the financial value of sports media assets,'' said Zack Keinan, President and CEO, OradNet Inc. ``This innovative content represents the changing face of the business of sports, as well as the enhancement of the fan's relationship with their favorite team.''
How the Technology Works
To create the immersive 3-D images for TOPlay(TM) Soccer, the soccer match is analyzed using proprietary tracking technology from OradNet, Inc. The TOPlay(TM) production suite then uses the Intel Interactive Sports software to `translate' the tracking data of the game into an XML database that contains player and ball movements as well as other data, such as goals. The XML database is then used by the TOPlay(TM) application to create Web-based 3-D animations of the game as well as game statistics.
To create the 3-D graphics representation of the game, TOPlay(TM) Soccer uses Intel® Internet 3-D Graphics technology, which Intel and Macromedia recently announced as a part of the new Macromedia Director(a) and Shockwave(a) Player products. This technology, designed to produce and render high quality 3-D graphics on the Web even over a 56K modem, can also intelligently tune the graphics quality to the performance of a user's computer -- increasing the graphics resolution and detail for high-performance computers and automatically decreasing resolution for slower systems. The technology is also optimized for the Pentium® 4 processor, making such systems the best way to experience interactive sports on the Web.
``Intel Interactive Sports software is another example of how technologies from Intel are changing the face of the Internet and making it more useable and fun,'' said Steve Spina, director of marketing of Intel Architecture Labs. ``We believe that immersive sports media will enhance the way fans interact with their favorite teams and we see TOPlay(TM) Soccer as a great example of the experiences that can be delivered using our technology.''
Availability
TOPlay(TM) Soccer will be available for download on April 30 at www.oradnet.com.
About OradNet Inc.
OradNet Inc. is a wholly owned subsidiary of Orad Hi-Tec Systems Ltd. (Neuer Markt:OHT.f). OradNet is dedicated to the development, marketing and commercialisation of immersive sports Webcasting technologies which enable sports fans to watch and play live or delayed sports events on their computer. OradNet's application will provide the fast-growing Internet market with the high impact tools for increasing revenue streams and for delivering an exciting immersive sports experience. Leading sports sites such as Sports.com, Jumpy.it, Canal+ in Spain, and more, are already using OradNet's current Webcasting application -- VirtuaLive(TM), for covering their local leagues.
Founded in 1993, Orad is a world leader in virtual sets, virtual advertising and virtual sports broadcasting. Orad's mission is to realize the potential of state-of-the-art tracking technologies, real-time image processing and 3D graphics for the worldwide TV broadcast, sports sponsorship and Internet markets. Orad's principal investors include ISL, the world's largest sports sponsorship organization; and the Ormat Group, a publicly-traded Israeli alternative energy company.
About Intel
Intel, the world's largest chip maker, is also a leading manufacturer of computer, networking and communications products. Additional information about Intel is available at www.intel.com/pressroom.
(a) All trademarks and brands remain the property of their respective owners.
Japan's NEC mulls chip partner for European phones
TOKYO, April 19 (Reuters) - Japanese electronics giant NEC Corp will partner with a U.S. based chip maker to build Web-connected mobile phones in Europe within the year, and Agere Systems Inc. (NYSE:AGRa - news) is the top candidate, industry sources said.
Besides Agere, Texas Instruments Inc. (NYSE:TXN - news), or Intel Corp. (NasdaqNM:INTC - news) have been among the potential partners NEC has been considering to partner with in Europe, sources said.
An industry source confirmed that Agere will partner with NEC by selling crucial chips and providing the technological know-how for NEC to break into the European market.
Ben Nakamura, senior vice president for mobile terminals said NEC will eventually replace entrenched incumbents Nokia Oyj and Motorola Inc. (NYSE:MOT - news) in their positions as the first and second largest phone handset makers, worldwide.
``We will have GPRS phones in Europe by the end of this year,'' Nakamura told Reuters in an interview on Thursday. GPRS technology sends data back and forth in short bursts to allow Web browsing without the high cost of a constant connection.
However, Nakamura declined to comment on any potential partnership connected with the company's European phone move.
A spokesman for Allentown, Pa.-based Agere was not available to comment. Santa Clara, Calif.-based Intel Corp. was not immediately available to comment.
Tom Engibous, Texas Instruments' chairman and chief executive, declined to comment on any partnership with NEC. ``I think you need to talk to NEC about that,'' he told Reuters after the company's annual shareholder meeting on Thursday in Dallas.
The Texas Instruments executive said there was no inherent conflict between the possibility of supplying NEC with chips for European phones and its existing ties to Nokia, Motorola and Ericsson, all of which use TI chips in their own phones.
``We supply 70 percent of baseband chipsets for GSM phones ... that involves a lot of different companies and that's something we have learned to live with,'' Engibous said.
Several industry analysts said Agere made the most sense. ``Agere would be a good fit for NEC,'' said Gartner analyst Stan Bruederle of San Jose, Calif.
NTT DoCoMo Inc. has selected GPRS as the system its i-mode Internet-enabled service will use in Europe as DoCoMo moves to extend its successful Japanese service into the European region.
``We have found the way to get GSM (Global System for Mobile Communications) and GPRS technology,'' Nakamura said of the complex task of combining existing GSM digital technology with newer GPRS techniques that has plagued many handset makers.
NEC stands a good chance with DoCoMo in Europe since its i-mode cell phones are constantly the top-selling model for DoCoMo, which has managed to get more than 22 million users surfing the Web in a little more than two years.
``The way to go in is by leveraging mobile Internet technology,'' Nakamura said of the Japanese phone maker's bid to crack open the European market.
The phones would be developed in Japan and Europe and be built by electronic contract manufacturer Arima Communications, a unit of Taiwan's Arima Computer Corp. , Nakamura said.
The GPRS phones will be foldable, or 'clamshell', a design which NEC pioneered in Japan and will also feature colour screens.
Agere shares rose 86 cents, or 13.25 percent, to $7.20 in active trading on the New York Stock Exchange on Thursday.
HTC Collaborates With TI to Adopt Texas Instruments OMAP(TM) Technology In HTC's Next Generation Wireless Devices
DALLAS and TAIWAN, April 19 /PRNewswire/ -- Texas Instruments Incorporated's (NYSE: TXN - news; TI) family of high performance, low power OMAP(TM) processors will serve as the engines for High Tech Computer Corporation's (HTC) 2.5 and 3G smart phones and advanced mobile Internet appliances. The collaboration leverages HTC's design and manufacturing expertise in mobile devices, and TI's high performance, low power OMAP processors, to deliver 2.5 and 3G mobile devices with sleek form factors and wireless functionality including broadband Internet access, mobile commerce, personal digital assistant (PDA) functionality and e-mail access. This further extends TI's position as the de facto standard processing platform for the next generation of broadband-enabled wireless devices. See http://www.ti.com/sc/omap.
``Wireless consumers have spoken loud and clear on their requirements for next generation smart phones -- high performance for new wireless applications; long battery life for extended use; and flexibility to personalize these devices. TI's OMAP platform makes this a reality today,'' said Peter Chou, vice president, Wireless Mobile Division, HTC. ``We have been working closely with TI's OMAP development team through design reviews and architecture discussion. Undoubtedly, we believe that the TI family of high performance, low power OMAP processors solutions offer HTC the best capabilities for delivering leading-edge products to the market. We plan to further this collaboration well into 3G and other future wireless solutions.''
HTC (http://www.htc.com.tw), based in Taiwan and founded in 1997, specializes in designing and manufacturing world-class mobile computing and communication solutions for customers including original equipment manufacturers (OEMs) and Original Design Manufacturers (ODMs). Since its establishment, HTC has become one of the most respected manufacturers of devices based on the Microsoft Windows CE(TM) operating system, including the Compaq iPAQ(TM) Pocket PC.
``TI has worked closely with HTC for several years to deliver a complete family of solutions that take advantage of TI's OMAP platform,'' said Alain Mutricy, general manager of TI's OMAP platform. ``HTC is considered one of the most respected manufacturers of mobile Internet devices based on Microsoft Windows CE operating system. We look forward to collaborating with HTC in the rapidly growing mobile Internet appliance market, and we're excited to assist HTC in providing 2.5 and 3G wireless solutions to its customer base.''
Unveiled in May 1999, TI's programmable DSP-based OMAP architecture delivers advanced wireless Internet and multimedia functionality, without compromising battery life essential to wireless communications devices. TI's OMAP platform is not only architected to support all 2G, 2.5G and 3G wireless standards, but also delivers an open software environment that enables wireless software developers to provide new applications for wired and wireless downloading.
TI began shipping OMAP processor prototypes in 4Q2000, and is shipping samples of its OMAP1510 product today. The OMAP1510 is scheduled to be available in volume production quantities in 3Q2001.
Texas Instruments Incorporated is the world leader in digital signal processing and analog technologies, the semiconductor engines of the Internet age. The company's businesses also include sensors and controls, and educational and productivity solutions. TI is headquartered in Dallas, Texas, and has manufacturing or sales operations in more than 25 countries.
Texas Instruments is traded on the New York Stock Exchange under the symbol TXN. More information is located on the World Wide Web at: http://www.ti.com.
Trademarks:
OMAP is a trademark of Texas Instruments Incorporated.
Microsoft Windows CE is a trademark of Microsoft Corporation.
iPAQ is a trademark of Compaq Computer Corporation.
SOURCE: Texas Instruments Incorporated
Techs Offer Solution to Streaming Radio Problem
By Sue Zeidler
LOS ANGELES (Reuters) - Internet radio is a new medium with an old problem: keeping the cost of doing business down given the demands of a strong union.
Many major radio stations have retreated from the Web to avoid paying higher fees for streaming traditional radio commercials online. But with access to a growing audience at stake, the union demanding the higher fees is in danger of finding itself shut out by technology that inserts Web-specific advertising spots in lieu of the established commercials.
That could provide a boost to companies marketing Internet ad-insertion technology, including RealNetworks Inc, Hiwire Inc, Lightningcast Inc. and StreamAudio, analysts said.
Stations owned by Clear Channel Communications Inc. (NYSE:CCU - news), Emmis Communications Corp. (NasdaqNM:EMMS - news) , ABC/Disney Disney (NYSE:DIS - news) and others recently pulled the plug on Web streaming citing the American Federation of Television and Radio Artists' (AFTRA) contract.
The contract requires advertisers to pay union talent 300 percent of the normal session fee if a spot originally recorded for radio is streamed online.
Ad-insertion technology providers hope to provide these stations solutions that substitute terrestrial radio ads with Internet-only ads, which are exempt from the fees. ''Streaming media overall will account for $1.4 billion in advertising sales in 2005 and ad-insertion technologies should be a crucial ingredient in reaching those numbers,'' said Aram Sinnreich, senior analyst at Jupiter Media Metrix.
``This is definitely an opportunity for ad-insertion technology companies to demonstrate the value of their services,'' said Sinnreich.
Union Says It Embraces Technology
AFTRA representatives said the contract fee is ``nominal'' and that other issues were behind the radio stations' actions.
``We think broadcasters are using the AFTRA contract as an excuse to stop streaming,'' said Mathis Dunn, assistant national executive director of commercials and non-broadcast for AFTRA.
He said that paying union talent 300 percent of a normal session fee would amount to paying about $660 dollars for an ad that could be used on the Internet for a maximum if 21 months.
``Advertisers urged broadcasters to pull spots that were not authorized to be streamed and then the Webcasters just pulled out of streaming altogether,'' Dunn said, noting that broadcasters' exposure to online music royalty fees is a far greater threat than exposure to commercial fees.
Tensions regarding potentially hundreds of millions of dollars in royalty rates owed by online radio companies to recording companies have also been simmering for months.
The two main streaming formats are RealNetworks Inc's (NasdaqNM:RNWK - news) Real Player and Microsoft Corp's(NasdaqNM:MSFT - news) Windows Media Player. RealNetworks is offering ad-insertion technology straight from its own server and is in talks to convert several major radio stations to its service.
Nagesh Pabbisetty, vice president of RealNetworks' Real Broadcast Network division, hopes to clinch new deals soon.
``We can offer a big advantage for broadcasters by helping them sell streaming advertisements and by providing them with our service infrastructure,'' Pabbisetty said.
He declined to identify which companies RealNetworks was in talks with, but the sector is clearly jumping right now.
Last week, Clear Channel Internet said it was in the process of selecting and deploying ad-insertion technology. ``It is our intention to put the streams back up when it makes legal and financial sense,'' said Kevin Mayer, chief executive officer of Clear Channel Internet Group in a statement last week.
While still in its early stages, more and more listeners, particularly music fans, are turning to Web radio.
The streaming industry's revenues last year totaled 90 million, with 80 percent derived from audio. This year, it is forecast to hit $214 million, with 80 percent from audio, said Larry Gerbrandt, chief operating officer of Kagan World Media.
Some say the AFTRA fall-out could be a boost for stations who may now get added revenue from sale of Internet-only ads.
``Rather than just streaming with dead air, the stations are now motivated to sell to Internet-only advertisers,'' said Darren Harle, chief operating officer of StreamAudio, which hosts 730 terrestrial radio stations and has also developed Internet ad-replacement technology.
``The advantage is that we can help broadcasters generate revenues with ad targeting regardless of the stream and we can target it down from the individual listener,'' said a spokesman for HiWire, which makes ad-insertion technology.
``If someone else benefits from all this, that's the way the market works. We're not against new technologies. In fact we embrace new technologies,'' said Dunn of AFTRA.
triples- see my prior post #974 and also recall DaBoss email rec'd re NDA between edig and PP.
triples- you are presuming edig not involved with Tango; such a presumption may make an.....
REPOST: You say you want a revolution?
If you think digital audio players have already shaken things up, just wait for the next generation.
Nicholas Cravotta, Contributing Editor
Think of the development of digital audio players (DAPs) as a symphony that began a few years ago. The orchestra just now reached the end of the first movement. We've heard one long crescendo, as the DAP market developed from silence to a near-deafening roar. And we've come to know a familiar melody, sung by PC-dependent, digital gadgets that resemble yesterday's Walkmans and portable CD players.
Now, as the conductor's baton comes down to commence the second movement, what kinds of variations on the DAP theme will we hear?
Digital audio is an evolutionary innovation. The DAP is the first significant Internet appliance that isn't trying to be a mini-PC. Yet it's not an entirely new product type either. DAPs improve upon the portable CD player by allowing you to better chose the music you want to listen to, letting you carry more music with you, and giving you the DJ-like power to create your own playlists. Now, the time is approaching when DAPs take the next evolutionary step, freeing themselves from our PCs and managing content for themselves.
Current portable players compete, for the most part, on the battlefield of features—radio tuners, larger displays, equalizers, and special effects. Next-generation features will include the ability to rip (record in digital form) tracks without a PC, voice recording, and smaller form factors, such as wristwatch and neck-pendant players. DAPs are also starting to appear inside cell phones and other products, although important issues like battery life will be barriers for some time.
Ho hum
What's happened is that it isn't particularly difficult to build a DAP anymore. Companies like Cirrus Logic, Micronas, Samsung, STMicroelectronics, Texas Instruments, and Toshiba offer various chip and software products that remove most of the technical challenge.
For example, PortalPlayer has announced perhaps the most integrated digital audio chip to date, Tango. The company claims that Tango has everything a portable device needs. From an engineering perspective, Tango does come fairly complete, offering a real-time operating system, content-management software, seven digital audio decoders, several encoders, several DRM (digital rights management) tools, special effects like an equalizer and 3D sound, and support for serial or USB ports.
Based on an ARM7 microprocessor, the Tango chip runs at 140 MIPS (millions of instructions per second). To give you an idea of what 140 MIPS means, typical MP3 decoding consumes about 35 MIPS. Extras add up quickly. An equalizer takes 5 to 6 MIPS and watermarking for DRM eats up another 8 to 12 MIPS. Encoding, which allows a player to rip on its own, takes between 80 and 120 MIPS.
Significantly, more than 70 percent of PortalPlayer's 115 employees work on software. This follows a growing chip-industry trend that sees IC vendors no longer selling chips but instead hawking "platforms" or "solutions." The lesson should be obvious: Software is where vendors differentiate their products. OEMs add personalization not so much at the interface level but by adding applications such as address-book functions or voice recognition.
And product differentiation is the chief factor driving innovation. For example, when Cirrus Logic offers a chip with graphic-equalizer functions, that means every OEM using a Cirrus Logic product can offer a graphic equalizer too. Jim Long, president and CEO of Rioport, a software spin-off from Rio creator Diamond Multimedia, describes the danger. "We're in a feature war, where it's not clear anyone cares," he says. "Suddenly you end up with a product that is too hard to use. I'm not sure they add to the cause. Fancy stuff on players will take a back seat to making them easier to use. The simplicity angle here is critical."
“We’re in a feature war, where it’s not clear anyone cares. Suddenly you end up with a product that is too hard to use.”
Jim Long, Rioport
Features aren't going to go away, but in some respects, they need to disappear, to become transparent. Quality, given that decoders are fairly standard, isn't really an issue anymore. It's far more important that the device be straightforward to use. And silicon isn't driving digital audio right now—the Internet and software are. "Software is still a lot tougher than the silicon," Long says. "There's an explosion of companies writing codecs. But software isn't just about writing software: you need an end-to-end system." In this light, next-gen battles concern connectivity and ease of use.Jim Long
Cutting the umbilical
The biggest change we're going to see with the next generation of digital audio players is their progressive movement away from the PC. Current players rely on PCs to feed them with music, either through a wired connection like USB or via media such as removable flash memory cards. And users still need a PC to procure the music in the first place, by downloading it or ripping it from CDs.
Digital audio can be so much more than just a player you can fit in your pocket. With the emergence of the Auto PC, your car will become a platform to play your favorite MP3 rips. You might run an Ethernet cable out to your garage, have a wireless connection, or drop in a flash card or a CD. In this scenario, you're still tied to a PC-based player, but it now weighs half a ton and can go 100 miles an hour.
The living-room entertainment center provides another fertile environment for sowing digital audio. A digital stereo, for example, could house a disk drive, the television could act as the display interface, and a cable set-top-box could serve as the high-speed Internet pipe.
And Bluetooth—if it ever gets here—provides the means for music to flow wirelessly among all these platforms—portable, PC, living room, and car.
Is digital audio ready to be independent? Let's take a hard look at the key technological issues involved in designing next-generation DAPs.
Start with storage. The physical side of the storage question is pretty much answered. Today's flash memories offer enough storage for more than an hour of music. And Moore's law says they'll only get denser, faster, and cheaper. Removable flash comes in a variety of formats, and the only real difficulty a DAP manufacturer faces today is finding a supplier who can provide enough flash to meet the rising demand.
However, flash isn't the only medium in the universe. Startup DataPlay recently announced a coin-sized optical disc (see "Would-be king"). And for standstill devices such as digital stereos, hard-disk drives provide an attractive cost-vs-capacity ratio (a single gigabyte can hold 16 hours of music encoded at a rate of 1-Mbyte/minute).
Location, location, location
Moreover, DAP vendors are already thinking outside of the memory box. After all, an audio file doesn't have to reside locally if you have Internet access. With a large enough pipe and streaming technology, your jukebox can encompass the whole Internet. Just look at the success of Napster, the free program that lets people make audio files on their hard drives accessible to others.
While Napster and MP3.com's "Insert your CD" idea (see "My way") may have crossed legal lines, the idea of an online repository is enticing. Take, for example, a music service provider. Instead of storing a distinct copy of a music file for each user who has legal access to it, a single copy could be stored to serve everyone.
For such visions to come true, music-manager software is crucial. Whether the player manufacturer creates a proprietary manager or teams up with an existing vendor to build a customized front-end, the manager must provide a seamless interface. The user should be able to manage the content easily and transparently. A competent content manager will have to keep track of where files are stored, what format they are in, and what legal license the user owns for them. The user shouldn't be burdened with these chores.
This model faces tall hurdles. The record companies hope to bind content to a particular PC. But that presents some serious logistical problems. For example, users will want to be able to play music across a home network. And if content can only reside on a single hard drive, what happens if that drive fails or if you upgrade your computer? How do rights transfer? In the old days, when a couple split up, he took his records and she took hers. But how do you divide up digital files?
Declaration of independence
The hardest problem to solve for next-gen players is independent Internet connectivity—freeing players from the mandatory PC link. Several companies are looking for their players to go straight to the Web through an internal modem. We may see a few this year, but design cycles, and cost, will probably push such introductions into next year.
The time is approaching when digital audio players take the next evolutionary step, freeing themselves from our PCs.
In the PC space, the most cost-effective modem is a "soft" modem. Instead of paying for modem hardware, which only gets used part of the time, PC makers take advantage of the powerful PC processor to run a software-based modem. A Pentium-grade processor can run a V.90 modem without much degradation of overall performance.
Embedded devices like DAPs have no Pentium on which to piggyback. However, a powerful enough DSP (digital signal processor) could act as both a modem and a digital audio decoder. Except in streaming applications, the DSP could act as either a modem or decoder, but not both at once. Utilizing the same silicon this way keeps cost down.
Several companies, such as Hitachi, are actively targeting the embedded Internet space. Hitachi's SH3-DSP combines the functions of a regular processor and a DSP, giving it an edge in this kind of application. Such chips still carry price tags too high for DAPs, but the technology is in place for the convergence to occur.
But a player with an internal modem still faces the problem of having a slow connection to the Internet. Sure, you could dial into your favorite MP3 site from a pay phone, but you'd find yourself standing there for a very long time. For independent DAPs to survive, they'll need broadband connectivity.
Broadband technologies such as DSL and cable modems are now seeing widespread deployment. However, DSL chipsets cost too much to consider adding one to a DAP, and software-based DSL is pie-in-the-sky even for PCs. In any case, where would you find a DSL line for your player in the first place? If you do happen to come across an accessible line, there's probably a computer with a USB port there anyway.
Perhaps what makes the most sense for an independent portable player is a shared broadband connection to the Internet, through either Ethernet, USB, or a wireless link. With home gateways on their way, a portable player could plug right into a home network to access the Internet via the gateway's DSL or cable connection.
Don't get awkward
Which brings us back to what makes a DAP independent of a PC. From one perspective, adding a modem will free a player from the hold of a PC today. On the other hand, many players already have USB, which is enough to hook them into a home gateway and DSL. Technically, these DAPs are already potentially free of the PC, but only if you can actually locate a gateway to plug into.
Once the PC is out of the picture, DAPs will need access to large amounts of storage to manage digital rights and archive digital files. This could be an Internet-accessible drive or even a personal PC acting as a server.
However, small size and simplicity rank as the portable player's most appealing features; throw a screen on the player so a person can surf the Web for files and the player becomes unwieldy, expensive, and difficult to use.
DAPs will need to be able to surf themselves, with only limited human guidance. For example, when you press the player's Internet button, the DAP could hook into a dedicated site for DAPs only. But some interface will be necessary. What will users tolerate? A bulky screen and a bunch of buttons? Voice recognition?
Perhaps the biggest stumbling block for digital audio today is copyright protection and the digital-rights management (DRM) schemes that are popping up to provide it. "Protective measures," in this issue, discusses DRM schemes in detail. Here, we'll limit ourselves to how the current DRM situation impacts player evolution.
New DRM concepts are appearing with great regularity, and the industry has yet to show any sign of slowing this pace down, never mind coalescing around a single standard. Yet DRM has to work right out of the gate. People won't tolerate paying for a song and then not being able to play it because their player doesn't support that standard.
For example, a track might come with a 30-day trial period before you have to pay for it. "But what happens on the 31st day when it stops playing?" asks Michael Maia, marketing VP at PortalPlayer. Will consumers be angry and confused when they pick up their player to go out for a run or a long trip and find that songs have expired? "We're still not sure what the psychological aspects of that will be," Maia says.
So player vendors have to be flexible when supporting DRM. Imagine the backlash from angry users if the adoption of a new standard made their $300 players obsolete overnight. The SDMI (Secure Digital Music Initiative), a group trying to lead digital content protection forward, only offers defining guidelines, leaving significant room for interpretation and implementation. Therefore players must accept firmware upgrades so they can support multiple DRM formats, as well as new audio formats, as they appear.
Supporting multiple formats should have little impact on processing performance (unless the DRM employs complex keys for streaming decryption) or memory (the player can steal a little flash space for the DRM code, reducing maximum play time by a few seconds at worst).
Of course, as soon as you open a system to firmware upgrades, you open the system to spoofing. The SDMI mandates that security should keep honest listeners honest. Quite frankly, that may not be a strong enough principle. You can break into a system for two reasons—for yourself or for everyone. Cracking a single key provides access to a single track or a single player. But cracking a player's firmware and, say, inserting a new version that ignores all rights checks, has the potential to break copy protection for every similar player. Suddenly a lot of honest people may be very tempted. If such firmware cracks can be created, it's safe to assume that they will be; the digital audio culture has already demonstrated that it includes hackers with plenty of time on their hands.
This places a burden on player manufacturers to not only protect content but also to protect the players themselves from casual break-in. Players will require encryption protection; the firmware should be stored safely in encrypted form, only to be decrypted inside the player.
A flexible design is important not only to future-proof a player against evolving DRM and audio standards, but also to provide malleability in the user interface. As DAPs become independent of a PC, they must be flexible enough to support new kinds of services. For example, a DAP released today should be able to interface with the vendor's future DAP-only web site. Additionally, different users will favor different features and options. A programmable or customizable interface could open the door to new markets.
Having to support multiple formats and accommodate formats that aren't yet in existence doesn't exactly make design easier. "There's a lot of chaos in the market," says PortalPlayer's Maia. Chaos, of course, equals opportunity.
Serve me
As digital audio players evolve, services will evolve along with them. The first and most important of the Web services will be gateways to content, both to purchase and to store. Consumers need to be able to access their legal content from anywhere, not just one PC. And today's PC players need to move from simply accessing content sites to working directly with them, understanding how songs are protected and in which format they are encoded.
Digital rights need to be as portable as the players that support them. This opens the door to services, such as digital rights management, that mirror your rights on the Internet. The money isn't in giving away free players for PCs, but in selling content and services that encourage the sale of more content.
Rioport is an example of a company already looking ahead at what will be. Rioport was spun off as the developer of software that interfaces the Rio to a PC. However, Rioport is spending a lot of resources focusing behind the scenes. For example, they've spun a deal to provide the download factory for MTV, serving up the music and handling the rights and commercial transactions. This becomes the platform for other services.
One barrier to new services, however, is that consumers are already stuck in the free model. PC music management software is free, upgrades are free, and if you aren't squeamish, content is free as well. Consumers may only be willing to pay for services they can't get any other way. Though many current Web companies depend on revenue from advertisements, we'll see new revenue streams emerge. Digital audio lets providers charge for premium services, such as managing files and playlists. A music club like BMG, for example, could let you download the album of the month to try out; if you decide you like it, you'd press a button on your player, and the next time you logged in, the player would buy the album for you.
Even better, you might be able to give copies of songs to your friends for temporary use. If the friend bought the song, you'd see a referral kickback. All of this will require some real complexity under the sheets, so to speak. And as more sites and services become available, ensuring interoperability and avoiding consumer annoyance will become big challenges.
Hear no evil
The actions of the Recording Industry Association of America (RIAA), the group that has opened litigation on MP3.com and napster.com, suggest a complete lack of understanding about the reality of digital media.
The RIAA does not want to release copyrighted content until sufficient copy protection is in place. But a protected format will only protect files that were ripped (taken from a CD and put into a digital file) by a compliant company. Consumers will still want to rip their own tracks (and should have the right to do so), so any successful protected format will need to support unprotected modes. Or perhaps the RIAA thinks consumers who pay full price for a CD won't mind paying a second time to hear the music on a portable digital player.
There's an absolute absurdity in the record companies' position that they must wait for "secure digital content protection" before they embrace digital distribution: They continue to release perfect digital masters (otherwise known as CDs) of the very content they seek to protect. As long as the music industry continues to distribute open masters, clean rips will continue to drift freely on the waves of the Internet.
In addition, history tells us a different story than the one touted by the labels. VCRs did not destroy the movie industry. As noted elsewhere in this issue (see "Protective measures") copy machines didn't doom the book industry. And digital audio, many now agree, should help music companies. Jim Long, president and CEO of Rioport, says the paranoia of record labels is akin to not using your credit card number on the Web because it might get stolen, but not caring when you drop your credit-card receipt on the floor of a restaurant. "The music industry is showing the classic 'We don't understand this' and putting up a lot of smoke," Long says.
Delays in deploying DRM have already had a tremendous impact on the digital audio market. "People are pirating today because they don't have a choice," Long says. He posits that when content is available and "dirt-ball simple" to download, many people will slip back into paying for it. The labels have delayed deployment in the hopes of finding a secure format. "But they're taking so long to put everything into place," Long says. "There's a whole generation of kids who don't know any different than ripping tracks they haven't paid for."
Wearable PCs: A new type of fashion statement
FAIRFAX, Va. — In 1979, when Sony created the Walkman, its inventors realized that most people would feel like absolute imbeciles walking around in public wearing, of all things, headphones.
So they famously put two headphone jacks in the first Walkman, thinking the device would only be used by two people sitting in a room listening to music together.
Obviously, they were wrong about all this, to the extent that these days, children are born wearing headphones acquired in utero, never taking them off until they get jobs in corporate management, and perhaps not even then.
The lesson: Never underestimate how stupid we're willing to look to accommodate technology.
I'm trying to keep this in mind as I'm standing in the boardroom of wearable computer maker Xybernaut, wearing — well — a wearable computer.
I'm doing this because wearable computers are just now entering the consciousness of the general public — kind of the place where portable music stood in 1979. And I'm wondering just how stupid I'm willing to look.
The breakthrough for wearables really started last year when IBM aired a commercial showing a manic young guy on a bench in a pigeon-filled piazza. The guy watches stock market information on a wireless screen attached to his eyeglasses and startles the pigeons as he shouts trades into his speech-recognition microphone.
Everybody wanted one of those gadgets, a quest complicated mainly by the fact that they didn't yet exist.
Xybernaut does sell a wearable, called the MA4. In May, Xybernaut, IBM and Texas Instruments will unveil a jointly developed wearable computer for commercial use. Perhaps by Christmas, Xybernaut and Hitachi might introduce a consumer wearable computer.
Xybernaut is pretty much considered the leader in this space, in part because it has locked up a big chunk of the patents.
But current models aren't much like the one in the IBM commercial. It's hard to imagine anyone walking around with what I'm wearing and not causing parents of small children to dial 911.
Around my waist is a fat belt that looks like Batman's utility belt. The computer — about the size of 2 pounds of ground chuck — nestles in one compartment on my right. In a back compartment is a battery, about the same size as the computer. In a left pouch is a handheld screen.
The belt is one thing. What's on my head is quite another. It's a headset built on a big cushiony set of earphones. An arm sticks out from it, hovering in front of my right eye. In the arm are a microphone and a screen about the width of a quarter.
Looking at the screen is like taking one of those eye tests where the optometrist puts the big lens wheel on your nose and asks you to say when things look blurry. Except when I peer into the Xybernaut screen, I see — Windows!
So I'm here speaking with the Xybernaut folks and feeling like I'm some combination of an airport tarmac worker and RoboCop.
As I alternately talk to the Xybernauts and use a tiny mouse to click through things on my tiny screen, I feel as rude as someone who talks to you in a bar while never taking her eyes off the TV in the corner. If I were seen in my local grocery store in this get-up, the neighbors would avoid me for months.
But Xybernaut isn't deterred by the apparent ridiculousness of wearing a computer.
"It's taken 8 or 10 years for the market to catch on," says CEO Ed Newman, a sparky, balding guy who seems like he might've been an Army colonel, but wasn't.
"Every bone in my body says we are on the verge."
One reason his bones are talking, he says, is that wearables are getting out there in business. It seems that it's easier to persuade somebody to slap on a big ol' headset when his or her boss deems it good for productivity.
Federal Express this month ordered $1 million worth of wearables from Xybernaut. FedEx plans to give the devices to its aircraft maintenance workers.
That way, they can have digital manuals and diagrams with them as they crawl around an aircraft, using the microphone to guide the computer by speech recognition.
Bell Canada is trying out wearables for workers who climb poles or go down manholes. That allows the workers to have their computers right on them. They used to have a laptop in the truck, which meant they'd have to go back to the truck to look up anything.
I imagine workers who climb poles in Canada are really happy about a development that spares them from ducking into their nice warm trucks several times an hour.
The Xybernaut crew is full of interesting ideas. "Gate agents could just descend on passengers at the airport," Newman says. This would cut check-in time, he says.
My guess is that first you'd see a big clump of travelers milling around a gate. Then you'd see a squad of uniformed agents striding toward them with these contraptions on their heads. Then you'd see half the passengers run away, making it possible to complete check-in procedures much more quickly.
Once consumer versions come out, one big use might be manuals, says Xybernaut chief technology officer Mike Jenkins.
"Like for putting together a swing set," he says. Wear the computer and it could walk you through the process, leaving your hands free to hammer and turn screws and otherwise gesture angrily, all of which are essential to swing-set assembly.
As for social acceptance, Newman points out that, like Walkman headphones, wearable computer displays will get smaller and cooler.
Already, companies are working on building them into normal-looking eyeglasses or wristwatches.
"I prefer a watch display," Newman says. "Eventually, we'll all have Lasik, so we won't need glasses."
And the computers themselves, like everything electronic, will get smaller and lighter and cheaper. In years to come, wearing a computer — or more likely a combination computer and cell phone — will be as common as wearing a stereo is today. I'm looking forward to walking the aisles at the grocery store while flipping through Alice Cooper Web sites, and having it seem perfectly normal.
Kevin Maney covers technology for USA TODAY. His Technology column appears Wednesdays. E-mail Kevin at kmaney@usatoday.com.
New Generation MP3 Encoding Player!
NAPA PA-15 supports MP3 encoding. PA-15 is able to record audio signal directly from its line-in socket. Other devices like portable CD player can be used to connect withPA-15, so as to establish a direct transfer of selected song titles to store on the memory of the player. Such interface is less fuss and saves your time from downloading song titles to the computer and transfer again to the player.
Napa PA-15 is a next-generation digital audio player, which does not require cassette tapes or CDs for operation. MP3 files are stored in flash memory or in the SmartMedia card (memory extension card )of the player.
In addition to playing your favorite song titles, the PA-15 also comes with a voice recording feature (via microphone) which can be used for such purposes as recording interviews, lectures or other audio clips up to 10 hours recording time.
NAPA PA-15 is able to support AAC and WMA audio formats. AAC and WMA digital content are supported to give you the freedom to choose the compression and sound quality you desire. AAC and WMA are the new industrial audio format established by DOLBY and Microsoft respectively.
NAPA PA-15 supports USB connection. PA-15 works with any PC or Macintosh that has built-in USB support. The USB connector delivers blazing fast downloads - 5 times faster than a parallel port.
NAPA PA-15 supports SDMI (Secure Digital Music Initiative). The SDMI is a forum of more than 160 companies and organizations representing a broad spectrum of information technology and consumer electronics businesses, Internet service providers, security technology companies, and members of the worldwide recording industry working to develop voluntary, open standards for digital music. As an SDMI-Compliant product, PA-15 meets the security needs of record companies and that users purchasing such devices will have broad, legitimate access to music or other audio resources.
In contrast to portable CD players, PA-15 will not skip during heavy exercise or while being jostled. Even after repeated playbacks, there is no degradation of the sound quality.
Key Features
Multi Sampling Rate MP3 Encoding (Recording from line in Audio signal)
On board Microphone for memo recording (>10 hours recording time via MGSM technology)
32MB on board Flash
Extend memory Smart Media Card slot up to 128MB
LCD display with 2 line 7 words and 1 line icons
USB Interface
Playback 32kbps to 320kbps sampling frequency MP3 files
Equalizer functions.(Classic, Pop, Rock, Jazz, User Setting)
A to B repeatable function
SDMI (Secure Digital Music Initiative) supporting
Specifications
KIND OF PLAYING FILES MP3 (WMA or AAC playing by software upgrade in the future)
USB INTERFACE Rev. 1.1 (12Mb/s)
AUDIO FREQUENCY RANGE 20 ~ 20 kHz
MAXIMUM OUTPUT > 5mW (32 Ohms)
SIGNAL TO NOISE RATIO > 90dB
TOTAL HARMONIC DISTORTION < 0.1%
AUDIO OUTPUT Ear-phone jack
Line_OUT Line-out jack
OPERATING TIME 7 hours (MP3/128kbps Alkaline Type)
Box Contents
One NAPA PA15 portable MP3 Encoding player
Stereo Jack to Stero Jack cable
Stereo Headphone
Users Manual
2 x AAA batteries
http://www.amaxhk.com/products/napa/pa15/pa15.htm
New Generation MP3 Encoding Player!
NAPA PA-15 supports MP3 encoding. PA-15 is able to record audio signal directly from its line-in socket. Other devices like portable CD player can be used to connect withPA-15, so as to establish a direct transfer of selected song titles to store on the memory of the player. Such interface is less fuss and saves your time from downloading song titles to the computer and transfer again to the player.
Napa PA-15 is a next-generation digital audio player, which does not require cassette tapes or CDs for operation. MP3 files are stored in flash memory or in the SmartMedia card (memory extension card )of the player.
In addition to playing your favorite song titles, the PA-15 also comes with a voice recording feature (via microphone) which can be used for such purposes as recording interviews, lectures or other audio clips up to 10 hours recording time.
NAPA PA-15 is able to support AAC and WMA audio formats. AAC and WMA digital content are supported to give you the freedom to choose the compression and sound quality you desire. AAC and WMA are the new industrial audio format established by DOLBY and Microsoft respectively.
NAPA PA-15 supports USB connection. PA-15 works with any PC or Macintosh that has built-in USB support. The USB connector delivers blazing fast downloads - 5 times faster than a parallel port.
NAPA PA-15 supports SDMI (Secure Digital Music Initiative). The SDMI is a forum of more than 160 companies and organizations representing a broad spectrum of information technology and consumer electronics businesses, Internet service providers, security technology companies, and members of the worldwide recording industry working to develop voluntary, open standards for digital music. As an SDMI-Compliant product, PA-15 meets the security needs of record companies and that users purchasing such devices will have broad, legitimate access to music or other audio resources.
In contrast to portable CD players, PA-15 will not skip during heavy exercise or while being jostled. Even after repeated playbacks, there is no degradation of the sound quality.
Napa DP600 DataPlay enabled Portable
You are now in: Home - Electronics - MP3 Players - Napa DP600
4.12.01 By Dereck Willis
(PREVIEW)
The people who brought us the DAV310 and most recently, the DAV311, are now displaying one of the first DataPlay enabled portable music player. Don't get too excited. . .these units and DataPlay disks will not be available in the US until October 2001. So why tempt us?
A-Max Technology (maker of Napa devices) wants to be known as a cutting edge manufacturer. They introduced the first MP3/VCD portable over a year ago in the U.S. (DAV309). In these days of high tech portable devices, someone must step up to the plate, and deliver what the end users want. Actually it is very simple. . .we want really small portable MP3 devices with enough storage to contain hours of music. Well, the 500MB DataPlay disks should give you 8-10 hours of MP3 tunes, depending on encoded bit-rate. And the size? The DataPlay disks are about the size of a quarter, so these players can be very small, indeed.
Napa has limited information on the DP600, but I'll tell you what I know. It will use 2 AA batteries and a USB (1.1) interface for transferring files. It also has a voice recording mode for those important meetings, and a EQ to boost the sounds from lower quality recordings. All the music artists out there will appreciate the SDMI (Secure Digital Music Initiative) support, which we believe most DataPlay enabled devices will include.
The large LCD screen appears to display ID3 tags, but we will not confirm this assumption until we receive further details. A-Max did not provide the unit's size or weight, but given the DataPlay's size, it should fit in the palm of your hand.
Get ready for more DataPlay enabled devices to be introduced in the near future. Everything from digital cameras to portable audio and video players. Trust us, these little disks are gonna give CDs a run for their money. According to a DataPlay rep we spoke with at CES, these disks have a theoretical limit of 2GB of storage! First to be introduced are the 250 and 500MB blanks (and many pre-recorded titles from popular music artists), and will retail from $10-15 each, depending on quantity. Is it October yet?
Here is a list of key features provided by A-Max:
Built-in Data Play Engine support Data Play Disc
Sound Effect EQ Mode
USB Rev.1.1 interface
SDMI (Secure Digital Music Initiative)Supporting
Voice Recording
Storage data format: as same as DVD
Power Supply 2 x AA cells
USB transfer rate 12Mb/s
MSRP and Availability Date Not Confirmed
Bob Orban (Exchange) radio-tech@broadcast.net
Mon, 28 Aug 2000 12:39:28 -0700
Eureka uses an obsolete codec, MPEG1 Layer2, which is about half as
bandwidth-efficient as MPEG AAC. Layer 2 at 96kbs sounds absolutely awful.
96kbs AAC is not bad, but "CD quality" AAC requires 128kbs.
As part of the USADR/Lucent IBOC merger, the codec used in Ibiquity will be
PAC. I believe that PAC is approximately as efficient as AAC, although I
chatted with one its chief designers about a year ago, and he felt at that
time that PAC could still be improved significantly. I don't know how far
they have come since that conversation.
> From: "Wes Keene" <wes@brick.net>
> Subject: RE: [RT] Eureka?
>
> Thank you! I've been trying to make that point for over a year now.
People
> make comments like 'What you're saying doesn't reflect the state of the
art
> in codec technology.' The quality IS worse, even in an all digital plant
> with no analog conversions. This is what happens when business meets
> engineering...lowered quality standards to get more 'channels'. 96
> Kbits/sec...what the hell were they thinking? How many digital cell phone
> owners do we have on the group? How many would say the quality is better?
> Not me.
>
> As to the issue of 'increased' coverage...HAH! Again get out those
digital
> cell phones!
>
> Now some of you may say 'Wes, IBOC is a totally different ball of wax from
> cell phones.' Really, last time I checked IBOC's biggest proponent is a
> phone manufacturer!
>
> Thanks for letting me rant a little, hopefully I will still be on this
group
> tomorrow :))
>
>
> Wes Keene
>
MP3 Soon to be Superceded - The so-called MP3 community will soon be the MP4 community. Advanced Audio Encoding (AAC) - developed by the same Fraunhofer group in Germany that came up with MP3 - is a superior data compression scheme. It is the basis for Liquid Audio's integration of the encryption keys into the audio files - so that they have to be purchased to be unlocked and played. AAC is also used in AT&T's A2B format and will be a major part of MP4. The bottom line here is that MP4 allows for more compression than ever (meaning faster and more reliable delivery even with narrow bandwidth) together with much better audio quality than MP3. With the encryption built in there will be no legal hassles concerning bypassing the copyright laws. MP4 will be a multimedia format for both audio and video, functioning similarly to RealPlayer and the Windows Media Player, and will be set up for simplified commercial downloading and sale on the Net.
Digital audio gets an audition. Part two: lossy compression
Lossless compression can shrink file sizes down a fair amount, but for serious weight loss, you need to permanently discard some of the data. Find out how the lossy codecs work and whether you can hear the differences between them.
Brian Dipert, Technical Editor
With all the advantages of lossless compression that part one of this article series cites (Jan 4, 2001), what then is the role of lossy compression? Consider a lossy codec's ability to compress audio files not to half their size but to a guaranteed 1/12 (MP3) or even 1/24 (WMA) compression ratio, with little-to-no degradation perceived in playback quality. Lossy-format conversion is necessary to cost-effectively listen to audio on a portable player that employs semiconductor storage, stream it to Internet listeners from a Web server or across a LAN, or digitally broadcast it. And if your audio device can directly read and decode MP3 or other lossy-codec-format files burned on a CD-R, you give users the ability to store as much as a few dozen hours of music on one disc. (For a list of acronyms that appear in this story, see "Acronyms" sidebar.)
Compressed file sizes aren't meaningful comparison points for lossy-compression algorithms, because the objective is to always encode to a specific bit rate (Table 1). The time it takes to encode to that bit rate for a given µP type and speed differs from one algorithm to another, though, as does the quality at that bit rate. Quality is a user-, environment-,and application-dependent metric. Last year, my neighbor swore that he was unable to hear any differences when he listened to an original audio CD, a 128-kbps MP3 stream, a 64-kbps WMA stream, and a 16-kbps RealAudio stream through his PC's low-cost speakers. However, when I brought him to my house and played the same files on my PC's higher-end speaker set, he immediately understood the need for those files "that take so long to download." I generally resist the urge, therefore, to make quality comments on various codecs, though if one stands out as sounding particularly good or bad to my less-than-golden ears, I'll mention it.
Even my "slow" 533-MHz CPU can rapidly encode and decode a 30-sec test-tone clip. Therefore, for the performance-analysis portions of this lossy-compression project, I employ the same 19 songs used to evaluate lossless-compression algorithms (Table 2 from part one of this series). In addition to measuring performance, I also hope to reveal the presence of various lossy-compression techniques and their artifacts. The list of things I am looking for include:
lowpass filtering, or removal of all audio information above a certain frequency;
stereo-to-mono conversion of the original two audio channels, completely or above a certain frequency;
phase collapse, or elimination of phase differences between the two channels, completely or above a certain frequency;
frequency masking, in which a loud tone masks lower-volume information in nearby frequencies;
temporal masking, in which a loud tone masks lower-volume information that both precedes and follows the masking tone in time; and
echo, or the insertion of unwanted audio information both before and after a sharp transient, such as a percussion-instrument sound.
More on test tones
The white- and pink-noise clips I use in the lossless-compression study are also useful in my lossy-compression work. Equal-intensity noise channels, converted to a frequency-domain display via a spectrum analyzer, enable me to identify any lowpass, bandpass, or highpass filtering that a codec performs. Channels of differing intensities provide additional details—specifically if the encoder is converting the source material from stereo to mono within certain frequency ranges. The human auditory system groups its detection of incoming audio information into a number of critical frequency bands, with most of the bands residing at less than 5 kHz (reference 1 and reference 2). Note that in Table 2, the bands' widths increase as the corresponding center frequencies rise. A structure in the inner ear called the organ of Corti translates incoming audio waves into nerve impulses. Its basal-membrane width, thickness, stiffness, and hair-cell clustering define the critical band-frequency ranges and endpoints.
What better way to continue my test-clip development, then, than by combining tones at the midpoints of each critical band? Syntrillium Software's Cool Edit Pro, which costs roughly the same as Sonic Foundry's Sound Forge, includes a 64-track mixer that I use extensively. Cool Edit Pro enables me to create and combine precisely defined audio tones, as well as generate white, pink, and brown noise. Its time-based (oscilloscope) and frequency-based (spectrum- analyzer) output displays are more informative and have more robust features than those in Sound Forge.
All of my critical-band-derived sound clips have one channel 180° out of phase from the other, to give the encoder one more challenge to surmount and to enable me to look for phase collapse in the subsequent decoding. As with the pink- and white-noise clips, I created two versions of each file; one with both channels at equivalent amplitude, and the other with the left channel 20 dB "louder" than the right.
To generate each file, I first created a number of 32-bit per-sample and per-channel, 44.1-kHz-sampled, single-tone sources, then mixed them together at 32-bit resolution in Cool Edit and attenuated the result to the desired maximum amplitude. I then needed to convert them to 16-bit equivalents. After discussions with both Syntrillium Software and with audio consultant Arny Krueger, I chose the following sample-type-conversion settings:
dither on,
0.5-bit dither depth,
triangular probability-distribution function, and
no noise shaping.
Next, to test for frequency masking, I regenerated my critical-band midpoint mix, but this time mixed in 50 additional coincident tones, half of them at the one-quarter point across each critical band and the other half at the three-quarter point. One- and three-quarter-point test tones were 20 dB quieter than their midpoint neighbors. To test for temporal masking, I first determined that the pre-tone masking duration extended no further than 50 msec ahead of the masking tone, and the post-tone duration extended no more than 200 msec beyond the masking tone (Reference 3). Therefore, I again created my 30-sec midpoint tone combination. But this time, I preceded it by 50 msec of the same tonal mix, but 20 dB quieter, and followed it with 200 msec of the same 20-dB-quieter mix. Finally, to find pre- and post-echo noise around sharp audio transients, I turned to three tracks on the EBU SQAM disc: track 27 (castanets), track 32 (triangle) and track 35 (glockenspiel).
Although I created the noise and test tones in Cool Edit Pro, I switched to Sound Forge for the lossy-compression process because of its more comprehensive format support. Sound Forge version 4.5h can encode MP3, RealAudio G2, and Windows Media Audio 7 files. It can also decode MP3 files back to WAV, but licensing restrictions preclude it from supporting RealAudio and WMA decoding, forcing me to rely on RealNetworks Real Jukebox and Microsoft's command-line decoder, respectively. By encoding from the same WAV file to each of the three formats within an otherwise-identical software environment, I hope to be most accurately measuring the speed of the encoding algorithm, with other system overheads canceled out.
The need for speed
I ran 19 song clips and 13 test tones through MP3 encoding 10 times, with each iteration a combination of one of five compressed target bit rates, in conjunction with either a quality- or performance-optimized encoder configuration. I ran them through WMA encoding four times and through RealAudio encoding twice. I also ran each resultant MP3 file through the decoder built into Sound Forge. That's 512 total encoder runs, 320 decoder runs, a whole lot of mouse clicks, and a whole lot of time spent staring at a computer monitor. Fortunately, Sound Forge supports batch-mode capability and gives you the option to create a log file that captures time to encode and decode.
For Windows Media Audio 7 decoding, I used a DOS-command-line utility that Microsoft supplied me. I was unable to figure out how to capture to a file the time-to-decode message displayed on-screen, so I manually logged each displayed value as the batch file ran. RealAudio G2 decoding uses RealJukebox's convert-to-WAV capability, and I referenced the "created" and "modified" time/date stamps, which are viewable through Windows Explorer, to determine decode time.
In analyzing the encode- and decode-performance-testing results, several trends are evident (Table 3). (You can view the results for all 19 music genres in either PDF or Excel format. These results reside on the Web site of EDN 's sister publication, CommVerge.) Look at the disparity in encoding times between MP3's "fastest encode" and "highest quality" settings, even at the same bit rate; 64 kbps deviates from the general trend, but a good reason for this anomaly exists. The MP3 encoder, when set to 64 kbps, down-samples the original 44.1-kHz material to 22.05 kHz and severely lowpass filters out the upper portion of the frequency spectrum. These alterations ensure that the encoder has less source data to work with and at least partially explain why the "highest quality" and "fastest encode" results are more alike at this bit rate.
Also, notice that MP3 "highest quality" encoding to 192 kbps is actually faster than encoding to 160 kbps. Although the encoder is generating more compressed data at the higher bit rate, this trade-off gives an overall performance benefit: The encoder needs not work so hard at 192 kbps to squeeze the data down while maintaining quality. This result also suggests that, thanks to a fast hard drive and DRAM, the additional system overhead that my PC needs to store the larger compressed bit stream is an insignificant factor in the results. I was actually measuring the encode speed.
Table 2 in part one of this article lists the songs I used for each music genre, their duration, and their uncompressed WAV sizes. Match this information with that of Table 3 in this article, and you'll find that, as with lossless compression, songs of similar duration but different genres sometimes have significantly different encoding delays. This trend indicates that some types of music are "harder" to compress to a given bit rate and quality than others, and it validates the hunch that prompted me to do all this work in the first place! The results make sense: Compare a techno track to spoken word, for example, and you'll find that the techno track has a broader meaningful frequency spectrum, increased high-frequency content, greater channel-to-channel variation in both amplitude and phase, and more abrupt transients.
Evaluate WMA against MP3, particularly in the context of the quality results that follow, and WMA will probably impress you. As a general rule (with a few exceptions), the WMA encoder performance approximates that of the MP3 encoder set to "fastest encode," while its quality at least matches (and, at lower bit rates, exceeds) that of MP3 files created using the "highest quality" setting. RealAudio's encoder speed is approximately the same as that of WMA and MP3 set to "fastest encode," but the quality news isn't so good. On both test tones and music tracks, RealAudio files consistently sounded the worst and contained the largest number of lossy compression artifacts.
And what about decoders? In all three cases, their speed scaled with the bit rate of the file they were decoding. (More bits to decode means a slower decoding speed, all other factors being equal.) At 64 kbps, MP3 decoding runs much faster than the other two decoders, but remember that the encoder had previously halved the sample rate, halving the size of the resulting decoded WAV file and giving the MP3 decoder a significant built-in speed advantage. At greater than 64 kbps, MP3 and WMA decoder speeds were comparable. Poor RealAudio, though, was consistently slower than its peers, roughly twice as slow on average. Be careful when drawing definitive conclusions here. I used three decoding-software packages, so some of these differences may be the result of factors other than the decoding algorithms themselves.
Now let's see how well the test tones unveil the secrets behind the lossy codecs' magic. First, look at a spectrum-analyzer (frequency-sweep) plot of the original sound clip 2 (Figure 1a), along with its 64-kbit MP3 (Figure 1b), RealAudio (Figure 1c), and WMA (Figure 1d) counterparts. This diagram, and all subsequent MP3 diagrams, show the output of the encoder set to its "fastest encode" setting. As you examine the data that follows, as well as the additional information in this article's Web site addendum (www.ednmag.com/ednmag/extras/01csaddendum.asp), compare the trade-offs that codec developments made at each compressed bite rate, such as encode and decode speed, noise floor versus frequency, overall frequency range, and type and amount of various artifacts.
As expected, the original file shows content extending to 22.05 kHz; the summation of the left channel's frequency components is 20 dB "louder" than the right. (The left channel appears in aqua, and the right channel appears in violet). Also, notice the negative slope of both channels' amplitude-versus-frequency plots. This negative slope occurs because pink noise, which contains equivalent audio energy in each octave frequency, proportionally places a greater amount of content in low frequencies than it does in high frequencies. A white-noise graph, in contrast, would show a flat amplitude slope versus frequency.
Now, compare the original plot to the MP3 graph. Two things are immediately evident. First, the upper end of the MP3-encoded frequency range terminates at just greater than 10 kHz, meaning that the encoder has lowpass filtered and discarded all information above this point. One reason the encoder does this filtering and discarding is because it makes the highly dubious assumption (at this chosen cutoff frequency) that many of us would be unable to hear high-frequency content above this point even if it existed. Secondly, compression algorithms work best if they can reduce sample-to-sample variation. For audio, this variation is most significant at high frequencies.
Next, notice that the amplitude deviation between the two channels is much less pronounced in MP3 than in the original, particularly at high frequencies. This trend indicates that the encoder is doing a frequency-dependent stereo-to-mono partial conversion to reduce channel-to-channel differences and consequently simplify its job.
Compared with MP3, RealAudio looks pretty good. The frequency response extends quite a bit higher, past 16 kHz, and the channel-to-channel amplitude difference is better preserved across the entire frequency range. Finally, take a look at WMA. Of the three lossy codecs, WMA delivers the widest frequency response, and the left channel looks pretty good. But what about that right channel? Much of the frequency detail has been altered and discarded.
Noise files provide useful data on how the compression algorithm works, but their results don't necessarily correlate with how real-life compressed audio sounds. So don't reject WMA quite yet. Also because the files contain random noise, they tend to obscure the subtle alterations that the codecs make. So, next analyze sound-clip 6.
First look at the original file (Figure 2a). As expected, it contains 25 distinct tones; both the left and right channels are at uniform amplitudes across frequency, and the right channel is 20 dB below the left. No tone information exists between the 25 critical band midpoints; the noise floor is –120 dB.
he MP3 file looks ugly (Figure 2b). Notice again the lowpass filtering: The last three tones in the original file (10,750; 13,750; and 18,775 Hz) are now missing. Also notice the suppressed amplitude difference between the left and right channels even at low frequencies and how this difference further diminishes as frequency increases. Finally, and perhaps most obviously, look at all the added noise clustered around each of the original tones. Pragmatically, it looks worse than it is; at –80 dB it's not very audible, particularly outside the human auditory system's 2 to 5 kHz "sweet spot."
As the prior pink-noise results predicted, RealAudio has a better-preserved channel separation and frequency response. (The 13,750-Hz tone survived compression; the 18,775-Hz tone did not.) (Figure 2c). But at what trade-off? Here, the noise floor extends at times above –60 dB, just a few decibels below the "real" right-channel information. WMA compression (Figure 2d), in contrast, delivers clean stereo separation and wide frequency response. (Even the 18,775 Hz tone made it through.) Its noise floor, at no greater than –80 dB, is comfortably below the levels of even the right-channel tones. WMA seems to like critical band midpoints much more than pink noise.
The mask
Next, look at test tone 8. First, a spectrum-analyzer display of the original file clearly shows the quarter-, mid- and three-quarter-band tones, with the quarter- and three-quarter-band info 20 dB down (in both channels) from the mid-point tones (Figure 3a). Now look at the MP3 version, and you'll see little evidence that the algorithm has done any frequency masking (Figure 3b). The encoder algorithm did not eliminate any of the quarter- and three-quarter tones, at least the ones that survived the lowpass filter. Note that the 10,125-Hz quarter tone made it through the lowpass filter, but the corresponding 10,750-Hz midpoint tone and 11,375-Hz three-quarter-band tone did not.
he RealAudio graph is a mess (Figure 3c). From the frequency plot, you can't distinguish a distorted quarter- or three-quarter-tone from unwanted noise. In test tone 8, I intentionally set the amplitude of the original file's left channel quarter- and three-quarter-tones identical to the amplitude of the right channel's midtones. I suspect this amplitude and tone combination didn't simplify the encoder's job, although it appears that as with test tone 6, the midtones in both channels survived the encoding process pretty well. The additional and altered stuff in between the midtones causes the problems.
What about WMA (Figure 3d)? Remember that with the pink-noise file, the left channel survived pretty much unscathed, but the right channel came out looking very different from its original state. A similar phenomenon happened here. The quarter- and three-quarter tones of the left (louder) channel remain intact. But right-channel quarter- and three-quarter-tones, particularly below critical band 18, are nonexistent; the disappearing act is most obvious with critical band 0 data. Keep in mind that this artifact, as is the case with many artifacts I find in this study, isn't necessarily "bad"; frequency-masking theory dictates that even if the quarter- and three-quarter-tone data remains, you might be unable to hear it.
I mentioned earlier that the original quarter- and three-quarter-tone data seemed to survive MP3 encoding. But the MP3 compression algorithm wasn't immune from sound-altering behavior with test clips 7 and 8. Look at the additional, slowly decaying amplitude in both channels in the first few hundred milliseconds of the MP3-compressed version of test-tone clip 7 (Figure 4a), representing increased volume absent from the original WAV file (the left channels on top with the right channels below it) (Figure 4b). For an even stranger oscilloscope plot, look at the results from test-tone clip 8 (Figure 4c and Figure 4d), in which the increased amplitude in the left channel corresponds with decreased amplitude in the right channel. Similar MP3 behavior occurred with some of the other test tones, although not to this extreme. Neither RealAudio nor WMA exhibited similar behavior.
My initial attempts to uncover temporal masking were unsuccessful, but they did reveal other strange encoder and decoder behavior. Take a look at the first 200 msec of test tone 9 (Figure 5a). If temporal masking had occurred as part of lossy compression, you would see an interval with a reduced amplitude or a completely silent interval in the lossy-compression clips just prior to the onset of the "normal" audio material (at the 50-msec point in the original WAV file). Neither the MP3 (Figure 5b), RealAudio (Figure 5c), nor WMA (Figure 5d) versions of the test tone exhibit such masking evidence. Also note that all three lossy codecs appear to have preserved at least some of the channel-to-channel phase differences present in the original; one channel is a mirror image of the other.
You should, however, notice a couple of odd occurrences in Figure 5. First, see how the MP3 algorithm significantly attenuates the original signal, whereas the RealAudio and WMA clips are as "loud" as the original version. Also, notice that MP3 inserts in its compressed version of the test tone a filter-bank-delay-created 55-msec initial silent gap, and WMA inserts a 45-msec gap. These gaps are neither present in the original nor does RealAudio insert them.
RealAudio's gap addition occurs at the tail end of the sound clip. Comparing Figure 6a with Figure 6c, RealAudio inserts at the end of the test tone 1.385 sec of silence. The 50-msec gap added at the MP3 version's back-end (Figure 6b) is smaller than RealAudio's but still present, and WMA (Figure 6d) sticks an even smaller 30-msec gap at the end of the test tone.
Why didn't I find temporal masking? Keep in mind that with all of these test tones, I chose specific frequencies, as well as specific masked- and masking-tone amplitudes. Changes in any of these source variables can trigger temporal masking or any other lossy-compression technique, as can compressing to a different bit rate or compressing with a different encoder setting combination. For example, versions of the Fraunhofer MP3 "engine" in some software packages enable you to select whether to allow the encoder to use channel-combining joint stereo techniques; Fraunhofer MP3 encoder versions in other products don't give you this customization option.
Finally, let's look for echo artifacts. First, a quick review about what causes echo in the first place might be helpful. One of the first steps that nearly all lossy-compression audio algorithms (as well as lossy codecs for still images, such as JPEG, and video, such as MPEG) take involves converting a group of contiguous samples (called a frame) from their time-domain representation to the frequency domain. This process is analogous to the algorithm my computer uses to create the spectrum-analyzer plots in this article.
Once in the frequency domain, the encoder decides which portions of the frame's data are inaudible and, therefore, appropriate to diminish in importance or even discard. This culling process can inject into the frame quantization noise and other undesirable data. The corresponding frequency-to-time retransformation within the decoder spreads this noise throughout all of the frame's samples.
Ordinarily, the noise isn't a big deal; the "real" audio data covers it up. Similarly, temporal masking can hide noise injected after a sharp audio transient (a tap on a cymbal or a handclap, for example). But prior to a transient, the "real" audio information is subdued, or worst-case, silent. Pre-echo not only smears transients, it also injects annoying hiss into the previously quiet gaps ahead of transients, hiss which temporal masking only partially hides.
My wife, a Cuban music aficionado, listened to the uncompressed and lossy-compressed versions of the castanets in EBU SQAM test tone 27. Even with my PC's low-quality speakers and without my prompting, she immediately pointed out the pre-echo noise in the 64-kbit RealAudio file (Figure 7c). This echo doesn't exist in the original WAV (Figure 7a), lossy-compressed MP3 (Figure 7b), and WMA (Figure 7d) files.
The added noise prior to the onset of the transient in the RealAudio file should be obvious to your eyes. And it'll be obvious to most ears, too. In fairness to RealAudio, different codecs use different frame sizes for their time-to-frequency transforms, so other types of transients, or those occurring at other points in time, might cause the other codecs problems, too. More advanced codecs minimize pre-echo effects by supporting multiple frame sizes. They use less efficient, smaller frames when the encoder detects a transient and longer frames for more conventional material.
Acronyms
AAC: Advanced Audio Coding
CBR: constant bit rate
CD-R: CD-recordable
codec: coder/decoder, also sometimes used to define a single-chip A/D-plus-D/A converter
DAT: digital audio tape
EBU: European Broadcast Union
(e)PAC: (enhanced) Perceptual Audio Coder
MMX: multimedia extension
SQAM: Sound Quality Assessment Material
TwinVQ: Transform-Domain Weighted Interleaved Vector Quantization
VBR: variable bit rate
WMA: Windows Media Audio
The never-ending sonic story
My digital audio analysis work is by no means done, so periodically visit the Web-site addendum to this article series (www.ednmag.com/ednmag/extras/01csaddendum.asp) for any updates. Because the lossless-compression results in part one of this study are so similar, I don't plan to evaluate any of the other lossless codecs. (If any of you would like to do so, I'd be happy to post your results.) My efforts will focus on lossy compression. Looking first at MP3, I'd like to recompress some of my test tones using VBR encoding to see whether VBR significantly reduces the presence and magnitude of artifacts.
Other versions of the Fraunhofer encoder might provide additional flexibility to enable, disable, and otherwise adjust the operation of various compression options. And although Fraunhofer is the most popular MP3 encoder, it's not the only game in town. RealNetworks uses the Xing encoder. QDesign also sells one. And a number of independently developed encoders exist: Blade, Gogo, Lame, and Radium, just to name a few. The choice of an MP3 decoder might even affect the results, as University of Essex doctoral candidate David Robinson's recent study suggests (http://privatewww.essex.ac.uk/~djmrob/mp3decoders).
I'd like to look more closely for evidence of pre-echo in MP3 and WMA, as well as phase collapse and temporal masking in all the codecs. I'd also like to string together a series of test tones to see whether I can replicate the behavior PCABX Web-site audio consultant Arny Krueger found when he evaluated WMA (see sidebar "Music mysteries"). And I'd like to perform critical listening tests of the lossy-compressed music tracks and EBU SQAM test tones; I strongly suspect that RealAudio isn't the only codec making audible alterations.
I haven't yet begun to evaluate a plethora of additional codecs. First on the list is AAC; Fraunhofer is creating batch-mode-capable versions of their v3 encoder and decoder for me, and Dolby Labs has already supplied me with its AAC professional encoder/decoder software. Sony's dominant position in consumer electronics is motivating me to look at 132-kbps ATRAC3. (The older 292-kbps ATRAC is of less interest.) Sony uses ATRAC3 in the Music Clip and other solid-state player/recorders; their latest MiniDisc Long Play units also support it.
A few of the other algorithms in Table 1 tweaking my intellectual curiosity include open-source perceptual coders MPEGplus and Ogg Vorbis; vector-quantization pioneer TwinVQ; and Qdesign's codec, which employs parametric encoding techniques and is used in Apple's QuickTime. I'll probably skip ePAC, though; Vedalabs indicates that it's fallen out of favor in consumer electronics, and iBiquity Digital's compression derives from the original PAC algorithm (Reference 1).
If I determine that High Criteria's Total Recorder doesn't alter amplitude or otherwise mangle the audio's characteristics when intercepting and capturing the digital bit stream on its way to a PC sound card, I'll use it to convert lossy-compressed formats back to WAV if the formats' players won't natively accomplish this task. Otherwise, I'll route the players' signals to the digital outputs of the PC sound card, capture them with a DAT deck, then send them digitally back into the PC and capture them as a WAV in Sound Forge or Cool Edit Pro. Ego-Sys' USB-based Waveterminal U2A is ideal for this task, because I don't need greater than 16-bit or 44.1-kHz-sampled audio and because it doesn't require me to open up the PC and swap out sound cards. User feedback suggests that the U2A is a more robust performer than Opcode Systems' Sonicport Optical (Reference 2).
--------------------------------------------------------------------------------
REFERENCE
Dipert, Brian, "Digital-radio combatants make peace, agree on codec," EDN , Nov 9, 2000, pg 32.
Dipert, Brian, "The high-end PC looks for a home," EDN , Nov 24, 1999, pg 145.
Music mysteries
Microsoft steadfastly refuses to reveal the implementation details behind its WMA compression algorithm. Reviewers generally evaluate this codec as having quite good quality, particularly notable at low bit rates, and it uncharacteristically also delivers very fast encoding performance. With no official word from the company, audio aficionados are doing their best to figure out how WMA works its compression magic.
Reference 2 suggests that Microsoft might be employing a variant of the vector-quantization technique that a lossy codec called TwinVQ uses. The TwinVQ encoder and decoder both contain a prefabricated set of data coefficients called a codebook that represents what the algorithm's developers believe are the most common sets of per-frame frequency combinations in audio. The TwinVQ encoder, after completing a time-to-frequency transformation and subsequent compression of each frame of audio data, finds the closest match in its codebook, and instead of sending the actual coefficient data, it sends the codebook index. The decoder uses this index to pull the matching data approximation from its codebook, which it then outputs.
Because each codebook index (analogous to a book page, paragraph, and line-number combination) is much smaller than the actual data, vector quantization can produce impressive reductions in file sizes. But such reduction comes with a trade-off. If the codec developer poorly constructs the codebook or if the encoder doesn't do a good job of matching the actual data to a codebook entry, the compressed audio can sound terrible. Therefore, several variations on the basic vector-quantization technique are possible. Instead of using a preassembled codebook, the encoder might create it on the fly, based on the characteristics of the audio it's compressing.
The advantage of on-the-fly custom-codebook creation is that you're more likely to get good matches between data samples and codebook entries. But now you have to transmit the codebook along with the compressed audio, because the decoder won't already have it. The bigger (and theoretically the better the quality of) the codebook, the larger the compressed bit stream that the encoder creates and the less efficient the compression results. Also, on-the-fly codebook creation is a computing-intensive operation, which leads to slow encoding speeds. As an interim step, therefore, the vector-quantization algorithm may rely mostly on a prefabricated codebook, supplementing it with a smaller, unique codebook appendix that the encoder creates on the fly.
In response to Reference 2, Sean Alexander, product manager for Microsoft's digital media division, responded that Microsoft doesn't "do vector quantization." However, some reported characteristics of WMA lead me to suspect that although Microsoft might not be employing a strictly defined vector-quantization approach, their algorithms might be analogous to or derived from TwinVQ-like techniques. Several sources say, for example, that snippets of WMA-encoded audio sound better at a given bit rate than entire encoded songs. This feedback wouldn't make sense if, like MP3 and many other perceptual coders, WMA simply transformed and compressed multisample frames of several milliseconds one at a time.
Quality degradation with increasing audio duration could, however, occur if the encoder creates a codebook on the fly. Longer audio sequences tend to contain more randomness than shorter clips—randomness that a fixed-size codebook less accurately approximates. A discussion on Internet newsgroup rec.audio.pro (search on topic "Sound & Vision's Download Showdown: MP3 versus AAC & Windows Media") between PCABX Web-site audio consultant Arny Krueger and Microsoft's general manager for digital media, Amir Majidimehr, also hints at this phenomenon. Majidimehr was unable to replicate Krueger's results, in which WMA-encoded versions of test clips exhibited pre-echo, quantization noise, missing information, and other artifacts. Majidimehr and Krueger determined that whereas Majidimehr separately encoded each test clip, Krueger combined all of his test clips into one big file before running them through the WMA encoder.
Majidimehr wrote in one of the newsgroup postings that indeed, a combined input file of all the samples does generate different results than encoding each clip. He warns that if you want to gang encode a lot of samples, then you must leave sufficient space between them, so that the samples are truly independent. (Later in that posting, he recommends leaving 5 sec between samples.) Otherwise, he warns, your results will represent only a composite, as the previous clip may change the outcome for some codecs, such as those that Microsoft produces. I hesitate to conclude that WMA is using vector-quantization-like techniques, for two reasons. The first of these reasons is Sean Alexander's aforementioned "no-vector-quantization" comment, although something peculiar is definitely going on.
WMA also exhibits a curious discrepancy from TwinVQ and other vector-quantization compression approaches targeting multimedia. (Similar techniques are possible with still- and video-image compression.) Vector quantization has a reputation for extreme slowness, particularly when the algorithm incorporates on-the-fly codebook creation. WMA encoding, however, is reputed to be faster than other comparable-quality lossy-compression routines, and my results bear out this reputation. Perhaps Microsoft's engineers have just written tight code that takes advantage of processor-acceleration hardware, such as MMX instructions. Or maybe WMA doesn't use vector quantization at all; both Krueger and Ken Gundry from Dolby Labs have hypothesized that perhaps some kind of moving, large-window Huffman coding might explain the reported time-dependent effects. I'll continue to dig into the algorithm and report on the EDN Web site any interesting results that I encounter. Microsoft also recently announced version 8 WMA tempting me to rerun my tests on this new codec version.
http://www.ednmag.com/ednmag/reg/2001/01182001/02df1.htm
Digital broadcast radio tunes in at CES
By Junko Yoshida and George Leopold
EE Times
(01/03/00, 11:04 a.m. EST)
LAS VEGAS — Behind the scenes at this week's Consumer Electronics Show, chip and system makers will be discussing partnerships, displaying prototypes and talking about standards for what could be the next wave of digital consumer technology: digital broadcast radio. The products are still months away from the market, but key regulatory, business and technical decisions will be hotly debated at CES this week.
Digital radio promises to merge digital audio and data broadcast streams for a host of receivers, from car radios, alarm clocks and Walkman products to PCMCIA cards, Palm Pilots and cellular phones. As the Federal Communications Commission gears up for U.S. digital radio deployment, key technology developers hoping to capitalize on the nascent market opportunity are rushing to ready approaches that could vie for status as the domestic standard.
Several consumer OEMs are expected at CES to announce partnerships with proponents of digital radio and to show early implementations in private demonstrations. The National Radio Systems Committee (NRSC), which sets standards for the radio broadcast industry, is also scheduled to meet here in conjunction with CES to determine where development and testing efforts stand.
Three companies-USA Digital Radio Inc. (Columbia, Md.), Lucent Digital Radio (Warren, N.J.) and Digital Radio Express (DRE; Milpitas, Calif.)-have been touting their respective versions of in-band, on-channel (IBOC) digital audio broadcast (DAB) technology for selection as the U.S. terrestrial DAB standard. Spectrum efficiency is IBOC's biggest attribute: Federal regulators are pressing for IBOC since it would use existing AM/FM frequencies to broadcast digital audio simultaneously, conserving other portions of the spectrum for new wireless applications. If deployed, an IBOC-based digital radio standard promises CD-quality sound and such new services as data broadcasting.
The FCC, which issued a notice of proposed rulemaking in November to consider terrestrial DAB, will play a key role in standards development along with the National Radio Systems Committee. The FCC's schedule for reviewing IBOC proposals called for proponents to submit laboratory and field test results to the NRSC by Dec. 15. Lucent Digital Radio informed the committee it wouldn't make what president and chief executive officer Suren Pai called an "arbitrary" deadline, and the group now expects to hear from Lucent at its next meeting on Jan. 8 in Las Vegas.
But USA Digital Radio, founded in 1991 by Westinghouse Electric Co., CBS and Gannett Co., and Digital Radio Express announced in mid-December that they would join forces to speed development of USA Digital Radio's IBOC system. DRE will focus on datacasting applications. The partners are targeting an IBOC system based on the MPEG-2 Advanced Audio Coding (AAC) algorithm. AAC is also being considered as a codec for Internet music and digital TV applications.
USA Digital Radio and DRE delivered laboratory and field test data to the NRSC and FCC by the December deadline in a move that “confirms that we are ahead,” said Bob Struble, USA Digital Radio's president and chief executive officer.
Lucent's Pai took issue with the testing procedures. For one, the current tests compare IBOC systems with FM radio, but if the proposed requirement for IBOC is CD-quality audio, "the IBOC system should be compared to CD," Pai said. "To demonstrate how much better one's IBOC system is than FM is meaningless." Further, the NRSC thus far has not required testing under suboptimal conditions, so the tests do not reflect "the real world," Pai said.
Lucent Digital Radio is in discussions with the NRSC to amend the testing procedures. The company wants to see the creation of a common test platform, under which the results of tests done by the same labs and field tests operated by the same radio stations would be compared. "Unless you are comparing to apples to apples, we believe presenting data to the NRSC is not meaningful. We need to do the right thing here," said Pai.
NRSC chairman Charlie Morgan said the group's key criteria for judging IBOC systems will be a significant improvement in audio quality over existing AM/FM systems. "If it's not significantly better, then no standard," he said.
But the group has yet to define what "significantly better" means. Signal quality and immunity to multipath interference will be considered, but "no one [has] put their finger on it," Morgan said.
Efforts to pin down the definition and review the proposed IBOC systems are expected to shift into high gear this year. "We will know sometime during 2000 about whether an IBOC system will work," Morgan predicted.
Proponents said IBOC stations could be operating as early as the second quarter of 2001 if a transmission standard is approved. Struble said radio stations could be up and running by the end of 2000, with the first broadcasts following six months later. He pegged transition costs, including the addition of digital exciters, at between $30,000 and $200,000 per station. No new towers or antennas would be needed to begin digital radio broadcasts based on IBOC technology.
But, Struble added, "We still have another 12 to 18 months of hard work ahead of us."
Looking skyward
Lucent Digital Radio, spun out of Lucent Technologies as an independent company to develop IBOC technology for AM and FM broadcasting, has established a corporate mission that seeks to go beyond winning the terrestrial digital radio race. "Our goal is to optimize our core Perceptual Audio Coder [PAC] technology for satellite, AM IBOC, FM IBOC and the Internet," Pai said.
Further ambitions include entry into the semiconductor business to offer a PAC-based digital radio receiver platform on a chip.
PAC is an audio compression algorithm originally developed by Lucent Technologies Bell Labs to compress audio for transmission or storage. Lucent Digital Radio is tailoring PAC for a variety of applications. Pai envisions next-generation PAC-based radio receivers that would receive satellite broadcasts and Internet music as well as terrestrial digital and analog radio broadcasts.
Though no IBOC technology has yet emerged as the U.S. terrestrial digital radio standard, Pai said his company is making steady progress on its PAC implementation. Early last month, Lucent Digital Radio announced that it had licensed PAC to XM Satellite Radio for use in the latter company's satellite radio service.
XM Satellite Radio received an FCC license for satellite radio service two years ago. Getting the company to sign on to Lucent Digital Radio's PAC was "really a big step" for the technology, said Pai.
Lucent believes that the business models of satellite and terrestrial radio broadcasters will overlap rather than directly compete, allowing the broadcasters to provide complementary services. From a system vendor's standpoint, what "ultimately" makes sense is to build "one receiver on a car to receive both [satellite and terrestrial radio] services," said Pai. "After all, it can only cost so much, and it has only so much space."
That's where Lucent Digital Radio hopes to come in. Pai envisions his company's role as a key supplier of a standardized component based on its audio codec.
But rival USA Digital Radio said it will focus for now on terrestrial broadcasting. Satellite broadcasting "will happen if the market wants it to," said Struble.
While extending PAC's applications beyond terrestrial radio, Lucent Digital Radio knows that it won't get a genuine break unless it successfully demonstrates the superiority of its PAC-based multistreaming IBOC technology over other IBOC variants. "Within the first three to four months of 2000, we will have conclusively demonstrated our IBOC system to nearly everyone's satisfaction," Pai promised.
Lucent claims that it has dramatically improved its originally proposed IBOC system by developing a multistreaming PAC. Multistreaming breaks audio information into multiple packets (streams), each of which can stand alone and provide quality audio. Audio coding operating at 128 kbits/second, for example, can be broken into four 32-kbit/s streams. The streams can be re-assembled at the decoder in any combination to provide sequentially higher-quality audio. When all four streams are combined, the original audio is recovered.
With the multistreaming technique, the system continues to operate by constantly switching to the highest-quality combination of streams available, according to Lucent. In essence, it improves signal robustness to first-and second-adjacent-channel interference, significantly extends the range of digital signals and emulates the graceful degradation characteristics of analog signals at the edge of coverage.
"The biggest advantage with the multistreaming technique is that radio stations don't have to fall back on analog services," Pai said.
Lucent has the results of field tests conducted earlier this year using the older, single-streaming IBOC system. While the multistreaming signals are broadcast today from WPST's Trenton, N.J., transmission tower, Lucent Digital Radio is still "refining" its receiver design, according to Pai. Still, the company holds high hopes for multistreaming because "we've already done testing based on the single-streaming IBOC, and we are aware of its problems," said Pai.
Lucent's competitors, however, remain skeptical of multistreaming. USA Digital Radio's Struble said the approach remains risky for several reasons, including interference problems and the inability to fall back on analog services if the digital signal is lost.
"We just don't think [multistreaming] is going to work," he said.
AAC Applications
There are three distinct markets with significant focus on AAC technology: digital broadcasting in Japan, digital radio in the US, and electronic music distribution in a variety of worldwide markets. Background material on each is presented below.
The Japanese Association of Radio Industries and Businesses (ARIB) has selected AAC as the audio coding scheme for all of Japan's digital broadcast systems, including standard-definition television (SDTV), high-definition television (HDTV), digital radio, and new multimedia services.
Digital radio via terrestrial, satellite, or cable transmission is emerging around the world. While numerous international markets have adopted a spread-spectrum technology for terrestrial transmission codifying the use of MPEG-1, Layer 2 audio, regulators and broadcasters in the United States are contemplating the use of AAC for In-Band On-Channel (IBOC) transmission and other wireless applications.
Internet or Electronic Music Distribution (EMD) applications represent strong near-term markets for AAC. With its demonstrated advantages of higher audio quality at lower bit rates, and richer multichannel capabilities, AAC provides numerous advantages over MP3 and other competing audio coders. AAC will be implemented both in PC-based players and portable storage/playback devices, as well as in networked entertainment system components.
The infrastructure for each of these applications is being created today: not just the physical networks and the receivers, but also the billing, provisioning, management, and even the business models.
First-generation residential broadband networks are being fashioned from existing telephone and cable TV lines. Over time, as phone companies upgrade fiber-fed multiplexers, cable companies upgrade head-ends, and TCP protocols evolve to accommodate increased traffic, there will be enough bandwidth for everyone to stream any entertainment content that can be distributed digitally today. However, that day is far in the future. In networked applications that value quality audio reproduction, especially where bandwidth and/or storage requirements are a primary cost factor, there is a strong case to be made for AAC.
1998 - The Advanced Audio Coding, or AAC algorithm used in the demonstrations at National Association of Broadcasters (NAB) for the terrestial In-Band, On-Channel (IBOC) system that will use existing AM and FM bands for digital radio broadcasts in the U.S.