System.Speech -- What is it?

I've been lurking on the fringes of speech recognition for quite some time. My first exposure required installation of a dedicated ISA sound card. Yes, it was that long ago.

As you can imagine, that experience was somewhat less than exhilarating.

My next go round was with Dragon in 1999 (I think they were owned by Corel at that point). Although the engine took a while to train, and didn't do very well on some of the words I tried to feed it ("Kosovo" comes to mind). However, once you corrected what it thought you had said, the engine worked splendidly. At the time, I was still in the Canadian Forces (Army), so much of what I was dictating invovled acronyms and jargon.

Fast-forward to the summer of 2005. The company I work for (which shall remain nameless for the time being) proposed a project to speech-enable an application in a very noisy environment. I spent a few days researching the various technologies available. Most were either intended for embedded systems, required a lot of specialized knowledge, were expensive, had a C API with no .NET wrapper... or some combination of the above.

Then I found SAPI 5.1 -- the speech recognition API for the engine that ships as part of XP. Talk about an awesome tool!

No specialized knowledge -- just give it a list of English phrases in an XML file and it would recognize them all. I even tried it with such esoterica as "Boba Fett" and "Obi-Wan Kenobi". Someone on the SAPI team must be a serious Star Wars geek, because those worked the first time.

There's lots of other great features in SAPI 5.1 -- but the one my boss liked best was the price... free. I won't dwell on the other advantages, since this is deprecated technology. And, who'd want to use it, now that System.Speech is released?

Of course, nothing is perfect. The documentation was pretty much useless to me. And, not even Google pulled very much from the web.

So let's talk about System.Speech...

For those of you who haven't found out about this namespace, it's actually buried in WPF. One namespace each for recognition and text-to-speech. System.Speech sits on top of the SAPI shipped with your OS -- either 5.1 for XP or 5.3 for Vista.

The team at Microsoft has done a superb job of making speech recognition easy. There's a SpeechRecognizer class that exposes that functionality you need to do basic speech recognition. If you're feeling really masochistic, there's also the RecognitionEngine class. I like to describe this class a SpeechRecognizer's ugly bigger brother -- who will gladly give you more than enough rope to hang yourself.

As a developer, the part I appreciate most is the inclusion of methods that make my life easy. One of the most graphic examples is in the creation of a custom grammar at run-time. With SAPI 5.1, this took about 50 lines of code -- generating an XML file on the fly, saving it to disk, and loading it into the engine. (Okay, maybe there are better ways to do this -- this was the one I could make work.)

In System.Speech, this takes four lines of code. With no disk IO operations. This is the sort of improvement that makes my life as a developer sooo much easier.

If you've ever been interested in speech-enabling your applications, you've really got to look into System.Speech. If you want a hand getting started, you'll just have to wait until the June issue of Visual Studio Magazine (hint, hint).

Print | posted on Sunday, May 13, 2007 12:37 PM

Feedback

No comments posted yet.

Your comment:





 
Please add 5 and 7 and type the answer here:

Copyright © Jeff Certain

Design by Bartosz Brzezinski

Design by Phil Haack Based On A Design By Bartosz Brzezinski