The On-going Voice Recognition Saga

Part One

Under the spotlight this week is voice recognition software which I’ve been wanting to play with for some time.

The idea of being able to lie back in my armchair and dictate my weekly column sounded seductive because I’m a very slow typist and because I seem to be developing a repetitive stress injury in my carpal tunnels. At the moment, if you’re talking voice recognition software, you’re talking either Dragon Dictate or one of various products produced by IBM.

The parcel from IBM contained a copy of VIA Voice and a copy of its predecessor, Simply Speaking Gold, just to demonstrate what the previous generation of voice recognition software was like. I installed Simply Speaking first and I was most impressed with the installation routine which even guides you through plugging in the microphone that come with it and strapping it to your head.

From that point on I don’t have much good to say about the program because it has the major snag that you have to pause between each word you say to it. That might not sound very hard to do but I found it very difficult to master and thus didn’t get very good results at all.

The program knows quite a lot of words when it wakes up and can take dictation without any preparation. The first time I tried it it recognised “aid of the party” as “boot of the bottom” and “She sells sea shells on the sea shore” as “User casual surreal sight on the sea shore”.

You can improve the accuracy of the program’s recognition by reading out a whole lot of sentences to it but I had to repeat sentences so often before getting them right that I gave it up as bad joke and installed VIA Voice. I was very thankful to find that it accepts continuous speech and I must say that it seemed to know what I was talking about because it zipped through the first training session with the greatest of ease.

I’ll take a closer look at it next week once I’ve given it a bit more training and I’ll also be looking at its competitor, Dragon Dictate.

Part Two

Last week I wrote about voice recognition software and described how I had been playing with an IBM package called Simply Speaking Gold which lets you dictate to your computer. The trouble with it, however, is that it requires you to pause between each word and that I found very difficult to master.

Fortunately IBM also sent along its newest voice recognition product called which is called ViaVoice. It promised to be a considerable improvement over Simply Speaking because it allows you to dictate in your normal voice without pauses.

Like other voice recognition packages it is not very accurate when it comes out of the box and you have to train it to get used to your voice. You initially read just over 400 paragraphs of text aloud to it until it more or less understands you.

The next step is to use its vocabulary builder to scan a selection of your documents to spot words you use regularly that are not in its dictionary. The package then records you pronouncing each of them so that, when it hears you say them again, it knows what to type out for you.

The last way of improving the package’s accuracy is by correcting it where it has recognised words incorrectly. All you have to do double-click the word you want to correct and the package will play it back as you pronounced it and give you a chance to type in the correct spelling.

The first question I would ask if I were reading an article like this would be if it was good enough to use to write it’s own review. As it turned out I did produce the review using ViaVoice and, if parts of it were gibberish, there was nothing wrong that a good edit couldn’t fix.

The most striking thing about using the package, I found, was how quickly you can get your stuff down on paper because, in 45 minutes of chat, I easily managed to input something like a thousand words. Editing it and trimming it down to my usual 600 words, or so, may, I suppose, have a bit longer.

That’s a saving of over an hour on the time I normally take to complete this column giving me valuable extra sleeping time. I used to take two-and-a-half to three hours to write the column because my typing fingers were lagging so far behind my brain that they forgot what it was they were supposed to type, but no longer.

After training the software for a while I reckon I could pretty easily get ,my column down and edited in 45 minutes. Voice recognition is just too cool for words!!

I learnt some very important lessons while preparing for this column that I should perhaps share. The a first is that you need a really potent computer for voice recognition; a Pentium 150 with 32Mb of memory must be considered the absolute minimum but the faster the machine and the more memory it has, the better.

The second factor is that getting voice recognition to work well is very hard work and you have to be prepared to spend a lot of time at it. The rewards coming from speeding up your work so dramatically are there but you’ll only get to enjoy them after a considerable effort.

The other package that I intended to write about in this column was Dragon Dictate’s NaturallySpeaking which is IBM’s main competitor. It turned out that the new version is due out in only two weeks and I thought I’d wait for it to arrive before reviewing it and comparing it to Via Voice.

I saw two demonstrations of it last week and I was most impressed with it. I expect that the new version is going to be dynamite.

Part Three

One of the most profound experiences of my computer life was when I first caught sight of Windows 3.1.

I was using DOS and not liking it, when I saw Windows being demonstrated in a a software distributor’s office. I was instantly convinced that it was going to be the next big thing in computing and so, in fact, it turned out.

There have been many wonders between then and now but none have managed to cause quite the same level of excitement in me. Not, that is, until the last few weeks during which I have been playing with speech recognition programs.

The two packages I have been experimenting with are IBM’s Via Voice and Dragon’s NaturallySpeaking In a previous column I wrote of my experiences with Via Voice which I found to be a pretty good value-for-money product although I found the interface a little difficult to work with.

I initially had a little difficulty in installing NaturallySpeaking but after a visit from Craig Dahl, of Voice Communications Systems, I got it up and running. It took me quite some time to plough through the training which you are required to complete before you can use NaturallySpeaking.

As part of the training you read out a series of texts to the program so that it can get used to your voice and then you train it as you go along. All you have to is tell it to select the word or phrase it got wrong, spell it out as it should be and then record yourself saying it; it should then know the difference.

From the start it managed very well in recognising my normal speech and a little less well in recognising such doggerel as “round the Reagan rocks the Reagan rascal rat”. It didn’t take very much effort, however, to teach NaturallySpeaking that “around the ragged rocks the ragged rascal ran”.

The only place where NaturallySpeaking might be said to fall down is in the fact that you must dictate into its own little word processor and then transfer your words to the wordprocessing package you usually use. Via Voice has the advantage that it allows you to dictate straight into Microsoft Word thus cutting out the labor of transferring your text.

At the moment I have to say that I would choose NaturallySpeaking for my own use because of the fact that it has a much easier interface and, for my voice at least, it seems a little more accurate. It scored points with me because you can format text and train it when it gets a word wrong with your voice alone wheras Via Voice makes you use the keyboard, missing the point of voice recognition as far as I’m concerned.

NaturallySpeaking, at around R1800 and some change, is about four times more expensive than Via Voice which doesn’t make it the value-for-money choice. Unfortunately for IBM, however, Via Voice is about to lose most of its price advantage because the current version of NaturallySpeaking is to be repackaged as NaturallySpeaking Solo and sold at a good bit less than half of its current price.

If you’re not tempted you can always get one of the high-end versions of NaturallySpeaking which are also due out soon. The first of these will allow you to dictate into any Windows program and the second will allow you to control your PC completely including the making of fine mouse movements.

I think I can quite confidently predict that voice recognition is going to be taking off like a rocket given the fact that it can save its users a great deal of time and strain. My weekly column, for example, used to take me about three hours to type but this one took me a little less than an hour to dictate and about 20 minutes to edit.

I’ll certainly be dictating as much of my other work as possible and even if I only end up saving four hours a week, it’ll be worth the time and effort.Voice recognition is certainly great but here I must repeat the cautionary note I struck in my last column on the subject.

You shouldn’t even try voice recognition unless you have time to put into it and you have, at the very least, a PC with a Pentium 133 chip and 32Mb of RAM.

My columns in the Sunday Tribune on 16 & 23 November and 14 December, 1997, formed the basis for this article.


Leave a Comment