RD3: Audio Recording the Easy Way

May 14, 2021

One of the most tedious tasks associated with the development of online training is the audio recording. This involves transposing the written narration into audio form so that a user can read and listen while they study. Some people learn better by listening, while others prefer reading.

When we started building CBTs a long time ago, we would hire a voice over artist to record the audio. We would send that person the narration in text format. Then, the artist would record the audio, name the audio files in the manner we asked them to, and send back the audio via CD-ROM or download.

Once the files were received, we started the tedious (and error prone) task of linking each screen with the appropriate audio file. In the process, we would proof the audio and make notes on any discrepancies. When we were done, we’d send our notes to the voice over artist for re-recording. Rinse, wash, repeat.

The Downside of Using a Voice Over Artist to Record Audio

This process was extremely time consuming and error prone. Sometimes the audio files weren’t named properly, so we had to listen to it and then find the narration that it related to. But there were many other problems as well:

Voice over artists are very expensive – They want up to $2000 per recording hour
It took a long time to get the complete audio for the course – Sometimes up to 3-4 months
Availability problems – Voice over artists are often very busy, so if you had a correction to make, you either inserted a different voice in the middle of the module, or you had to re-record the whole thing with a different voice over artist
Aviation vocabulary/pronunciation issues – You’d be surprised at the number of people who don’t know how to pronounce pitot, empennage, radome, or EICAS
Linking the audio with the proper screen – This was a manual process and was error prone (a typical aircraft course can easily have more than 3000 screens)

CBT Module Creation the Avsoft Way

We originally used a product called Toolbook to create our CBT modules. This was a program that created files similar to PowerPoint, and it included a programming language that allowed you to associate programming code with a particular screen. It was a pretty complex software package, so in order to learn as much as I could about it, I attended a conference hosted by Platte Canyon Multi-Media Software Company (owned by a fellow Air Force Academy Alumni). In one of the seminars, I learned how to hook up Toolbook files to a Microsoft Access database. I also learned about the Microsoft .net platform and Visual Studio.

After the conference, I used Visual Studio to write a small program that would hook up to an Access Database, and the only intent at this point was to use that program to write the storyboard for a module. Then, I created a Toolbook template that would read that database, dynamically add a slide for each narration, and pull in the proper graphic and audio file. This template would then create the CBT module.

Troubleshooting Common Problems Associated with Audio Recording

This process greatly simplified the management of the storyboard and assembly of a module, but it did not solve the problems associated with the audio recording. In order to solve those problems, we started to explore the use of Text to Speech (TTS) for the audio recording. This was happening around 2007, and TTS was marginal at that time, but we did run across ATT’s Natural Voice technology. The audio produced by that platform was way above average, but still not good enough for me.

Instead, I decided to approach the problem from a different angle. We modified the storyboard program to add a recording function. This would allow the voice over artist to record the audio, and the program took care of the naming, linking, and storage of the audio file. We also built a sound proofed recording studio and used high end equipment for the recording.

We still had to troubleshoot the problems associated with the cost and availability of voice over artists, so we turned to a local Radio Announcer school. Since we knew that those people wouldn’t be available forever, we decided to split the audio recording task for a single course between multiple people – typically, three to four for a course.

This worked very well, but we still had problems with the quality of the recordings, mispronunciation of terms, and delivery time. In 2014, we had to downsize our offices as a result of the financial crisis, so we sold our building and moved into a flex space near Buckley Air Force Base. The space was smaller and due to the location near I-225, we had higher levels of ambient noise between the car traffic and the airplanes flying overhead.

As a result, we looked at TTS again. This time around, though, we came across a platform that produced even better quality than the ATT Natural Voice platform. It still wasn’t perfect, but it was good enough in my opinion. This led us to make a very controversial decision – from then on, we only used TTS for the recording. We modified our program to automatically submit the narration text to the TTS engine via a web service, download the recording, rename it, link it, and store it on our network.

TTS Makes Audio Recording Easy with RD3

Nowadays, we use TTS platforms that are driven by neural networks (i.e., artificial intelligence), and the quality is so good that it’s not unusual for people to ask me who recorded the audio, as opposed to what recorded it.

We added the TTS capability to RD3, and now we’re able to record a complete module one screen at a time, or in batch mode. In batch mode, the software cycles through the narration, submits the text to the TTS api, downloads the file, names it, stores in the master manifest. In this manner, we can record 200 screens in under 15 minutes. Then, we proof the audio, and anytime the quality is not what we expected, we massage the text, and record that single one.

From the screen shot above, the red button indicates the scenes that need to be recorded or re-recorded. The text box below contains the narration, and the bottom text box has the same text. We use that bottom box to provide instructions to the TTS engine to improve the audio recording.

Advanced TTS Functions Available with RD3

One noteworthy function is the ability to use IPA (IPA button). IPA stands for International Phonetic Alphabet. It’s a phonetic notation system to convey the way a word is pronounced, and the TTS system understands this notation.

We also have a TTS dictionary that translates terms used in the narration into something more understandable by the TTS servers:

I don’t have enough words to describe the usefulness of this particular RD3 function. It has enabled us to record an entire course in less than a week, and at much lower cost – a few hundred dollars for a course as opposed to tens of thousands of dollars. We’ve passed these savings along to our customers, and all of the problems of the past have been solved in the process!

News