advertisement
javaboutique
Search Tips
Articles  |   Tutorials  |   Reviews  |   Tools  |   by Category  |   by Date  |   by Name  |   Submit  |   Source  |   Forums  |  
javaboutique
Browse DevX


Partners & Affiliates











advertisement

Tutorial: The Java Speech API: A Primer on Speech Applications:

Synthesizer Properties

Synthesizers use five properties to control and fine-tune output speech. They are voice, volume, speaking rate, pitch, and pitch range.
  • Voice: This decides the type of voice the synthesizer uses to render the speech. Synthesizers provide a variety of voice options by simulating the age of the user (infant, adult, teenager etc), type (male, female, neutral etc) and, style (casual, business etc.). Combinations of any of these provide different kinds of voices.
  • Volume: This value ranges from a scale of 0.0 to 1.0 for the loudest.
  • Speaking Rate: This decides the speed of speech output in words per minute.
  • Pitch: This decides the baseline (minimum) pitch of the voice.
  • Pitch range: This decides the range of pitches that can be used starting from the baseline pitch.
Synthetic speech is usually generated using either Concatenative Synthesis or Formant Synthesis.
  • Concatenative Synthesis: Libraries of phonemes (unique theoretical units of sound in a language that enable the differentiation of words) are arranged together to form words and sentences. The generated sentence is then rendered as a waveform (or, sound signal). Mostly, the intelligibility of speech generated using this method is high. However, differences in speech patterns prevent naturalness in generated speech.
  • Formant Synthesis: This method generates speech artificially. Phonemes are associated with certain frequency ranges called Formants. Each formant is defined by its pitch, frequency range, and noise level. Varying the frequency range and pitch of each formant generates waveforms. The speech thus generated sounds machine-like and does not have human speech quality.

Speech Recognition

Speech recognition is the process of converting speech to text. This is more difficult compared to synthesis, as it requires interpreting what the user has spoken and converting that speech into tangible sentences.

The Process of Speech Recognition can be effectively divided into these four steps:

  1. Speech is converted to digital signals. Noise, microphone position, quality of audio hardware, etc. have a major impact of the generated digital signals.
  2. Actual speech sounds are extracted from the sounds (based on energy of the sounds).
  3. The extracted sounds are put together into 'speech frames.'
  4. The speech frames are compared with words from the grammar file to determine the word that was spoken.
These are the types of speech recognizers:
  • Speaker-independent: Speaker-independent systems are recognizers that can be used by anybody without the need for training. These systems are usually deployed in environments where the system cannot be trained, for instance, telephony applications.
  • Speaker-dependent: Speaker-dependent systems are those systems that need to be first trained for use by specific speakers. Voice samples of each speaker are taken, analyzed and stored. Speech is then matched with these samples to accurately determine the words spoken.
  • Continuous Speech Recognition: Continuous speech recognizers allow users to speak naturally and continuously.
  • Isolated or Discrete Speech Recognition: Isolated speech recognizers require the speaker to pause (about a fifth of a second) between each word that is uttered, so that the recognizer can buffer the word before processing the next one.
  • Vocabulary Constrained Systems: These systems have a limited vocabulary they can understand. Small vocabulary means that users will have to restrict speech to only words that recognizers understand.
Speech recognition is usually done using Grammar Constrained Recognition or Natural Language Recognition:
  • Grammar Constrained Recognition: This method is used by applications that need short responses to definite questions. Probable responses are stored as grammar. The synthesizer then uses the grammar to 'recognize' user answers. Software is programmed so that appropriate action is taken when answers are not found in the grammar. For instance, the question can be repeated again so that the user can provide the answer as specified in the grammar.
  • Natural Language Recognition: This method allows users to speak in a 'natural way'. Statistical models are developed to map normal responses to their inferred meanings (what the response meant). These models are then used to match user answers to a 'what the user meant' concept and thereby provide suitable responses.
Natural speech has a lot of possibilities. It has alternative pronunciations, context-based pronunciations, varied meanings of phrases. For instance, 'turn on' could mean either to arouse or to operate (as in to turn on the TV or turn on the charm). Building applications that understand such diversity is therefore a very complex process.

Two Java Speech API Implementations

A Synthesizer Implementation: FreeTTS

FreeTTS is an open source implementation of JSAPI written completely using Java. The implementation is based on Flite, a speech synthesizer built at Carnegie Mellon University. FreeTTS is not a full implementation of the JSAPI, since it does not implement javax.speech.recognition.

Features:

  • Standard download comes with three different voices&3151ltwo male and one female
  • More voices (for US English) based on FestVox project can be imported
  • Supports MBROLA voices (MBROLA is a speech synthesizer from the MBROLA project)
  • Support for JSAPI (subset of javax.speech.synthesis only)
  • Performs better than Flite on a couple of platforms
The limitation with FreeTTS is that it does not render JSML speech markup. It processes JSML data, but discards it and generates speech as plain text. You can download FreeTTS from sourceforge.com.

A Recognizer Implementation: Sphinx 4

Sphinx 4 is a sophisticated speech recognition system built using Java. It was also developed at Carnegie Mellon University. Previous versions were developed using C. Features:
  • Supports a wide range of grammar formats including: Java Speech Grammar Format, SimpleWordListGrammar, LMGrammar, FSTGrammar
  • Continuous speech and grammar constrained recognition
  • Large vocabulary
  • Partial support for JSAPI
  • Complete support for Java Speech Grammar Format (JSGF)
  • Built using JDK 1.4
  • Allows the training of new acoustic models
  • Provides for the use custom language models
  • Provides for the addition of custom dictionaries
Sphinx can be downloaded from sourceforge.com.

How to Add Java Applets to Your Site

New on the Java Boutique:

New Review:

Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling API boasts simplicity, ease-of-integration, a well-rounded feature set, and it's free!

New Applet:

Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA sequences into three useful formats.

Elsewhere on internet.com:

WebDeveloper Java
Lots of Java information on webdeveloper.com

WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.

ScriptSearch Java
Hundreds of free Java code files to download.

jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.

 Microsoft RIA Development Center
 IBM Rational Resource Center
 Destination .NET
XML error: not well-formed (invalid token) at line 33
advertisement
Receive Articles via our XML/RSS feed
Receive Articles via our XML/RSS feed

JavaBytes
Internet Cyclone
This powerful, easy-to-use, internet optimizer is for Windows 95, 98, ME, NT, 2000 and XP. It's designed to automatically optimize your Windows settings, boosting your Internet connection up to 200%.

Free VMware Server 2.0 Now Release Candidate
Linux Player Xandros Grabs Storied Rival Linspire
Hey Enterprise: Here Comes the 3G iPhone
MySpace Opens Profile Portability API
Microsoft Jumps Into Virtualization Fray
Eclipse Ganymede Makes It Easier for Devs
Open Source Nokia a Threat to Microsoft, Google?
Salesforce, Google Head for 2nd on Apps
HP Open Sources Unix File System for Linux
Red Hat Opens Its Network to Space

Build a Generic Histogram Generator for SQL Server
Beyond XML and JSON: YAML for Java Developers
Mastering the Windows Mobile Emulators
Avaya AE Services Provide Rapid Telephony Integration with Facebook
Featured Algorithm: Intel Threading Building Blocks: parallel_reduce
Getting Started with Windows Live Admin Center
Eight Key Practices for ASP.NET Deployment
Java ME User Interfaces: Do It with LWUIT!
Talking VPro: Transcript
Bringing Semantic Technology to the Enterprise

Advertising Info  |   Member Services  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About



JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
IBM eBook: Planning a Service Oriented Architecture
IBM eBook: Choosing the Right Architecture--What It Means for You and Your Business
Microsoft Article: Will Hyper-V Make VMware This Decade's Netscape?
Avaya Article: Using Intelligent Presence to Create Smarter Business Applications
Intel Go Parallel Article: Getting Started with TBB on Windows
Microsoft Article: 7.0, Microsoft's Lucky Version?
Avaya Article: How to Feed Data into the Avaya Event Processor
IBM Article: Developing a Software Policy for Your Organization
Microsoft Article: Managing Virtual Machines with Microsoft System Center
Intel Go Parallel Article: Intel Threading Tools and OpenMP
HP eBook: Storage Networking , Part 1
Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
HP Video: StorageWorks EVA4400 and Oracle
HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Red Gate Download: SQL Toolbelt and free High-Performance SQL Code eBook
Iron Speed Designer Application Generator
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
Silverlight 2 App and Walkthrough: Leverage Silverlight 2 with SQL Server and XML
IBM Article: Enterprise Search--Do You Know What's Out There?
HP Demo: StorageWorks EVA4400
Microsoft Article: The Progress and Promise of Deep Zoom
Microsoft How-to Article: Get Going with Silverlight and Windows Live
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES