OpenEars 包括离线语音处理等等
http://www.politepix.com/openears/
OpenEars is an shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement round-trip English language speech recognition and text-to-speech on the iPhone and iPad and uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries, and it is free to use in an iPhone or iPad app. It is the most popular offline framework for speech recognition and speech synthesis on iOS and has been featured in development books such as O'Reilly's Basic Sensors in iOS by Alasdair Allan and Cocos2d for iPhone 1 Game Development Cookbook by Nathan Burba.
Highly-accurate large-vocabulary recognition (that is, trying to recognize any word the user speaks out of many thousands of known words) is not yet a reality for local in-app processing on the iPhone given the hardware limitations of the platform; even Siri does its large-vocabulary recognition on the server side. However, Pocketsphinx (the open source voice recognition engine that OpenEars uses) is capable of local recognition on the iPhone of vocabularies with hundreds of words depending on the environment and other factors, and performs very well with command-and-control language models. The best part is that it uses no network connectivity because all processing occurs locally on the device.
OpenEars can:
To use OpenEars:
OK, now that you've finished laying the groundwork, you have to...wait, that's everything. You're ready to start using OpenEars. Give the sample app a spin to try out the features (the sample app uses ARC so you'll need a recent Xcode version) and then visit the Politepix interactive tutorial generator for a customized tutorial showing you exactly what code to add to your app for all of the different functionality of OpenEars.
If the steps on this page didn't work for you, you can get free support at the forums, read the FAQ, brush up on the documentation, or open aprivate email support incident at the Politepix shop. If you'd like to read the documentation, simply read onward.
There are a few basic concepts to understand about voice recognition and OpenEars that will make it easiest to create an app.
The class that controls speech synthesis (TTS) in OpenEars.
Preparing to use the class:
To use FliteController, you need to have at least one Flite voice added to your project. When you added the "framework" folder of OpenEars to your app, you already imported a voice called Slt, so these instructions will use the Slt voice. You can get eight more free voices in OpenEarsExtras, available at https://bitbucket.org/Politepix/openearsextras
Add the following lines to your header (the .h file). Under the imports at the very top:What to add to your header:
#import <Slt/Slt.h> #import <OpenEars/FliteController.h>In the middle part where instance variables go:
FliteController *fliteController; Slt *slt;In the bottom part where class properties go:
@property (strong, nonatomic) FliteController *fliteController; @property (strong, nonatomic) Slt *slt;
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
@synthesize fliteController; @synthesize slt;Among the other methods of the class, add these lazy accessor methods for confident memory management of the object:
- (FliteController *)fliteController { if (fliteController == nil) { fliteController = [[FliteController alloc] init]; } return fliteController; } - (Slt *)slt { if (slt == nil) { slt = [[Slt alloc] init]; } return slt; }
In the method where you want to call speech (to test this out, add it to your viewDidLoad method), add the following method call:How to use the class methods:
[self.fliteController say:@"A short statement" withVoice:self.slt];
- (void) say: | (NSString *) | statement | |
withVoice: | (FliteVoice *) | voiceToUse | |
This takes an NSString which is the word or phrase you want to say, and the FliteVoice to use to say the phrase. Usage Example:
There are a total of nine FliteVoices available for use with OpenEars. The Slt voice is the most popular one and it ships with OpenEars. The other eight voices can be downloaded as part of the OpenEarsExtras package available at the URL http://bitbucket.org/Politepix/openearsextras. To use them, just drag the desired downloaded voice's framework into your app, import its header at the top of your calling class (e.g. import <Slt/Slt.h> or import <Rms/Rms.h>) and instantiate it as you would any other object, then passing the instantiated voice to this method.
- (Float32) fliteOutputLevel |
A read-only attribute that tells you the volume level of synthesized speech in progress. This is a UI hook. You can't read it on the main thread.
|
duration_stretch changes the speed of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.
|
target_mean changes the pitch of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.
|
target_stddev changes convolution of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.
|
Set userCanInterruptSpeech to TRUE in order to let new incoming human speech cut off synthesized speech in progress.
The class that generates the vocabulary the PocketsphinxController is able to understand.
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
#import <OpenEars/LanguageModelGenerator.h>Wherever you need to instantiate the language model generator, do it as follows:
LanguageModelGenerator *lmGenerator = [[LanguageModelGenerator alloc] init];
In the method where you want to create your language model (for instance your viewDidLoad method), add the following method call (replacing the placeholders like "WORD" and "A PHRASE" with actual words and phrases you want to be able to recognize):How to use the class methods:
NSArray *words = [NSArray arrayWithObjects:@"WORD", @"STATEMENT", @"OTHER WORD", @"A PHRASE", nil]; NSString *name = @"NameIWantForMyLanguageModelFiles"; NSError *err = [lmGenerator generateLanguageModelFromArray:words withFilesNamed:name]; NSDictionary *languageGeneratorResults = nil; NSString *lmPath = nil; NSString *dicPath = nil; if([err code] == noErr) { languageGeneratorResults = [err userInfo]; lmPath = [languageGeneratorResults objectForKey:@"LMPath"]; dicPath = [languageGeneratorResults objectForKey:@"DictionaryPath"]; } else { NSLog(@"Error: %@",[err localizedDescription]); }If you are using the default English-language model generation, it is a requirement to enter your words and phrases in all capital letters, since the model is generated against a dictionary in which the entries are capitalized (meaning that if the words in the array aren't capitalized, they will not match the dictionary and you will not have the widest variety of pronunciations understood for the word you are using).If you need to create a fixed language model ahead of time instead of creating it dynamically in your app, just use this method (or generateLanguageModelFromTextFile:withFilesNamed:) to submit your full language model using the Simulator and then use the Simulator documents folder script to get the language model and dictionary file out of the documents folder and add it to your app bundle, referencing it from there.
- (NSError *) generateLanguageModelFromArray: | (NSArray *) | languageModelArray | |
withFilesNamed: | (NSString *) | fileName | |
Generate a language model from an array of NSStrings which are the words and phrases you want PocketsphinxController or PocketsphinxController+RapidEars to understand. Putting a phrase in as a string makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Documents directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP. The error that this method returns contains the paths to the files that were created in a successful generation effort in its userInfo when NSError == noErr. The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".
- (NSError *) generateLanguageModelFromTextFile: | (NSString *) | pathToTextFile | |
withFilesNamed: | (NSString *) | fileName | |
Generate a language model from a text file containing words and phrases you want PocketsphinxController to understand. The file should be formatted with every word or contiguous phrase on its own line with a line break afterwards. Putting a phrase in on its own line makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. Give the correct full path to the text file as a string. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Documents directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP. The error that this method returns contains the paths to the files that were created in a successful generation effort in its userInfo when NSError == noErr. The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".
|
Set this to TRUE to get verbose output
|
Advanced: turn this off if the words in your input array or text file aren't in English and you are using a custom dictionary file
|
Advanced: if you have your own pronunciation dictionary you want to use instead of CMU07a.dic you can assign its full path to this property before running the language model generation.
OpenEarsEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OpenEarsEventsObservers as you need and receive information using them simultaneously. All of the documentation for the use ofOpenEarsEventsObserver is found in the sectionOpenEarsEventsObserverDelegate.
|
To use the OpenEarsEventsObserverDelegate methods, assign this delegate to the class hosting OpenEarsEventsObserver and then use the delegate methods documented under OpenEarsEventsObserverDelegate. There is a complete example of how to do this explained under theOpenEarsEventsObserverDelegate documentation.
A singleton which turns logging on or off for the entire framework. The type of logging is related to overall framework functionality such as the audio session and timing operations. Please turn OpenEarsLogging on for any issue you encounter. It will probably show the problem, but if not you can show the log on the forum and get help.
+ (id) startOpenEarsLogging |
This just turns on logging. If you don't want logging in your session, don't send the startOpenEarsLogging message.
Example Usage:
Before implementation:
In implementation:
The class that controls local speech recognition in OpenEars.
Preparing to use the class:
To use PocketsphinxController, you need a language model and a phonetic dictionary for it. These files define which words PocketsphinxController is capable of recognizing. They are created above by using LanguageModelGenerator.
Add the following lines to your header (the .h file). Under the imports at the very top:What to add to your header:
#import <OpenEars/PocketsphinxController.h>In the middle part where instance variables go:
PocketsphinxController *pocketsphinxController;In the bottom part where class properties go:
@property (strong, nonatomic) PocketsphinxController *pocketsphinxController;
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
@synthesize pocketsphinxController;Among the other methods of the class, add this lazy accessor method for confident memory management of the object:
- (PocketsphinxController *)pocketsphinxController { if (pocketsphinxController == nil) { pocketsphinxController = [[PocketsphinxController alloc] init]; } return pocketsphinxController; }
In the method where you want to recognize speech (to test this out, add it to your viewDidLoad method), add the following method call:How to use the class methods:
[self.pocketsphinxController startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath languageModelIsJSGF:NO];
- (void) startListeningWithLanguageModelAtPath: | (NSString *) | languageModelPath | |
dictionaryAtPath: | (NSString *) | dictionaryPath | |
languageModelIsJSGF: | (BOOL) | languageModelIsJSGF | |
Start the speech recognition engine up. You provide the full paths to a language model and a dictionary file which are created usingLanguageModelGenerator.
- (void) stopListening |
Shut down the engine. You must do this before releasing a parent view controller that contains PocketsphinxController.
- (void) suspendRecognition |
Keep the engine going but stop listening to speech until resumeRecognition is called. Takes effect instantly.
- (void) resumeRecognition |
Resume listening for speech after suspendRecognition has been called.
- (void) changeLanguageModelToFile: | (NSString *) | languageModelPathAsString | |
withDictionary: | (NSString *) | dictionaryPathAsString | |
Change from one language model to another. This lets you change which words you are listening for depending on the context in your app.
- (Float32) pocketsphinxInputLevel |
Gives the volume of the incoming speech. This is a UI hook. You can't read it on the main thread or it will block.
- (void) runRecognitionOnWavFileAtPath: | (NSString *) | wavPath | |
usingLanguageModelAtPath: | (NSString *) | languageModelPath | |
dictionaryAtPath: | (NSString *) | dictionaryPath | |
languageModelIsJSGF: | (BOOL) | languageModelIsJSGF | |
You can use this to run recognition on an already-recorded WAV file for testing. The WAV file has to be 16-bit and 16000 samples per second.
|
This is how long PocketsphinxController should wait after speech ends to attempt to recognize speech. This defaults to .7 seconds.
|
Advanced: set this to TRUE to receive n-best results.
|
Advanced: the number of n-best results to return. This is a maximum number to return – if there are null hypotheses fewer than this number will be returned.
|
How long to calibrate for. This can only be one of the values '1', '2', or '3'. Defaults to 1.
|
Turn on verbose output. Do this any time you encounter an issue and any time you need to report an issue on the forums.
|
By default, PocketsphinxController won't return a hypothesis if for some reason the hypothesis is null (this can happen if the perceived sound was just noise). If you need even empty hypotheses to be returned, you can set this to TRUE before starting PocketsphinxController.
OpenEarsEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OpenEarsEventsObservers as you need and receive information using them simultaneously.
Add the following lines to your header (the .h file). Under the imports at the very top:What to add to your header:
#import <OpenEars/OpenEarsEventsObserver.h>at the @interface declaration, add the OpenEarsEventsObserverDelegate inheritance.An example of this for a view controller called ViewController would look like this:
@interface ViewController : UIViewController <OpenEarsEventsObserverDelegate> {In the middle part where instance variables go:
OpenEarsEventsObserver *openEarsEventsObserver;In the bottom part where class properties go:
@property (strong, nonatomic) OpenEarsEventsObserver *openEarsEventsObserver;
Add the following to your implementation (the .m file):Under the @implementation keyword at the top:What to add to your implementation:
@synthesize openEarsEventsObserver;Among the other methods of the class, add this lazy accessor method for confident memory management of the object:
- (OpenEarsEventsObserver *)openEarsEventsObserver { if (openEarsEventsObserver == nil) { openEarsEventsObserver = [[OpenEarsEventsObserver alloc] init]; } return openEarsEventsObserver; }and then right before you start your first OpenEars functionality (for instance, right before your first self.fliteController say:withVoice: message or right before your first self.pocketsphinxController startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF: message) send this message:
[self.openEarsEventsObserver setDelegate:self];
Add these delegate methods of OpenEarsEventsObserver to your class:How to use the class methods:
- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID { NSLog(@"The received hypothesis is %@ with a score of %@ and an ID of %@", hypothesis, recognitionScore, utteranceID); } - (void) pocketsphinxDidStartCalibration { NSLog(@"Pocketsphinx calibration has started."); } - (void) pocketsphinxDidCompleteCalibration { NSLog(@"Pocketsphinx calibration is complete."); } - (void) pocketsphinxDidStartListening { NSLog(@"Pocketsphinx is now listening."); } - (void) pocketsphinxDidDetectSpeech { NSLog(@"Pocketsphinx has detected speech."); } - (void) pocketsphinxDidDetectFinishedSpeech { NSLog(@"Pocketsphinx has detected a period of silence, concluding an utterance."); } - (void) pocketsphinxDidStopListening { NSLog(@"Pocketsphinx has stopped listening."); } - (void) pocketsphinxDidSuspendRecognition { NSLog(@"Pocketsphinx has suspended recognition."); } - (void) pocketsphinxDidResumeRecognition { NSLog(@"Pocketsphinx has resumed recognition."); } - (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString { NSLog(@"Pocketsphinx is now using the following language model: \n%@ and the following dictionary: %@",newLanguageModelPathAsString,newDictionaryPathAsString); } - (void) pocketSphinxContinuousSetupDidFail { // This can let you know that something went wrong with the recognition loop startup. Turn on OPENEARSLOGGING to learn why. NSLog(@"Setting up the continuous recognition loop has failed for some reason, please turn on OpenEarsLogging to learn more."); }
|
There was an interruption.
|
The interruption ended.
|
The input became unavailable.
|
The input became available again.
|
The audio route changed.
|
Pocketsphinx isn't listening yet but it started calibration.
|
Pocketsphinx isn't listening yet but calibration completed.
|
Pocketsphinx isn't listening yet but it has entered the main recognition loop.
|
Pocketsphinx is now listening.
|
Pocketsphinx heard speech and is about to process it.
|
Pocketsphinx detected a second of silence indicating the end of an utterance
|
Pocketsphinx has a hypothesis.
|
Pocketsphinx has an n-best hypothesis dictionary.
|
Pocketsphinx has exited the continuous listening loop.
|
Pocketsphinx has not exited the continuous listening loop but it will not attempt recognition.
|
Pocketsphinx has not existed the continuous listening loop and it will now start attempting recognition again.
|
Pocketsphinx switched language models inline.
|
Some aspect of setting up the continuous loop failed, turn onOpenEarsLogging for more info.
|
Flite started speaking. You probably don't have to do anything about this.
|
Flite finished speaking. You probably don't have to do anything about this.