转自:http://www.subfurther.com/blog/2010/12/13/from-ipod-library-to-pcm-samples-in-far-fewer-steps-than-were-previously-necessary/
In a July blog entry, I showed a gruesome technique for getting raw PCM samples of audio from your iPod library, by means of an easily-overlooked metadata attribute in the Media Library framework, along with the export functionality of AV Foundation. The AV Foundation stuff was the gruesome part — with no direct means for sample-level access to the song “asset”, it required an intermedia export to.m4a
, which was a lossy re-encode if the source was of a different format (like MP3), and then a subsequent conversion to PCM with Core Audio.
Please feel free to forget all about that approach… except for the Core Media timescale stuff, which you’ll surely see again before too long.
iOS 4.1 added a number of new classes to AV Foundation (indeed, these were among the most significant 4.1 API diffs) to provide an API for sample-level access to media. The essential classes areAVAssetReader
and AVAssetWriter
. Using these, we can dramatically simplify and improve the iPod converter.
I have an example project, VTM_AViPodReader.zip (70 KB) that was originally meant to be part of my session at the Voices That MatteriPhone conference in Philadelphia, but didn’t come together in time. I’m going to skip the UI stuff in this blog, and leave you to a screenshot and a simple description: tap “choose song”, pick something from your iPod library, tap “done”, and tap “Convert”.
To do the conversion, we’ll use an AVAssetReader
to read from the original song file, and an AVAssetWriter
to perform the conversion and write to a new file in our application’s Documents
directory.
Start, as in the previous example, by using thevalueForProperty:MPMediaItemPropertyAssetURL
attribute to get an NSURL
representing the song in a format compatible with AV Foundation.
-(IBAction) convertTapped: (id) sender {
// set up an AVAssetReader to read from the iPod Library
NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
AVURLAsset *songAsset =
[AVURLAsset URLAssetWithURL:assetURL options:nil];
NSError *assetError = nil;
AVAssetReader *assetReader =
[[AVAssetReader assetReaderWithAsset:songAsset
error:&assetError]
retain];
if (assetError) {
NSLog (@"error: %@", assetError);
return;
}
Sorry about the dangling retain
s. I’ll explain those in a little bit (and yes, you could use the alloc
/init
equivalents… I’m making a point here…). Anyways, it’s simple enough to take an AVAsset
and make an AVAssetReader
from it.
But what do you do with that? Contrary to what you might think, you don’t just read from it directly. Instead, you create another object, an AVAssetReaderOutput
, which is able to produce samples from an AVAssetReader
.
AVAssetReaderOutput *assetReaderOutput =
[[AVAssetReaderAudioMixOutput
assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
audioSettings: nil]
retain];
if (! [assetReader canAddOutput: assetReaderOutput]) {
NSLog (@"can't add reader output... die!");
return;
}
[assetReader addOutput: assetReaderOutput];
AVAssetReaderOutput
is abstract. Since we’re only interested in the audio from this asset, a AVAssetReaderAudioMixOutput
will suit us fine. For reading samples from an audio/video file, like a QuickTime movie, we’d want AVAssetReaderVideoCompositionOutput
instead. An important point here is that we set audioSettings
to nil
to get a generic PCM output. The alternative is to provide anNSDictionary
specifying the format you want to receive; I ended up doing that later in the output step, so the default PCM here will be fine.
That’s all we need to worry about for now for reading from the song file. Now let’s start dealing with writing the converted file. We start by setting up an output file… the only important thing to know here is that AV Foundation won’t overwrite a file for you, so you should delete the exported.caf
if it already exists.
NSArray *dirs = NSSearchPathForDirectoriesInDomains
(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [dirs objectAtIndex:0];
NSString *exportPath = [[documentsDirectoryPath
stringByAppendingPathComponent:EXPORT_NAME]
retain];
if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath]) {
[[NSFileManager defaultManager] removeItemAtPath:exportPath
error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];
Yeah, there’s another spurious retain
here. I’ll explain later. For now, let’s take exportURL
and create the AVAssetWriter
:
AVAssetWriter *assetWriter =
[[AVAssetWriter assetWriterWithURL:exportURL
fileType:AVFileTypeCoreAudioFormat
error:&assetError]
retain];
if (assetError) {
NSLog (@"error: %@", assetError);
return;
}
OK, no sweat there, but the AVAssetWriter
isn’t really the important part. Just as the reader is paired with “reader output” objects, so too is the writer connected to “writer input” objects, which is what we’ll be providing samples to, in order to write them to the filesystem.
To create the AVAssetWriterInput
, we provide an NSDictionary
describing the format and contents we want to create… this is analogous to a step we skipped earlier to specify the format we receive from the AVAssetReaderOutput
. The dictionary keys are defined in AVAudioSettings.h
and AVVideoSettings.h
. You may find you need to look in these header files to look for the value types to provide for these keys, and in some cases, they’ll point you to the Core Audio header files. Trial and error led me to ultimately specify all of the fields that would be encountered in aAudioStreamBasicDescription
, along with anAudioChannelLayout
structure, which needs to be wrapped in anNSData
in order to be added to an NSDictionary
AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings =
[NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)],
AVChannelLayoutKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
nil];
With this dictionary describing 44.1 KHz, stereo, 16-bit, non-interleaved, little-endian integer PCM, we can create anAVAssetWriterInput
to encode and write samples in this format.
AVAssetWriterInput *assetWriterInput =
[[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
outputSettings:outputSettings]
retain];
if ([assetWriter canAddInput:assetWriterInput]) {
[assetWriter addInput:assetWriterInput];
} else {
NSLog (@"can't add asset writer input... die!");
return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;
Notice that we’ve set the propertyassetWriterInput.expectsMediaDataInRealTime
to NO
. This will allow our transcode to run as fast as possible; of course, you’d set this to YES
if you were capturing or generating samples in real-time.
Now that our reader and writer are ready, we signal that we’re ready to start moving samples around:
[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];
These calls will allow us to start reading from the reader and writing to the writer… but just how do we do that? The key is theAVAssetReaderOutput
method copyNextSampleBuffer
. This call produces a Core Media CMSampleBufferRef
, which is what we need to provide to the AVAssetWriterInput
‘s appendSampleBuffer
method.
But this is where it starts getting tricky. We can’t just drop into awhile
loop and start copying buffers over. We have to be explicitly signaled that the writer is able to accept input. We do this by providing a block
to the asset writer’srequestMediaDataWhenReadyOnQueue:usingBlock
. Once we do this, our code will continue on, while the block will be called asynchronously by Grand Central Dispatch periodically. This explains the earlier retain
s… autoreleased variables created here in convertTapped:
will soon be released, while we need them to still be around when the block is executed. So we need to take care that stuff we need is available inside the block: objects need to not be released, and local primitives need the __block
modifier to get into the block.
__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue =
dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
usingBlock: ^
{
The block will be called repeatedly by GCD, but we still need to make sure that the writer input is able to accept new samples.
while (assetWriterInput.readyForMoreMediaData) {
CMSampleBufferRef nextBuffer =
[assetReaderOutput copyNextSampleBuffer];
if (nextBuffer) {
// append buffer
[assetWriterInput appendSampleBuffer: nextBuffer];
// update ui
convertedByteCount +=
CMSampleBufferGetTotalSampleSize (nextBuffer);
NSNumber *convertedByteCountNumber =
[NSNumber numberWithLong:convertedByteCount];
[self performSelectorOnMainThread:@selector(updateSizeLabel:)
withObject:convertedByteCountNumber
waitUntilDone:NO];
What’s happening here is that while the writer input can accept more samples, we try to get a sample from the reader output. If we get one, appending it to the writer output is a one-line call. Updating the UI is another matter: since GCD has us running on an arbitrary thread, we have to use performSelectorOnMainThread
for any updates to the UI, such as updating a label with the current total byte-count. We would also have to do call out to the main thread to update the progress bar, currently unimplemented because I don’t have a good way to do it yet.
If the writer is ever unable to accept new samples, we fall out of thewhile
and the block, though GCD will continue to re-run the block until we explicitly stop the writer.
How do we know when to do that? When we don’t get a sample fromcopyNextSampleBuffer
, which means we’ve read all the data from the reader.
} else {
// done!
[assetWriterInput markAsFinished];
[assetWriter finishWriting];
[assetReader cancelReading];
NSDictionary *outputFileAttributes =
[[NSFileManager defaultManager]
attributesOfItemAtPath:exportPath
error:nil];
NSLog (@"done. file size is %ld",
[outputFileAttributes fileSize]);
NSNumber *doneFileSize = [NSNumber numberWithLong:
[outputFileAttributes fileSize]];
[self performSelectorOnMainThread:@selector(updateCompletedSizeLabel:)
withObject:doneFileSize
waitUntilDone:NO];
// release a lot of stuff
[assetReader release];
[assetReaderOutput release];
[assetWriter release];
[assetWriterInput release];
[exportPath release];
break;
}
Reaching the finish state requires us to tell the writer to finish up the file by sending finish messages to both the writer input and the writer itself. After we update the UI (again, with the song-and-dance required to do so on the main thread), we release
all the objects we had to retain
in order that they would be available to the block.
Finally, for those of you copy-and-pasting at home, I think I owe you some close braces:
}
}];
NSLog (@"bottom of convertTapped:");
}
Once you’ve run this code on the device (it won’t work in the Simulator, which doesn’t have an iPod Library) and performed a conversion, you’ll have converted PCM in an exported.caf
file in your app’s Documents
directory. In theory, your app could do something interesting with this file, like representing it as a waveform, or running it through a Core Audio AUGraph
to apply some interesting effects. Just to prove that we actually have performed the desired conversion, use the Xcode Organizer to open up the “iPod Reader” application and drag its “Application Data” to your Mac:
The exported folder will have a Documents
, in which you should find exported.caf
. Drag it over to QuickTime Player or any other application that can show you the format of the file you’ve produced:
Hopefully this is going to work for you. It worked for most Amazon and iTunes albums I threw at it, but found I had an iTunes Plus album, Ashtray Rock by the Joel Plaskett Emergency, whose songs throw an inexplicable error when opened, so I can’t presume to fully understand this API just yet:
2010-12-12 15:28:18.939 VTM_AViPodReader[7666:307] *** Terminating app
due to uncaught exception 'NSInvalidArgumentException', reason:
'*** -[AVAssetReader initWithAsset:error:] invalid parameter not
satisfying: asset != ((void *)0)'
Still, the arrival of AVAssetReader
and AVAssetWriter
open up a lot of new possibilities for audio and video apps on iOS. With the reader, you can inspect media samples, either in their original format or with a conversion to a form that suits your code. With the writer, you can supply samples that you receive by transcoding (as I’ve done here), by capture, or even samples you generate programmatically (such as a screen recorder class that just grabs the screen as often as possible and writes it to a movie file).