In a July blog entry, I showed a gruesome technique for getting raw PCM samples of audio from your iPod library, by means of an easily-overlooked metadata attribute in the Media Library framework, along with the export functionality of AV Foundation. The AV Foundation stuff was the gruesome part — with no direct means for sample-level access to the song “asset”, it required an intermedia export to.m4a
, which was a lossy re-encode if the source was of a different format (like MP3), and then a subsequent conversion to PCM with Core Audio.
Please feel free to forget all about that approach… except for the Core Media timescale stuff, which you’ll surely see again before too long.
iOS 4.1 added a number of new classes to AV Foundation (indeed, these were among the most significant 4.1 API diffs) to provide an API for sample-level access to media. The essential classes areAVAssetReader
and AVAssetWriter
. Using these, we can dramatically simplify and improve the iPod converter.
I have an example project, VTM_AViPodReader.zip (70 KB) that was originally meant to be part of my session at the Voices That MatteriPhone conference in Philadelphia, but didn’t come together in time. I’m going to skip the UI stuff in this blog, and leave you to a screenshot and a simple description: tap “choose song”, pick something from your iPod library, tap “done”, and tap “Convert”.
To do the conversion, we’ll use an AVAssetReader
to read from the original song file, and an AVAssetWriter
to perform the conversion and write to a new file in our application’s Documents
directory.
Start, as in the previous example, by using thevalueForProperty:MPMediaItemPropertyAssetURL
attribute to get an NSURL
representing the song in a format compatible with AV Foundation.
-(IBAction) convertTapped: (id) sender {
// set up an AVAssetReader to read from the iPod Library
NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
AVURLAsset *songAsset =
[AVURLAsset URLAssetWithURL:assetURL options:nil];
NSError *assetError = nil;
AVAssetReader *assetReader =
[[AVAssetReader assetReaderWithAsset:songAsset
error:&assetError]
retain];
if (assetError) {
NSLog (@"error: %@", assetError);
return;
}
Sorry about the dangling retain
s. I’ll explain those in a little bit (and yes, you could use the alloc
/init
equivalents… I’m making a point here…). Anyways, it’s simple enough to take an AVAsset
and make an AVAssetReader
from it.
But what do you do with that? Contrary to what you might think, you don’t just read from it directly. Instead, you create another object, an AVAssetReaderOutput
, which is able to produce samples from an AVAssetReader
.
AVAssetReaderOutput *assetReaderOutput =
[[AVAssetReaderAudioMixOutput
assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
audioSettings: nil]
retain];
if (! [assetReader canAddOutput: assetReaderOutput]) {
NSLog (@"can't add reader output... die!");
return;
}
[assetReader addOutput: assetReaderOutput];
AVAssetReaderOutput
is abstract. Since we’re only interested in the audio from this asset, a AVAssetReaderAudioMixOutput
will suit us fine. For reading samples from an audio/video file, like a QuickTime movie, we’d want AVAssetReaderVideoCompositionOutput
instead. An important point here is that we set audioSettings
to nil
to get a generic PCM output. The alternative is to provide anNSDictionary
specifying the format you want to receive; I ended up doing that later in the output step, so the default PCM here will be fine.
That’s all we need to worry about for now for reading from the song file. Now let’s start dealing with writing the converted file. We start by setting up an output file… the only important thing to know here is that AV Foundation won’t overwrite a file for you, so you should delete the exported.caf
if it already exists.
NSArray *dirs = NSSearchPathForDirectoriesInDomains
(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [dirs objectAtIndex:0];
NSString *exportPath = [[documentsDirectoryPath
stringByAppendingPathComponent:EXPORT_NAME]
retain];
if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath]) {
[[NSFileManager defaultManager] removeItemAtPath:exportPath
error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];
Yeah, there’s another spurious retain
here. I’ll explain later. For now, let’s take exportURL
and create the AVAssetWriter
:
AVAssetWriter *assetWriter =
[[AVAssetWriter assetWriterWithURL:exportURL
fileType:AVFileTypeCoreAudioFormat
error:&assetError]
retain];
if (assetError) {
NSLog (@"error: %@", assetError);
return;
}
OK, no sweat there, but the AVAssetWriter
isn’t really the important part. Just as the reader is paired with “reader output” objects, so too is the writer connected to “writer input” objects, which is what we’ll be providing samples to, in order to write them to the filesystem.
To create the AVAssetWriterInput
, we provide an NSDictionary
describing the format and contents we want to create… this is analogous to a step we skipped earlier to specify the format we receive from the AVAssetReaderOutput
. The dictionary keys are defined in AVAudioSettings.h
and AVVideoSettings.h
. You may find you need to look in these header files to look for the value types to provide for these keys, and in some cases, they’ll point you to the Core Audio header files. Trial and error led me to ultimately specify all of the fields that would be encountered in aAudioStreamBasicDescription
, along with anAudioChannelLayout
structure, which needs to be wrapped in anNSData
in order to be added to an NSDictionary
AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings =
[NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)],
AVChannelLayoutKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
nil];
With this dictionary describing 44.1 KHz, stereo, 16-bit, non-interleaved, little-endian integer PCM, we can create anAVAssetWriterInput
to encode and write samples in this format.
AVAssetWriterInput *assetWriterInput =
[[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
outputSettings:outputSettings]
retain];
if ([assetWriter canAddInput:assetWriterInput]) {
[assetWriter addInput:assetWriterInput];
} else {
NSLog (@"can't add asset writer input... die!");
return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;
Notice that we’ve set the propertyassetWriterInput.expectsMediaDataInRealTime
to NO
. This will allow our transcode to run as fast as possible; of course, you’d set this to YES
if you were capturing or generating samples in real-time.
Now that our reader and writer are ready, we signal that we’re ready to start moving samples around:
[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];
These calls will allow us to start reading from the reader and writing to the writer… but just how do we do that? The key is theAVAssetReaderOutput
method copyNextSampleBuffer
. This call produces a Core Media CMSampleBufferRef
, which is what we need to provide to the AVAssetWriterInput
‘s appendSampleBuffer
method.
But this is where it starts getting tricky. We can’t just drop into awhile
loop and start copying buffers over. We have to be explicitly signaled that the writer is able to accept input. We do this by providing a block
to the asset writer’srequestMediaDataWhenReadyOnQueue:usingBlock
. Once we do this, our code will continue on, while the block will be called asynchronously by Grand Central Dispatch periodically. This explains the earlier retain
s… autoreleased variables created here in convertTapped:
will soon be released, while we need them to still be around when the block is executed. So we need to take care that stuff we need is available inside the block: objects need to not be released, and local primitives need the __block
modifier to get into the block.
__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue =
dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
usingBlock: ^
{
The block will be called repeatedly by GCD, but we still need to make sure that the writer input is able to accept new samples.
while (assetWriterInput.readyForMoreMediaData) {
CMSampleBufferRef nextBuffer =
[assetReaderOutput copyNextSampleBuffer];
if (nextBuffer) {
// append buffer
[assetWriterInput appendSampleBuffer: nextBuffer];
// update ui
convertedByteCount +=
CMSampleBufferGetTotalSampleSize (nextBuffer);
NSNumber *convertedByteCountNumber =
[NSNumber numberWithLong:convertedByteCount];
[self performSelectorOnMainThread:@selector(updateSizeLabel:)
withObject:convertedByteCountNumber
waitUntilDone:NO];
What’s happening here is that while the writer input can accept more samples, we try to get a sample from the reader output. If we get one, appending it to the writer output is a one-line call. Updating the UI is another matter: since GCD has us running on an arbitrary thread, we have to use performSelectorOnMainThread
for any updates to the UI, such as updating a label with the current total byte-count. We would also have to do call out to the main thread to update the progress bar, currently unimplemented because I don’t have a good way to do it yet.
If the writer is ever unable to accept new samples, we fall out of thewhile
and the block, though GCD will continue to re-run the block until we explicitly stop the writer.
How do we know when to do that? When we don’t get a sample fromcopyNextSampleBuffer
, which means we’ve read all the data from the reader.
} else {
// done!
[assetWriterInput markAsFinished];
[assetWriter finishWriting];
[assetReader cancelReading];
NSDictionary *outputFileAttributes =
[[NSFileManager defaultManager]
attributesOfItemAtPath:exportPath
error:nil];
NSLog (@"done. file size is %ld",
[outputFileAttributes fileSize]);
NSNumber *doneFileSize = [NSNumber numberWithLong:
[outputFileAttributes fileSize]];
[self performSelectorOnMainThread:@selector(updateCompletedSizeLabel:)
withObject:doneFileSize
waitUntilDone:NO];
// release a lot of stuff
[assetReader release];
[assetReaderOutput release];
[assetWriter release];
[assetWriterInput release];
[exportPath release];
break;
}
Reaching the finish state requires us to tell the writer to finish up the file by sending finish messages to both the writer input and the writer itself. After we update the UI (again, with the song-and-dance required to do so on the main thread), we release
all the objects we had to retain
in order that they would be available to the block.
Finally, for those of you copy-and-pasting at home, I think I owe you some close braces:
}
}];
NSLog (@"bottom of convertTapped:");
}
Once you’ve run this code on the device (it won’t work in the Simulator, which doesn’t have an iPod Library) and performed a conversion, you’ll have converted PCM in an exported.caf
file in your app’s Documents
directory. In theory, your app could do something interesting with this file, like representing it as a waveform, or running it through a Core Audio AUGraph
to apply some interesting effects. Just to prove that we actually have performed the desired conversion, use the Xcode Organizer to open up the “iPod Reader” application and drag its “Application Data” to your Mac:
The exported folder will have a Documents
, in which you should find exported.caf
. Drag it over to QuickTime Player or any other application that can show you the format of the file you’ve produced:
Hopefully this is going to work for you. It worked for most Amazon and iTunes albums I threw at it, but found I had an iTunes Plus album, Ashtray Rock by the Joel Plaskett Emergency, whose songs throw an inexplicable error when opened, so I can’t presume to fully understand this API just yet:
2010-12-12 15:28:18.939 VTM_AViPodReader[7666:307] *** Terminating app
due to uncaught exception 'NSInvalidArgumentException', reason:
'*** -[AVAssetReader initWithAsset:error:] invalid parameter not
satisfying: asset != ((void *)0)'
Still, the arrival of AVAssetReader
and AVAssetWriter
open up a lot of new possibilities for audio and video apps on iOS. With the reader, you can inspect media samples, either in their original format or with a conversion to a form that suits your code. With the writer, you can supply samples that you receive by transcoding (as I’ve done here), by capture, or even samples you generate programmatically (such as a screen recorder class that just grabs the screen as often as possible and writes it to a movie file).
9 Comments
1. nick_king_equate replies at 25th January 2011 um 6:26 am :
Hi,
first off, thanks for this useful bit of code. Im still a bit fuzzy about blocks, but understand the rest surprisingly enough.
I am working on an app at present that uses the ipod media library. I have lifted a bit of your code to do this. Unfortunately when I run instruments over it it starts generating leaks despite what seems to be good releasing of the memory involved. This problem compounds for every successive access to the library until it gets an out of memory message.
I thought this was bound to be something stupid that I did, but after much banging of head against the screen I ran your original code through instruments and saw that it was having the same issues.
Is this something you are aware of and possibly a pitfall of using the library in this manner. Any direction you could give at this point would make my life a little better
Cheers
N.
2. cocell replies at 4th February 2011 um 2:19 pm :
// release a lot of stuff
[assetReader release];
[assetReaderOutput release];
[assetWriter release];
[assetWriterInput release];
[exportPath release];
Yeah, I notice that these aren’t being released at all and is causing major leaks and then crashes. Anyone know a fix to this? I’m trying right now, I thought it was my AU.
3. [Time code];… replies at 4th March 2011 um 11:11 pm :
[...] This got easier in iOS 4.1. Please forget everything you’ve read here and go read From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary [...]
4. cocell replies at 6th March 2011 um 12:49 pm :
I believe this is the right blog. I used the code from VTM_AViPodReader.zip, and I get Leaks for some reason I don’t know, And crash after converting new song from iPod library.
5. Hardik Nimavat replies at 25th April 2011 um 10:07 am :
Hello, After converting 4 to 5 songs app crashes due to memory issue. It gets memory level warning – 2. and crashes. Can you help me to solve this problem?
Thanks,
Hardik Nimavat
6. Ahsan replies at 2nd May 2011 um 5:52 am :
Hi there,
Can you help me out a bit?
I have read mp3 and I am able to get CMSampleBufferRef when I check AudioBuffer’s mData (the void pointer) by casting it in SInt16 I get some values. Does these values indicate amplitudes of the audio?
What does mData indicates w.r.t audio?
Thanks!
7. adilsherwani replies at 27th May 2011 um 5:45 pm :
Awesome article, thanks! A quick tip on the leaks: the CMSampleBufferRef is leaking as [assetReaderOutput copyNextSampleBuffer] follows the Create rule for memory management. A call to CFRelease after you’re done with it gets rid of the memory warnings!
8. [Time code];… replies at 28th May 2011 um 1:47 pm :
[...] added in 4.1 to do sample level access, AVAssetWriter and AVAssetReader. An earlier blog entry, From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary, exercises both of these, reading from an iPod Library song with an AVAssetReader and writing to a [...]
9. profiles.google.com/milburn.and… replies at 12th June 2011 um 8:44 am :
Greetings! Love your work. I’m getting errors with this approach (and related approaches) in iOS5 beta. I know we aren’t free to discuss these here, so I’ve started a thread on the dev forums here:
https://devforums.apple.com/thread/104433