Contents[hide]
|
Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "meta" task which includes various audio classification tasks that follow this Train/Test process. For MIREX 2011, five classification sub-tasks are included:
All five classification tasks were conducted in previous MIREX runs (please see ). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats.
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX "EvalFest" list. If you have an question or comment, simply include the task name in the subject heading.
This dataset requires algorithms to classify music audio according to the composer of the track (drawn from a collection of performances of a variety of classical music genres). The collection used at MIREX 2009 will be re-used.
Collection statistics:
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of US Pop music tracks). The MIREX 2007 Genre dataset will be re-used, which was drawn from the USPOP 2002 and USCRAP collections.
Collection statistics:
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of Latin popular and dance music, sourced from Brazil and hand labeled by music experts). Carlos Silla's (cns2 (at) kent (dot) ac (dot) uk) Latin popular and dance music dataset [1] will be re-used. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.
Collection statistics:
This dataset requires algorithms to classify music audio according to the mood of the track (drawn from a collection of production msuic sourced from the APM collection [2]). The MIREX 2007 Mood Classification dataset [3] will be re-used.
Collection statistics:
For all datasets, participating algorithms will have to read audio in the following format:
This section first describes evaluation methods common to all the datasets, then specifies settings unique to each of the tasks.
Participating algorithms will be evaluated with 3-fold cross validation. For Artist Identification and Classical Composer Classification, album filtering (保证每张专辑的在训练和测试数据中都有)will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for US Pop Genre Classification(应该是对应Mixed genre classification) and Latin Genre Classification, artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.
In addition computation times for feature extraction and training/classification will be measured.
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.I.e.
<example path and filename>
E.g.
/path/to/track1.wav/path/to/track2.wav...
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.
I.e.
<example path and filename>\t<class label>
E.g.
/path/to/track1.wav rock/path/to/track2.wav blues...
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.
I.e.
<example path and filename>
E.g.
/path/to/track1.wav/path/to/track2.wav...
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.
I.e.
<example path and filename>\t<class label>
E.g.
/path/to/track1.wav classical/path/to/track2.wav blues...
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.
Executables will have to accept the paths to the aforementioned list files as command line parameters.
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt Train.sh /path/to/scratch/folder /path/to/trainListFile.txt Classify.sh /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt myAlgo.sh -classify /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.
extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
All submissions should include a README file including the following the information:
Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
A hard limit of 24 hours will be imposed on feature extraction times.
A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.
Friday August 5th 2011
Friday August 26th 2011
name / email