首页 > 代码库 > 【转载】[完整]Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music

【转载】[完整]Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music

技术分享       
[Topics]       [Projects]       [Downloads]       [People]       [Publications]       [Press]       [Events] 
 

Algorithm

Evaluation setup

Evaluation reports

Corpus

Conclusions

Downloads

Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music

by Ewald Peiszer ([firstname].peiszer@gmx.at)

Automatic audio segmentation aims at extracting information on a songs structure, i.e., segment boundaries, musical form and semantic labels like verse, chorus, bridge etc. This information can be used to create representative song excerpts or summaries, to facilitate browsing in large music collections or to improve results of subsequent music processing applications like, e.g., query by humming.

This thesis features algorithms that extract both segment boundaries and recurrent structures of everyday pop songs. Numerous experiments are carried out to improve performance. For evaluation a large corpus is used that comprises various musical genres. The evaluation process itself is discussed in detail and a reasonable and versatile evaluation system is presented and documented at length to promote a common basis that makes future results more comparable.

 

Algorithm

Phase 1: Boundary detection

This phase tries to detect the segment boundaries of a song, i.e., the time points where segments begin and end. The output of this phase is used as the input for the next phase.

The classic similarity matrix / novelty score approach has been used. In addition, various attempts to further improve the result have been carried out.

The figure below shows the novelty score plot of KC and the Sunshine Band: That’s the Way I Like It. Vertical dotted lines indicate groundtruth boundaries.

技术分享

Note that automatic boundary extraction worked very well for this song: all major segment boundaries have been found (red askerisks).

技术分享      

Phase 2: Structure detection

This phase tries to detect the form of the song, i.e., a label is assigned to each segment where segments of the same type (verse, chorus, intro, etc.) get the same label. The labels themselves are single characters like A, B,       C, and thus not semantically meaningful.

The songs have been fully annotated. Both sequential-unaware approaches and an approach that takes temporal information into account have been used. In addition, cluster validity indices have been employed to find the correct number of segment types for each song.

The right figure (click to enlarge) shows clustering result of KC and the Sunshine Band: That’s the Way I Like It song segments. Numbered circles indicate segments, crosses mark cluster centroids.

      

The source code of the algorithm implemented in Matlab can be obtain from the download section. For information on how to use it, please refer to the included README file (or ask the author if there are still problems).

 
技术分享      

Evaluation setup

A significant amount of time has been invested in careful considerations about good evaluation. An easy-to-use evaluation program that produces both appealing and informative HTML reports has been designed and implemented.

You can download the source code from the download section at the bottom of this page.

 

A novel file format for audio segmentations (SegmXML) has been introduced. This format can contain information about hierarchical segments and alternative labels. See the example groundtruth file for Alanis Morisette: Thank You. A corresponding XML schema definition file for validating SegmXML files is available, too.

 

 

Selected evaluation reports

The evaluation reports of the following algorithm runs are available. Note that this table corresponds to Table 3.1 of the thesis. For an explanation of symbols and abbreviations used please refer to the thesis.

Parameter changedBoundary extraction results / hyperlink
dS: EuclideanP=0.55+- 0.038, R=0.78+- 0.035, F=0.65
dS: cosineP=0.55+- 0.039, R=0.76+- 0.038, F=0.64
nH=8P=0.45+- 0.04, R=0.77+- 0.037, F=0.56
nH=12P=0.46+- 0.043, R=0.7+- 0.04, F=0.56
nH=16P=0.52+- 0.044, R=0.64+- 0.042, F=0.58
nH=18P=0.52+- 0.043, R=0.62+- 0.041, F=0.57
kC=48, nH=4            P=0.49+- 0.035, R=0.77+- 0.031, F=0.6
kC=96, nH=8            P=0.55+- 0.038, R=0.78+- 0.035, F=0.65
kC=128, nH=8P=0.59+- 0.039, R=0.72+- 0.039, F=0.65
kC=128, nH=14P=0.62+- 0.038, R=0.67+- 0.041, F=0.65
boundary removing heuristicP=0.57+- 0.038, R=0.75+- 0.038, F=0.65
post processingP=0.54+- 0.038, R=0.78+- 0.037, F=0.64

MFCC40 and CQT1 are names of two parameter value sets that are explained in Table 3.2 of the thesis. MFCC40 uses Mel Frequency Cepstrum Coefficients features whereas CQT1 employs Constant Q Transform with such parameter values for fundamental frequency, maximal frequency and number of bins that the feature vectors model the semitones of seven octaves, each octave containing twelve notes.

 

Corpus

The corpus on which this work is based contains 94 songs of various genres (Rock, Pop, Hiphop, RNB, etc). Final algorithm runs are conducted on a 109 song corpus which is the largest corpus used so far in this research field. The following table contains all songs of the corpus.

Unfortunately, the demonstration songs cannot be published due to copyright issues.

                                                                                                             
Title
Take on me 
SOS 
Waterloo 
Head Over Feet 
Thank You 
Rewind 
Intergalactic 
All I‘ve Got To Do 
All My Loving 
Devil In Her Heart 
Don‘t Bother Me 
Hold Me Tight 
I saw her standing there 
I Wanna Be Your Man 
It Won‘t Be Long 
Little Child 
Misery 
Money 
Not A Second Time 
Please Mister Postman 
Roll Over Beethoven 
Till There Was You 
You Really Got A Hold On Me 
Anna go to 
Please please me 
It‘s Oh So Quiet 
Cali To New York 
Hit Me Baby One More Time 
Oops I Did It Again 
Old Days 
Thubthumping 
The Devil Is Dope 
Zombie 
Have You Ever Seen the Rain 
It‘s no good 
You Can Get It If You Really Want 
Suds & Soda 
Money For Nothing 
Stan 
Epic 
I Will Survive 
That‘s the Way I Like It 
Got The Life 
Don‘t Mess With My Man 
Like a virgin 
Into the Groove 
Sweet Dreams 
Bad 
Black Or White 
Northern Sky 
Smells like teen spirit 
Lonestar 
Wonderwall 
Always On My Mind 
Wandering star 
Kiss 
Ain‘t It Time 
Drive 
I Believe I Can Fly 
Creep 
Parallel Universe 
Whatta Man 
The Great White Buffalo 
How Much Is The Fish 
Crazy 
You‘re Still The One 
Stars 
Nothing compares to you 
Wannabe 
Trash 
A Day In The Life 
A Hard Days Night 
Being For The Benefit Of Mr. Kite 
Fixing A Hole 
Getting Better 
Good Morning Good Morning 
Help 
I Should Have Known Better 
If I Fell 
I‘m Happy Just To Dance With You 
Lovely Rita 
Lucy In The Sky With Diamonds 
Sgt. Peppers Lonely Hearts Club Band 
Sgt. Peppers Lonely Hearts reprise 
She‘s Leaving Home 
When I‘m Sixty-Four 
With A Little Help From My Friends 
Within You Without You 
Combat Rock 
Can You Feel It 
Words 
Message In A Bottle 
The Next Movement 
You Got Me 
Additional 15 songs ("test set")            
Stop The Rock 
Wo Ist Der Kaiser 
Eien no replica 
Magic in your eyes 
Jinsei konnamono 
Doukoku 
Kage-rou 
Cool Motion 
Feeling In My Heart 
Syounen no omoi 
Dream Magic 
Midarana kami no moushigo 
Born Too Slow 
Kinder 
Powerfrau 
 

Conclusions

Both boundary detection and structure extraction are quite acceptable, yet improvable.

The algorithm, however, proved to be robust in a negative and positive sense: Many experiments conducted with various parameter settings and heuristics applied did not lead to a statistically significant improvement of the mean performance.

On the other hand, cross validation and the performance on an independent test set did not show any decline in performance either. Thus, the algorithm presented seems suitable to be applied to a wide range of songs and genres.

 

Downloads

  • Master‘s thesis: Ewald Peiszer: Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music (pdf)        
  • Poster (pdf)
  • Segmentation algorithm (Matlab) and Evaluation system (Perl)  are available on request from the author
  • Beats files (Beat onsets of all songs extracted by Simon Dixon‘s BeatRoot. Plain text format.)
  • Ground truth files (SegmXML file format). Please note, that the groundtruth for the 36 files which originated from Jouni Paulus is not included. Please contact Jouni Paulus for  obtaining the groundtruth for these files.
 
last edited 02.08.2007 by Ewald Peiszer, 20.08.2007 by Thomas Lidy

【转载】[完整]Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music