首页 > 代码库 > 【转帖】VOICEBOX: Speech Processing Toolbox for MATLAB

【转帖】VOICEBOX: Speech Processing Toolbox for MATLAB

VOICEBOX: Speech Processing Toolbox for MATLAB

Introduction

VOICEBOX is a speech processing toolbox consists of MATLAB routines that are  maintained by and mostly written by Mike Brookes, Department of Electrical & Electronic  Engineering, Imperial College, Exhibition  Road, London SW7 2BT, UK. Several of the routines require MATLAB V6.5 or above  and require (normally slight) modification to work with earlier veresions.

The routines are available as a  zip archive and are made available under the terms of the GNU Public  License.

The routine VOICEBOX.M contains  various installation-dependent parameters which may need to be altered before  using the toolbox. In particular it contains a number of default directory paths  indicating where temporary files should be created, where speech data normally  resides, etc. You can override these defaults by editing voicebox.m directly or,  more conveniently, by setting an environment variable VOICEBOX to the path of an  initializing m-file. See the comments in voicebox.m for a fuller description.

For reading compressed SPHERE format files, you will need the SHORTEN program written by Tony Robinson and SoftSound  Limited www.softsound.com. The path to  the shorten executable must be set in voicebox.m.Unfortunately, the current version does not work on 64-bit  systems.

MATLAB doesn‘t really like unicode fonts; some non-unicode fonts containing  IPA phonetic symbols developed by SIL are  available here.

Please send any comments, suggestions, bug reports etc to mike.brookes@ic.ac.uk.


Contents


Audio File Input/Output
Read and write WAV and other speech file formats
Frequency Scales
Convert between Hz, Mel, Erb and MIDI frequency scales
Fourier/DCT/Hartley Transforms
Various related transforms
Random Number and Probability Distributions
Generate random vectors and noise signals
Vector Distances
Calculate distances between vector lists
Speech Analysis
Active level estimation, Spectrograms
LPC Analysis of Speech
Linear Predictive Coding routines
Speech Synthesis
Text-to-speech synthesis and glottal waveform models
Speech Enhancement
Spectral noise subtraction
Speech Coding
PCM coding, Vector quantisation
Speech Recognition
Front-end processing for recognition
Signal Processing
Miscellaneous signal processing functions
Information Theory
Routines for entropy calculation and symbol codes
Computer Vision
Routines for 3D rotation
Printing and Display Functions
Utilities for printing and graphics
Voicebox Parameters and System Interface
Get or set VOICEBOX and WINDOWS system parameters
Utility Functions
Miscellaneous utility functions


Audio File Input/Output

Routines are available to read and, in some cases write, a variety of file formats:

ReadWriteSuffix 
readwavwritewav.wavThese routines allow an arbitrary number of channels and can deal with linear PCM (any precision up to 32 bits), A-law PCM, Mu-law PCM and Floating point formats. Large files can be read and written in small chunks.
readhtkwritehtk.htkRead and write waveform and parameter files used by Microsoft‘s Hidden Markov Toolkit.
readsfs .sfsSpeech Filing system files from Mark Huckvale at UCL.
readsph .sphNIST Sphere format files (including TIMIT). Needs SHORTEN for compressed files.
readaif .aifAIFF format (Audio Interchange File Format) used by Mac users.
readcnx cnxRead Connex database files (from BT)
readau auRead AV audio files (from Sun)

Frequency Scale Conversion

From fTo fScale 
frq2barkbark2frqbarkThe bark scale is based on critical bands and masking in the human ear.
frq2centcent2frqerbThe cent scale is in increments of 0.01 semitones.
frq2erberb2frqerbThe erb scale is based on the equivalent rectangular bandwidths of the human ear.
frq2melmel2frqmelThe mel scale is based on the human perception of sinewave pitch.
frq2midimidi2frqmidiThe midi standard specifies a numbering of semitones with middle C being 60. They can use the normal equal tempered scale or else the pythagorean scale of just intonation. They will in addition output note names in a character format.

Fourier, DCT and Hartley Transforms

ForwardInverse 
rfftirfftForward and inverse discrete fourier transforms on real data. Only the first half of the conjugate symmetric transform is generated. For even length data, the inverse routine is asumptotically twice as fast as the built-in MATLAB routine.
rsfft Forward transform of real, symmetric data to give the first half only of the real, symmetric transform.
zoomfft Calculate the discrete fourier transform at an arbitrary set of linearly spaced frequencies. Can be used to zoom into a subset of the full frequency range.
rdctirdctForward and inverse discrete cosine transform on real data.
rhartleyrhartleyHartley transform on real data (forward and inverse transforms are the same).

Random Numbers and Probability Distributions

  • Random Number Generation

    randvecgenerates random vectors from gaussian or lognormal mixture distributions.
    randiscrgenerates discrete random values with a specified probability vector
    stdspectrumgenerates noise samples or filter coefficients for a variety of standard spectra including: A, B, C or BS468 weighting, USASI noise, POTS spectrum, LTASS, Internal masking noise (from SII spec)
    randfiltgenerates filtered gaussian noise without any startup transients.
    rnsubsetselects a random subset of k elements from the numbers 1:n
  • Probability Density Functions

    lognmpdfcalculates the pdf of a lognormal distribution
    gaussmixgenerates a multivariate Gaussian mixture model (GMM) from training data
    gaussmixddetermines marginal and conditional distributions from a GMM and can be used to perform inference on unobserved variables.
    gaussmixgcalculates the global mean, covariance matrix and mode of a GMM
    gaussmixmestimates the mean and variance of the magnitude of a GMM vector variate
    gaussmixm_cartcalculate the CART regression tree used by gaussmixm
    gaussmixkcalculates the Kulback-Leibler Divergence, D(f||g), between two GMMs
    gaussmixpcalculates and plots full and marginal log probability and relative mixture probabilities from a GMM
    gaussmixtmultiplies two GMMs together
    v_chimvapproximates the mean and variance of a non-central chi distribution
    vonmisespdfcalculate the pdf of the Von Mises (circular normal) distribution
  • Miscellaneous

    berk2probconvert Berkson matrix to probability
    gausprodcalculates the product of two gaussian distributions
    histndimcalculates an n-dimensional histogram (and plots a 2-D one)
    maxgausscalculates the mean and variance of the maximum element of a gaussian vector
    prob2berkconvert probability matrix to Berksons

Vector Distance

disteusqcalculates the squared euclidean distance between all pairs of rows of two matrices.
distitarcalculates the Itakura spectral distances between sets of AR coefficients.
distitpfcalculates the Itakura spectral distances between power spectra.
distisarcalculates the Itakura-Saito spectral distances between sets of AR coefficients.
distispfcalculates the Itakura-Saito spectral distances between power spectra.
distcharcalculates the COSH spectral distances between sets of AR coefficients.
distchpfcalculates the COSH spectral distances between power spectra.

Speech Analysis

activlevcalculates the active level of a speech segment according to ITU-T recommendation P.56.
activlevgcalculates the active level of a speech segment robustly to added noise
dypsaestimates the glottal closure instants from the speech waveform.
enframecan be used to split a signal up into frames. It can optionally apply a window to each frame.
correlogramCalculates a 3D correlogram [slowly]
ewgrpdelcalculates the energy-weighted group delay waveform.
fram2wavinterpolates a sequence of frame-based value into a waveform
filtbankmTransformation matrix for a linear/mel/erb/bark-spaced filterbank from dft output
fxpefacPEFAC pitch tracker
fxraptis an implementation of the RAPT pitch tracker by David Talkin.
gammabankDetermine a bank of IIR gammatone filters
importsiicalculate the SII importance function
mos2pesqConvert MOS values to PESQ speech quality scores
overlapaddJoin frames up using overlap-add processing. Commonly used with enframe.
pesq2mosConvert PESQ speech quality scores to MOS values
phon2soneConvert signal levels from phons to sones
psycdigitexperimental estimation of monotonic/unimodal psychometric function using TIDIGITS
psycestexperimental estimation of monotonic psychometric function
psycestuexperimental estimation of unimodal psychometric function
psychofunccalculate psychometric function
v_sigmaestimate glottal opening and closure instants from the laryngograph/EGG waveform
snrsegcalculate segmental SNR and global SNR relative to a reference signal
sone2phonConvert signal levels from sones to phons
soundspeedgives the speed of sound as a function of temperature
spgrambwdraws a spectrogram with many options. See tutorial.
txalignfinds the best alignment (in a least squares sense) between two sets of time markers (e.g. glottal closure instants).
vadsohnvoice activity detector
v_ppmvuCalculate the PPM, VU or EBU levels of a signal

LPC Analysis of Speech

lpcauto & lpccovarperform linear predictive coding (LPC) analysis. The routines relating to LPC are described in more detail on another page. A large number of conversion routines are included for changing the form of the LPC coefficients (e.g. AR coefficients, reflection coefficients etc.): these are of the form lpcxx2yy where xx and yy denote the coefficient sets.
lpcrr2amcalculates LPC filters for all orders up to a given maximum.
lpcbwexpperforms bandwidth expansion on an LPC filter.
ccwarpfperforms frequency warping in the complex cepstrum domain.
lpcifiltperforms inverse filtering to estimate the glottal waveform from the speech signal and the lpc coefficients.
lpcrandcan be used to generate random, stable filters for testing purposes.

Speech Synthesis

sapisynthText-to-speech synthesis (TTS) of a string or matrix entries
glotrosCalculates the Rosenberg model of the glottal flow waveform
glotlfCalculates the Liljencrants-Fant model of the glottal flow waveform

Speech Enhancement

estnoiseguses an MMSE algorithm to estimate the noise spectrum from a noisy speech signal that has been divided into frames.
estnoisemuses a minimum-statistics algorithm to estimate the noise spectrum from a noisy speech signal that has been divided into frames.
specsubperforms speech enhancement using spectral subtraction
ssubmmseperforms speech enhancement using the MMSE or log MMSE criteria
ssubmmsevperforms speech enhancement using the MMSE or log MMSE criteria with VAD-based noise estimate

Speech Coding

lin2pcmaconverts an audio waveform to 8-bit A-law PCM format
lin2pcmuconverts an audio waveform to 8-bit mu-law PCM format
pcma2linconverts 8-bit A-law PCM to a waveform
pcmu2linconverts 8-bit mu-law PCM to a waveform
kmeanlbgvector quantisation using the LBG algorithm
kmeanharvector quantisation using the K-harmonic means algorithm
potsbandcalculates a bandpass filter corresponding to the standard telephone passband.
v_kmeansvector quantisation using the K-means algorithm

Speech Recognition

melcepstimplements a mel-cepstrum front end for a recogniser
melbankmconstructs a bandpass filterbank with mel-spaced centre frequencies
cep2powconverts multivariate Gaussian means and covariances from the log power or cepstral domain to the power domain
pow2cepconverts multivariate Gaussian means and covariances from the power domain to the log power or cepstral domain
ldatraceperforms Linear Discriminant Analysis with optional constraints on the transform matrix

Signal Processing

ditherqadds dither and quantizes a signal
dlyapsqsolves the discrete lyapunov equation using an efficient square root algorithm
filterbankApply a bank of IIR filters to a signal
maxfiltperforms running maximum filter
meansqtfcalculates the output power of a rational filter with a white noise input
momfiltgenerate running moments from a signal
sigalignalign a clean reference with a noise signal and find optimum gain
schmittpasses a signal through a schmitt trigger having hysteresis
teagercalculate the Teager energy waveform
v_addnoiseadd noise to a signal at a chosen SNR
v_findpeaksfinds the peaks in a signal
v_windowsgenerates window functions
v_windinfocalculate window properties and figures of merit
zerocrosfinds the zero crossings of a signal with interpolation

Information Theory

huffmancalculates optimum D-ary symbol code from a probability mass vector
entropycalculates entropy and conditional entropy for discrete and continuous distributions

Computer Vision

imagehomogApply a homography transformation to an image with bilinear interpolation
polygonareaCalculates the area of a polygon
polygonwindDetermines whether points are inside or outside a polygon
polygonxlineDetermines where a line crosses a polygon
qrabsAbsolute value of a real quaternion
qrdividedivide two real quaternions (or invert one)
qrdotdivelmentwise division of two real quaternion arrays
qrdotmultelmentwise multiplication of two real quaternion arrays
qrmult multiply two real quaternion arrays
qrpermutepermute the indices of a quaternion array
rectifyhomogApply rectifing homographies to a set of cameras to make their optical axes parallel
rot--2--converts between the following representations of rotations: rotation matrix (ro), euler angles (eu), axis of rotation (ax), plane of rotation (pl), real quaternion vector (qr), real quaternion matrix (mr), complex quaternion vector (qc), complex quaternion matrix (mc). A detailed description is given here.
rotqrmeanFind the average of several rotation quaternions
rotqrvecApply a quaternion rotation to an array of 3D vectors
skew3dConvert between vectors and skew symmetric matrices: 3x3 matrix <-> 3x1 vector and 4x4 Plucker matrix <-> 6x1 vector.
sphrharmforward and inverse spherical harmonic transform using uniform, Gaussian or arbitrary inclination (elevation) grids and a uniform azimuth grid.
upolyhedronCalculate the vertex coordinates and other characteristics of a uniform polyhedron

Printing and Display Functions

axisenlargeenlarge the axes of a figure slightly
bitsprecrounds values to a precision of n bits
cblabeladd a label to the colourbar
figboldenmakes the lines on a figure bold, enlarges font sizes and adjusts colours for printing clearly
fig2emfoptionally makes the lines on a figure bold and then saves in windows metafile format
frac2binconverts numbers to fixed-point binary strings
lambda2rgbconvert wavelength to an RGB or XYZ triplet
sprintsiprints a value with the correct standard SI multiplier (e.g. 2100 prints as 2.1 k)
texthvcadd text to plots with specified alignment and colour
tilefigsarrange all figures on the screen
v_colormapset and display colormap information including colormaps that print well in monochrome
xticksiLabel the x-axis tick marks using SI multipliers for large and small values. Particularly useful for logarithmic plots.
yticksiLabel the y-axis tick marks using SI multipliers for large and small values. Particularly useful for logarithmic plots.
  

Voicebox Parameters and System  Interface

voiceboxcontains a number of installation-dependent global parameters and is likely to need editing for each particular setup.
unixwhichsearches the WINDOWS system path for an executable (like UNIX which command)
winenvarObtains WINDOWS environment variables

Utility Functions

atan2scarctangent function that returns the sin and cos of the angle
bitsprecRounds values to a precision of n bits
choosenkall possible ways of choosing k elements out of the numbers 1:n without duplications
choosrnkall possible ways of choosing k elements out of the numbers 1:n with duplications allowed
dlyapsqSolve the discrete lyapunov equation
dualdiagsimultaneously diagonalises two matrices: this is useful in computing LDA or IMELDA transforms.
finishatEstimate the finishing time of a long loop
fopenmkdEquivalent to FOPEN() but creates any missing directories/folders
hostipinfoGives information about computer name and internet connections
logsumcalculates log(sum(exp(x))) without overflow problems.
minspaneCalculates the minimum spanning tree (a.k.a. shortest spanning tree) of a set of n-dimensional points
mintraceFind a row permutation to minimize the trace of a matrix
m2htmlpwdCreate HTML documentation of matlab routines in the current directory
nearnonzReplace zero elements by the nearest non-zero elements
permutesall possible permutations of the numbers 1:n
quadpeakfind a quadratically-interpolated peak in a N-dimensional array by fitting a quadratic function to the array values
rotationgenerates rotation matrices
zerotrimremoves from a matrix any trailing rows and columns that are all zero.

【转帖】VOICEBOX: Speech Processing Toolbox for MATLAB