Zafar's Audio Functions in Matlab for audio signal analysis.
Files:
zaf.m
: Matlab class with the audio functions.examples.ipynb
: Jupyter notebook with some examples.audio_file.wav
: audio file used for the examples.
See also:
- Zaf-Julia: Zafar's Audio Functions in Julia for audio signal analysis.
- Zaf-Python: Zafar's Audio Functions in Python for audio signal analysis.
This Matlab class implements a number of functions for audio signal analysis.
Simply copy the file zaf.m
in your working directory and you are good to go.
Functions:
stft
- Compute the short-time Fourier transform (STFT).istft
- Compute the inverse STFT.melfilterbank
- Compute the mel filterbank.melspectrogram
- Compute the mel spectrogram using a mel filterbank.mfcc
- Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.cqtkernel
- Compute the constant-Q transform (CQT) kernel.cqtspectrogram
- Compute the CQT spectrogram using a CQT kernel.cqtchromagram
- Compute the CQT chromagram using a CQT kernel.dct
- Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).dst
- Compute the discrete sine transform (DST) using the FFT.mdct
- Compute the modified discrete cosine transform (MDCT) using the FFT.imdct
- Compute the inverse MDCT using the FFT.
Other:
sigplot
- Plot a signal in seconds.specshow
- Display a spectrogram in dB, seconds, and Hz.melspecshow
- Display a mel spectrogram in dB, seconds, and Hz.mfccshow
- Display MFCCs in seconds.cqtspecshow
- Display a CQT spectrogram in dB, seconds, and Hz.cqtchromshow
- Display a CQT chromagram in seconds.
Compute the short-time Fourier transform (STFT).
audio_stft = zaf.stft(audio_signal, window_function, step_length)
Inputs:
audio_signal: audio signal (number_samples,)
window_function: window function (window_length,)
step_length: step length in samples
Output:
audio_stft: audio STFT (window_length, number_frames)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Set the window duration in seconds (audio is stationary around 40 milliseconds)
window_duration = 0.04;
% Derive the window length in samples (use powers of 2 for faster FFT and constant overlap-add (COLA))
window_length = 2^nextpow2(window_duration*sampling_frequency);
% Compute the window function (periodic Hamming window for COLA)
window_function = hamming(window_length,'periodic');
% Set the step length in samples (half of the window length for COLA)
step_length = window_length/2;
% Compute the STFT
audio_stft = zaf.stft(audio_signal,window_function,step_length);
% Derive the magnitude spectrogram (without the DC component and the mirrored frequencies)
audio_spectrogram = abs(audio_stft(2:window_length/2+1,:));
% Display the spectrogram in dB, seconds, and Hz
number_samples = length(audio_signal);
xtick_step = 1;
ytick_step = 1000;
figure
zaf.specshow(audio_spectrogram, length(audio_signal), sampling_frequency, xtick_step, ytick_step);
title('Spectrogram (dB)')
Compute the inverse short-time Fourier transform (STFT).
audio_signal = zaf.istft(audio_stft, window_function, step_length)
Inputs:
audio_stft: audio STFT (window_length, number_frames)
window_function: window function (window_length,)
step_length: step length in samples
Output:
audio_signal: audio signal (number_samples,)
% Read the (stereo) audio signal with its sampling frequency in Hz
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
% Set the parameters for the STFT
window_length = 2^nextpow2(0.04*sampling_frequency);
window_function = hamming(window_length,'periodic');
step_length = window_length/2;
% Compute the STFTs for the left and right channels
audio_stft1 = zaf.stft(audio_signal(:,1),window_function,step_length);
audio_stft2 = zaf.stft(audio_signal(:,2),window_function,step_length);
% Derive the magnitude spectrograms (with DC component) for the left and right channels
audio_spectrogram1 = abs(audio_stft1(1:window_length/2+1,:));
audio_spectrogram2 = abs(audio_stft2(1:window_length/2+1,:));
% Estimate the time-frequency masks for the left and right channels for the center
center_mask1 = min(audio_spectrogram1,audio_spectrogram2)./audio_spectrogram1;
center_mask2 = min(audio_spectrogram1,audio_spectrogram2)./audio_spectrogram2;
% Derive the STFTs for the left and right channels for the center (with mirrored frequencies)
center_stft1 = [center_mask1;center_mask1(window_length/2:-1:2,:)].*audio_stft1;
center_stft2 = [center_mask2;center_mask2(window_length/2:-1:2,:)].*audio_stft2;
% Synthesize the signals for the left and right channels for the center
center_signal1 = zaf.istft(center_stft1,window_function,step_length);
center_signal2 = zaf.istft(center_stft2,window_function,step_length);
% Derive the final stereo center and sides signals
center_signal = [center_signal1,center_signal2];
center_signal = center_signal(1:length(audio_signal),:);
sides_signal = audio_signal-center_signal;
% Write the center and sides signals
audiowrite('center_signal.wav',center_signal,sampling_frequency);
audiowrite('sides_signal.wav',sides_signal,sampling_frequency);
% Display the original, center, and sides signals in seconds
xtick_step = 1;
figure
subplot(3,1,1), zaf.sigplot(audio_signal, sampling_frequency, xtick_step), ylim([-1,1]), title("Original signal")
subplot(3,1,2), zaf.sigplot(center_signal, sampling_frequency, xtick_step), ylim([-1,1]), title("Center signal")
subplot(3,1,3), zaf.sigplot(sides_signal, sampling_frequency, xtick_step), ylim([-1,1]), title("Sides signal")
Compute the mel filterbank.
mel_filterbank = zaf.melfilterbank(sampling_frequency, window_length, number_mels)
Inputs:
sampling_frequency: sampling frequency in Hz
window_length: window length for the Fourier analysis in samples
number_mels: number of mel filters
Output:
mel_filterbank: mel filterbank (sparse) (number_mels, number_frequencies)
% Compute the mel filterbank using some parameters
sampling_frequency = 44100;
window_length = 2^nextpow2(0.04*sampling_frequency);
number_mels = 128;
mel_filterbank = zaf.melfilterbank(sampling_frequency,window_length,number_mels);
% Display the mel filterbank
figure
imagesc(mel_filterbank)
axis xy
colormap(jet)
title('Mel filterbank')
xlabel('Frequency index')
ylabel('Mel index')
Compute the mel spectrogram using a mel filterbank.
mel_spectrogram = zaf.melspectrogram(audio_signal, window_function, step_length, mel_filterbank)
Inputs:
audio_signal: audio signal (number_samples,)
window_function: window function (window_length,)
step_length: step length in samples
mel_filterbank: mel filterbank (number_mels, number_frequencies)
Output:
mel_spectrogram: mel spectrogram (number_mels, number_times)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Set the parameters for the Fourier analysis
window_length = 2^nextpow2(0.04*sampling_frequency);
window_function = hamming(window_length,'periodic');
step_length = window_length/2;
% Compute the mel filterbank
number_mels = 128;
mel_filterbank = zaf.melfilterbank(sampling_frequency,window_length,number_mels);
% Compute the mel spectrogram using the filterbank
mel_spectrogram = zaf.melspectrogram(audio_signal,window_function,step_length,mel_filterbank);
% Display the mel spectrogram in in dB, seconds, and Hz
number_samples = length(audio_signal);
xtick_step = 1;
figure
zaf.melspecshow(mel_spectrogram, number_samples, sampling_frequency, window_length, xtick_step)
title('Mel spectrogram (dB)')
Compute the mel-frequency cepstral coefficients (MFCCs) using a mel filterbank.
audio_mfcc = zaf.mfcc(audio_signal, window_function, step_length, mel_filterbank, number_coefficients)
Inputs:
audio_signal: audio signal (number_samples,)
window_function: window function (window_length,)
step_length: step length in samples
mel_filterbank: mel filterbank (number_mels, number_frequencies)
number_coefficients: number of coefficients (without the 0th coefficient)
Output:
audio_mfcc: audio MFCCs (number_times, number_coefficients)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Set the parameters for the Fourier analysis
window_length = 2^nextpow2(0.04*sampling_frequency);
window_function = hamming(window_length,'periodic');
step_length = window_length/2;
% Compute the mel filterbank
number_mels = 40;
mel_filterbank = zaf.melfilterbank(sampling_frequency,window_length,number_mels);
% Compute the MFCCs using the filterbank
number_coefficients = 20;
audio_mfcc = zaf.mfcc(audio_signal,window_function,step_length,mel_filterbank,number_coefficients);
% Compute the delta and delta-delta MFCCs
audio_dmfcc = diff(audio_mfcc,1,2);
audio_ddmfcc = diff(audio_dmfcc,1,2);
% Compute the time resolution for the MFCCs in number of time frames per second (~ sampling frequency for the MFCCs)
time_resolution = sampling_frequency*size(audio_mfcc,2)/length(audio_signal);
% Display the MFCCs, delta MFCCs, and delta-delta MFCCs in seconds
xtick_step = 1;
number_samples = length(audio_signal);
figure
subplot(3,1,1), zaf.mfccshow(audio_mfcc,number_samples,sampling_frequency,xtick_step), title('MFCCs')
subplot(3,1,2), zaf.mfccshow(audio_dmfcc,number_samples,sampling_frequency,xtick_step), title('Delta MFCCs')
subplot(3,1,3), zaf.mfccshow(audio_ddmfcc,number_samples,sampling_frequency,xtick_step), title('Delta-delta MFCCs')
Compute the constant-Q transform (CQT) kernel.
cqt_kernel = zaf.cqtkernel(sampling_frequency, octave_resolution, minimum_frequency, maximum_frequency)
Inputs:
sampling_frequency: sampling frequency in Hz
octave_resolution: number of frequency channels per octave
minimum_frequency: minimum frequency in Hz
maximum_frequency: maximum frequency in Hz
Output:
cqt_kernel: CQT kernel (sparse) (number_frequencies, fft_length)
% Set the parameters for the CQT kernel
sampling_frequency = 44100;
octave_resolution = 24;
minimum_frequency = 55;
maximum_frequency = sampling_frequency/2;
% Compute the CQT kernel
cqt_kernel = zaf.cqtkernel(sampling_frequency,octave_resolution,minimum_frequency,maximum_frequency);
% Display the magnitude CQT kernel
figure
imagesc(abs(cqt_kernel))
axis xy
colormap(jet)
title('Magnitude CQT kernel')
xlabel('FFT index')
ylabel('CQT index')
Compute the constant-Q transform (CQT) spectrogram using a CQT kernel.
cqt_spectrogram = zaf.cqtspectrogram(audio_signal, sampling_frequency, time_resolution, cqt_kernel)
Inputs:
audio_signal: audio signal (number_samples,)
sampling_frequency: sampling frequency in Hz
time_resolution: number of time frames per second
cqt_kernel: CQT kernel (number_frequencies, fft_length)
Output:
cqt_spectrogram: CQT spectrogram in magnitude (number_frequencies, number_times)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Compute the CQT kernel
octave_resolution = 24;
minimum_frequency = 55;
maximum_frequency = 3520;
cqt_kernel = zaf.cqtkernel(sampling_frequency,octave_resolution,minimum_frequency,maximum_frequency);
% Compute the CQT spectrogram using the kernel
time_resolution = 25;
cqt_spectrogram = zaf.cqtspectrogram(audio_signal,sampling_frequency,time_resolution,cqt_kernel);
% Display the CQT spectrogram in dB, seconds, and Hz
xtick_step = 1;
figure
zaf.cqtspecshow(cqt_spectrogram,time_resolution,octave_resolution,minimum_frequency,xtick_step);
title('CQT spectrogram (dB)')
Compute the constant-Q transform (CQT) chromagram using a CQT kernel.
cqt_chromagram = zaf.cqtchromagram(audio_signal,sampling_frequency,time_resolution,octave_resolution,cqt_kernel)
Inputs:
audio_signal: audio signal (number_samples,)
sampling_frequency: sampling frequency in Hz
time_resolution: number of time frames per second
octave_resolution: frequency channels per octave
cqt_kernel: CQT kernel (number_frequencies, fft_length)
Output:
cqt_chromagram: CQT chromagram (number_chromas, number_times)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Compute the CQT kernel
octave_resolution = 24;
minimum_frequency = 55;
maximum_frequency = 3520;
cqt_kernel = zaf.cqtkernel(sampling_frequency,octave_resolution,minimum_frequency,maximum_frequency);
% Compute the CQT chromagram using the kernel
time_resolution = 25;
cqt_chromagram = zaf.cqtchromagram(audio_signal,sampling_frequency,time_resolution,octave_resolution,cqt_kernel);
% Display the CQT chromagram in seconds
xtick_step = 1;
figure
zaf.cqtchromshow(cqt_chromagram,time_resolution,xtick_step)
title('CQT chromagram')
Compute the discrete cosine transform (DCT) using the fast Fourier transform (FFT).
audio_dct = zaf.dct(audio_signal, dct_type)
Inputs:
audio_signal: audio signal (window_length,)
dct_type: dct type (1, 2, 3, or 4)
Output:
audio_dct: audio DCT (number_frequencies,)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Get an audio segment for a given window length
window_length = 1024;
audio_segment = audio_signal(1:window_length);
% Compute the DCT-I, II, III, and IV
audio_dct1 = zaf.dct(audio_segment,1);
audio_dct2 = zaf.dct(audio_segment,2);
audio_dct3 = zaf.dct(audio_segment,3);
audio_dct4 = zaf.dct(audio_segment,4);
% Compute MATLAB's DCT-I, II, III, and IV
matlab_dct1 = dct(audio_segment,'Type',1);
matlab_dct2 = dct(audio_segment,'Type',2);
matlab_dct3 = dct(audio_segment,'Type',3);
matlab_dct4 = dct(audio_segment,'Type',4);
% Plot the DCT-I, II, III, and IV, MATLAB's versions, and their differences
figure
subplot(3,4,1), plot(audio_dct1), xlim([0,window_length]), title('DCT-I')
subplot(3,4,2), plot(audio_dct2), xlim([0,window_length]), title('DCT-II')
subplot(3,4,3), plot(audio_dct3), xlim([0,window_length]), title('DCT-III')
subplot(3,4,4), plot(audio_dct4), xlim([0,window_length]), title('DCT-IV')
subplot(3,4,5), plot(matlab_dct1), xlim([0,window_length]), title('MATLAB''s DCT-I')
subplot(3,4,6), plot(matlab_dct2), xlim([0,window_length]), title('MATLAB''s DCT-II')
subplot(3,4,7), plot(matlab_dct3), xlim([0,window_length]), title('MATLAB''s DCT-III')
subplot(3,4,8), plot(matlab_dct4), xlim([0,window_length]), title('MATLAB''s DCT-IV')
subplot(3,4,9), plot(audio_dct1-matlab_dct1), xlim([0,window_length]), title('DCT-I - MATLAB''s DCT-I')
subplot(3,4,10), plot(audio_dct2-matlab_dct2), xlim([0,window_length]), title('DCT-II - MATLAB''s DCT-II')
subplot(3,4,11), plot(audio_dct3-matlab_dct3), xlim([0,window_length]), title('DCT-III - MATLAB''s DCT-III')
subplot(3,4,12), plot(audio_dct4-matlab_dct4), xlim([0,window_length]), title('DCT-IV - MATLAB''s DCT-IV')
Compute the discrete sine transform (DST) using the fast Fourier transform (FFT).
audio_dst = zaf.dst(audio_signal, dst_type)
Inputs:
audio_signal: audio signal (window_length,)
dst_type: DST type (1, 2, 3, or 4)
Output:
audio_dst: audio DST (number_frequencies,)
Example: Compute the 4 different DSTs and compare their respective inverses with the original audio.
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Get an audio segment for a given window length
window_length = 1024;
audio_segment = audio_signal(1:window_length);
% Compute the DST-I, II, III, and IV
audio_dst1 = zaf.dst(audio_segment,1);
audio_dst2 = zaf.dst(audio_segment,2);
audio_dst3 = zaf.dst(audio_segment,3);
audio_dst4 = zaf.dst(audio_segment,4);
% Compute their respective inverses, i.e., DST-I, II, III, and IV
audio_idst1 = zaf.dst(audio_dst1,1);
audio_idst2 = zaf.dst(audio_dst2,3);
audio_idst3 = zaf.dst(audio_dst3,2);
audio_idst4 = zaf.dst(audio_dst4,4);
% Plot the DST-I, II, III, and IV, their respective inverses, and their differences with the original audio segment
figure
subplot(3,4,1), plot(audio_dst1), xlim([0,window_length]), title('DST-I')
subplot(3,4,2), plot(audio_dst2), xlim([0,window_length]), title('DST-II')
subplot(3,4,3), plot(audio_dst3), xlim([0,window_length]), title('DST-III')
subplot(3,4,4), plot(audio_dst4), xlim([0,window_length]), title('DST-IV')
subplot(3,4,5), plot(audio_idst1), xlim([0,window_length]), title('Inverse DST-I (DST-I)')
subplot(3,4,6), plot(audio_idst2), xlim([0,window_length]), title('Inverse DST-II (DST-III)')
subplot(3,4,7), plot(audio_idst3), xlim([0,window_length]), title('Inverse DST-III (DST-II)')
subplot(3,4,8), plot(audio_idst4), xlim([0,window_length]), title('Inverse DST-IV (DST-IV)')
subplot(3,4,9), plot(audio_idst1-audio_segment), xlim([0,window_length]), title('Inverse DST-I - audio segment')
subplot(3,4,10), plot(audio_idst2-audio_segment), xlim([0,window_length]), title('Inverse DST-II - audio segment')
subplot(3,4,11), plot(audio_idst3-audio_segment), xlim([0,window_length]), title('Inverse DST-III - audio segment')
subplot(3,4,12), plot(audio_idst4-audio_segment), xlim([0,window_length]), title('Inverse DST-IV - audio segment')
Compute the modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
audio_mdct = zaf.mdct(audio_signal, window_function)
Inputs:
audio_signal: audio signal (number_samples,)
window_function: window function (window_length,)
Output:
audio_mdct: audio MDCT (number_frequencies, number_times)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Compute the Kaiser-Bessel-derived (KBD) window as used in the AC-3 audio coding format
window_length = 512;
alpha_value = 5;
window_function = kaiser(window_length/2+1,alpha_value*pi);
window_function2 = cumsum(window_function(1:window_length/2));
window_function = sqrt([window_function2; window_function2(window_length/2:-1:1)]/sum(window_function));
% Compute the MDCT
audio_mdct = zaf.mdct(audio_signal,window_function);
% Display the MDCT in dB, seconds, and Hz
number_samples = length(audio_signal);
xtick_step = 1;
ytick_step = 1000;
figure
zaf.specshow(abs(audio_mdct),number_samples,sampling_frequency,xtick_step,ytick_step)
title('MDCT (dB)')
Compute the inverse modified discrete cosine transform (MDCT) using the fast Fourier transform (FFT).
audio_signal = zaf.imdct(audio_mdct, window_function)
Inputs:
audio_mdct: audio MDCT (number_frequencies, number_times)
window_function: window function (window_length,)
Output:
audio_signal: audio signal (number_samples,)
% Read the audio signal with its sampling frequency in Hz, and average it over its channels
[audio_signal,sampling_frequency] = audioread('audio_file.wav');
audio_signal = mean(audio_signal,2);
% Compute the MDCT with a slope function as used in the Vorbis audio coding format
window_length = 2048;
window_function = sin(pi/2*(sin(pi/window_length*(0.5:window_length-0.5)').^2));
audio_mdct = zaf.mdct(audio_signal,window_function);
% Compute the inverse MDCT
audio_signal2 = zaf.imdct(audio_mdct,window_function);
audio_signal2 = audio_signal2(1:length(audio_signal));
% Compute the differences between the original signal and the resynthesized one
audio_differences = audio_signal-audio_signal2;
y_max = max(abs(audio_differences));
% Display the original and resynthesized signals, and their differences in seconds
xtick_step = 1;
figure
subplot(3,1,1), zaf.sigplot(audio_signal,sampling_frequency,xtick_step), ylim([-1,1]), title('Original signal')
subplot(3,1,2), zaf.sigplot(audio_signal2,sampling_frequency,xtick_step), ylim([-1,1]), title('Resyntesized signal')
subplot(3,1,3), zaf.sigplot(audio_differences,sampling_frequency,xtick_step), ylim([-y_max,y_max]), title('Original - resyntesized signal')
This Jupyter notebook shows some examples for the different functions of the Matlab class zaf
.
23 second audio excerpt from the song Que Pena Tanto Faz performed by Tamy.
- Zafar Rafii
- http://zafarrafii.com/
- CV
- GitHub
- Google Scholar