speechpy-fast-2.4/0000775000175000017500000000000013266015456015310 5ustar matthewmatthew00000000000000speechpy-fast-2.4/MANIFEST.in0000664000175000017500000000002313264176157017045 0ustar matthewmatthew00000000000000include README.rst speechpy-fast-2.4/setup.cfg0000664000175000017500000000012013266015456017122 0ustar matthewmatthew00000000000000[metadata] description-file = README.rst [egg_info] tag_build = tag_date = 0 speechpy-fast-2.4/setup.py0000775000175000017500000000123413266015406017020 0ustar matthewmatthew00000000000000from setuptools import setup, find_packages setup(name='speechpy-fast', version='2.4', description='A fork of the python package for extracting speech features.', author='Amirsina Torfi, Matthew Scholefield', author_email='matthew331199@gmail.com', url='https://github.com/matthewscholefield/speechpy', download_url = 'https://github.com/matthewscholefield/speechpy/archive/2.4.zip', packages=find_packages(exclude=('tests', 'docs')), include_package_data=True, install_requires=[ 'scipy', 'numpy', 'backports.functools_lru_cache;python_version<"3.2"' ], zip_safe=False) speechpy-fast-2.4/README.rst0000664000175000017500000003166713264176157017020 0ustar matthewmatthew00000000000000.. image:: _images/speechpy_logo.gif :target: https://github.com/astorfi/speech_feature_extraction/blob/master/images/speechpy_logo.gif ============================================ `SpeechPy Official Project Documentation`_ ============================================ .. image:: https://travis-ci.org/astorfi/speechpy.svg?branch=master :target: https://travis-ci.org/astorfi/speechpy .. image:: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat :target: https://github.com/astorfi/speechpy/pulls .. image:: https://coveralls.io/repos/github/astorfi/speechpy/badge.svg?branch=master :target: https://coveralls.io/github/astorfi/speechpy?branch=master .. image:: https://codecov.io/gh/astorfi/speechpy/branch/master/graph/badge.svg :target: https://codecov.io/gh/astorfi/speechpy .. image:: https://badge.fury.io/py/speechpy.svg :target: https://badge.fury.io/py/speechpy .. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1066373.svg :target: https://doi.org/10.5281/zenodo.1066373 .. _SpeechPy Official Project Documentation: http://speechpy.readthedocs.io This library provides most frequent used speech features including MFCCs and filterbank energies alongside with the log-energy of filterbanks. If you are interested to see what are MFCCs and how they are generated please refer to this `wiki `_ page. .. image:: _images/speech.gif ==================================== Documentation ==================================== Please refer to the following links for further informations: `SpeechPy Official Project Documentation`_ `Technical Report`_ .. _SpeechPy Official Project Documentation: http://speechpy.readthedocs.io .. _Technical Report: https://arxiv.org/abs/1803.01094 ==================================== Which Python versions are supported ==================================== Currently, the package has been tested and verified using Python ``2.7``, ``3.4`` and ``3.5``. ======== Citation ======== If you used this package, please kindly cite it as follows: .. code:: bash @article{torfi2018speechpy, title={SpeechPy-A Library for Speech Processing and Recognition}, author={Torfi, Amirsina}, journal={arXiv preprint arXiv:1803.01094}, year={2018} } =============== How to Install? =============== There are two possible ways for installation of this package: local installation and PyPi. ~~~~~~~~~~~~~~~~~~~ Local Installation ~~~~~~~~~~~~~~~~~~~ For local installation at first the repository must be cloned:: git clone https://github.com/astorfi/speech_feature_extraction.git After cloning the reposity, root to the repository directory then execute:: python setup.py develop ~~~~~ Pypi ~~~~~ The package is available on PyPi. For direct installation simply execute the following: .. code-block:: shell pip install speechpy ============================= What Features are supported? ============================= - Mel Frequency Cepstral Coefficients(MFCCs) - Filterbank Energies - Log Filterbank Energies Please refer to `SpeechPy Official Project Documentation`_ for details about the supported features. ~~~~~~~~~~~~~~ MFCC Features ~~~~~~~~~~~~~~ |pic1| |pic2| .. |pic1| image:: _images/Speech_GIF.gif :width: 45% .. |pic2| image:: _images/pipeline.jpg :width: 45% The supported attributes for generating MFCC features can be seen by investigating the related function: .. code-block:: python def mfcc(signal, sampling_frequency, frame_length=0.020, frame_stride=0.01,num_cepstral =13, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None, dc_elimination=True): """Compute MFCC features from an audio signal. :param signal: the audio signal from which to compute features. Should be an N x 1 array :param sampling_frequency: the sampling frequency of the signal we are working with. :param frame_length: the length of each frame in seconds. Default is 0.020s :param frame_stride: the step between successive frames in seconds. Default is 0.02s (means no overlap) :param num_filters: the number of filters in the filterbank, default 40. :param fft_length: number of FFT points. Default is 512. :param low_frequency: lowest band edge of mel filters. In Hz, default is 0. :param high_frequency: highest band edge of mel filters. In Hz, default is samplerate/2 :param num_cepstral: Number of cepstral coefficients. :param dc_elimination: hIf the first dc component should be eliminated or not. :returns: A numpy array of size (num_frames x num_cepstral) containing mfcc features. """ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Filterbank Energy Features ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python def mfe(signal, sampling_frequency, frame_length=0.020, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None): """Compute Mel-filterbank energy features from an audio signal. :param signal: the audio signal from which to compute features. Should be an N x 1 array :param sampling_frequency: the sampling frequency of the signal we are working with. :param frame_length: the length of each frame in seconds. Default is 0.020s :param frame_stride: the step between successive frames in seconds. Default is 0.02s (means no overlap) :param num_filters: the number of filters in the filterbank, default 40. :param fft_length: number of FFT points. Default is 512. :param low_frequency: lowest band edge of mel filters. In Hz, default is 0. :param high_frequency: highest band edge of mel filters. In Hz, default is samplerate/2 :returns: features: the energy of fiterbank: num_frames x num_filters frame_energies: the energy of each frame: num_frames x 1 """ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ log - Filterbank Energy Features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The attributes for ``log_filterbank energies`` are the same for ``filterbank energies`` too. .. code-block:: python def lmfe(signal, sampling_frequency, frame_length=0.020, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None): """Compute log Mel-filterbank energy features from an audio signal. :param signal: the audio signal from which to compute features. Should be an N x 1 array :param sampling_frequency: the sampling frequency of the signal we are working with. :param frame_length: the length of each frame in seconds. Default is 0.020s :param frame_stride: the step between successive frames in seconds. Default is 0.02s (means no overlap) :param num_filters: the number of filters in the filterbank, default 40. :param fft_length: number of FFT points. Default is 512. :param low_frequency: lowest band edge of mel filters. In Hz, default is 0. :param high_frequency: highest band edge of mel filters. In Hz, default is samplerate/2 :returns: features: the energy of fiterbank: num_frames x num_filters frame_log_energies: the log energy of each frame: num_frames x 1 """ ~~~~~~~~~~~~ Stack Frames ~~~~~~~~~~~~ In ``Stack_Frames`` function, the stack of frames will be generated from the signal. .. code-block:: python def stack_frames(sig, sampling_frequency, frame_length=0.020, frame_stride=0.020, Filter=lambda x: numpy.ones((x,)), zero_padding=True): """Frame a signal into overlapping frames. :param sig: The audio signal to frame of size (N,). :param sampling_frequency: The sampling frequency of the signal. :param frame_length: The length of the frame in second. :param frame_stride: The stride between frames. :param Filter: The time-domain filter for applying to each frame. By default it is one so nothing will be changed. :param zero_padding: If the samples is not a multiple of frame_length(number of frames sample), zero padding will be done for generating last frame. :returns: Array of frames. size: number_of_frames x frame_len. """ ================= Post Processing ================= There are some post-processing operation that are supported in ``speechpy``. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Global cepstral mean and variance normalization (CMVN) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This function performs global cepstral mean and variance normalization (CMVN) to remove the channel effects. The code assumes that there is one observation per row. .. code-block:: python def cmvn(vec, variance_normalization=False): """ This function is aimed to perform global ``cepstral mean and variance normalization`` (CMVN) on input feature vector "vec". The code assumes that there is one observation per row. :param: vec: input feature matrix (size:(num_observation,num_features)) variance_normalization: If the variance normilization should be performed or not. :return: The mean(or mean+variance) normalized feature vector. """ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Local cepstral mean and variance normalization (CMVN) over a sliding window ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This function performs local cepstral mean and variance normalization (CMVN) over sliding windows. The code assumes that there is one observation per row. .. code-block:: python def cmvnw(vec, win_size=301, variance_normalization=False): """ This function is aimed to perform local cepstral mean and variance normalization on a sliding window. (CMVN) on input feature vector "vec". The code assumes that there is one observation per row. :param vec: input feature matrix (size:(num_observation,num_features)) win_size: The size of sliding window for local normalization and should be odd. default=301 which is around 3s if 100 Hz rate is considered(== 10ms frame stide) variance_normalization: If the variance normilization should be performed or not. :return: The mean(or mean+variance) normalized feature vector. """ ~~~~~~~~~~~~ Test Example ~~~~~~~~~~~~ The test example can be seen in ``test/test.py`` as below: .. code-block:: python import scipy.io.wavfile as wav import numpy as np import speechpy import os file_name = os.path.join(os.path.dirname(os.path.abspath(__file__)),'Alesis-Sanctuary-QCard-AcoustcBas-C2.wav') fs, signal = wav.read(file_name) signal = signal[:,0] # Example of pre-emphasizing. signal_preemphasized = speechpy.processing.preemphasis(signal, cof=0.98) # Example of staching frames frames = speechpy.processing.stack_frames(signal, sampling_frequency=fs, frame_length=0.020, frame_stride=0.01, filter=lambda x: np.ones((x,)), zero_padding=True) # Example of extracting power spectrum power_spectrum = speechpy.processing.power_spectrum(frames, fft_points=512) print('power spectrum shape=', power_spectrum.shape) ############# Extract MFCC features ############# mfcc = speechpy.feature.mfcc(signal, sampling_frequency=fs, frame_length=0.020, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None) mfcc_cmvn = speechpy.processing.cmvnw(mfcc,win_size=301,variance_normalization=True) print('mfcc(mean + variance normalized) feature shape=', mfcc_cmvn.shape) mfcc_feature_cube = speechpy.feature.extract_derivative_feature(mfcc) print('mfcc feature cube shape=', mfcc_feature_cube.shape) ############# Extract logenergy features ############# logenergy = speechpy.feature.lmfe(signal, sampling_frequency=fs, frame_length=0.020, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None) logenergy_feature_cube = speechpy.feature.extract_derivative_feature(logenergy) print('logenergy features=', logenergy.shape) For ectracting the feature at first, the signal samples will be stacked into frames. The features are computed for each frame in the stacked frames collection. ============= Dependencies ============= Two packages of ``Scipy`` and ``NumPy`` are the required dependencies which will be installed automatically by running the ``setup.py`` file. =========== Disclaimer =========== Although by dramatic chages, some portion of this library is inspired by the `python speech features`_ library. .. _python speech features: https://github.com/jameslyons/python_speech_features We clain the following advantages for our library: 1. More accurate operations have been performed for the mel-frequency calculations. 2. The package supports different ``Python`` versions. 3. The feature are generated in a more organized way as cubic features. 4. The package is well-tested and integrated. 5. The package is up-to-date and actively developing. 6. The package has been used for research purposes. 7. Exceptions and extreme cases are handled in this library. speechpy-fast-2.4/speechpy/0000775000175000017500000000000013266015456017130 5ustar matthewmatthew00000000000000speechpy-fast-2.4/speechpy/functions.py0000775000175000017500000000240013264176157021515 0ustar matthewmatthew00000000000000from __future__ import division import numpy as np from . import processing from scipy.fftpack import dct import math def frequency_to_mel(f): """converting from frequency to Mel scale. :param f: The frequency values(or a single frequency) in Hz. :returns: The mel scale values(or a single mel). """ return 1127 * np.log(1 + f / 700.) def mel_to_frequency(mel): """converting from Mel scale to frequency. :param mel: The mel scale values(or a single mel). :returns: The frequency values(or a single frequency) in Hz. """ return 700 * (np.exp(mel / 1127.0) - 1) def triangle(x, left, middle, right): out = np.zeros(x.shape) out[x <= left] = 0 out[x >= right] = 0 first_half = np.logical_and(left < x, x <= middle) out[first_half] = (x[first_half] - left) / (middle - left) second_half = np.logical_and(middle <= x, x < right) out[second_half] = (right - x[second_half]) / (right - middle) return out def zero_handling(x): """ This function handle the issue with zero values if the are exposed to become an argument for any lof function. :param x: The vector. :return: The vector with zeros substituted with epsilon values. """ return np.where(x == 0, np.finfo(float).eps, x) speechpy-fast-2.4/speechpy/feature.py0000775000175000017500000002147213264176401021142 0ustar matthewmatthew00000000000000from __future__ import division import numpy as np from . import processing from scipy.fftpack import dct from . import functions try: from functools import lru_cache except ImportError: from backports.functools_lru_cache import lru_cache @lru_cache() def filterbanks(num_filter, fftpoints, sampling_freq, low_freq=None, high_freq=None): """Compute the Mel-filterbanks. Each filter will be stored in one rows. The columns correspond to fft bins. Args: num_filter (int): the number of filters in the filterbank, default 20. fftpoints (int): the FFT size. Default is 512. sampling_freq (float): the samplerate of the signal we are working with. Affects mel spacing. low_freq (float): lowest band edge of mel filters, default 0 Hz high_freq (float): highest band edge of mel filters, default samplerate/2 Returns: array: A numpy array of size num_filter x (fftpoints//2 + 1) which are filterbank """ high_freq = high_freq or sampling_freq / 2 low_freq = low_freq or 300 assert high_freq <= sampling_freq / 2, "High frequency cannot be greater than half of the sampling frequency!" assert low_freq >= 0, "low frequency cannot be less than zero!" ###################################################### ########### Computing the Mel filterbank ############# ###################################################### # converting the upper and lower frequencies to Mels. # num_filter + 2 is because for num_filter filterbanks we need num_filter+2 point. mels = np.linspace(functions.frequency_to_mel(low_freq), functions.frequency_to_mel(high_freq), num_filter + 2) # we should convert Mels back to Hertz because the start and end-points should be at the desired frequencies. hertz = functions.mel_to_frequency(mels) # The frequency resolution required to put filters at the # exact points calculated above should be extracted. # So we should round those frequencies to the closest FFT bin. freq_index = (np.floor((fftpoints + 1) * hertz / sampling_freq)).astype(int) # Initial definition filterbank = np.zeros([num_filter, fftpoints]) # The triangular function for each filter for i in range(0, num_filter): left = int(freq_index[i]) middle = int(freq_index[i + 1]) right = int(freq_index[i + 2]) z = np.linspace(left, right, num=right - left + 1) filterbank[i, left:right + 1] = functions.triangle(z, left=left, middle=middle, right=right) return filterbank def mfcc(signal, sampling_frequency, frame_length=0.020, frame_stride=0.01,num_cepstral =13, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None, dc_elimination=True): """Compute MFCC features from an audio signal. Args: signal (array): the audio signal from which to compute features. Should be an N x 1 array sampling_frequency (int): the sampling frequency of the signal we are working with. frame_length (float): the length of each frame in seconds. Default is 0.020s frame_stride (float): the step between successive frames in seconds. Default is 0.02s (means no overlap) num_filters (int): the number of filters in the filterbank, default 40. fft_length (int): number of FFT points. Default is 512. low_frequency (float): lowest band edge of mel filters. In Hz, default is 0. high_frequency (float): highest band edge of mel filters. In Hz, default is samplerate/2 num_cepstral (int): Number of cepstral coefficients. dc_elimination (bool): hIf the first dc component should be eliminated or not. Returns: array: A numpy array of size (num_frames x num_cepstral) containing mfcc features. """ feature, energy = mfe(signal, sampling_frequency=sampling_frequency, frame_length=frame_length, frame_stride=frame_stride, num_filters=num_filters, fft_length=fft_length, low_frequency=low_frequency, high_frequency=high_frequency) if len(feature) == 0: return np.empty((0, num_cepstral)) feature = np.log(feature) feature = dct(feature, type=2, axis=-1, norm='ortho')[:, :num_cepstral] # replace first cepstral coefficient with log of frame energy for DC elimination. if dc_elimination: feature[:, 0] = np.log(energy) return feature def mfe(signal, sampling_frequency, frame_length=0.020, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None): """Compute Mel-filterbank energy features from an audio signal. signal (array): the audio signal from which to compute features. Should be an N x 1 array sampling_frequency (int): the sampling frequency of the signal we are working with. frame_length (float): the length of each frame in seconds. Default is 0.020s frame_stride (float): the step between successive frames in seconds. Default is 0.02s (means no overlap) num_filters (int): the number of filters in the filterbank, default 40. fft_length (int): number of FFT points. Default is 512. low_frequency (float): lowest band edge of mel filters. In Hz, default is 0. high_frequency (float): highest band edge of mel filters. In Hz, default is samplerate/2 Returns: array: features - the energy of fiterbank: num_frames x num_filters frame_energies. The energy of each frame: num_frames x 1 """ # Convert to float signal = signal.astype(float) # Stack frames frames = processing.stack_frames(signal, sampling_frequency=sampling_frequency, frame_length=frame_length, frame_stride=frame_stride, filter=lambda x: np.ones((x,)), zero_padding=False) # getting the high frequency high_frequency = high_frequency or sampling_frequency / 2 # calculation of the power sprectum power_spectrum = processing.power_spectrum(frames, fft_length) number_fft_coefficients = power_spectrum.shape[1] frame_energies = np.sum(power_spectrum, 1) # this stores the total energy in each frame # Handling zero enegies. frame_energies = functions.zero_handling(frame_energies) # Extracting the filterbank filter_banks = filterbanks(num_filters, number_fft_coefficients, sampling_frequency, low_frequency, high_frequency) # Filterbank energies features = np.dot(power_spectrum, filter_banks.T) features = functions.zero_handling(features) return features, frame_energies def lmfe(signal, sampling_frequency, frame_length=0.020, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None): """Compute log Mel-filterbank energy features from an audio signal. Args: signal (array): the audio signal from which to compute features. Should be an N x 1 array sampling_frequency (int): the sampling frequency of the signal we are working with. frame_length (float): the length of each frame in seconds. Default is 0.020s frame_stride (float): the step between successive frames in seconds. Default is 0.02s (means no overlap) num_filters (int): the number of filters in the filterbank, default 40. fft_length (int): number of FFT points. Default is 512. low_frequency (float): lowest band edge of mel filters. In Hz, default is 0. high_frequency (float): highest band edge of mel filters. In Hz, default is samplerate/2 Returns: array: Features - The energy of fiterbank: num_frames x num_filters frame_log_energies. The log energy of each frame: num_frames x 1 """ feature, frame_energies = mfe(signal, sampling_frequency=sampling_frequency, frame_length=frame_length, frame_stride=frame_stride, num_filters=num_filters, fft_length=fft_length, low_frequency=low_frequency, high_frequency=high_frequency) feature = np.log(feature) return feature def extract_derivative_feature(feature): """ This function extracts temporal derivative features which are first and second derivatives. Args: feature (array): The feature vector which its size is: N x M Return: array: The feature cube vector which contains the static, first and second derivative features of size: N x M x 3 """ first_derivative_feature = processing.derivative_extraction(feature, DeltaWindows=2) second_derivative_feature = processing.derivative_extraction(first_derivative_feature, DeltaWindows=2) # Creating the future cube for each file feature_cube = np.concatenate( (feature[:, :, None], first_derivative_feature[:, :, None], second_derivative_feature[:, :, None]), axis=2) return feature_cube speechpy-fast-2.4/speechpy/__init__.py0000775000175000017500000000005713264176157021252 0ustar matthewmatthew00000000000000from . import feature from . import processing speechpy-fast-2.4/speechpy/processing.py0000775000175000017500000002455713264176401021672 0ustar matthewmatthew00000000000000import decimal import numpy as np import math try: from functools import lru_cache except ImportError: from backports.functools_lru_cache import lru_cache # 1.4 becomes 1 and 1.6 becomes 2. special case: 1.5 becomes 2. def round_half_up(number): return int(decimal.Decimal(number).quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_UP)) def preemphasis(signal, shift=1, cof=0.98): """preemphasising on the signal. Args: signal (array): The input signal. shift (int): The shift step. cof (float): The preemphasising coefficient. 0 equals to no filtering. Returns: the pre-emphasized signal. """ rolled_signal = np.roll(signal, shift) return signal - cof * rolled_signal @lru_cache() def _create_frame_indices(numframes, frame_stride, frame_sample_length): indices = np.tile(np.arange(0, frame_sample_length), (numframes, 1)) + np.tile( np.arange(0, numframes * frame_stride, frame_stride), (frame_sample_length, 1)).T return np.array(indices, dtype=np.int32) def stack_frames(sig, sampling_frequency, frame_length=0.020, frame_stride=0.020, filter=lambda x: np.ones((x,)), zero_padding=True): """Frame a signal into overlapping frames. Args: sig (array): The audio signal to frame of size (N,). sampling_frequency (int): The sampling frequency of the signal. frame_length (float): The length of the frame in second. frame_stride (float): The stride between frames. filter (array): The time-domain filter for applying to each frame. By default it is one so nothing will be changed. zero_padding (bool): If the samples is not a multiple of frame_length(number of frames sample), zero padding will be done for generating last frame. Returns: array: stacked_frames-Array of frames of size (number_of_frames x frame_len). """ ## Check dimension assert sig.ndim == 1, "Signal dimention should be of the format of (N,) but it is %s instead" % str(sig.shape) # Initial necessary values length_signal = sig.shape[0] frame_sample_length = int(sampling_frequency * frame_length + 0.5) # Defined by the number of samples frame_stride = float(int(sampling_frequency * frame_stride + 0.5)) # Zero padding is done for allocating space for the last frame. if zero_padding: # Calculation of number of frames numframes = 1 + int(math.ceil((length_signal - frame_sample_length) / frame_stride)) # Zero padding len_sig = int((numframes - 1) * frame_stride + frame_sample_length) additive_zeros = np.zeros((len_sig - length_signal,)) signal = np.concatenate((sig, additive_zeros)) else: # No zero padding! The last frame which does not have enough # samples(remaining samples <= frame_sample_length), will be dropped! numframes = 1 + int(math.floor((length_signal - frame_sample_length) / frame_stride)) # new length len_sig = int((numframes - 1) * frame_stride + frame_sample_length) signal = sig[0:len_sig] # Getting the indices of all frames. indices = _create_frame_indices(numframes, frame_stride, frame_sample_length) # Extracting the frames based on the allocated indices. frames = signal[indices] # Apply the windows function window = np.tile(filter(frame_sample_length), (numframes, 1)) Extracted_Frames = frames * window return Extracted_Frames def fft_spectrum(frames, fft_points=512): """This function computes the one-dimensional n-point discrete Fourier Transform (DFT) of a real-valued array by means of an efficient algorithm called the Fast Fourier Transform (FFT). Please refer to https://docs.scipy.org/doc/numpy/reference/generated/numpy.fft.rfft.html for further details. Args: frames (array): The frame array in which each row is a frame. fft_points (int): The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded. Returns: array: The fft spectrum - If frames is an num_frames x sample_per_frame matrix, output will be num_frames x FFT_LENGTH. """ SPECTRUM_VECTOR = np.fft.rfft(frames, n=fft_points, axis=-1, norm=None) return np.absolute(SPECTRUM_VECTOR) def power_spectrum(frames, fft_points=512): """Power spectrum of each frame. Args: frames (array): The frame array in which each row is a frame. fft_points (int): The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded. Returns: array: The power spectrum - If frames is an num_frames x sample_per_frame matrix, output will be num_frames x fft_length. """ return 1.0 / fft_points * np.square(fft_spectrum(frames, fft_points)) def log_power_spectrum(frames, fft_points=512, normalize=True): """Log power spectrum of each frame in frames. Args: frames (array): The frame array in which each row is a frame. fft_points (int): The length of FFT. If fft_length is greater than frame_len, the frames will be zero-padded. normalize (bool): If normalize=True, the log power spectrum will be normalized. Returns: array: The power spectrum - If frames is an num_frames x sample_per_frame matrix, output will be num_frames x fft_length. """ power_spec = power_spectrum(frames, fft_points) power_spec[power_spec <= 1e-20] = 1e-20 log_power_spec = 10 * np.log10(power_spec) if normalize: return log_power_spec - np.max(log_power_spec) else: return log_power_spec def derivative_extraction(feat, DeltaWindows): """This function the derivative features. Args: feat (array): The main feature vector(For returning the second order derivative it can be first-order derivative). DeltaWindows (int): The value of DeltaWindows is set using the configuration parameter DELTAWINDOW. Returns: array: Derivative feature vector - A NUMFRAMESxNUMFEATURES numpy array which is the derivative features along the features. """ # Getting the shape of the vector. rows, cols = feat.shape # Difining the vector of differences. DIF = np.zeros(feat.shape, dtype=float) Scale = 0 # Pad only along features in the vector. FEAT = np.lib.pad(feat, ((0, 0), (DeltaWindows, DeltaWindows)), 'edge') for i in range(DeltaWindows): # Start index offset = DeltaWindows # The dynamic range Range = i + 1 dif = Range * FEAT[:, offset + Range:offset + Range + cols] - FEAT[:, offset - Range:offset - Range + cols] Scale += 2 * np.power(Range, 2) DIF += dif return DIF / Scale def cmvn(vec, variance_normalization=False): """ This function is aimed to perform global cepstral mean and variance normalization (CMVN) on input feature vector "vec". The code assumes that there is one observation per row. Args: vec (array): input feature matrix (size:(num_observation,num_features)) variance_normalization (bool): If the variance normilization should be performed or not. Return: array: The mean(or mean+variance) normalized feature vector. """ eps = 2**-30 rows, cols = vec.shape # Mean calculation norm = np.mean(vec, axis=0) norm_vec = np.tile(norm, (rows, 1)) # Mean subtraction mean_subtracted = vec - norm_vec # Variance normalization if variance_normalization: stdev = np.std(mean_subtracted, axis=0) stdev_vec = np.tile(stdev, (rows, 1)) output = mean_subtracted / (stdev_vec + eps) else: output = mean_subtracted return output def cmvnw(vec, win_size=301, variance_normalization=False): """ This function is aimed to perform local cepstral mean and variance normalization on a sliding window. (CMVN) on input feature vector "vec". The code assumes that there is one observation per row. Args: vec (array): input feature matrix (size:(num_observation,num_features)) win_size (int): The size of sliding window for local normalization. Default=301 which is around 3s if 100 Hz rate is considered(== 10ms frame stide) variance_normalization (bool): If the variance normilization should be performed or not. Return: array: The mean(or mean+variance) normalized feature vector. """ # Get the shapes eps = 2**-30 rows, cols = vec.shape # Windows size must be odd. assert type(win_size) == int, "Size must be of type 'int'!" assert win_size % 2 == 1, "Windows size must be odd!" # Padding and initial definitions pad_size = int((win_size - 1) / 2) vec_pad = np.lib.pad(vec, ((pad_size, pad_size), (0, 0)), 'symmetric') mean_subtracted = np.zeros(np.shape(vec), dtype=np.float32) for i in range(rows): window = vec_pad[i:i + win_size, :] window_mean = np.mean(window, axis=0) mean_subtracted[i, :] = vec[i, :] - window_mean # Variance normalization if variance_normalization: # Initial definitions. variance_normalized = np.zeros(np.shape(vec), dtype=np.float32) vec_pad_variance = np.lib.pad(mean_subtracted, ((pad_size, pad_size), (0, 0)), 'symmetric') # Looping over all observations. for i in range(rows): window = vec_pad_variance[i:i + win_size, :] window_variance = np.std(window, axis=0) variance_normalized[i, :] = mean_subtracted[i, :] / (window_variance + eps) output = variance_normalized else: output = mean_subtracted return output # def resample_Fn(wave, fs, f_new=16000): # """This function resample the data to arbitrary frequency # :param fs: Frequency of the sound file. # :param wave: The sound file itself. # :returns: # f_new: The new frequency. # signal_new: The new signal samples at new frequency. # # dependency: from scikits.samplerate import resample # """ # # # Resampling using interpolation(There are other methods than 'sinc_best') # signal_new = resample(wave, float(f_new) / fs, 'sinc_best') # # # Necessary data converting for saving .wav file using scipy. # signal_new = np.asarray(signal_new, dtype=np.int16) # # # # Uncomment if you want to save the audio file # # # Save using new format # # wav.write(filename='resample_rainbow_16k.wav',rate=fr,data=signal_new) # return signal_new, f_new speechpy-fast-2.4/speechpy_fast.egg-info/0000775000175000017500000000000013266015456021637 5ustar matthewmatthew00000000000000speechpy-fast-2.4/speechpy_fast.egg-info/top_level.txt0000664000175000017500000000001113266015456024361 0ustar matthewmatthew00000000000000speechpy speechpy-fast-2.4/speechpy_fast.egg-info/requires.txt0000664000175000017500000000010513266015456024233 0ustar matthewmatthew00000000000000scipy numpy [:python_version < "3.2"] backports.functools_lru_cache speechpy-fast-2.4/speechpy_fast.egg-info/dependency_links.txt0000664000175000017500000000000113266015456025705 0ustar matthewmatthew00000000000000 speechpy-fast-2.4/speechpy_fast.egg-info/SOURCES.txt0000664000175000017500000000053313266015456023524 0ustar matthewmatthew00000000000000MANIFEST.in README.rst setup.cfg setup.py speechpy/__init__.py speechpy/feature.py speechpy/functions.py speechpy/processing.py speechpy_fast.egg-info/PKG-INFO speechpy_fast.egg-info/SOURCES.txt speechpy_fast.egg-info/dependency_links.txt speechpy_fast.egg-info/not-zip-safe speechpy_fast.egg-info/requires.txt speechpy_fast.egg-info/top_level.txtspeechpy-fast-2.4/speechpy_fast.egg-info/not-zip-safe0000664000175000017500000000000113265732607024070 0ustar matthewmatthew00000000000000 speechpy-fast-2.4/speechpy_fast.egg-info/PKG-INFO0000664000175000017500000000061613266015456022737 0ustar matthewmatthew00000000000000Metadata-Version: 1.1 Name: speechpy-fast Version: 2.4 Summary: A fork of the python package for extracting speech features. Home-page: https://github.com/matthewscholefield/speechpy Author: Amirsina Torfi, Matthew Scholefield Author-email: matthew331199@gmail.com License: UNKNOWN Download-URL: https://github.com/matthewscholefield/speechpy/archive/2.4.zip Description: UNKNOWN Platform: UNKNOWN speechpy-fast-2.4/PKG-INFO0000664000175000017500000000061613266015456016410 0ustar matthewmatthew00000000000000Metadata-Version: 1.1 Name: speechpy-fast Version: 2.4 Summary: A fork of the python package for extracting speech features. Home-page: https://github.com/matthewscholefield/speechpy Author: Amirsina Torfi, Matthew Scholefield Author-email: matthew331199@gmail.com License: UNKNOWN Download-URL: https://github.com/matthewscholefield/speechpy/archive/2.4.zip Description: UNKNOWN Platform: UNKNOWN