Speech Recognition-Zero Crossing Rate and Short-term Energy-Endpoint Detection

Keywords: MATLAB less Programming

Endpoint detection

Endpoint Detection Algorithmic Steps of Energy and Zero Crossing Number

  1. Speech signal x(n) subframe processing
  2. Calculate the short-time energy of each frame, and get the short-time frame energy of speech.
  3. Calculate the zero-crossing number of each frame and get the zero-crossing number of short-time frames.
  4. The average energy of the erased speech is set to a higher threshold T1 to determine the beginning of the speech, and then a lower threshold T2 is determined according to the average energy of the background noise to determine the end point of the first level speech. The second level judgement also sets a threshold T3 based on the average zero-crossing of background noise to judge the front voice voice and the back end voice.

a. Speech signal framing

filedir=[];%set up path
filename='D:\matlab\music\zj3.wav';
file=[filedir filename];
[x,Fs]=audioread(file);%Get the data of speech signal

wlen=200;%Frame length
inc=100;%Frame shift
win=hamming(wlen);%hamming window
N=length(x);%Signal Length
time=(0:N-1)/Fs;%Calculate the time scale of the signal

X=enframe(x,win,inc)'; %Framing,A column is a frame.
fn=size(X,2);%Frame number
frameTime=frame2time(fn,wlen,inc,Fs);  %Find the corresponding time of each frame
%This formula has to be looked at again.

b.

 %short time energy
  for i=1:fn
          y=X(:,i);%Data per frame
          b=0;
          for m=1:1:200 %This is based on the frame length.
          b=b+y(m).^2;   
          end
          E(i)=b;
  end
  
  %%Reference resources-Simpler-Not yet understood
  
  fn=size(X,2);              % Find out the number of frames
time=(0:N-1)/Fs;           % Calculate the time scale of the signal
for i=1 : fn
    u=X(:,i);              % Take out a frame
    u2=u.*u;               % Find out the energy
    En(i)=sum(u2);         % Summation of a frame
end

c. Zero Crossing Number of Short Time Frames

 %Short-term zero-crossing rate
   Z=zeros(1,fn);                 % Initialization
  for i=1:fn
          y=X(:,i);%Data per frame
          b=0;
          for m=1:1:199  %Depending on the frame length
          if y(m)*y(m+1)<0;
              b=b+1;
          end
          Z(i)=b;
          end
  end
  
  %%Simpler-Not yet understood
  fn=size(X,2);                     % Get the number of frames
zcr1=zeros(1,fn);                 % Initialization
for i=1:fn
    z=X(:,i);                     % Get a frame of data
    for j=1: (wlen- 1) ;          % Looking for Zero Crossing Points in a Frame
         if z(j)* z(j+1)< 0       % Judging whether it is a zero-crossing point
             zcr1(i)=zcr1(i)+1;   % It's zero crossing. Record it once.
         end
    end
end

d. Setting thresholds based on average energy

Average energy results:

[External Link Picture Transfer Failure (img-Ciy7MbEu-15639507473) (D: matlab Voice Signal Processing Experimental Course - Self assets Short-term Energy. jpg)]

E has 818 data

T1 is set to 0.01, that is, when a marker greater than 0.1 is found. The Beginning of Voice

T2 is set to 0.001, i.e. when a marker less than 0.001 is found. The End of Voice

Find a point, but programming is a bit of a problem. It's about showing only endpoints, not all of them.

[External Link Picture Transfer Failure (img-2wWpu6Tz-1563959907475)(assets/found point-1 but did not find endpoint.jpg)]

Amended

[External Link Picture Transfer Failure (img-SRvXb1fn-1563959907476)(assets/find point-something wrong. jpg)]

Zero-crossing rate endpoint is a bit problematic

%Endpoint Detection Algorithms for Energy and Zero Crossing Number 1
clear all;
clc;
filedir=[];%set up path
filename='D:\matlab\music\zs.wav';
file=[filedir filename];
[x,Fs]=audioread(file);
xmax=max(abs(x));
x=x/xmax';%normalization

x=filter([1 -0.98],[1],x);%Pre-aggravation

wlen=200;%Frame length
inc=100;%Frame shift
win=hamming(wlen);%hamming window
N=length(x);%Signal Length
time=(0:N-1)/Fs;%Calculate the time scale of the signal

X=enframe(x,win,inc)'; %Framing,A column is a frame.
% Xmax=max(abs(X));%Matrix normalization, that's not good.
% X=X/Xmax;
fn=size(X,2)';%Frame number
frameTime=frame2time(fn,wlen,inc,Fs);  %Find the corresponding time of each frame
%This formula has to be looked at again.

%short time energy
for i=1:fn
    y=X(:,i);%Data per frame
    b=0;
    for m=1:1:200%Data in a frame
        b=b+y(m).^2;
    end
    E(i)=b;
end


%Short-term zero-crossing rate
Z=zeros(1,fn);                 % Initialization, fn Previously used
for i=1:fn
    y=X(:,i);%Data per frame
    b=0;
    for m=1:1:199
        if y(m)*y(m+1)<0;
            b=b+1;
        end
        Z(i)=b;
    end
end

%Find the threshold of short-term energy to determine the beginning and end of speech
zeros(i);
q=[];%Store the location of the start voice boundaries
i1=1;
while (i1<length(E))
    for i1=i1:1:length(E)
        e=E(i1);
        if e>0.1
            q=[q i1-1];
            i1=i1+1;
            for i2=i1:length(E)
                e=E(i2);
                if e<0.1
                    q=[q i2+1];
                    i1=i2+1;
                    break
                end               
            end
            break
        end
    end
end

%Zero-crossing rate
i1=1;
w=[];%Storage End Speech Limit Location
while (i1<length(Z))
    for i1=i1:1:length(Z)
        e=Z(i1);
        if e>120
           w=[w i1];
            i1=i1+1;
            for i2=i1:length(Z)
                e=Z(i2);
                if e<50
                    w=[w i2+1];
                    i1=i2+1;
                    break
                end               
            end
            break
        end
    end
end

    %Drawing
    subplot(311)
    plot(time,x);
    title('original signal')
    xlabel('time');ylabel('Range');
    subplot(312)
    plot(frameTime(q),E(q),'or');
    hold on
    plot(frameTime,E);
      title('short time energy')
    xlabel('time');ylabel('Range');
    subplot(313)
    plot(frameTime(w),Z(w),'or');
    hold on
    plot(frameTime,Z);
      title('Zero-crossing rate')
    xlabel('time');ylabel('frequency'); 

Posted by sectachrome on Wed, 24 Jul 2019 04:00:38 -0700