Controlling echo

Echo cancellers can be used to implement echo control for the following applications:

Application

Implementation

PSTN terminal

Improves DTMF detection (DTMF cut-through) or automatic speech recognition performance by eliminating leakage of playback audio into the receive signal path. This behavior typically applies to IVR or voice mail applications of AG Series or CG Series boards.

Network echo control

Eliminates talker echo so that peer-to-peer human communications do not suffer the annoying effects. This behavior typically applies to IP telephony gateway applications on AG Series or CG Series boards.

This topic presents:

Echo cancellation examples

DTMF cut-through example

In an IVR application, the user typically uses DTMF keys to make option selections. Since the user calling into the IVR system does not always wait for the whole message to be played, an echo canceller is needed to cancel the local and near-end echo of the prompt played by the application. Canceling the echo enables the local DTMF detector to recognize a received tone during the time the message is played.

The echo canceller improves the signal-to-noise ratio as seen by the DTMF detector on AG or CG Series boards. The useful signal is the received DTMF signal and the noise is the echo of the message prompt played by the board.

Similar to the example of DTMF cut-through, the echo canceller also helps in improving cleardown tone detection. The following illustration shows echo cancellation for DTMF cut-through:

Host-based ASR example

This application is similar to the DTMF cut-through example with the automatic speech recognition (ASR) system replacing or augmenting the DTMF detector for control of the IVR session. ASR algorithms require a high performance echo canceller. A prompt is played out on the board. The user commands the application by saying a keyword (for example, a number or a name) to make a selection. The person's response is processed by the software that runs on the host or on the board. A necessary condition for a correct recognition is an echo-free received signal. The echo canceller on the local board must respond quickly to any changes and totally cancel the echo without distorting the incoming signal.

The echo canceller provides settings to optimize performance for ASR applications. The ASR application may need to defeat its endpointing until the echo canceller has fully converged. Empirical tests have shown that the echo suppressor part of the echo canceller (sometimes called the non-linear processor or NLP) must be disabled. These controls can be used through ADI functions.

IP telephony gateway (network echo canceller) example

Another important application of echo cancellation can be found in IP telephony gateways. The following illustration shows the two gateways:

For a two-wire connection, the gateway echo canceller cancels the local echo generated by the on-board hybrid and the near-end echo generated by the near-end hybrid. The returned echo level for the echo canceller must be as low as possible because the one way delay for this type of connection can be 100 ms or more.

For a four-wire connection to the PSTN, the echo canceller cancels the near-end echo, or in some cases, no echo at all. For a near-end echo, the requirements are the same as in the previous case.

Without proper echo control in an IP telephony application, annoying echo can be heard by both speakers in a duplex voice conversation. The longer the delay through the IP network, the more unpleasant the effects of any residual or uncancelled echo.

Echo canceller components

The following illustration shows the structure of the echo canceller:

new_echo_can.gif

Echo cancellers are four port devices, two ports facing the near end and two ports facing the far end. The four ports are: Rin, Rout, Sin, and Sout.

R stands for receive if the port is situated in the receive path. S stands for send if the port is situated in the send path. The subscripts in and out define the input and output ports of the echo canceller on the corresponding path.

The main components of the echo canceller are:

Component

Description

Predelay buffer

Signals sent into the echo path enter a predelay buffer prior to being operated on by the FIR filter. Depending on the board, the predelay can be set from 0 to a maximum of 20 ms delay. This predelay is introduced to compensate for the pure delay in the echo path.

Finite impulse response (FIR) filter

The echo canceller FIR filter tries to mimic the echo path. The coefficients or taps of the FIR filter determined by the adaptation logic, determine the FIR filter response. The FIR filter converges to mimic the echo channel when the coefficients of this filter equal the impulse response of the echo path. The length of the FIR filter determines how much of an echo is covered by the echo canceller.

The FIR filter and the adaptation logic can be referred to as an adaptive filter.

Subtractor

The subtractor subtracts the output of the FIR filter from the signal in the send path. If the adaptation performs well (for example, the echo path has been exactly identified), Sin is equal to the adaptive filter output (echo estimate) and the difference is zero. Because the adaptive filter can never match the echo path exactly, the difference between Sin and the echo estimate is never zero. This difference is called the error signal and is used by the adaptive filter to improve its performance. The better the estimation of the echo path, the smaller the energy of the error signal. The attenuation of the signal at the output of the subtractor in relation to the Sin signal is denoted as echo returned loss enhancement (ERLE).

Adaptation logic

The adaptation logic updates the FIR filter coefficients using the error signal. A modified least mean square (LMS) algorithm is used to modify the coefficients in an iterative fashion. The application can freeze or stop this adaptation, or reset the value of the coefficients to restart convergence.

Double-talk detector

The double-talk detector detects when both callers speak at the same time (IP telephony application) or when DTMF is input to the system at the same time as audio playback (IVR). In the presence of double-talk, this detector sends a command to the adaptation logic to stop or slow the adaptation of the coefficients. Detecting the double talk situation is critical for correct operation of the echo canceller. If adaptation continues during double-talk, the adaptive filter modifies its coefficients based on the information contained in the Sin signal. In this case, this is the sum of the echo of Rout signal and the signal produced by the near-end talker. The adaptation would therefore be erroneous.

Non-linear processor (echo suppressor)

The non-linear processor is a device with a defined suppression threshold level in which signals having a level detected:

  • Below the threshold are suppressed.

  • Above the threshold are passed (although the signal can be distorted).

The non-linear processor functions only during single talk situations. The non-linear processor attenuates the residual echo that could not be cancelled by the adaptive filter.

Input gain

The application can provide input signal gain or loss.

Bypass

The application can bypass the echo canceller and restart at any time. Use Bypass when voiceband modems or fax machines terminate both ends of the connection in an IP telephony application.

Specifying echo canceller parameters

If you use echo cancellation in your application, you may need to modify the callctl.mediamask in ADI_START_PARMS (or the mediamask in NCC_ADI_START) before you start a telephony protocol. The mediamask controls which functions are running or reserved when the call enters the connected state. Reserved indicates that the DSP MIPS have been committed to the operation before the operation starts. The application must reserve DSP resources in advance by using mediamask for DTMF detection, silence detection, cleardown detection, and echo cancellation.

The ADI service initiates echo cancellation when a telephony protocol is started. The appropriate parameters must be set before calling adiStartProtocol or nccStartProtocol. For information on the echo cancellation parameters, refer to ADI_START_PARMS.

The echo canceller parameters can be modified after the echo canceller is started by calling adiModifyEchoCanceller.

For all board types, the predelay parameter time shifts the correlation buffer. This enables shorter filter lengths to be shifted in time, allowing more echo energy to be captured, as shown in the following illustration:

The default mode (mode = 1) chooses the best possible echo cancellation for the available DSP power on the board. Choosing echo cancellation parameters that consume more DSP power than is available can result in errors when all ports are active. To determine whether your boards support echo cancellation, refer to Default filter length and adaptation time values.

Configuring boards for echo cancellation

Echo cancellation requires board-specific settings.

Configuring AG boards for echo cancellation

For AG boards, configure the system for echo cancellation by editing the board keyword file. Add echo.m54, echo_v3.m54, or echo_v4.m54, depending on the features you require, to the list of files in DSP.C5x[x].Files[y].

For information on DSP file features, see DSP file summary.

To enable echo cancellation with the board's default settings, set the parameter ADI.START.echocancel.mode or NCC.X.ADI_START.echocancel.mode to 1. See Default filter length and adaptation time values.

For AG boards, as the predelay value is in increments, the correlated data buffer is shifted later in time. The predelay can be adjusted to center the correlated data on most of the echo energy. The valid range is from 0 to 20 milliseconds.

Refer to the board's installation and developer's manual for more information.

Configuring CG boards for echo cancellation

The resource definition string and the list of data processing modules (DPM) loaded on the DSPs on the CG boards have a default setup that includes echo.

To configure a CG board for echo cancellation, edit the board keyword file. Add echo.f54, echo_v3.f54, or echo_v4.f54, depending on the features you require, to the list of files in DSP.C5x[x].Files. For information on DSP file features, see DSP file summary.

CG 6565/C boards and CG 6060/C boards use C5441 DSPs and not C5420 DSPs for applications. The DSP files have .f41 extensions instead of .f54 extensions. For information about configuring hardware echo cancellation on CG 6565/C boards and CG 6060/C boards, refer to the board installation and developer's manual.

The default echo, Echo.In20_apt25 specified in the resource definition string, has a 20 ms filter length and an adapt rate of 25 percent of the maximum adaptation rate. If an echo different from Echo.In20_apt25 is needed, change the resource definition string. Replace the current echo in the resource definition string with the new echo.

Note: Changing a function in the resource definition string can decrease the number of ports that run on the board. Each DSP function has its own resource requirement. If the new function has higher resource requirements than the function it is replacing, the number of ports the board can run can be less.

Refer to the board installation and developer's manual for more information.

Default filter length and adaptation time values

To enable echo cancellation with default settings, set ADI.START.echocancel.mode or NCC.X.ADI_START.echocancel.mode to 1.

The following table shows the default filter length and adaptation time values for each board type:

Board type

Filter length

Adaptation time

CG

20 ms

200 (25 percent of maximum adaptation rate)

AG

4 ms

100 ms

To enable echo cancellation with specific parameters:

Features

The following table provides general information about the echo canceller features:

Features

AG

CG

Filter length

2,4,6,8,10,16,20,24,32, 40,48,64 ms

2,4,6,8,10,16,20,24, 32,40,48,64 ms

Echo pre-delay

0,1,2...20 ms

0,1,2...20 ms

Double talk detector

Yes

Yes

Input gain

Yes

Yes

Echo suppressor enable/disable

Yes

Yes

Adaptation enable/disable

Yes

Yes

Windowing enable

No

No

Bypass

Yes

Yes

Comfort noise generation

Yes

Yes

Tone disabling

Yes

Yes

Performance parameters

The following table provides general information about the echo canceller performance parameters:

Performance

AG

CG

Minimum echo return loss (ERLmin).

For all values of ERL greater than ERLmin, the echo canceller delivers the expected performance. If the real ERL is less than the ERLmin, the echo canceller does not function correctly.

6 dB

6 dB

Maximum echo return loss enhancement (ERLE).

33 dB

33 dB

Non-linear processor loss. An additional loss introduced in the reception path, only when pure echo is received (no near-end speech).

36 dB

36 dB

Typical convergence time on speech. Convergence time depends on the transmitted signal, double talk events, adaptation time parameter, and on echo return loss of the network. The convergence time can be greater than the values presented in this table.

Less than 1 second.

Obtained using echo_v3.x54. For echo.x54, the typical convergence time on speech is < 4 seconds.

Less than 1 second.

Obtained using echo_v3.x54.

Recommendations for controlling echo

Transmission level planning and echo

For IP telephony applications, proper audio levels and echo are tightly coupled. It is desirable to provide adequate listening levels; but increasing system gains anywhere in the four-wire trunk portion of a connection can make proper echo control difficult to attain under a wide range of telephony equipment and connection scenarios. In general, have no more than a zero dB of gain in each direction of the complete four-wire part of a connection. If it is necessary to increase the gain prior to a low-bit rate codec for example, there should be a commensurate loss at the output of the decoder.

Using Microsoft NetMeeting or other IP telephony clients

In IP telephony applications, the connection can be asymmetric. For example, you can talk on a telephone through an IP telephony gateway connected through an IP link to someone using Microsoft NetMeeting client on the remote end. At this NetMeeting client, the microphone and loudspeakers should not be used; a microphone headset is preferred. With a microphone and speaker combination at the NetMeeting client, the person on the telephone end of the connection will hear considerable echo due to the acoustic, loudspeaker-to-microphone acoustic coupling.

Delay and echo

In IP telephony applications, a user's tolerance to echo in a telephone conversation is reduced by the more end-to-end delay there is in the connection. IP packet delay is caused by routers and WAN facilities. High packet inter-arrival packet jitter usually must be absorbed by jitter buffers in the media gateway. The more jitter there is in the IP network, the longer the jitter buffer must be so the user does not experience poor audio quality due to packet loss.

Designers of IP telephony applications must reduce the number of routers and the amount of packet jitter so that any residual, uncanceled echo does not unnecessarily degrade the quality of the telephone connection.

Non-voice terminals (FAX and modem pass-through)

For IP telephony applications, it is desirable to handle non-voice communication devices such as modems. Modem transport can be handled by setting up a full duplex G.711 MSPP channel. Disable the echo canceller since it impairs both FDM (frequency division multiplexing) and EC (echo cancelling) modem transmission.

For T.30 FAX, T.38 can be used as a packet transport, or the MSPP channel can be set to G.711. In either case, disable the echo canceller.

Automatic speech recognition

Speech control of an IVR application can present special echo control challenges. Consider the following recommendations to improve the ability of the speaker to cut through a voice prompt to control the application:

Depending on which board you use, you may be able to select an echo canceller that has faster convergence (reduces echoes more quickly). For example, the CG 6000C echo canceller can be configured for a 100 percent adaptation rate (fast). With a 20 ms echo coverage, this canceller requires 5.40 MIPS.

Minimization of two-wire switching

Hybrids in telephony circuits convert two-wire transmission to and from four-wire transmission. Most modern circuit switched telephony switching is done at four-wire points in a connection. Older two-wire switching still exists. Each interface from a two-wire to a four-wire connection can be a source of echoes. Therefore, wherever possible, minimize the use of two-wire switching.