Media resource control protocol (MRCP)

Media resource control protocol (MRCP) is an application protocol for accessing automatic speech recognition (ASR) and text-to-speech (TTS) engines through IP networks. MRCP minimizes the resources needed to integrate various speech-based technologies for ASR and TTS platforms.

MRCP is designed for network-based solutions where ASR or TTS servers are configured to work together with VoiceXML interpreters, media gateways, and application servers. MRCP uses RTP (real time protocol) to transport audio information received from callers, as well as audio transferred to callers, from recordings or text-to-speech processes. The following illustration shows MRCP components:

The MRCP protocol does not specify how the control session is established with the server. It relies on real time streaming protocol (RTSP) to establish and maintain sessions. The session description protocol (SDP) is responsible for establishing media connections from clients to the network servers.

MRCP components

The MRCP architecture consists of the following components:

Component

Description

Clients

Provides media streams that can be generated or processed by ASR or TTS engines.

Servers

Provides resources or devices for processing or generating the streams. Examples include speech recognizers, speech synthesizers, speaker verification and speaker identification servers, and  signal generators and detectors.


The MRCP protocol defines requests, responses, and events that control media processing resources and the state machine for each resource. The MRCP control architecture has the following characteristics:

The following illustration shows MRCP media streaming and control: