Real Time Text

Introduction

This document provides a proposal for how to integrate the T.140 Real-Time Text protocol over RTP, as described in RFC 4103, into the SIP SIMPLE client and consequently into PJSIP. This is a joint effort between R3TF, NL NET fundation and AG Projects with the goal of enabling Real Time Text as generic purpose communication tool.

Use cases

  • One can call a device directly using real-time text. The called device can be any ToIP client, it is possible with or without voice included
  • One can decide to switch from IM mode to a more talking mode and press an activate real-time text button. This is similar with switching to voice.
  • There is an hybrid form of real-time to be used as well. Real-time text preview. See: http://tools.ietf.org/html/draft-hellstrom-textpreview What you get is you see immediately what the other side is typing. And when he or she presses enter, it is sent en-bloc IM style thus combining best of both worlds
  • Remote captioning, one big screen where one can modify the size of the text and receive the text interpreter text on it. The send window is almost non present in that mode (or only pulled up if one need to talk with the interpreter on one-on –one interpreter sessions).

This document does not include any information on how to integrate RTT into the GUI, as this is not relevant at the PyPJUA level, the GUI is a separate project.

Existing implementations

There are two known open source implementations of this protocol:

  • The RTP text/t140 Library.
    This library is used in the SIPCon1 client and is written in Java.
    Because it is written in Java and PJSIP is written in C, this library cannot be reused.
  • Asterisk supports relaying T.140 over RTP, but is not an endpoint.
    Perhaps some code can be recycled from it, in particular the redundancy support (or RED).

Relation to PyPJUA and PJSIP

Implementing T.140 over RTP can be subdivided into several tasks:

  • RTT codec development as described in RFC 4103 to transmit and receive text over RTP (written in C)
  • SDP negotiation for RTT session development (written in Python)
  • Handling events generated by the RTT codec in the Graphical User Interface (not addressed by this document)

Sending and receiving of RTP and RTCP packets is already implemented in PJSIP and will be reused. The actual RTT codec must be written in C similar to an audio stream implementation and pushed to PJSIP project repository. Decoding and encoding of SDP in SIP SIMPLE client is done in PyPjUA layer, so a dedicated Python object must be developed to handle it in SIP SIMPLE client.

Audio stream implementation in PJSIP

To determine how T.140 over RTP should be implemented in PJSIP, we first need to examine how the API to create and manage audio streams works in PJSIP, as RTT should be implemented in a similar fashion.

We consider the API from the point of view of PyPJUA, which in the diagram above is in the same position as PJSUA. Currently audio is the only type of media stream supported by PJSIP. First, the local SDP is generated, either when initating a new INVITE or when an incoming INVITE including SDP is received.
This can be done by PJSIP using the pjmedia_endpt_create_sdp() function, which creates SDP with exactly one audio stream.
Once SDP negotiation has been completed, which is done in the pjsip_inv module using the SDP negotation framework, PyPJUA is handed two pjmedia_sdp_session structs, which represent the local and remote SDP.
Using this, PyPJUA should perform the following steps:

RTT codec development

An object would need to be implemented that is similar to the pjmedia_stream object, but for a RTT stream, for example by the name of pjmedia_rtt_stream.
Since RTT only has one codec, the encoding and decoding can be integrated into this object.
First, the SDP needs to be generated in some way to include RTT, if the user requests this.
This will be done by PyPJUA.
The sequence of events to create and use a pjemdia_rtt_stream after SDP negotiation is complete would look like this:

  • For the audio stream, the pjmedia_stream_info structure will be initialized by the pjmedia_stream_info_from_sdp() function.
    This function may need some adjusting as currently it does only checking on audio codecs.
  • A pjmedia_rtt_stream object is created using a function like pjmedia_rtt_stream_create, passing as arguments the pjmedia_stream_info structure and a previously created pjmedia_transport object.
    This object will manage the following:
    • An instance of the T.140 encoder and decoder, which probably consists of some text buffers for RED.
    • Two instances of pjmedia_rtp_session, representing the RTP streams in both directions.
    • One instance of pjmdia_rtcp_session.
    • A reference to the media transport that was created earlier to carry the RTP and RTCP streams.
  • The stream should be started through some function, e.g. pjmedia_rtt_stream_start.
    This will set a timer that performs transmission every interval.
  • The application can transmit text by appending characters to the internal transmission buffer by calling some function, like pjmedia_rtt_stream_send.
  • The application could receive text either by reading from the internal reception buffer by calling a function, like pjmedia_rtt_stream_receive, or have a previously set callback called whenever there is new text.
    The former has the disadvantage that the application will have to do active polling on the object and possibly cause buffer overruns if it doesn't poll, the latter has the disadvantage that because of the short intervals the callback may be called quite frequently, which may prove to be inefficient, particularly in a Python environment.

SDP negotiation for RTT session

In PyPJUA a RTTStream class will be created, which represents a T.140 stream. It should have a method to send text and an event will be associated with it which will be called on the event handler whenever there is incoming text. Internally, it should also govern the SDP generation.

Handling events generated by the RTT codec

The GUI must be notified every time there is a new character received from RTT codec or every time the user types in a character in the GUI for an established session.