Faxing over IP networks


Please Note: This is not my own work, but a republication of the article published at http://osmon.net/~josmon/foip/ (originally published at http://www.soft-switch.org/foip.html).


Executive summary

FAXing over VoIP networks doesn't work. You can sometimes arrange things so a fairly high percentage of FAXes get through OK. You can occassionally create setups that work 100% of the time. These are rare and unrepeatable setups. You need to use a proper FAX over IP protocol, such as T.38, to achieve consistent reliable FAXing across IP networks.

For the skeptical

Don't believe FAXing over VoIP protocols won't work? Heard that things will work OK if you use the G.711 u-law or A-law codec? Read on if you want to see why things are much more complex, and troublesome, than that.

Trying to live with FAXing over VoIP networks

Sending FAXes over VoIP networks usually fails. It is human nature to look for simple reasons for that, and simple cures. In reality, there are a number of reasons, and no certain universal cures. VoIP networks are designed to do a good job with speech. Carrying any sound other than a single voice speaking is not generally a system requirement. It shouldn't be too surprising if it works rather poorly.

During development work I have started finding some nasty behaviour in real world products, which I have started to document here.

You cannot get a quart into a pint point

The commonest problem with sending a FAX over VoIP networks is the easiest to deal with. A low bit rate voice codec is unable to carry a fast modem signal without severe distortion. Would you really expect an 8kbps G.729 codec to convey a 9.6kbps FAX modem signal correctly? If you would, you probably also believe in perpetual motion. I might even be able to offer you a good deal if you would like to buy London Bridge :-) The only common codecs capable of adequately preserving FAX modem signals up to 14,400bps (V.17) are u-law and A-law. Up to 9600bps (V.29) a fully implemented G.726 codec will also work. However not all codecs claiming to be G.726 fully implement the spec. A few shortcuts can save considerable compute power, and only a few people need the spec. to be fully implemented. Your mileage may vary. The G.726 codec was, however, specifically designed to be able to carry medium speed modems, such as the V.29 modem used for FAX.

Recently, FAX machines supporting 33,600bps (V.34bis) have become popular. This rate is unlikely to work with any reliability across any VoIP connection, even when an A-law or u-law codec is used. The codecs will maintain the required signal quality, but the delay across the VoIP channel, even if it is a stable delay, will prevent the echo cancelers in most modems from training well enough. The slower FAX modems - V.27ter, V.29 and V.17 - do not use echo cancellation, so the problem does not exist there.

Lower bit rate codecs have zero chance of working for any standard FAX image modem. Many will convey the 300bps (V.21) FAX control messages OK. They will not convey the fast modem signals, used for the actual image data.

Modems don't like relativity

In the PSTN world, the network provides a constant delay for any particular call. The speed at which data enters the network is always the same as the speed at which it leaves. The end to end delay does not jitter, or make step changes in anything but exceptional circumstances (e.g. on automatic fail-over, if a fibre link fails). Modems require this. In an IP network jitter it a fact of life. It can be kept to a modest level, through the use of the QoS (quality of service) features available in a lot of IP equipment, but only if you control the network end-to-end. If the call passes across the open Internet there is no QoS control. It is hard to see a business model that would ever encourage QoS to be introduced across the open Internet. So, in the long term the timing of a voice signal entering a VoIP network is the same as the timing as it leaves, but in the short term they can be very different.

If a VoIP network works only across a LAN or a QoS managed WAN link, there might be a near guarantee of zero packet loss, and fairly low jitter. Many people then assume the jitter buffer at the receiver will smooth out the modest jitter, and the received signal will perfectly match the transmitted one. They are often right, but there is no guarantee. There are many designs of jitter buffer. Most modern ones dynamically adapt the length of the buffer in some way, although many different algorithms are used. If the jitter is low, and dynamic jitter buffering is switched off, things may work well. If it cannot be switched off, the behaviour of dynamic buffering will generally upset a modem signal. Various algorithms will:

  • Guarantee some packet loss, by tuning the buffering until a small percentage of packets are declared late, and dropped. Dropping packets is actually built into these algorithms, and the results can be pretty good for voice. Trading a small number of dropped packets for somewhat less latency is a reasonable trade-off.
  • Adjust periods of silence in whole packet steps (typically 20ms). Certain silence periods in a FAX signal are specified as 75+-20ms. 20ms jumps can push them out of spec.
  • Continuously adjust the timing of non-silent periods, using overlap and add techniques. This is the state of the art in jitter buffering for voice, but a complete disaster for a modem.

Perversely, the more basic equipment is likely to work well, and the newer more sophisticated designs are likely to be troublesome.


Modems don't like silence suppression

Depending on its implementation in particular equipment, silence suppression can destroy a FAX call. If silence suppression is enabled, a voice detector continuously monitors the call, looking for the presence of a real voice. Some of these are designed to focus purely on voice, and tend to reject other kinds of sound - e.g. modem tones. They may, therefore, not switch the audio path on and off cleanly when the modem signal starts and stops. Even if they do switch cleanly, the suppression algorithms usually modify the audio around the switching points.

During silent periods, comfort noise is usually introduced, to simulate the background noise you normally hear in a conversation. This might mean a period which should be silent, is actually significantly noisy. The receiving modem might not see a good enough "silence" for its signal detector to correctly declare the boundaries of the modem signal.

Modems like a complete conversation

Modems need a continuous audio path. If there is packet loss the consequences are severe, but the actual effect depends a lot on the equipment in use. Lets say a 20ms packet of audio is lost in the middle of a page of FAX. Obviously this is going to loose a bit of the image, but will it affect just a small stripe, or the rest of the page? If the receiving end emits 20ms of silence, the receiving modem will probably declare the end of the page. If the receiving end emits 20ms of some fill in sound, the receiving modem might be able to ride over the gap, depending on its design. If more or less than 20ms of some fill in sound is emitted, the remainder of the page will definitely not be received correctly. The receiving modem will not tolerate a jump in timing of that sort.

The bottom line

FAX, and other modem applications, operating over VoIP channels are quirky, and unreliable. This will not get better over time. It will get worse. In general, the more sophisticated the equipment gets in trying to make speech work smoothly, the worse it behaves for modems. In the near term (i.e. until all data applications are native IP applications) store and forward protocols, and protocols tailored to reasonably conveying modem data across an IP channel are the only way to achieve consistent results.

FAX over IP (FoIP) specifications

Most current FAX machines lack an RJ-45 connector and a TCP/IP protocol stack. Few of even the latest models will connect directly to the Internet, even though the protocols for doing so have been standardised for several years. Only some quite high end machines seem to offer this. This means in a world increasingly moving to VoIP for telephony, support is needed for using conventional FAX machines over IP channels until most of those machines are consigned to the scrap-heap.

Store and forward FoIP (T.37)

The T.37 specification defines a standard method for store and forward delivery of FAXes through an IP network. It is simple, elegant, and reliable. Its only real drawback is that it is not real time. In reality this is not much of a drawback, but psychologically it is. Real time FAX gives the illusion that if your FAX machine says the FAX got through, it is now in the recipient's hands. Of course, this is completely bogus. It may be sitting in some store and forward unit (maybe the target FAX machine's buffer memory, or a unified message service) which you are unaware of. It may be sitting in a waste bin, mistaken for junk FAX. Of course, any sensible user knows this. It doesn't stop many taking an irrational attitude to the subject, though. Some people think a FAX machine report might have some legal standing as an indication a document were sent. It might even be true. Telex logs used to be accepted by courts, and they were trivial to forge.

The only real problem store and forward brings is the two endpoints are not able to negotiate their capabilities. FAXing is limited by the capabilities of the store and forward system. If you want to send a colour FAX between known colour FAX machines, and the store and forward system doesn't understand colour FAXes, there will be some surprise and disappointment when a monochrome FAX appears at the far end. Of course, improving the capabilities of the gateways, to more completely implement the current version of the FAX spec. can solve this.

T.37 defines a procedure for receiving a FAX at a gateway; making an e-mail message, containing the FAX as an attachment; sending it to a remote gateway; and dialing out and delivering it from the remote gateway to the destination machine. It might optionally be delivered as an e-mail with a FAX attachment directly to the recipient's e-mail box. Also, FAXes might be sent directly to the store and forward system from a sender's e-mail box, for delivery to a dialed FAX number.

T.37 is a very simple specification. It doesn't need to say very much. It builds upon well proven, widely deployed protocols - SMTP, MIME, etc. - and just defines a few details needed to bring them all together for the FAX application. This means it can be very simple to implement in any system which already contains most of the building blocks. T.37 is good. T.37 is wholesome. T.37 is the sane way to handle FAX at this time.

Real time FoIP (T.38)

Only one standard has been developed for real time FAX over IP - T.38. Before discussing what T.38 is like, it is important to note a few things about its current status in the real world. A lot of ATA boxes, and other gateway equipment, still do not support T.38. A lot which say they support it actually just have it in the pipeline (e.g. Sipura 2100). Very few T.38 implementations currently support 33,600bps (V.34bis) FAX, although recently low cost all in one printer/scanner/FAX machines supporting V.34 FAX have become fairly common. A lot have very buggy implementations - I have used a T.38 ATA box recently that simply locks up the moment it hears a FAX tone. If you think you know how T.38 behaves, you might just be familiar with how buggy versions behave.

So, what is T.38?

T.38 is the real-time FAX over IP protocol. This means it is designed to work like traditional FAXing. You call another FAX machine, and send the FAX as you wait. Either FAX machine could be a traditional FAX machine connected to the PSTN, an ATA box, or similar; it could be a FAX machine with an RJ-45 connector plugged straight into an IP network; it could be a computer pretending to be a FAX machine.

There are some issues in trying to do FoIP well with traditional FAX machines. Recent versions of the core FAX protocol - T.30 - have introduced flags and features to allow newer FAX machines to be Internet aware FAX devices. These tie in to the T.38 spec. A few makers now say their FAX machines are "Internet Aware" or "Internet Capable". This might mean the machines can connect directly to an IP network. It usually just means the machines are aware of the existance and qualities of T.38.

What does T.38 look like?

The original version of the T.38 spec. defined two methods for transmission across an IP network - one based on UDP and one based on TCP. At that time RTP was the emerging protocol for streaming media across IP networks. Instead of using that, T.38 defined its own method of packaging data within UDP packets, called UDPTL. This has now been accepted as a mistake, and an RTP based form of the protocol has been defined. Currently, this just makes more work for implementors. The only method in widespread use is the non-RTP method, so that has to be implemented. There is no choice. For the future, the RTP form has to be implemented too. AHHHH!

The T.38 spec. says some odd things about when the UDP form is more suitable and when the TCP form is more suitable. I would say the TCP form should be used between two IP devices. When one of the machines is connected to an analogue phone line, the UDP form probably has to be used for its nearer real-time streaming qualities. UDP is, however, an unreliable protocol, and that compromises the benefits of T.38 over trying to use FAX over VoIP.

T.38 is a very loose specification. Most good modern specifications try to really tie down what should happen. T.38 allows a huge spread of implementation decisions.

In what ways does T.38 outperform FAX over VoIP?

If the TCP form of T.38 is used, it is very robust. Used between Internet aware FAX machines, it basically solves all the problems of using VoIP for FAX.

If one of the UDP forms of T.38 is used, it is common for each packet to contain a copy of the main data in the previous packet. This is an option, but most implementations seem to support it. This forward error correction scheme makes T.38 far more tolerate of dropped packets than using VoIP. It requires two successive lost packets to actually loose any data. The overheads in T.38 are so big, the extra data sent in each packet is hardly noticable. If two successive packets are lost, T.38 will still have trouble. However, if that is a common occurance, the network is probably quite bad, and VoIP performance will be poor.

Loosing a packet in a T.38 stream does not cause the modems to loose sync. This means two successive lost packets should only corrupt a section of an image. If the optional FAX error correction (ECM) mode is used, there is a good chance that with a retry or two, a perfect image will be transferred. Not ideal, but functional.

Much of the robustness of T.38 comes not from what the spec. says, but from the potential it offers for smart implementation. The trick is to work out the smartest implementation, which will not cause trouble with the many buggy implementations of T.30 which exist in commercial FAX products.

The T.30 spec. allows transmission of a page to be paused just before the end of any row of pixels. This is used as a method of flow control, by FAX machines with slow paper handling. It can also be used by a T.38 implementation, to wait for more data when a packet is delayed or lost. This means a T.38 gateway can start sending a page as soon as it gets some data, without performing any jitter buffering. When there is little jitter, transmission delay is minimised. When jitter is bad, things will be delayed only as much as necessary. If packets are lost, and FEC is in use, the outgoing gateway can simply wait a while, to try to reconstruct the stream from the redundant information available when further packets arrive. If the required data is irretrievably lost, due to a burst of lost packets, transmission can continue with only the minimum possible page corruption.

HDLC transmission, used for the FAX control messages, offers no similar way to precisely control flow. However, it is possible to achieve pretty good results. The HDLC protocol only supports flow control between HDLC frames. The full HDLC protocol allows frames to be aborted midway, and restarted. However, the protocol as definition in the T.30 spec. doesn't include the abort feature. If we wait until we have received the whole of a long frame, before starting to pass it on, we could introduce substantial delay. However, this is not a big problem for T.30 FAX transmission. Most of the HDLC frames used in the T.30 spec. are quite short, especially the ones which occur between pages. Delaying until we receive all the data for one of these messages will not significantly extend the call. To avoid long delays for very long frames we can apply rules like: if a frame is no more than 30 bytes (1 second) long we wait for the whole frame to be received before passing it on; if the frame is longer we start passing it on with a 1 second delay.

What are the weaknesses of T.38?

T.38 cannot avoid the basic problem that it needs to deal with old FAX machines made before the idea of FoIP was ever considered. These machines expect certain timing constraints to be met. For these machines T.38 eliminates some problems, and reduces the scale of others. However, it is nothing like an FTP or HTTP transfer of an image in its ability to deal with poor network performance.