When UART Talks to Itself: The Optical Probe Loopback Bug Nobody Expects

During the development of optical communication firmware for smart meters, a strange field issue was reported: “Communication randomly drops. Works perfectly on the bench. Fails in the field.”

To investigate the issue, I swapped cables, verified baud rates, and analyzed logic analyzer captures frame by frame. Everything appeared correct until I noticed that the optical port was missing its protective cover.

That small missing piece of plastic turned out to be the root cause of one of the most deceptive serial communication bugs I had ever encountered: a full-duplex UART silently receiving its own transmitted data as valid incoming bytes.

Byte by byte, the transmitter effectively fed the receiver. The RX buffer filled with echoed data faster than the firmware could drain it. Eventually, the UART overran, the protocol state machine lost synchronization, retries began flooding the line, and communication collapsed in ways that appeared completely random.

At first, the issue looked like a firmware bug. Then it looked like a timing problem. But the real culprit was optical feedback.

Before diving deeper into the bug, let’s first understand what an optical probe is and how it works.

What is an optical probe?

Optical probes provide isolated serial communication without exposed electrical contacts, making them the standard interface for smart meters and industrial field devices. They are most commonly associated with IEC 62056-21 / DLMS energy meters, but similar infrared interfaces also appear in PLCs, process controllers, and maintenance terminals.

The probe attaches to a small optical window on the device enclosure and acts as a bridge between infrared light signals and a standard UART interface.

At the hardware level, the interface relies on two elements:

TX (IR LED transmitter) => Sends serial data as infrared light pulses toward the remote device.
RX (Photodiode or phototransistor receiver) => Detects incoming infrared signals from the remote device.

In a properly coupled system, these two paths remain optically isolated from each other. TX sends. RX listens. Under normal conditions, they never interfere.

Protocol Context:

Most metering optical interfaces follow IEC 62056-21, communicating over an infrared UART link at baud rates ranging from 300 bps to 19200 bps.

The communication model is logically half-duplex one side transmits while the other listens. But the UART peripheral inside the microcontroller is still electrically full duplex. Both TX and RX paths remain active simultaneously unless firmware explicitly disables one direction.

This gap between protocol behavior and hardware behavior is exactly where the problem begins. If transmitted IR light reflects back into the receiver, the UART has no way to distinguish its own transmission from valid incoming data.

Now let’s examine the real issue when TX Loopback.

TX Loopback (When the UART Hears Itself – BUG):

In this bug real problem is physical. Without a cover on the optical port, the transmitted IR signal does not reach the intended remote device. Instead, some of the light reflects from nearby surfaces such as the probe housing, device cover, surrounding objects, or even a technician’s hand. A portion of this reflected light can return directly to the RX photodiode.

MCU
 │
 ├─ TX ──── IR LED ──────────────────► uncovered optical port
 │               │                            │
 │               └──── reflected IR ──────────┘
 │                                            │
 │                                       photodiode
 │                                            │
 └─ RX ◄────────────────────────────────────┘
       ↑ transmitted bytes return here

The UART treats reflected data as valid and places it directly into the RX buffer. There is no error flag. No warning. Your firmware is now receiving everything it transmits silently, at the same rate it transmits it.

Why this breaks communication:

UART RX buffers are small, typically 16, 64, or 256 bytes depending on the microcontroller.

The moment firmware sends a burst of data, a meter read request, a handshake sequence, or anything the reflected bytes arrive immediately and begin filling that buffer.

But the real response from the remote device has not even arrived yet. At 19200 baud, a small UART FIFO can fill in just a few milliseconds. There is no recovery from that point without intervention.

Then the cascade begins:

Reflected bytes enter RX
          ↓
RX buffer fills
          ↓
UART overrun error occurs
          ↓
ISR or DMA loses sync
          ↓
Protocol parser receives corrupted data
          ↓
Firmware retries communication
          ↓
Retries generate more reflected data
          ↓
Communication collapses

The device may appear to randomly stop communicating, drop packets, throw UART overrun errors, timeout unexpectedly, or lock up after repeated retries. Retries make it worse. Every retry generates more reflected data, which fills the buffer faster, which triggers more retries. The system accelerates toward failure.

Why this Bug is so easy to miss:

Several factors combine to make this bug surprisingly dangerous. Individually, none of them seem serious. Together, they create the perfect conditions for UART self-feedback and RX buffer overflow.

Factor	Why It Matters
Full-duplex UART	TX and RX operate at the same time. The UART treats reflected bytes exactly like real incoming data.
No echo filtering	Most UART drivers do not automatically discard self-generated echoes.
Fast data transmission	At 19200 baud, reflected bytes can fill a small RX FIFO in just a few milliseconds.
No flow control	Most optical interfaces do not use RTS/CTS, so nothing prevents the RX buffer from filling.
Bench testing hides it	A connected or covered probe blocks reflected IR, so the failure often appears only in the field.

Why the cover fixes It?

A strip of tape, a heat-shrink sleeve, or a moulded plastic cap over the optical port forms a simple optical barrier. The transmitted IR energy is absorbed or scattered before it can couple back into the RX photodiode. When no remote device is present, the UART correctly receives silence instead of its own transmitted bytes.

This is not a workaround or a field hack. It is part of the intended design behaviour of optical UART systems. The communication model assumes half-duplex physical behaviour, while the UART hardware remains electrically full-duplex. The cover enforces that half-duplex behaviour by preventing optical self-coupling between TX and RX.

Why It is frequently Misdiagnosed?

Because the failure only appears when the cover is missing, it is routinely mistaken for an intermittent hardware defect, EMI interference, baud-rate mismatch, or firmware timing problem. Boards are reworked. Cables are swapped. UART drivers are patched — all while the actual root cause sits unnoticed in a technician’s toolbox.

Why it appears Random?

In some environments, the problem becomes even harder to reproduce consistently. Nearby surfaces, oscilloscope probes, test fixtures, or even a person standing close to the device can partially absorb, scatter, or redirect the reflected IR energy making the feedback path appear intermittent and environment dependent.

This is why the system can appear perfectly stable during bench testing while failing unpredictably in real deployments. The cover is not an optional accessory. It is part of the system.

Eliminating UART Optical Loopback:

Shipping the optical cover is the first line of defense, not the last. In practice, covers may be removed during servicing, damaged in use, or missing due to human error. If the system cannot operate reliably without it, the design has a single point of failure. A robust system prevents optical self-feedback in hardware and handles it reliably throughout the firmware stack.

Mitigation Strategies:

The following firmware, hardware, and process-level techniques can eliminate or significantly reduce this failure mode.

1. Enforced Half-Duplex (Firmware):

One of the simplest and most reliable fixes is to disable the UART receiver before transmission and re-enable it only after the TX-complete event fires.

This is one of the most reliable firmware-side fixes because echoed bytes are blocked before they can reach the FIFO, DMA engine, or protocol layer. Most STM32, NXP, PIC, and Renesas UART peripherals support direction-controlled operation natively with very little overhead.

Example,

static void pauseReceiving(UART_HandleTypeDef *uartHandle)
{
    uint32_t intState = __get_interrupt_state();

    __disable_interrupt();

    /*
     * Disable RX interrupt so transmitted bytes reflected
     * through the optical path do not enter the RX buffer.
     */
    uartHandle->Instance->CR1 &= ~USART_CR1_RXNEIE;

    __set_interrupt_state(intState);
}



static void resumeReceiving(UART_HandleTypeDef *uartHandle)
{
    uint32_t intState = __get_interrupt_state();
    __disable_interrupt();

    /*
     * Flush any reflected or stale bytes captured during TX.
     */
    __HAL_UART_FLUSH_DRREGISTER(uartHandle);

    /*
     * Clear UART error flags that may be triggered by
     * reflected frames or RX overrun conditions.
     */

    __HAL_UART_CLEAR_FLAG(uartHandle,
                          UART_FLAG_ORE |
                          UART_FLAG_FE  |
                          UART_FLAG_PE);

    /*
     * Re-enable RX interrupt so normal reception resumes.
     */
    uartHandle->Instance->CR1 |= USART_CR1_RXNEIE;

    __set_interrupt_state(intState);
}


/* ── Transmit data while suppressing local optical echo ── */

void transmitData(UART_HandleTypeDef *uartHandle,
                  const uint8_t *pWriteBuf,
                  uint16_t len)
{
    if (pWriteBuf == NULL)
    {
        return;
    }

    /*
     * Temporarily disable reception during transmission
     * to prevent TX-to-RX optical loopback.
     */
    pauseReceiving(uartHandle);

    HAL_UART_Transmit(uartHandle,
                      pWriteBuf,
                      len,
                      HAL_MAX_DELAY);

    /*
     * Remove any reflected bytes and restore reception.
     */
    resumeReceiving(uartHandle);
}

2. RX Flush on TX Complete:

On TX completion, flush the RX FIFO and clear all overrun flags before arming the receiver. The reflected bytes arrive immediately after transmission, so flushing at this exact point removes them before any legitimate response can appear.

On DMA-backed drivers, this step is especially important because residual echo data can silently corrupt circular-buffer state and surface as intermittent failures long after the originating transmission has completed.

3. Protocol Echo Cancellation:

Keep a copy of the transmitted data in a shadow buffer. In the RX handler, compare incoming bytes with the recently transmitted bytes and discard any match before passing the data to the protocol layer.

The UART hardware still operates in full-duplex mode, but the firmware behaves like a half-duplex system by filtering out self-generated echoes in firmware. This is the right approach when hardware cannot gate the receiver during transmission.

4. RX Guard Window (Firmware):

Mask R_X interrupts for the full duration of transmission and hold the mask for a calibrated guard interval afterward. The window must cover the entire T_X time, at least one additional character time, and margin for ISR latency, line settling, and optical reflections.

For a UART operating at 19 200 baud with a standard 10-bit frame:

T_frame = 10 / 19200 ≈ 520 μs

For a packet of N bytes, total transmission time is:

T_TX = N × T_frame

A 20-byte packet, for example, requires approximately 10.4 ms of transmit time.

The full guard window should therefore be:

T_guard ≈ T_TX + T_margin

where Tmargin is typically several hundred microseconds to a few milliseconds, depending on MCU speed, ISR latency, and optical probe characteristics.

This is the lowest-overhead option on the list because it requires minimal hardware and firmware changes. However, the timing must be recalibrated for each baud rate, packet length, MCU variant, and optical assembly configuration.

5. Optical Isolation (Hardware):

A directional coupler or internal optical baffle inside the probe head physically blocks the TX-to-RX feedback path at the source. As a result, the loopback condition cannot occur, eliminating the need for firmware-based compensation.

Industrial-grade optical probes commonly incorporate this form of optical isolation, allowing uncovered operation to degrade gracefully rather than causing complete communication failure.

6. Enforce Cover Policy (Process):

A missing optical cover should be treated as a critical defect during manufacturing, validation, and field servicing.

The cover is not just a protective cap, but it is part of the communication system’s optical design. Its role is as important as an EMC shield, termination resistor, or controlled-impedance trace.

No device should leave the factory or return from service without the cover properly installed and verified.