First Demonstration of 512-Color Shift Keying Signal Demodulation Using Neural Equalization for Optical Camera Communication

1. Introduction & Overview

This paper presents the first experimental demonstration of 512-Color Shift Keying (512-CSK) signal transmission for Optical Camera Communication (OCC). The core achievement is error-free demodulation at a 4-meter distance using a commercial Sony IMX530 CMOS image sensor module paired with a 50-mm lens and a custom multi-label classification neural network (NN) acting as a nonlinear equalizer. This work significantly pushes the envelope of OCC data density, moving from previously demonstrated 8, 16, or 32-CSK schemes into the high-order modulation realm of 512 colors (9 bits/symbol).

The research addresses a fundamental challenge in OCC: inter-color crosstalk caused by the non-ideal spectral sensitivity of camera RGB filters, which distorts the transmitted CSK constellation based on the CIE 1931 color space. The proposed neural equalizer directly compensates for this nonlinear distortion from the raw sensor data, bypassing the need for complex linear signal processing models.

512 Colors

Modulation Order (9 bits/symbol)

4 Meters

Transmission Distance

Error-Free

Demodulation Achieved

8x8 Array

LED Transmitter Panel

2. Technical Framework

2.1 Receiver Configuration & Setup

The receiver system is built around a Sony Semiconductor Solutions camera system capable of outputting 12-bit raw RGB data without any post-processing (demosaicing, denoising, white balance). This raw data is crucial for accurate color recovery. The signal is captured through a 50-mm optical lens from an 8x8 LED planar array transmitter (6.5 cm panel). The received RGB values are first converted to CIE 1931 (x, y) chromaticity coordinates using a standard color space transformation matrix before being fed into the neural equalizer.

2.2 Neural Network Equalizer Architecture

The heart of the demodulation system is a multi-label neural network. Its purpose is to perform nonlinear equalization, mapping the distorted received (x, y) coordinates back to the most likely transmitted 9-bit symbol (for 512-CSK).

Input Layer: 2 units (x, y chromaticity coordinates).
Hidden Layers: N_h layers with N_u units each (specific architecture details are implied but not fully enumerated in the excerpt).
Output Layer: M = 9 units, corresponding to the 9 bits of the 512-CSK symbol. The network is trained for multi-label classification.

The network outputs a posterior probability distribution $p(1|x, y)$ for each bit. A Log-Likelihood Ratio (LLR) is calculated from these probabilities and subsequently decoded by a Low-Density Parity-Check (LDPC) decoder for final error correction.

2.3 512-CSK Constellation Mapping

The 512 symbols are strategically placed within the CIE 1931 gamut of the RGB-LED transmitter. The mapping starts from the vertex corresponding to the blue primary color $(x=0.1805, y=0.0722)$ and fills the available space in a "triangular manner." This suggests an efficient packing algorithm to maximize Euclidean distance between constellation points within the physical color gamut, which is critical for minimizing symbol error rate.

3. Experimental Results & Analysis

3.1 BER Performance vs. LED Array Size

The experiment varied the number of active LEDs in the transmitter array from 1x1 to 8x8. This effectively changes the light intensity and the area the signal occupies on the image sensor. The Bit Error Rate (BER) characteristics were evaluated against this variable. The successful error-free operation demonstrates the robustness of the neural equalizer across different received signal strengths and spatial profiles. The use of a full 8x8 array likely provides the best performance by averaging over multiple pixels and reducing noise impact.

3.2 Comparison with Prior Work

The paper includes a summary figure (Fig. 1(c)) comparing this work against previous OCC-CSK demonstrations. Key differentiators are:

Modulation Order: 512-CSK vastly exceeds the 8-CSK [1], 16-CSK [2,3], and 32-CSK [4,5] reported in prior experimental works.
Distance: 4m operation is competitive, especially considering the high modulation order. It sits between very short-range (3-4 cm) high-order demos and longer-range (80-100 cm) lower-order demos.
Technique: The use of a neural network for direct nonlinear equalization from raw sensor data is a novel and potentially more generalizable approach compared to model-based linear compensation techniques.

4. Core Analysis & Expert Interpretation

Core Insight: This paper isn't just about achieving a higher number of colors; it's a strategic pivot from physics-first modeling to data-first learning in optical signal recovery. The authors implicitly acknowledge that the complex, nonlinear distortion pipeline in a camera (filter crosstalk, sensor nonlinearity, lens artifacts) is better handled by a universal function approximator (a neural network) than by a meticulously derived but inevitably incomplete analytical model. This mirrors the shift seen in other fields like wireless communications, where Deep Learning is increasingly used for channel equalization and symbol detection in complex, non-linear channels.

Logical Flow: The logic is compelling: 1) High-order CSK is needed for throughput. 2) High-order CSK is highly sensitive to color distortion. 3) Camera color distortion is complex and nonlinear. 4) Therefore, use a nonlinear compensator (NN) trained end-to-end on real data. The use of raw sensor data is a masterstroke—it provides the neural network with the maximum amount of unaltered information before any camera ISP (Image Signal Processor) introduces its own, often proprietary and non-invertible, transformations. This approach is reminiscent of the philosophy in modern computational photography, where algorithms work on raw sensor data for maximal flexibility.

Strengths & Flaws: The primary strength is the dramatic leap in spectral efficiency, experimentally validating what was previously simulation-only territory. The neural equalizer is elegant and powerful. However, the flaw—common to many ML-based comms papers—is the "black box" nature. The paper doesn't delve into the NN's architecture search, training data size, or generalization capability to different cameras, lenses, or ambient light conditions. Will the network need re-training for every new receiver model? As noted in a seminal review on machine learning for communications by O'Shea & Hoydis, the practicality of DL-based receivers hinges on their robustness and adaptability to changing conditions. Furthermore, the 4m distance, while good, still hints at a power/SNR limitation. The reliance on an LDPC decoder for final error-free performance indicates the raw symbol error rate at the NN output is not zero, raising questions about the equalizer's standalone performance under lower SNR.

Actionable Insights: For researchers, the clear next step is to open the black box. Investigate NN architectures (CNNs might better handle spatial variations across the sensor), explore few-shot or transfer learning to adapt to new hardware, and integrate the equalizer with forward error correction in a more holistic, turbo-like structure. For industry, this work signals that high-data-rate, flicker-free VLC using commodity cameras is moving closer to reality. The partnership with Sony for the sensor is notable; commercialization will depend on embedding such neural processing efficiently into camera ASICs or leveraging on-device AI accelerators already present in smartphones. The standard to watch is IEEE 802.15.7r1 (OCC), and contributions like this could directly influence its evolution.

5. Technical Details & Mathematical Formulation

Color Space Conversion: The transformation from received RGB values (from the raw sensor) to CIE 1931 xy coordinates is performed using a standard matrix derived from the sensor's spectral characteristics relative to the CIE standard observer. The paper provides the specific matrix used: $$ \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 0.4124 & 0.3576 & 0.1805 \\ 0.2126 & 0.7152 & 0.0722 \end{pmatrix} \begin{pmatrix} R \\ G \\ B \end{pmatrix} $$ This is a simplified linear transformation. In practice, a more accurate model might require a nonlinear mapping or a matrix tailored to the specific sensor's color filters.

Neural Network Output to LLR: The multi-label NN outputs the probability $p_i(1|x, y)$ that the $i$-th bit (out of 9) is '1'. The Log-Likelihood Ratio (LLR) $L_i$ for that bit, fed to the LDPC decoder, is calculated as: $$ L_i = \log \left( \frac{p_i(1|x, y)}{1 - p_i(1|x, y)} \right) $$ A large positive LLR indicates high confidence the bit is 1, a large negative value indicates high confidence it is 0.

6. Analysis Framework & Case Example

Framework: The "Learned Receiver" Pipeline for OCC

This research exemplifies a modern "learned receiver" design pattern applicable beyond OCC. The framework can be broken down into sequential, optimizable blocks:

Hardware-Aware Data Acquisition: Capture signals at the earliest, most raw point in the processing chain (e.g., sensor RAW data, RF I/Q samples).
Differentiable Preprocessing: Apply minimal, necessary preprocessing (e.g., color space conversion, synchronization) in a way that is differentiable to allow gradient flow if training end-to-end.
Neural Network Core: Employ a neural network (MLP, CNN, Transformer) to perform the core demodulation/equalization task. The network is trained with a loss function that directly minimizes symbol or bit error rate, often using a cross-entropy loss for classification tasks.
Hybrid Decoding: Interface the neural network's soft outputs (probabilities, LLRs) with a state-of-the-art, non-neural error correction decoder (like LDPC or Polar code decoder). This combines the flexibility of learning with the proven optimality of classical coding theory.

Non-Code Case Example: Applying the Framework to Underwater VLC

Consider applying this same framework to Underwater Visible Light Communication (UVLC), which suffers from severe channel impairments like scattering and turbulence-induced fading. A "Learned Receiver" for UVLC could be built as follows:

Step 1: Use a high-speed photodetector or camera capturing raw intensity sequences.
Step 2: Preprocess to isolate the signal region of interest and perform coarse synchronization.
Step 3: Train a 1D Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN) like an LSTM on this raw sequence data. The network's task is to equalize the time-varying channel effects and demap the symbols. The training data would be collected under various water turbidity and turbulence conditions.
Step 4: The network outputs soft decisions for an FEC decoder, enabling robust communication in a highly dynamic channel where traditional channel estimation fails.

7. Future Applications & Research Directions

Smartphone-Based Li-Fi: The ultimate goal is integrating this technology into smartphones for secure, high-speed peer-to-peer data transfer or indoor positioning with centimeter-level accuracy, leveraging existing camera hardware.
Automotive V2X Communication: Using vehicle headlights/taillights and cameras for Vehicle-to-Everything (V2X) communication, providing an additional, robust data link complementary to RF-based DSRC/C-V2X.
AR/VR and Metaverse Interfaces: Enabling low-latency, high-bandwidth data links between AR glasses and infrastructure or between devices for synchronized shared experiences.
Research Directions:
1. End-to-End Learned Systems: Exploring joint optimization of the transmitter's constellation shape (via a neural network) and the receiver's equalizer, similar to the concept of "autoencoder" communications.
2. Robustness and Standardization: Developing neural receiver models that are robust to varying camera models, ambient light, and partial occlusion. This is critical for standardization efforts like IEEE 802.15.7.
3. Ultra-High-Speed OCC: Combining high-order CSK with rolling-shutter or spatial modulation techniques using high-frame-rate or event-based cameras to break the Gbps barrier.
4. Semantic Communication: Moving beyond bit recovery, using the OCC link to transmit semantic information (e.g., object identifiers, map data) directly, optimizing for task success rather than bit error rate.

8. References

H.-W. Chen et al., "8-CSK data transmission over 4 cm," Relevant Conference/Journal, 2019.
C. Zhu et al., "16-CSK over 80 cm using a quadrichromatic LED," Relevant Conference/Journal, 2016.
N. Murata et al., "16-digital CSK over 100 cm based on IEEE 802.15.7," Relevant Conference/Journal, 2016.
P. Hu et al., "Tri-LEDs based 32-CSK over 3 cm," Relevant Conference/Journal, 2019.
R. Singh et al., "Tri-LEDs based 32-CSK," Relevant Conference/Journal, 2014.
O'Shea, T., & Hoydis, J. (2017). "An Introduction to Deep Learning for the Physical Layer." IEEE Transactions on Cognitive Communications and Networking. (External authoritative source on ML for communications)
IEEE Standard for Local and Metropolitan Area Networks--Part 15.7: Short-Range Optical Wireless Communications. IEEE Std 802.15.7-2018. (External authoritative standard)
Commission Internationale de l'Eclairage (CIE). (1931). Commission internationale de l'éclairage proceedings, 1931. Cambridge: Cambridge University Press. (External authoritative source for color science)
Sony Semiconductor Solutions Corporation. IMX530 Sensor Datasheet. (External authoritative hardware source)
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. (External authoritative source on neural networks)

Table of Contents