TRIDENT: Tabular Representation Inference with Dedicated Embeddings for Null Tokens

Pedro B. Rigueira; Victoria F. Mello; Guilherme H. G. Evangelista; Caio S. Grossi; Giovana A. M. Machado; Pedro Dutenhefner; Wagner Meira Jr.; Gisele L. Pappa

Pedro B. Rigueira UFMG
Victoria F. Mello UFMG
Guilherme H. G. Evangelista UFMG
Caio S. Grossi UFMG
Giovana A. M. Machado UFMG
Pedro Dutenhefner UFMG
Wagner Meira Jr. UFMG
Gisele L. Pappa UFMG

Resumo

Handling missing data remains a fundamental challenge in learning from tabular datasets. In this work, we introduce TRIDENT (Tabular Representation Inference with Dedicated Embeddings for Null Tokens), a Transformer-based architecture specifically designed for robust classification on tabular data with high missingness. TRIDENT employs tailored embedding strategies for numerical and categorical features, including learnable, column-specific embeddings for missing and masked values, allowing the model to natively interpret data absence without prior imputation. The architecture follows a two-stage training procedure: (i) a self-supervised pre-training phase, in which the model learns to reconstruct the original feature embeddings of artificially masked positions by leveraging the context from the partially corrupted input sequence, and (ii) supervised fine-tuning on classification tasks using the [CLS] token representation. To evaluate the model’s effectiveness under various levels of missingness, we constructed a benchmark composed of diverse real-world tabular datasets. Each dataset was artificially degraded by introducing missing values at 20%, 40%, 60%, and 80%, yielding five versions per dataset. We compared TRIDENT against six baselines, including three tree-based models and three Transformer-based models. Results indicate that TRIDENT consistently outperforms all baselines across most missingness levels, especially in high-missing regimes (60% and 80%), where it achieves up to 3.4% absolute improvement in average F1-score compared to the best-performing baseline. Notably, TRIDENT exhibits strong resilience to missing patterns without relying on explicit data imputation, and maintains stable performance across heterogeneous datasets. These findings suggest that leveraging learnable [NULL] embeddings and self-supervised contextual reconstruction provides a robust inductive bias for tabular learning with incomplete data.