ITFormer: Bridging Time Series and Language

Abstract

Time-series data are critical in diverse applications, such as industrial monitoring, medical diagnostics, and climate research. However, effectively integrating these high-dimensional temporal signals with natural language for dynamic, interactive tasks remains a significant challenge.

To address this, we introduce the Time-Series Question Answering (Time-Series QA) task and release EngineMT-QA, the first large-scale, multi-task, temporal-textual QA dataset designed to capture complex interactions between time-series signals and natural language.

Building on this resource, we propose the Instruct Time Transformer (ITFormer), a novel framework that bridges time-series encoders with frozen large language models (LLMs). ITFormer effectively extracts, aligns, and fuses temporal and textual features, achieving strong improvement in QA accuracy over baselines with fewer than 1% additional trainable parameters.

EngineMT-QA Dataset

A comprehensive multi-task QA dataset based on real-world aero-engine sensor data. EngineMT-QA contains 110k+ QA pairs across four task types constructed from 32-channel flight data using the N-CMAPSS dataset.

110k+ QA Pairs

32 Channels

4 Task Types

Task Categories:

Understanding: Interpret sensor relationships and semantic implications
Perception: Uncover health state semantics and fault diagnosis
Reasoning: Infer degradation trends and predict failure probability
Decision-Making: Generate maintenance recommendations and operational decisions

ITFormer Architecture

Key components enabling effective temporal-textual modeling:

TPE (Time Token Position Encoding): Temporal + channel + segment positional encoding
LIT (Learnable Instruct Tokens): Instructional tokens guiding semantic alignment
ITA (Instruct Time Attention): Temporal-textual cross-modal attention mechanism
TAL (Time Token as Language): Projects time tokens as natural language inputs for LLMs

Key Innovation: ITFormer acts as an intermediary connector, enabling seamless integration between temporal encoders and frozen LLMs with minimal computational overhead.

Results

ITFormer achieves state-of-the-art performance on the EngineMT-QA benchmark. With less than 1% additional trainable parameters, its accuracy and robustness scale well with model size, outperforming both vision-text and time-series baselines.

Understanding (Rouge-L)

58.04

Perception (Accuracy)

65.07%

Reasoning (F1)

88.69

Decision (BLEU)

38.68

Performance scales consistently across model sizes (0.5B, 3B, 7B parameters), demonstrating the effectiveness of our approach in integrating time-series signals with natural language understanding.

Figures & Visualizations

Key visualizations from our research showcasing the ITFormer framework and experimental results.

Overview of EngineMT-QA dataset and ITFormer framework for time-series question answering

ITFormer architecture showing TPE, LIT, ITA, and TAL components

Adaptive study results showing ITFormer's versatility across encoders and LLMs

Computational efficiency analysis of ITFormer components

Length ablation study showing performance across different sequence lengths

Authors & Affiliations

Shanghai Jiao Tong University

Yilin Wang, Peixuan Lei, Jie Song, Tao Chen, Haoyu Zhe, Yuxuan Zhang, Lei Jia, Yuanxiang Li

Shanghai Innovation Institute

Yilin Wang, Zhongyu Wei

Fudan University

Zhongyu Wei

Corresponding Author: Yuanxiang Li (yuanxli@sjtu.edu.cn)

Code & Citation

📊 Dataset & Code: https://github.com/Pandalin98/ITFormer-ICML25
📄 Paper: http://arxiv.org/abs/2506.14500
🔗 Project Page: https://pandalin98.github.io/itformer_site/

Stay tuned for the release of our complete codebase, including:

EngineMT-QA dataset with 110k+ QA pairs
ITFormer implementation and training scripts
Evaluation benchmarks and baseline comparisons
Pre-trained models and checkpoints

📥 Download Dataset (Coming Soon) 💻 View Code (Coming Soon)