Found-RL: foundation model-enhanced reinforcement learning for autonomous driving

Purdue University
University of Wisconsin-Madison
University of Texas at Austin

*Indicates Corresponding Author
arXiv Code 🤗 Dataset 🤗 VLMs

Found-RL (Demo 1)

Found-RL (Demo 2)

Found-RL (Demo 3)

Found-RL (Demo 4)

Found-RL (Demo 5)

Found-RL (Demo 6)

VLM-driven Agent (RGB Input)

VLM-driven Agent (BEV Visualization)

Qualitative results demonstrating Found-RL navigating complex scenarios (Top two rows).
The bottom row showcases our pure VLM experts, which provide precise action guidance for complex environments.
(These VLM experts are open-sourced to facilitate future research in the autonomous driving community.)

Abstract

Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD) with real-time inference. However, RL typically suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. To mitigate these limitations, Foundation Models (particularly, Vision-Language Models (VLMs)) can be integrated because they offer rich, context-aware knowledge. Yet still, deploying such computationally intensive models within high-frequency multi-environment RL training loops is severely hindered by prohibitive inference latency and the absence of unified integration platforms. To bridge this gap, we present Found-RL, a specialized platform tailored to leverage foundation models to efficiently enhance RL for AD. A core innovation of the proposed platform is its asynchronous batch inference framework, which decouples heavy VLM reasoning from the simulation loop. This design effectively resolves latency bottlenecks, supporting real-time or near-real-time RL learning from VLM feedback. Using the proposed platform, we introduce diverse supervision mechanisms to address domain-specific challenges: we first implement Value-Margin Regularization (VMR) and Advantage-Weighted Action Guidance (AWAG) to effectively distill expert-like VLM action suggestions into the RL policy. Furthermore, for dense supervision, we adopt high-throughput CLIP for reward shaping. We mitigate CLIP’s dynamic blindness and probability dilution via Conditional Contrastive Action Alignment, which conditions prompts on discretized speed/command and yields a normalized, margin-based bonus from context-specific action-anchor scoring. Found-RL delivers an end-to-end pipeline for fine-tuned VLM integration with modular support, and shows that a lightweight RL model with millions of parameters can achieve near-VLM performance compared with billion-parameter VLMs while sustaining real-time inference (~500 FPS). Code, data, and models will be publicly available at https://github.com/ys-qu/found-rl.

Foundation Model-enhanced RL concept

Comparison of supervision paradigms for autonomous driving. Imitation learning relies on costly, fixed human demonstrations, while human-in-the-loop RL is informative but hard to scale. Foundation model-enhanced RL provides an always-ready “tireless mentor” via VLM semantic feedback, combining exploration with richer guidance—yet practical deployment requires overcoming inference-latency bottlenecks in high-frequency, multi-environment training.

Overall framework of Found-RL

Overall framework of Found-RL, a unified platform with three coupled parts: simulation, algorithms, and applications. CARLA generates multimodal observations and structured context; rollout workers build prompts and send requests to a shared queue; an asynchronous micro-batched VLM/CLIP server returns feedback without blocking simulation. The returned signals (e.g., action guidance and semantic scores) are stored with transitions in replay buffers for modular learning and consistent evaluation on route completion, safety, and efficiency.

Video Presentation

Poster

BibTeX

@misc{qu2026foundrlfoundationmodelenhancedreinforcement,
      title={Found-RL: foundation model-enhanced reinforcement learning for autonomous driving}, 
      author={Yansong Qu and Zihao Sheng and Zilin Huang and Jiancong Chen and Yuhao Luo and Tianyi Wang and Yiheng Feng and Samuel Labi and Sikai Chen},
      year={2026},
      eprint={2602.10458},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.10458}, 
}
}