Logo MedReasoner

Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision

1Beijing University of Posts and Telecommunications, 2Zhongguancun Academy, 3Beijing Information Science and Technology University
*Equal contribution
†Corresponding author

🌟 Introduction

In real clinical workflows, doctors rarely provide explicit prompts like “segment the left kidney.” Instead, they raise implicit queries such as “What can be inferred from this shadow?” Existing MLLMs, though capable of vision-language interaction, still produce image-level outputs and rely heavily on handcrafted spatial prompts for grounding—inputs that are rarely available in practice.

Current datasets reflect this disconnect: VQA datasets lack spatial supervision, while segmentation datasets lack language. No existing dataset aligns implicit clinical queries with chain-of-thought reasoning and pixel-level localization, making it impossible to evaluate whether a model can truly reason and ground under realistic conditions.

To address the limitations of existing medical grounding systems, we define the Unified Medical Reasoning Grounding (UMRG) task, which challenges models to interpret implicit clinical queries, reason over visual and anatomical cues, and produce accurate pixel-level grounding—mirroring how clinicians observe, reflect, and pinpoint regions of interest in medical images. We tackle this task with a two-fold approach: (1) we construct U-MRG-14K, a dataset that pairs implicit queries with interpretable reasoning traces and pixel-level masks; and (2) we introduce MedReasoner, a reinforcement-learning framework that decouples reasoning from segmentation and grounds vague clinical language without relying on handcrafted spatial prompts.

We envision MedReasoner as a step toward trustworthy and generalizable medical grounding systems, enabling future clinical applications that demand both interpretability and spatial precision.

Comparison of annotated question and implicit clinical question. The ground-truth bounding box is green, and models' predicted box is red. MedReasoner precisely identifies the target with the reasoning trace and achieves accurate grounding.

Logo U-MRG-14K Dataset

Three-Stage Construction Pipeline

To support reasoning-based grounding under implicit clinical queries, we construct U-MRG-14K through a structured three-stage pipeline. This pipeline combines standardized medical data with GPT-4o–generated question–answer pairs and chain-of-thought reasoning traces, ensuring both semantic richness and spatial accuracy.

Our construction process emphasizes realism and interpretability: we simulate implicit clinical queries using GPT-4o, align them with precise pixel-level masks, and enrich each sample with structured reasoning traces. This design enables both language understanding and spatial evaluation in a unified setting. To our knowledge, U-MRG-14K is the first dataset to bridge implicit medical questions, chain-of-thought reasoning, and pixel-level grounding at scale.

Overview of the U-MRG-14K construction pipeline: (1) Data cleaning and metadata organization manually, (2) Description and QA format generation via GPT-4o, (3) QA pair generation with GPT-4o and human verification.

Comparison with Existing Datasets

While existing medical datasets either offer pixel-level masks or clinical question–answering pairs, none integrate implicit queries with chain-of-thought (CoT) reasoning and fine-grained spatial grounding. U-MRG-14K uniquely combines all three: it supports reasoning-aware evaluation with high-quality QA pairs grounded in pixel-level masks across diverse anatomical regions. It is the first dataset to bridge segmentation and medical VQA under realistic, implicit clinical language.

Dataset # Prompts QAs Sup. Cat. CoT
SA-Med2D 20M - 219
BioMedParse 1.1M 3 82
IMED 361M 6 204
MoCoVQA 100K - -
U-MRG-14K 14K 15 108

Sup. = Super-categories    Cat. = Fine-grained Categories    CoT = Chain-of-Thought reasoning

Logo MedReasoner Framework

Our MedReasoner framework decouples language reasoning from visual segmentation, consisting of two modular components: a trainable Clinical Reasoning Module (CRM) that interprets implicit queries and predicts spatial prompts (a bounding box and two key points), and a frozen Anatomical Segmentation Module (ASM) that converts these prompts into high-resolution masks using MedSAM2. This design enables authentic reasoning without handcrafted spatial cues, avoids phrase overfitting, and supports plug-and-play compatibility with strong segmentation backbones.

To optimize the CRM, we design three categories of reward functions tailored to the UMRG task: (1) format rewards to enforce structured output, (2) box and point rewards to evaluate grounding accuracy, and (3) smoothing and penalization terms to ensure training stability and output plausibility. Together, these components guide the model toward reasoning-aligned spatial grounding. Extensive experiments confirm that MedReasoner achieves state-of-the-art performance on U-MRG-14K and generalizes well to unseen clinical queries.

Overview of the MedReasoner framework. MedReasoner transforms implicit clinical prompts into pixel-level grounding via a two-stage process. The CRM first generates intermediate reasoning and grounding outputs (CoT, bounding box, and key points). Then, the ASM converts the grounded outputs into final segmentation masks.

📊 Experiment Results on U-MRG-14K

🏆 U-MRG-14K Testset Performance

📈Results on the U-MRG-14K test set under the MedReasoner paradigm. Each candidate uses one medical MLLM as the CRM to output a bounding box and two key points; the ASM is fixed to MedSAM2. Bold numbers denote the best score in each column, and underlined numbers denote the second best.

# 🏆 Model Size Type IoU↑ pDice↑ Dice↑ Super-Categories (IoU↑)
Abdomen Brain Heart Lung Neoplasm Non-Neoplasm
1 MedReasoner 🏆 7B Grounding 32.42 26.55 37.78 30.27 32.81 34.72 50.75 33.58 37.19
2 Qwen2.5-VL 72B General 18.32 12.39 29.71 13.60 20.06 15.51 35.25 20.69 30.19
3 SegZero 7B Grounding 16.14 5.23 26.05 11.66 23.37 40.23 22.18 12.58 21.93
4 VLMR1-REC 3B Grounding 13.96 22.19 8.64 21.81 8.19 29.77 8.76 26.59
5 Qwen2.5VL 7B General 12.61 7.14 22.73 6.84 23.97 8.37 20.79 8.00 24.97
6 HuatuoGPT 7B Medical 10.13 5.23 19.76 5.88 18.16 6.63 22.94 8.25 16.12
7 Lingshu 7B Medical 8.19 3.73 16.48 4.03 15.72 6.27 19.77 6.34 13.31
8 MedR1 2B Medical 8.18 3.60 14.73 3.53 12.55 3.53 25.58 4.39 13.57
9 SAM4MLLM 8B Grounding 7.94 16.49 6.30 14.69 5.81 12.61 6.24 11.96
10 Gemini-2.5-flash General 7.86 3.24 14.29 3.99 5.69 7.77 16.37 7.15 13.91
11 Chiron-o1 8B Medical 6.40 2.46 10.05 3.82 6.90 4.20 12.86 5.53 11.31
12 InternVL3 8B General 5.70 2.46 9.23 3.72 6.54 3.67 14.44 3.78 8.71
13 MedGamma 4B Medical 5.39 1.90 8.90 4.23 6.92 3.41 4.78 3.17 3.90
14 InternVL3 78B General 4.02 1.55 7.23 2.04 2.95 2.12 12.21 1.33 8.19
15 MiniInternVL 4B Medical 2.88 0.85 4.76 1.88 2.67 1.60 7.99 1.56 3.76
16 GPT-4o General 2.65 1.12 4.72 0.92 0.91 0.36 11.70 1.01 4.16

🔡 Case Studies

🧩 Meta Information Examples

🧩 QA Pairs Examples

📚 BibTeX Citation


    @article{yan2025medreasoner,
      title={MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision},
      author={Yan, Zhonghao and Diao, Muxi and Yang, Yuxuan and Xu, Jiayuan and Zhang, Kaizhou and Jing, Ruoyan and Yang, Lele and Liu, Yanxi and Liang, Kongming and Ma, Zhanyu},
      journal={arXiv preprint arXiv:2508.08177},
      year={2025}
    }