📈Results on the U-MRG-14K test set under the MedReasoner paradigm. Each candidate uses one medical MLLM as the CRM to output a bounding box and two key points; the ASM is fixed to MedSAM2. Bold numbers denote the best score in each column, and underlined numbers denote the second best.
# | 🏆 Model | Size | Type | IoU↑ | pDice↑ | Dice↑ | Super-Categories (IoU↑) | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Abdomen | Brain | Heart | Lung | Neoplasm | Non-Neoplasm | |||||||
1 | MedReasoner 🏆 | 7B | Grounding | 32.42 | 26.55 | 37.78 | 30.27 | 32.81 | 34.72 | 50.75 | 33.58 | 37.19 |
2 | Qwen2.5-VL | 72B | General | 18.32 | 12.39 | 29.71 | 13.60 | 20.06 | 15.51 | 35.25 | 20.69 | 30.19 |
3 | SegZero | 7B | Grounding | 16.14 | 5.23 | 26.05 | 11.66 | 23.37 | 40.23 | 22.18 | 12.58 | 21.93 |
4 | VLMR1-REC | 3B | Grounding | 13.96 | — | 22.19 | 8.64 | 21.81 | 8.19 | 29.77 | 8.76 | 26.59 |
5 | Qwen2.5VL | 7B | General | 12.61 | 7.14 | 22.73 | 6.84 | 23.97 | 8.37 | 20.79 | 8.00 | 24.97 |
6 | HuatuoGPT | 7B | Medical | 10.13 | 5.23 | 19.76 | 5.88 | 18.16 | 6.63 | 22.94 | 8.25 | 16.12 |
7 | Lingshu | 7B | Medical | 8.19 | 3.73 | 16.48 | 4.03 | 15.72 | 6.27 | 19.77 | 6.34 | 13.31 |
8 | MedR1 | 2B | Medical | 8.18 | 3.60 | 14.73 | 3.53 | 12.55 | 3.53 | 25.58 | 4.39 | 13.57 |
9 | SAM4MLLM | 8B | Grounding | 7.94 | — | 16.49 | 6.30 | 14.69 | 5.81 | 12.61 | 6.24 | 11.96 |
10 | Gemini-2.5-flash | — | General | 7.86 | 3.24 | 14.29 | 3.99 | 5.69 | 7.77 | 16.37 | 7.15 | 13.91 |
11 | Chiron-o1 | 8B | Medical | 6.40 | 2.46 | 10.05 | 3.82 | 6.90 | 4.20 | 12.86 | 5.53 | 11.31 |
12 | InternVL3 | 8B | General | 5.70 | 2.46 | 9.23 | 3.72 | 6.54 | 3.67 | 14.44 | 3.78 | 8.71 |
13 | MedGamma | 4B | Medical | 5.39 | 1.90 | 8.90 | 4.23 | 6.92 | 3.41 | 4.78 | 3.17 | 3.90 |
14 | InternVL3 | 78B | General | 4.02 | 1.55 | 7.23 | 2.04 | 2.95 | 2.12 | 12.21 | 1.33 | 8.19 |
15 | MiniInternVL | 4B | Medical | 2.88 | 0.85 | 4.76 | 1.88 | 2.67 | 1.60 | 7.99 | 1.56 | 3.76 |
16 | GPT-4o | — | General | 2.65 | 1.12 | 4.72 | 0.92 | 0.91 | 0.36 | 11.70 | 1.01 | 4.16 |