📈 Evaluation on HepatoPathoBench against general and pathology-specific MLLMs. Single/Multi: single/multiple choice. WSI-P: patch-level BLEU on WSI captioning. WSI, ROI, Patch: multi-scale accuracy. Bold: best; underlined: second best.
| Model | Input | Morphological Analysis | Diagnosis | Multi-scale | Avg | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Open | Close | Open | Close | WSI↑ | ROI↑ | Patch↑ | |||||||
| WSI-P↑ | METEOR↑ | Single↑ | Multi↑ | WSI-P↑ | METEOR↑ | Single↑ | Multi↑ | ||||||
| Lingshu | Thumbnail | 0.53 | 0.17 | 0.38 | 0.44 | 0.73 | 0.18 | 0.39 | 0.38 | 0.52 | 0.52 | 0.49 | 0.50 |
| Huatuo-GPT | Thumbnail | 0.74 | 0.24 | 0.81 | 0.45 | 0.70 | 0.23 | 0.59 | 0.32 | 0.60 | 0.65 | 0.65 | 0.65 |
| Quilt-LLaVA | Thumbnail | 0.64 | 0.22 | 0.47 | 0.32 | 0.56 | 0.15 | 0.57 | 0.37 | 0.57 | 0.60 | 0.55 | 0.57 |
| Patho-R1 | Thumbnail | 0.66 | 0.19 | 0.87 | 0.50 | 0.20 | 0.05 | 0.59 | 0.45 | 0.55 | 0.55 | 0.54 | 0.55 |
| SlideChat | WSI | 0.70 | 0.17 | 0.87 | 0.47 | 0.72 | 0.14 | 0.63 | 0.39 | 0.66 | 0.68 | 0.66 | 0.66 |
| WSI-LLaVA | WSI | 0.69 | 0.20 | 0.84 | 0.46 | 0.67 | 0.16 | 0.65 | 0.36 | 0.65 | 0.67 | 0.64 | 0.65 |
| Hepato-LLaVA 🏆 | WSI | 0.79 | 0.33 | 0.97 | 0.88 | 0.75 | 0.33 | 0.87 | 0.68 | 0.82 | 0.83 | 0.83 | 0.83 |