When Attention Collapses: Stage-Aware Visual Token Pruning from Structure to Semantics

Jiahui Wang, Kai Zhang, Mai Han, Huanghe Zhang

Jun 3, 2026 at 04:00

11 Views

0 Comments

arXiv:2606.03569v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have demonstrated remarkable capabilities but suffer from significant computational overhead during inference. While visual token pruning offers a promising solution, existing methods predominantly rely on initial attention scores. This single-metric paradigm presents...

Read the full article at the source.

Read Original Article

Was this helpful?