arXiv:2602.07768v3 Announce Type: replace-cross Abstract: Distilling knowledge from large Vision-Language Models (VLMs) into lightweight networks is crucial yet challenging in Fine-Grained Visual Classification (FGVC), due to the reliance on fixed prompts and global alignment. To address this, we propose PAND (Prompt-Aware Neighborhood...
Läs hela artikeln hos källan.
Kommentarer (0)
Inga kommentarer ännu. Bli först med att kommentera!