arXiv:2602.07768v3 Announce Type: replace-cross Abstract: Distilling knowledge from large Vision-Language Models (VLMs) into lightweight networks is crucial yet challenging in Fine-Grained Visual Classification (FGVC), due to the reliance on fixed prompts and global alignment. To address this, we propose PAND (Prompt-Aware Neighborhood...
Læs hele artiklen hos kilden.
Kommentarer (0)
Ingen kommentarer ennå. Bli den første til å kommentere!