arXiv:2606.12138v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through \emph{feature stability}: for each SAE feature, we estimate the probability that a...
Les hele artikkelen hos kilden.
Kommentarer (0)
Ingen kommentarer ennå. Bli den første til å kommentere!