arXiv:2606.03483v1 Announce Type: cross Abstract: Hyper-Connections (HC) replace the single Transformer residual stream with multiple streams, introducing a permutation symmetry over stream indices. We study how this symmetry is resolved in practice: whether streams specialize in a balanced way or exhibit dominant-stream usage. Using fine-grained...
Read the full article at the source.
Comments (0)
No comments yet. Be the first to comment!