arXiv:2606.11440v1 Announce Type: new Abstract: Existing multi-agent LLM orchestration methods, ranging from brute-force ensembles to learned routers, select models and topologies based on task and model features. However, these methods do not consider the runtime state of the serving infrastructure. On shared GPU clusters under concurrent load,...
Läs hela artikeln hos källan.
Kommentarer (0)
Inga kommentarer ännu. Bli först med att kommentera!