When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability

Suraj Babu Thimma Krishnaram

Jun 5, 2026 at 04:00

8 Visninger

0 Kommentarer

arXiv:2606.05654v1 Announce Type: cross Abstract: Hate moderation is often evaluated as classification on clean English inputs, but deployed systems must route content to actions such as ALLOW, FLAG, or REVIEW. We study how this workflow changes under code-mixed inputs using a paired evaluation setting where the same underlying content is...

Læs hele artiklen hos kilden.

Læs original artikel

Var dette nyttigt?

Del: