NOISER: Bounded input perturbations for attributing large language models

By Miniml Research, April 3, 2025

In Conference on Language Modeling (COLM)

Attribution methods for LLMs can be fragile or inconsistent across prompts. NOISER tackles this by adding bounded noise to input embeddings and measuring how output probabilities shift under those controlled perturbations.

Because the perturbations are bounded, the method preserves input meaning while still exposing which tokens drive the model’s behavior. The paper reports stronger faithfulness than gradient, attention, and other perturbation baselines across multiple models and tasks.

NOISER offers a practical route to more reliable attribution in settings where interpretability and auditing are important.

Paper: https://arxiv.org/abs/2504.02911

Stay ahead with research-backed solutions

From papers to production, we translate cutting-edge AI research into practical systems that give your business a competitive edge.

See how we work