Researchers have introduced SalesSim, a framework and testbed for evaluating how Multimodal Large Language Models simulate realistic, persona-driven customer behaviour in multi-turn, multi-modal, tool-augmented online retail conversations 1. The paper, authored by Yada Pruksachatkun, Elaine Wan, Lyanna Chen, Kai-Wei Chang, and Chien-Sheng Wu, was submitted to arXiv on 8 May 2026 1.
SalesSim models retail interaction and decision-making as a grounded, agentic process. Shoppers with diverse backgrounds, preferences, and dealbreakers interact with a sales agent, seek clarifications, and make informed purchasing decisions 1. The framework centres evaluation on decision alignment—measuring consistency between the simulator's actions and its persona specifications—as well as conversational quality 1.
Benchmarking six open and closed-source state-of-the-art models revealed significant behavioural gaps 1. While models produce fluent conversations, they display substantially lower lexical diversity and overdisclose criteria across personas compared to human interactions 1. They also tend to be persuaded by sales agent suggestions and drift from persona specifications 1.
Even the strongest model achieves less than 79% average alignment with its underlying persona specifications 1. This highlights a fundamental challenge in current multimodal language models: maintaining consistency with assigned customer profiles during extended retail interactions 1.
To address these limitations, the researchers propose UserGRPO, a multi-turn, multi-objective reinforcement learning recipe that optimises both conversational fluency and decision alignment under persona specifications 1. Experiments show UserGRPO boosts the baseline model's decision alignment by 13.8% while improving conversational quality 1.
The paper is categorised under Computation and Language with the arXiv identifier arXiv:2605.08334 1. By introducing SalesSim, the researchers provide a new testbed for the community to investigate and improve user simulator adherence in goal-oriented settings 1.
ops.llm_calls. Every fact traces to a citation. If a fact looks wrong, write to corrections.