MMToM-QA is a benchmark and method for evaluating and advancing multimodal Theory of Mind (ToM) reasoning in AI systems. It tests whether machines can understand human beliefs, goals, and intentions from multimodal inputs (such as behavior, motion, and speech), and proposes a new approach that combines the flexibility of large language models (LLMs) with Bayesian inverse planning for improved robustness. This tool is intended for researchers in artificial intelligence and cognitive science interested in AI social intelligence and Theory of Mind.
Visit MMToM-QA's official website for product details and getting started.