Conkurrence

Conkurrence

MCP server for multi-model, statistical LLM evaluation and consensus.

Visit Conkurrence

About Conkurrence

Conkurrence is an MCP server and AI evaluation toolkit designed to measure agreement and reliability across multiple large language model (LLM) providers such as OpenAI, Bedrock, and Gemini. It statistically assesses inter-rater reliability with metrics like Fleiss' kappa and bootstrap confidence intervals, and offers trend tracking, self-consistency checks, schema suggestion/validation, and cost estimation. Aimed at AI engineers, researchers, and evaluators, it integrates easily with popular AI assistants and clients (e.g., Claude Desktop) using the MCP protocol, enabling rigorous, repeatable, and transparent LLM evaluation in production or research workflows.

Resources

Product Website

Visit Conkurrence's official website for product details and getting started.

Visit website →