Token Compressor is an MCP server that provides semantic prompt compression as a callable tool for LLM workflows. It enables AI assistants and MCP-compatible clients (such as Claude, Cursor, Windsurf) to compress input prompts by 40–60% using a two-stage process: local LLM-based rewriting (via Ollama) followed by embedding-based semantic validation. This preserves all conditionals and ensures that compressed prompts always meet a set semantic similarity threshold, saving on token usage and reducing operational costs without sacrificing meaning. Its primary users include AI tool developers, workflow integrators, and prompt engineers looking to optimize LLM context lengths and reduce costs across local or cloud AI pipelines.
Visit Token Compressor's official website for product details and getting started.