Langfuse is an open source platform for LLM engineering and observability. It helps teams understand what their AI applications are actually doing in production: tracing every LLM call, tool use, and agent step with full input/output capture, latency, cost, and custom metadata. Traces are visualized in a timeline view that makes debugging multi-step agent workflows practical. Beyond tracing, Langfuse includes a prompt management system (version prompts, A/B test variants, track performance by prompt version), dataset management for running evals, and a scoring system where you can log human feedback, LLM-as-judge scores, or custom metrics. All evaluation results are linked to traces so you can see which inputs caused quality regressions. Langfuse integrates with LangChain, LlamaIndex, OpenAI, Anthropic, and most other frameworks via auto-instrumentation SDKs for Python and JavaScript. The self-hosted version runs on Docker Compose and is free with no restrictions. Langfuse Cloud offers managed hosting with a generous free tier.

What the community says

Langfuse is the most-recommended LLM observability tool in developer communities building production AI systems, often cited alongside Helicone in 'what monitoring do you use' threads. AI engineers praise the trace visualization for debugging complex agent workflows and the prompt management system for keeping prompt iterations organized. Some teams prefer the fully hosted Helicone for simpler use cases where self-hosting overhead isn't worth it. The active development cadence and responsive maintainers on GitHub contribute to its strong community reputation.

See alternatives to Langfuse →