Captum
Captum is an open-source, extensible PyTorch library for model interpretability, supporting multi-modal models and facilitating research in interpretability algorithms.
Arize Phoenix is an open-source LLM tracing and evaluation tool that helps evaluate, experiment, and optimize AI products in real-time.

Arize Phoenix is an open-source observability tool designed for tracing and evaluating Large Language Models (LLMs). It provides real-time insights into the performance of AI applications, enabling users to experiment, debug, and optimize their AI products efficiently. Leveraging OpenTelemetry, Phoenix ensures seamless setup and full transparency without vendor lock-in. Key capabilities include application tracing for visibility, an interactive prompt playground for iteration, streamlined evaluations and annotations, and dataset clustering and visualization to uncover performance issues. Phoenix is vendor-agnostic and framework-agnostic, making it suitable for various LLM tools and projects. It is particularly valuable for AI engineers and developers seeking to monitor and improve the decision-making processes of their LLMs in production.
Arize Phoenix is an open-source observability tool designed for tracing and evaluating Large Language Models (LLMs).
Explore all tools that specialize in tracing llm requests. This domain focus ensures Arize Phoenix delivers optimized results for this specific requirement.
Explore all tools that specialize in evaluating llm performance. This domain focus ensures Arize Phoenix delivers optimized results for this specific requirement.
Explore all tools that specialize in experimenting with different prompts and models. This domain focus ensures Arize Phoenix delivers optimized results for this specific requirement.
Explore all tools that specialize in debugging llm failures. This domain focus ensures Arize Phoenix delivers optimized results for this specific requirement.
Explore all tools that specialize in visualizing llm decision-making. This domain focus ensures Arize Phoenix delivers optimized results for this specific requirement.
Explore all tools that specialize in identifying problematic llm responses. This domain focus ensures Arize Phoenix delivers optimized results for this specific requirement.
Automatically instruments LLM applications using OpenTelemetry to capture request-response traces, providing visibility into the entire LLM workflow.
A sandbox environment for experimenting with different prompts and models, allowing users to compare outputs and debug failures in real-time.
Provides a library of evaluation templates that can be customized for any task, enabling users to easily incorporate human feedback into the evaluation process.
Uses embeddings to cluster semantically similar questions, document chunks, and responses, helping users isolate performance issues.
Phoenix is fully open-source and self-hostable, meaning users have full control over their data and can modify the software to fit their needs.
Install Phoenix using pip: `pip install arize-phoenix`.
Import Phoenix in your Python code: `import phoenix as px`.
Start the Phoenix server: `px.launch_app()`.
Instrument your LLM application with OpenTelemetry.
Log LLM requests and responses as traces.
Use the Phoenix UI to visualize traces and evaluate performance.
Customize evaluation templates for specific tasks.
Integrate human feedback into the evaluation process.
All Set
Ready to go
Verified feedback from other users.
"Users praise Arize Phoenix for its visual clustering and model interpretability features, highlighting its usefulness in debugging and troubleshooting LLM applications. It is seen as a significant advancement in model observability and production."
0Post questions, share tips, and help other users.
Captum is an open-source, extensible PyTorch library for model interpretability, supporting multi-modal models and facilitating research in interpretability algorithms.
LibreChat is an open-source AI platform that unifies all your AI conversations in a customizable interface.
Grepper is an AI search infrastructure delivering real-time, accurate results for RAG and agentic AI applications.
APEER is a low-code platform for computer vision, allowing users to build and deploy AI-powered applications without extensive coding.
OpenVoiceOS is a community-driven, open-source voice AI platform for creating custom voice-controlled interfaces across devices.
Neptune.ai is a comprehensive experiment tracker designed for foundation models, enabling users to monitor, debug, and visualize metrics at scale.

Advanced instruction tuning for code LLMs using Git commit history.

Translate natural language into high-performance code with the engine powering GitHub Copilot.