Is there support for custom containers?

Yes, Inference Endpoints supports deploying models with custom containers, giving you full control over the inference environment.

What security and compliance measures are in place?

Hugging Face offers SSO support and adheres to industry-standard data compliance regulations like GDPR and SOC2.

Inference Endpoints

Overview

Hugging Face Inference Endpoints is a fully managed platform designed to simplify AI model deployment. It eliminates the complexities of infrastructure configuration, allowing developers to focus on building AI applications. The platform supports one-click deployment of models from the Hugging Face Hub and offers a catalog of ready-to-deploy models. It features autoscaling to handle varying traffic loads, comprehensive logging and metrics for observability, and integration with various inference engines like vLLM, TGI, SGLang, and TEI. It also provides seamless integration with the Hugging Face Hub for fast and secure model weight downloads. Inference Endpoints offers both self-serve, pay-as-you-go pricing and enterprise custom contracts with uptime guarantees and dedicated support.

Common tasks

Text Generation Feature Extraction Image-Text-to-Text

FAQ

View all

What types of models can I deploy on Inference Endpoints?

You can deploy any model from the Hugging Face Hub, including text generation, feature extraction, and image-to-text models. Custom models are also supported.

How does autoscaling work?

Inference Endpoints automatically scales up or down based on the traffic to your endpoint. This ensures optimal resource utilization and cost efficiency.

What inference engines are supported?

Inference Endpoints supports vLLM, TGI, SGLang, TEI, and custom containers.

How is pricing calculated?

Pricing is pay-as-you-go based on the instance type and usage duration. Enterprise plans with custom pricing are also available.

FAQ+