by NVIDIA· Released October 2024· Cutoff June 2024
Llama 3.1 Nemotron Nano 8B is a small, efficient language model optimized for low-latency inference on NVIDIA GPUs. It is part of NVIDIA's Nemotron family, designed for edge and real-time applications where speed and resource efficiency are critical.
Input cost
Free (open source)
Output cost
Free (open source)
Context window
128K tokens
Max output
4096 tokens
Modalities
Parameters
8B
License
NVIDIA Open Model License
Real-time, low-latency applications on edge devices or resource-constrained environments.