Overview
BiSeNet is a real-time semantic segmentation network designed for efficient scene understanding. It addresses the challenge of balancing accuracy and speed in deep learning models for tasks like autonomous driving and video surveillance. The architecture consists of two branches: a Spatial Path for preserving spatial details and a Context Path with a fast downsampling strategy to obtain sufficient receptive field. These paths are fused to generate high-resolution segmentation maps. The implementation provided supports both BiSeNetV1 and BiSeNetV2, with pretrained weights available for Cityscapes, COCOStuff, and ADE20k datasets. It provides tools for training, evaluation, and deployment using TensorRT, ncnn, OpenVINO, and Triton Inference Server. The model's performance is benchmarked on various datasets, offering competitive mIOU and FPS metrics.
