This thesis addresses multiple challenges in the field of vision-based surveillance and monitoring. This includes low-level preprocessing, multi-camera object detection, multi-object tracking as well as gait-based person identification. More precisely, methods to reliably detect people from multiple heterogeneous video sources and to track them over time in potentially crowded scenarios are studied and novel improvements are presented. All presented methods are tested on publicly available benchmark databases and significant performance improvements can be reported. Furthermore, the biometric modality of gait is used to identify people at a distance. New methods for gait signature extraction and identification are presented and show substantial performance gain over the current state-of-the-art.