In this paper we present a comparative study of tracking-by-detection approaches applied to passenger counting in city buses. A detector targets passengers at each frame, a tracker then matches detections together through time to produce trajectories. We compare three deep learning detectors still under-explored in our context, and couple them with a real time tracker for global evaluation on our large scale in situ dataset. The results we present are very encouraging in terms of detection, tracking rate and speed expected for our embedded perspectives.