Computer Vision Engineer (Detection, Tracking & 2D Metric Calibration Specialist)

Remote, USA
Posted Jun 14, 2026
Full-time

Project Context

CrackCoach is an AI platform for automatic analysis of show-jumping videos.

This role builds the IMAGE-level perception and geometry stack that everything depends on: detection, tracking, obstacle understanding, jump segmentation, and metric calibration in real-world competition footage.

Without a rock-solid perception and geometric foundation, pose estimation, biomechanics, and AI coaching are not reliable.

Core Mission and Responsibilities

You will design, implement, and validate a production-grade computer vision pipeline capable of ingesting raw competition videos and producing robust, structured, and metric-aware outputs.

Your responsibilities include:

• Video ingestion and preprocessing: handle codecs, resolutions, FPS, orientation, stabilization, and cropping policies.

• Horse-and-rider detection using state-of-the-art detectors (YOLO / RT-DETR / Detectron2 or equivalent).

• Persistent tracking across frames (ByteTrack, BoT-SORT, DeepSORT, Kalman-based trackers).

• Obstacle detection and scene understanding for show-jumping arenas (rails, poles, standards).

• Obstacle-to-jump association logic: correctly identify which obstacle is being jumped and when.

• Automatic segmentation of a full round video into individual jump clips (per-obstacle segments).

• 2D trajectory reconstruction of the horse in image space with stable, low-jitter trajectories.

2D Metric Calibration (Image → Ground Plane)

In addition to perception, this role includes implementing a robust 2D metric calibration module:

• Estimate a ground-plane homography (image → ground) using stable scene references such as obstacle bases or other ground contact points.

• Compute a pixel-to-meter scale, ideally leveraging known or user-declared obstacle heights (e.g. “course at 1.35m”) when available.

• Project horse trajectories from image space to ground-plane coordinates in meters.

• Enable metric estimates such as:

• approach speed (m/s)

• distances between obstacles (m)

• take-off and landing distances at ground level (m)

• approximate stride length at ground level (when combined later with biomechanics)

• Provide a calibration confidence indicator and gracefully fall back to relative (pixel-based) measures when calibration is unreliable.

The calibration module must be robust, non-blocking, and designed for real-world competition footage (single camera, uncontrolled viewpoints).

Required Technical Skills

• Strong background in computer vision applied to video (sports footage experience is a strong plus).

• Proven experience with object detection (YOLO family, Detectron2, RT-DETR, etc.).

• Multi-object tracking expertise (ByteTrack / BoT-SORT / DeepSORT; handling occlusions and ID switches).

• Experience with segmentation models (Mask R-CNN, YOLO-Seg, SAM-family) if needed for background removal.

• Solid understanding of image-space geometry and camera perspective limitations.

• Experience implementing 2D metric calibration using planar homography and RANSAC.

• Comfortable working with pixel-to-meter conversions and expressing metric uncertainty.

• Advanced Python and OpenCV; deep learning framework (PyTorch preferred).

• Experience building modular, maintainable pipelines with clear interfaces and exports.

Key Technical Challenges

• Highly variable camera angles, zoom levels, and lighting conditions.

• Dynamic occlusions from obstacles, rails, other horses, and spectators.

• Motion blur and compression artifacts in user-generated videos.

• Background clutter and false positives (banners, rails, similar shapes).

• Maintaining stable trajectories despite noisy detections and temporary misses.

• Correct obstacle differentiation and obstacle association in multi-obstacle scenes.

• Metric calibration with a single camera, limited scene control, and partial reference data.

• Performance constraints: processing HD videos in minutes, not hours.

Expected Deliverables

• A fully modular computer vision pipeline (source code) that ingests raw video and outputs:

• detections

• tracks

• obstacle detections

• jump segments

• 2D trajectories

• ground-plane metric projections (when calibration is reliable)

• A 2D calibration module producing pixel-to-meter scale, ground-plane mapping, and confidence scores.

• Trained detection/segmentation models (weights + training scripts) when custom training is required.

• Clean data exports (JSON / CSV) and stable ROI frame exports for pose estimation and biomechanics.

• Visual validation outputs (overlays showing boxes, tracks, obstacles, jump boundaries, and metric projections).

• Clear technical documentation defining interfaces and data formats for downstream pose estimation, biomechanics, and AI coaching stages.

Important Notes

• This role does NOT include pose estimation or biomechanics (handled by separate specialists).

• Metric calibration is 2D ground-plane based, not full 3D reconstruction.

• Robustness and graceful degradation are more important than theoretical precision.

Apply tot his job

More Remote Jobs