SysOps Engineer – Monitoring & Cloud Operations

Remote, USA
Posted Jun 13, 2026
Full-time

About Tap
Tap Payments is revolutionizing online payments across the MENA region by connecting businesses with simple, unified payment experiences. We need exceptional talent to help us on this journey.

The Technology Team
Our technology team builds the platforms, systems, and payment infrastructure our merchants use to process millions of transactions daily.
This team is building technology solutions to simplify MENA payments regionally and globally for businesses of all sizes.

As a Tapster you will:
Monitor infrastructure using tools like New Relic, Prometheus, and Grafana

Configure and maintain alerts, dashboards, and service health checks

Perform incident management, troubleshooting, and root cause analysis (RCA)

Ensure uptime and SLA compliance for all systems

Monitor CPU, memory, disk, and system processes

Manage OS-level operations (Linux/Windows) including patching and tuning

Manage system backups and perform regular restoration validation

Execute and validate disaster recovery (DR) plans across environments

Perform failover and failback testing for critical services (on-prem
cloud / multi-region)

Coordinate DR drills and simulate outage scenarios

Ensure replication health and data consistency (in coordination with DataOps)

Maintain and update DR runbooks and incident playbooks

Perform capacity planning and performance optimization

Maintain logs, metrics, and operational documentation

What you will bring to the party:
Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience.

Proven experience in Systems Operations, Cloud Operations, Infrastructure Support, Site Reliability Engineering (SRE), or a related role.

Strong hands-on experience administering Linux and Windows operating systems.

Experience with enterprise monitoring and observability platforms such as New Relic, Prometheus, Grafana, Datadog, or similar tools.

Solid understanding of incident management, problem management, and root cause analysis methodologies.

Experience supporting cloud platforms such as AWS, Azure, or Google Cloud Platform.

Strong knowledge of backup, disaster recovery, business continuity, and failover processes.

Experience managing compute infrastructure, including virtual machines, cloud instances, and physical servers.

Familiarity with system services and web servers such as Nginx, IIS, and systemd.

Understanding of capacity planning, performance tuning, and infrastructure optimization practices.

Strong troubleshooting and analytical skills with the ability to resolve complex operational issues.

Excellent communication, documentation, and cross-functional collaboration skills.

Experience working in high-availability, mission-critical production environments is highly preferred.

Are you ready to shape the future of payments in MENA?

More Remote Jobs