T3 Operations & Support Specialist — Compute & OS (PID9066)

Remote, USA

Posted Jun 14, 2026

Full-time

This is a remote position.

Contract / Freelance

Full-time

Remote with travel readiness required (Germany)

Start: ASAP

About the role

We are working with a long-standing anchor client to source a T3 Operations & Support Specialist (Compute & OS) for a large-scale cloud-native platform programme supporting a major energy transmission operator in Germany. The platform is a service-oriented hybrid cloud environment providing application teams with self-service capabilities to develop, run and operate software products across private and public cloud infrastructure.

In this role you will provide Tier-3 operational ownership for Compute & Operating System services within Local Production (DE), handling complex incidents, deep troubleshooting and root cause analysis, and driving permanent fixes and preventive measures.

What you'll be doing

Providing T3 operational ownership for Compute & OS services: handling complex incidents, troubleshooting and RCA, and driving permanent fixes and preventive measures

Ensuring compute/OS readiness for releases and changes: monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks

Executing and improving standard operational procedures through automation to reduce toil and improve MTTR and stability

Coordinating with Kubernetes, Data, Network and Storage SMEs to resolve cross-domain production issues

Validating deployment artefacts from an operations perspective and enforcing quality assurance measures

Monitoring system health, performance metrics and service availability across multi-tenant environments

Identifying, analysing and resolving incidents to minimise service disruption, and triggering RCA and corrective actions

Implementing monitoring and logging strategies to support audit and compliance requirements

Performing routine security scans and remediating identified vulnerabilities

Requirements

What you'll need

5 to 10+ years in IT operations, service delivery or platform operations with demonstrated leadership in mission-critical environments

Proven experience implementing and leading Incident, Problem, Change and Release governance in production

Hands-on experience with VMware 8 virtualisation

Operating Systems: Red Hat Enterprise Linux and Ubuntu

OS tooling: Satellite, IPA, Certificate Server

ITSM/collaboration tooling: Jira Service Management, Jira, Confluence

Fundamental understanding of core operations processes (Incident, Change, Problem management, ITSM) and SRE concepts

Experience gathering operational insights from monitoring/observability including SLI/SLA/SLO management and tracking

Hands-on experience documenting procedures and enforcing clear runbooks and playbooks

Hands-on experience with monitoring and logging tools (e.g. Prometheus, Grafana, Datadog, Mimir, Loki)

Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to govern specialists

Fluent English and German (C1 minimum in both)

Desirable

Experience operating in regulated or high-availability industries (banking, telco, public sector, healthcare)

Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management

Familiarity with enterprise DevOps toolchains (GitLab, JFrog Artifactory, Backstage, Harness)

GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm)

Benefits
As a freelancer / contractor with us, you will enjoy flexible working hours and the freedom to choose your own projects. Our platform gives you access to exciting projects in various industries and supports you in advancing your career. You'll benefit from competitive pay and a dedicated team to help you with any questions you may have. Work independently and utilise our strong network to achieve your professional goals.

Apply Now

T3 Operations & Support Specialist — Compute & OS (PID9066)

Requirements

More Remote Jobs