Senior Data Engineer

Remote, USA
Posted Jun 14, 2026
Full-time

Position Overview

Every release we ship serves families living with dementia. As our Senior Data Engineer, you will be the person who turns that flood of data into something a clinician, an analyst, and an AI model can each trust. You will design the ingestion, validation, and curation layers that sit between our partners and our products: a durable landing zone for raw data, pipelines shipping data into our transactional Postgres database, and a clean analytics layer that lets data science, BI, and AI features move fast without ever putting our production system at risk.

We value simplicity over sprawl, so a measured approach to introducing complexity and new tooling is important. If you love Postgres, write SQL like prose, and have strong, yet gently held opinions about when a data lake earns its complexity, this role is built for you.

About Us & What We Do

Ceresti is on a mission to reduce avoidable hospitalizations and improve care for people living with dementia while improving the lives of their families and caregivers. We envision a world in which family caregivers of people living with dementia are supported and have the knowledge, skills, and confidence to provide the best possible care for their loved ones. 

Ceresti is a tech-enabled dementia care provider with a differentiated model of care that improves outcomes and delivers guaranteed cost savings by including family caregivers in the care team. We offer health plans and accountable care organizations a turnkey solution for impacting a population that has limited engagement in traditional clinical programs. 

Our culture is rooted in agility, innovation, and collaboration. We believe that every idea, no matter how small, can spark a meaningful improvement. We work in cross-functional Agile teams that move fast, ship often, and learn together. Together, we create solutions that make a lasting impact on the healthcare ecosystem, enabling more compassionate and cost-effective care for those who need it most. 

Responsibilities

Design and own Ceresti’s end-to-end data architecture: a landing zone with secure cloud object storage for raw partner files and API payloads, validated ingestion pipelines into our transactional Postgres, and a curated analytics layer that decouples reporting and AI workloads from production

Build ingestion pipelines for the data we receive today, including partner data files (CSV/JSON/XML/HL7/X12 as applicable) and REST/SFTP API integrations with schema validation, quarantine of bad records, and full lineage from raw bytes to curated row

Stand up and operate the curated layer (data warehouse / lakehouse-lite) so analytics and ML models can consume data without slowing down the transactional system

Choose, integrate, and operate the smallest set of tools needed, including object storage, an orchestrator (Dagster, Prefect, Airflow, etc.), dbt or similar for transformations, a single validation library (Great Expectations / Pandera / Soda)

Design and enforce data governance for a HIPAA-regulated environment: PHI/PII classification, encryption in transit and at rest, role-based access, audit logging, retention and minimum-necessary policies, and de-identification where appropriate

Partner with backend, ML, product, and clinical stakeholders to define data contracts with our health plan and ACO partners and hold the line on data quality

Build and maintain reliable feature data for ML models, including embeddings (e.g., pgvector) and curated feature tables for risk stratification, engagement, and outcomes work

Instrument the data platform for observability including pipeline SLAs, data freshness, schema drift, quality metrics, and act on what the data tells you

Participate fully in our Agile process: backlog grooming, sprint planning, demos, and retrospectives

Mentor engineers across the team on SQL, schema design, and the craft of building data systems that are boring in the best possible way

Required Qualifications

Education

BS/BA degree or higher in Computer Science, Engineering, or a related technical field

Experience

8+ years of professional data engineering experience, with a track record of shipping production data systems end-to-end

Mastery of PostgreSQL: schema design, indexing, query tuning, partitioning, logical replication, JSONB, extensions (pg_partman, pg_cron, pgvector, etc.), and operating Postgres at scale

Strong experience designing and operating data pipelines, including file-based ingestion (SFTP / object storage drops) and API-based ingestion (REST, webhooks)

Hands-on experience with one or more cloud platforms (AWS preferred) and their data primitives: object storage (S3), managed Postgres

Experience designing data warehouses and/or data lakes and the judgment to know which one a given problem actually needs

Strong experience with dbt (or equivalent SQL-based transformation framework) and modern data modeling patterns (Kimball dimensional, Data Vault, One Big Table — and an opinion about when each is right)

Experience with at least one orchestration framework (Dagster, Prefect, or Airflow) and a clear point of view on which to use when

Strong Python skills for ingestion, validation, and tooling

Experience with data validation and data-quality frameworks (Great Expectations, Pandera, Soda, or equivalent)

Experience with change-data-capture from Postgres (logical replication, or equivalent)

Data governance experience in a HIPAA-regulated environment or, at minimum, demonstrated instincts for protecting PHI and PII (encryption, least privilege, audit, de-identification, BAA-aware vendor selection); HITRUST or SOC 2 experience is a strong plus

Comfortable with infrastructure-as-code and CI/CD for data systems

Experience supporting ML workloads: building feature tables, managing training data, serving features at inference time; familiarity with embeddings, vector search (pgvector or equivalent), and LLM integration patterns (RAG, prompt-grounded analytics) is a plus

Experience using AI coding assistants (e.g., GitHub Copilot, Cursor, Claude) to accelerate development

Excellent written and verbal communication skills: you can explain a tricky schema decision to a business stakeholder and a data contract to a partner with equal clarity

Demonstrated experience working in Agile/Scrum teams

Skills

Reliable, persistent and results-oriented

Easy to get along with; able to work with a team

Must demonstrate a high level of integrity and ownership

Consistently transparent, courageous and enthusiastic

Bias toward simplicity: you can recite the trade-offs of the heavyweight modern data stack and still default to the smallest thing that works

Must be able to pass a background check

Job Type

Full time

Location

This position is entirely remote. US-based candidates only.

What We Offer

Competitive salary and benefits package

Opportunities for professional growth and development

Collaborative and dynamic work environment

Flexible work arrangements and remote work options

Access to cutting-edge technologies and tools

The chance to do work that directly improves the lives of patients with dementia and the families who love them

Join us to build the data foundation that empowers family caregivers and improves patient care outcomes, all while advancing your career in a dynamic, growth-oriented environment.

More Remote Jobs