We're looking for a Senior Data Engineer (AWS-native | Spark | Tokenization & Claims Data) to join Source Meridian.
About Source Meridian
Source Meridian is a development software company that works to solve the industry's most challenging problems in healthcare practices. We are laser focused on specific technologies in the healthcare and life science industries: Healthcare technology, artificial intelligence, and healthcare interoperability.
About the Role
We're looking for a Senior Data Engineer to help build and operate an AWS-native data platform processing healthcare claims data and tokenized identifiers. You'll design and implement Spark-based pipelines that transform, intersect, and enrich tokenized datasets stored primarily as Parquet on S3, queried via Athena and related AWS services. This environment intentionally avoids managed lakehouse platforms (e.g., no Databricks and no Snowflake)—you'll be doing "real" data engineering directly on AWS.
What You'll Do
Build and maintain Spark pipelines to process large-scale Parquet datasets on S3.
Implement tokenization workflows, including transit token → real token conversion and dataset intersection/join logic.
Process and deliver healthcare claims datasets for matched individuals, ensuring accurate identity mapping and data integrity.
Orchestrate data pipelines using Airflow and/or AWS-native orchestration tools when appropriate.
Develop reliable, testable, and observable ETL/ELT processes (retries, idempotency, monitoring, reprocessing).
Optimize performance and cost across Spark jobs, S3 partitioning/layout, and Athena query patterns.
Contribute to dbt models when applicable (transformations, documentation, data quality checks).
Collaborate with cross-functional stakeholders in a healthcare environment, with a strong focus on privacy and secure data handling.
Required Qualifications
5+ years of professional experience in Data Engineering.
Strong experience with Apache Spark (PySpark or Scala), including joins, intersections, partitioning, and performance tuning.
Strong hands-on experience with the AWS data stack, including:
Amazon S3 (Parquet datasets, partition strategies, data layout best practices)
Amazon Athena (SQL, query optimization, managing large datasets)
Familiarity with AWS-native data lake patterns (Glue Catalog, Lake Formation concepts are a plus)
Experience building and operating pipelines using Airflow (DAGs, scheduling, dependencies, backfills).
Excellent SQL skills and solid data modeling fundamentals.
Advanced English level: able to lead technical discussions, write clear documentation, and work directly with US-based stakeholders.
Nice to Have
Experience with dbt (core, tests, documentation, exposures).
Familiarity with healthcare data (claims data, eligibility, member-level datasets).
Experience with tokenization, identity resolution, or privacy-preserving data workflows.
Knowledge of AWS security concepts such as IAM, KMS, encryption, and secure data handling.
Experience running Spark on AWS (e.g., EMR) or Spark-on-containers architectures.
Tech Stack
AWS-native architecture
Amazon S3 + Parquet (core storage layer)
Amazon Athena (query engine)
Apache Spark (no Databricks)
Airflow (orchestration)
dbt (optional, as applicable)
Soft Skills
Strong and empathetic leadership.
Proven client-facing experience.
Excellent communication skills.
Strong expectation management abilities.
Strategic mindset with a solution-oriented approach and strong decision-making skills.
What We Offer
Permanent contract
Learning and continuous growth environment
Benefits package focused on health and well-being
Competitive salary based on experience
Apply only if you reside in Colombia or Ecuador
At Source Meridian, you'll be part of a high-impact tech-health company, building products that truly make a difference.
If you meet the profile — or know someone who might be interested — apply now
We'd love to meet you
-
Quito Source Meridian Full timeWe're looking for a Senior Data Engineer to help build and operate an AWS-native data platform processing healthcare claims data and tokenized identifiers. · Build and maintain Spark pipelines to process large-scale Parquet datasets on S3. · ...