Spring 2026 – Data Engineering Internship

November 4, 2025

Apply for this job

Job Description

About Us

We’re continuing to build a transformative healthcare accreditation platform that is revolutionizing how our clients and new hospitals manage compliance, quality improvement, and regulatory processes. Our platform combines cutting-edge technology with deep healthcare domain expertise to solve real problems for healthcare organizations nationwide.

The Opportunity

The goal is to have interns turn into full time employees; Therefore, you will be given full time responsibilities day one. To add onto that, you will be working in a high velocity growth startup and will be required to move fast. You’ll work directly with our engineering team on a production healthcare platform, gaining hands-on experience with enterprise-grade systems while making real contributions that impact our product and customers.

Compensation Structure: Base position is unpaid, however qualified candidates may receive upfront equity compensation based on their experience level and demonstrated capabilities. We evaluate each applicant individually and offer equity packages commensurate with their potential contribution.

What You’ll Build

  • Entity Resolution Systems: Healthcare facility lookup and matching components
  • Data Processing Pipelines: ETL workflows for compliance tracking
  • Analytics: Data trigger refresh and dashboard feeding systems
  • ML Pipeline Integration: Data infrastructure supporting our ML team’s models
  • Quality Monitoring: Data validation and monitoring systems for healthcare data

Key Responsibilities

Data Pipeline Development:

  • Build AWS Lambda functions for processing APIs and healthcare data sources
  • Develop ETL pipelines using Python and SQL for healthcare compliance data
  • Implement data validation and quality checks for sensitive healthcare information
  • Create automated data processing workflows for accreditation documents

Database & Storage:

  • Work with MongoDB Atlas and S3
  • Design and optimize SQL database schemas
  • Implement S3-based data lake architecture with Bronze/Silver/Gold zones
  • Build caching systems using Redis for performance optimization
  • Use/Configure Athena for interactive analytics and querying

Analytics Support:

  • Create data feeds for executive dashboards and KPI tracking
  • Build healthcare-specific analytics and benchmarking data pipelines
  • Support Batch processing systems for hospital quality metrics
  • Collaborate with our ML team to integrate predictive models into data workflows
  • Build healthcare-specific analytics and benchmarking data pipelines using Athena for database queries

Healthcare Compliance:

  • Implement HIPAA-compliant data processing and audit trail systems
  • Build data governance and documentation standards
  • Create automated monitoring and alerting for data quality issues

Required Qualifications

Technical Skills:

  • 1-3+ years with Python and SQL: Data processing, scripting, and database querying
  • 1-2 years AWS Experience: knowledge of Lambda, S3, or other cloud data services
  • Database Knowledge: Experience with SQL databases and/or NoSQL (MongoDB preferred)
  • ETL/Data Processing: Understanding of data pipeline concepts and batch processing

Data Engineering Fundamentals:

  • Experience with data transformation and cleaning
  • Understanding of data warehousing and lake concepts
  • Basic knowledge of data quality and validation techniques
  • Familiarity with version control (Git) and collaborative development

Nice to Have:

  • Healthcare or regulated industry experience
  • Experience with Spark, Delta Lake, or distributed computing
  • Knowledge of data visualization and BI tools
  • Understanding of ML pipeline integration
  • HIPAA compliance knowledge

Technical Stack

Data Processing:

  • Languages: Python, SQL
  • Cloud: AWS (Lambda, S3, EMR, Athena)
  • Databases: MongoDB Atlas, PostgreSQL, Redis
  • Processing: Spark, Delta Lake, batch and stream processing

Integration & ML:

  • ML Integration: SageMaker, MLflow (working with our ML team)
  • Analytics: Athena for scheduled/triggered queries
  • Monitoring: CloudWatch, automated data quality checks

Our Hiring Process

We believe in a transparent and thorough selection process that respects your time while ensuring mutual fit:

1. Initial Screening Call We’ll discuss your background, experience, and career goals, while providing an overview of the role and our team culture.

2. Technical Challenge You’ll receive a real-world technical challenge to complete within a specified timeframe. We encourage you to leverage all available resources—including AI tools, documentation, and libraries—just as you would in a production environment. This reflects how we actually work and allows you to showcase your problem-solving approach.

3. Technical Interview We’ll have an in-depth discussion about your solution and explore related technical concepts. You should be prepared to walk through every aspect of your submission—explaining architectural decisions, code logic, trade-offs, and potential improvements. Whether you wrote the specific code section manually or generated it with AI assistance, you must demonstrate complete ownership and understanding of the entire codebase. This is a production-level assessment: we expect you to discuss, debug, and defend your work as if it were going live tomorrow.

We’re looking for engineers who can think critically, adapt their approach, and truly understand the systems they build—not just those who can generate code.

Ready to apply? We look forward to hearing from you!

MedLaunch is an equal opportunity employer committed to diversity and inclusion.