Senior Data Engineer
Health Catalyst
Join one of the nation’s leading and most impactful health care performance improvement companies. Over the years, Health Catalyst has achieved and documented clinical, operational, and financial improvements for many of the nation’s leading healthcare organizations. We are also increasingly serving international markets. Our mission is to be the catalyst for massive, measurable, data-informed healthcare improvement through:
Data: integrate data in a flexible, open & scalable platform to power healthcare’s digital transformation
Analytics: deliver analytic applications & services that generate insight on how to measurably improve
Expertise: provide clinical, financial & operational experts who enable & accelerate improvement
Engagement: attract, develop and retain world-class team members by being a best place to work
Role: Senior Data Engineer
Location: Hyderabad, IN
The healthcare industry is the next great frontier of opportunity for software development, and Health Catalyst is one of the most dynamic and influential companies in this space. We are working on solving national-level healthcare problems, and this is your chance to improve the lives of millions of people, including your family and friends. Health Catalyst is a fast-growing company that values smart, hardworking, and humble team members. Each product team is a small, mission-critical team focused on developing innovative tools to support Catalyst’s mission to improve healthcare performance, cost, and quality.
Health Catalyst is expanding and maintains a large suite of Improvement Apps that contribute to healthcare analytics and process improvement solutions. This includes products that manage the care of health system populations, better serve patients at the point of care, reduce health system costs, and reduce clinician workload.
Job Summary:
As a senior Data Engineer, you will be working with diverse Improvement Apps, software engineering team designing, developing, and maintaining various platforms that serve internal HCAT team members, clinicians, and patients. You will rely on Test-Driven Development to safely enhance and refactor our system, shipping production code multiple times per week. And you will go to bed each night with the comfort that your code is improving outcomes for patients.
If you love…
Help drive clarity and prototype individual features or problems
Knowledge of architecture patterns and the ability to design and complete features / tasks that are 50-60% well defined.
Can discern where gaps can be filled in without consulting a Product Manager or another programmer and can judge when a consultation is needed.
Work is reviewed with the occasional need for material direction or implementation changes
Seeks and provides guidance via PR reviews, pair-programming and other interactions with Engineers and Product Managers
It is second nature to develop high code quality standards balanced with the needs of real-world customer timelines.
Possesses a passion and drive to deliver exceptional products and follows established patterns and approaches within existing code bases with ease.
Takes ownership of learning and growth
Capitalizes on internal and external opportunities for learning.
Identifies gaps in knowledge/skills and seeks ways to close those gaps (self-guided learning, pairing, seeking guidance for yourself and developing guidance for less experienced members of the team)
Periodic On Call Rotation
Ability to communicate with Customer Success about customer issues that are escalated to Engineering and help quantify customer impact.
Can Respond quickly to operational emergencies, find short term resolutions and plan long term fixes to avoid similar issues in the future.
What you own in the role:
Design and implement scalable PySpark-based transformation pipelines on Databricks to process and analyze large volumes of patient engagement data — applying advanced statistical aggregations, window functions, and time-series analysis to deliver reliable, ongoing analytical reporting to clinical and product stakeholders.
Lead the architecture and implementation of enterprise data platforms on Databricks, leveraging Delta Lake, Unity Catalog, and Delta Live Tables to build robust data collection systems, analytical data models, and ML feature pipelines that underpin Health Catalyst's AI/ML services.
Build and maintain high-performance dbt projects for modular, version-controlled data transformation — including layered data modeling (staging, intermediate, marts), dbt tests, documentation, and orchestration integration with Databricks Workflows or Apache Airflow.
Develop and optimize complex PySpark and Databricks SQL pipelines to ingest data from primary and secondary sources — including relational databases, HL7/FHIR feeds, and flat files — transforming raw data into analytics-ready datasets integrated into Health Catalyst data products.
Implement comprehensive data quality frameworks using dbt tests, Great Expectations, and Databricks Delta Live Tables expectations — proactively identifying, flagging, and remediating data integrity issues, schema drift, and pipeline failures across the platform.
Collaborate with data science, ML engineering, and product leadership to translate business and analytical priorities into scalable data infrastructure — contributing to platform roadmap planning and driving alignment between data engineering investments and organizational goals.
Continuously identify and execute data pipeline optimization opportunities — including query performance tuning, Spark cluster right-sizing, partition optimization, Z-ordering on Delta tables, and refactoring legacy SQL workflows into maintainable, reusable dbt models and PySpark modules.
What you bring to this role:
Bachelor's degree or equivalent practical experience preferred.
Strong working knowledge of SQL
Technical expertise regarding data models, database design development, data mining and segmentation techniques
Strong knowledge of and experience with reporting software such as Power BI, BusinessObjects, Looker, Tableau, etc. (Looker experience preferred)
Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy, in a timely manner
Adept at constructing efficient queries, writing reports and presenting findings
Ability to manage multiple and simultaneous responsibilities and to prioritize scheduling of work
Strong verbal and written communication skills
Knowledge of statistics and experience using statistical packages for analyzing datasets (Python, R, SPSS, SAS etc.)
An understanding of healthcare data is a plus, but not a requirement
You may also bring:
Experience with cloud infrastructure and architecture patterns, either Azure or AWS preferred.
Software development experience within healthcare IT and understands key data models (clinical, claims, financial, etc.) and interoperability standards such as HL7v2, CDA, EMR, and FHIR
Knowledge of healthcare compliance and how it applies to Application Security
Agile/Scrum software development practices
Business Intelligence or Data warehousing experience
.
Equal Employment Opportunity has been, and will continue to be, a fundamental principle at Health Catalyst, where employment is based upon personal capabilities and qualification without discrimination or harassment on the basis of race, color, national origin, religion, sex, sexual orientation, gender identity, age, disability, citizenship status, marital status, creed, genetic predisposition or carrier status, sexual orientation or any other characteristic protected by law.. Health Catalyst is committed to a work environment where all team members are treated with respect and dignity..
The above statements describe the general nature and level of work being performed in this job function. They are not intended to be an exhaustive list of all duties, and indeed additional responsibilities may be assigned by Health Catalyst.
Studies show that candidates from underrepresented groups are less likely to apply for roles if they don’t have 100% of the qualifications shown in the job posting. While each of our roles have core requirements, please thoughtfully consider your skills and experience and decide if you are interested in the position. If you feel you may be a good fit for the role, even if you don’t meet all of the qualifications, we hope you will apply. If you feel you are lacking the core requirements for this position, we encourage you to continue exploring our careers page for other roles for which you may be a better fit.
At Health Catalyst, we appreciate the opportunity to benefit from the diverse backgrounds and experiences of others. Because of our deep commitment to respect every individual, Health Catalyst is an equal opportunity employer.