Interview will consist a coding test. This is a contract position.
Job Description
We are seeking a skilled Data Engineer to design, develop, and maintain scalable data pipelines and workflows. The ideal candidate will have strong expertise in Python, SQL, Snowflake, and Airflow, with experience in building ETL/ELT solutions and optimizing data infrastructure. This role involves collaborating with data analysts, scientists, and business stakeholders to ensure data availability, reliability, and efficiency.
Roles & Responsibilities
* Design, build, and maintain scalable ETL/ELT pipelines to process large volumes of structured and unstructured data.
* Develop and optimize SQL queries within Snowflake for efficient data storage and retrieval.
* Implement workflow orchestration using Apache Airflow to automate data processing tasks.
* Write efficient, reusable, and scalable Python scripts for data extraction, transformation, and loading (ETL).
* Monitor and troubleshoot data pipelines to ensure high availability and performance.
* Collaborate with data teams to define best practices for data modeling and maintain a structured data warehouse.
* Work with cloud platforms (AWS, GCP, or Azure) to integrate data sources and manage cloud-based data infrastructure.
* Ensure data security, governance, and compliance with industry best practices.
Required Skills & Qualifications
* Strong programming skills in Python.
* Expertise in SQL for querying, transformation, and performance tuning.
* Hands-on experience with Snowflake (schema design, performance optimization, Snowpipe, Streams, and Tasks).
* Experience with Apache Airflow for scheduling and orchestrating data pipelines.
* Knowledge of ETL/ELT processes and best practices in data engineering.
* Experience with cloud platforms (AWS, GCP, or Azure) and their data services.
* Familiarity with data modeling (Star Schema, Snowflake Schema) and data warehouse concepts.
* Experience with Git and CI/CD pipelines.
Preferred Skills
* Experience with big data processing frameworks (Spark, Databricks).
* Knowledge of Kafka, Kinesis, or other real-time data streaming tools.
* Familiarity with containerization (Docker, Kubernetes) for deploying data pipelines.
* Understanding of Data Governance, Data Quality, and Data Security principles.