Job Title: Big Data Engineer
Location: Remote
Employment Type: [Full-Time/Contract]
Department: Data Engineering / Analytics
About the Role:
We are looking for a highly skilled and experienced Big Data Engineer to join our growing data team. As a Big Data Engineer, you will be responsible for designing, developing, and optimizing scalable data pipelines and architectures that enable data-driven decision-making across the organization. You'll work closely with data scientists, analysts, and software engineers to ensure reliable, efficient, and secure data infrastructure.
Key Responsibilities:
* Design, develop, and maintain robust and scalable data pipelines for batch and real-time processing.
* Build and optimize data architectures to support advanced analytics and machine learning workloads.
* Ingest data from various structured and unstructured sources using tools like Apache Kafka, Apache NiFi, or custom connectors.
* Develop ETL/ELT processes using tools such as Spark, Hive, Flink, Airflow, or DBT.
* Work with big data technologies such as Hadoop, Spark, HDFS, Hive, Presto, etc.
* Implement data quality checks, validation processes, and monitoring systems.
* Collaborate with data scientists and analysts to ensure data is accessible, accurate, and clean.
* Manage and optimize data storage solutions including cloud-based data lakes (AWS S3, Azure Data Lake, Google Cloud Storage).
* Implement and ensure compliance with data governance, privacy, and security best practices.
* Evaluate and integrate new data tools and technologies to enhance platform capabilities.
Required Skills and Qualifications:
* Bachelor's or Master’s degree in Computer Science, Engineering, Information Systems, or related field.
* 3+ years of experience in data engineering or software engineering roles with a focus on big data.
* Strong programming skills in Python, Scala, or Java.
* Proficiency with big data processing frameworks such as Apache Spark, Hadoop, or Flink.
* Experience with SQL and NoSQL databases (e.g., PostgreSQL, Cassandra, MongoDB, HBase).
* Hands-on experience with data pipeline orchestration tools like Apache Airflow, Luigi, or similar.
* Familiarity with cloud data services (AWS, GCP, or Azure), particularly services like EMR, Databricks, BigQuery, Glue, etc.
* Solid understanding of data modeling, data warehousing, and performance optimization.
* Experience with CI/CD for data pipelines and infrastructure-as-code tools like Terraform or CloudFormation is a plus.
Preferred Qualifications:
* Experience working in agile development environments.
* Familiarity with containerization tools like Docker and orchestration platforms like Kubernetes.
* Knowledge of data privacy and regulatory compliance standards (e.g., GDPR, HIPAA).
* Experience with real-time data processing and streaming technologies (e.g., Kafka Streams, Spark Streaming).
Why Join Us:
* Work with a modern data stack and cutting-edge technologies.
* Be part of a data-driven culture in a fast-paced, innovative environment.
* Collaborate with talented professionals from diverse backgrounds.