Key Responsibilities:
* CI/CD Pipelines: Design, implement, and manage CI/CD pipelines for AI/ML products, facilitating seamless integration and delivery across development, testing, and production environments.
* Infrastructure as Code (IaC): Develop and maintain IaC using tools like Terraform, Ansible, or AWS CloudFormation to ensure scalable and consistent infrastructure management.
* Cloud Management: Manage cloud services (AWS, GCP, Azure) to deploy and maintain AI-based solutions, optimizing resources and cost efficiency.
* Model Deployment & Monitoring: Automate model deployment processes and set up monitoring for AI models in production to track performance, drift, and other key metrics.
* Containerization: Use Docker and orchestration tools like Kubernetes to create, deploy, and manage containers for various AI/ML workloads.
* Security & Compliance: Implement security best practices, including managing access controls, data encryption, and vulnerability scanning.
* Collaboration: Work closely with data scientists, ML engineers, and other cross-functional teams to translate requirements into scalable and reliable AI solutions.
* Troubleshooting & Optimization: Monitor system performance, identify issues, and optimize AI application infrastructure for speed, efficiency, and reliability.
Qualifications:
* Education: Bachelor’s degree in Computer Science, Engineering, or a related field. Relevant certifications in DevOps, AI/ML, or Cloud Services are a plus.
* Experience: 3-5 years of experience in DevOps or similar roles, with experience in AI/ML product deployment.
* Technical Skills:
* Proficiency in CI/CD tools (Jenkins, GitLab CI, CircleCI)
* Experience with cloud platforms (AWS, Azure, GCP)
* Strong knowledge of containerization (Docker, Kubernetes)
* Familiarity with IaC (Terraform, Ansible, CloudFormation)
* Proficiency in scripting languages (Python, Bash)
* AI/ML Knowledge: Understanding of AI/ML model lifecycle management, including deployment, monitoring, and retraining workflows.
* Problem-Solving: Ability to identify and resolve issues related to scalability, latency, and reliability in AI systems.
* Soft Skills: Strong communication, collaboration, and documentation skills.
Nice-to-Have:
* Experience with MLOps frameworks (Kubeflow, MLflow)
* Familiarity with data processing tools (Apache Spark, Kafka)
* Exposure to serverless architecture and microservices
* Understanding of model governance, bias detection, and AI ethics