Ireland Job Openings

Qualcomm

ML Support Engineer, IT Engineer Staff

Cork

October 22, 2024

Company:
QT Technologies Ireland Limited
Job Area:
Information Technology Group, Information Technology Group > IT Engineering
General Summary:
We are seeking a highly skilled Technical Support Engineer specializing in Machine Learning (ML) operations, Kubernetes, container technologies, and Run:AI. In this role, you will be responsible for providing technical and operational support for customers leveraging GPU computing platforms to optimize and manage AI/ML workloads, particularly in Kubernetes-based environments. The ideal candidate will have deep expertise in Kubernetes orchestration and GPU management, as well as a solid understanding of how these address AI/ML operations at scale.

Key Responsibilities

  • Kubernetes Orchestration & Resource Management: Serve as the subject matter expert for Kubernetes and container orchestration. Guide customers through the design and deployment of Kubernetes clusters tailored for AI/ML use cases, helping them effectively manage workloads through Run:AI. Ensure optimal resource allocation, including GPU sharing, node management, and job scheduling across clusters.

  • Cluster Monitoring & Optimization: Monitor and tune Kubernetes clusters to ensure they are optimized for AI/ML workloads. Provide support on managing Kubernetes autoscaling, resource quotas, and performance monitoring of distributed ML models running on Kubernetes clusters via the Run:AI platform.

  • GPU troubleshooting and incident response: Diagnose and resolve complex issues regarding dependencies between GPU drivers and software, Nvidia toolkit errors, or GPU component failure.

  • Run:AI Platform Support: Provide expert support for the Run:AI platform, assisting customers with the deployment, configuration, and management of Kubernetes clusters that handle AI/ML workloads. This includes setting up the platform, configuring resource pools (GPU, CPU), and optimizing Kubernetes namespaces to ensure proper orchestration of workloads.

  • Workload Optimization on Kubernetes: Assist customers in optimizing dynamic resource allocation for their AI/ML workloads by utilizing the Run:AI scheduler in conjunction with Kubernetes's native tools. Help manage job preemption, scheduling priorities, and horizontal scaling of workloads across clusters.

  • Kubernetes Troubleshooting & Incident Response: Diagnose and resolve complex issues related to Kubernetes cluster management, including pod failures, node connectivity issues, and namespace misconfigurations. Provide support in handling incidents such as job contention, GPU misallocation, and failed containerized workloads, ensuring smooth operation across the entire Kubernetes environment.

  • Integration Support: Help customers integrate Run:AI into their existing Kubernetes-based ML infrastructure. Ensure seamless operation of AI/ML pipelines, covering data flow, distributed training, and model deployment. Troubleshoot issues arising from the interaction between Run:AI, Kubernetes, and other ML tools (e.g., Tensor Flow, Py Torch, Kubeflow).

  • Security and Best Practices in Kubernetes: Advise customers on security best practices for Kubernetes clusters handling sensitive ML workloads, such as secure pod communications, role-based access control (RBAC), and resource isolation for multi-tenant clusters. Ensure Kubernetes and containerized environments are secure and compliant with organizational policies.

  • Collaboration with HQ: Work closely with the engineering and product teams in HQ, providing feedback on Kubernetes-related issues, cluster optimization features, and improvements to the Run:AI platform. Escalate complex issues and contribute to ongoing platform development.

  • Training & Documentation: Develop training materials and deliver technical workshops on using Run:AI in Kubernetes environments. Maintain up-to-date documentation on best practices for configuring and managing Kubernetes clusters for AI/ML workloads, focusing on high availability, performance, and security.

Minimum Qualifications:
  • 4+ years of IT-related work experience with a Bachelor's degree.
OR
7+ years of IT-related work experience without a Bachelor’s degree.

Physical Requirements:
  • Frequently transports and installs equipment up to 20 lbs.

Requirements

  • 3+ years of experience in technical support roles with strong expertise in Kubernetes administration, container orchestration, and AI/ML workload management.

  • 1+ year of general GPU administration, addressing issues with driver conflicts, hardware failures, and performance issues

  • In-depth knowledge of Kubernetes (CKA or CKAD certification highly preferred), including core components like Kubelet, Kube-API, Kube-scheduler, and etc.

  • Proficiency in Kubernetes resource management (e.g., CPU/GPU allocation, pods, services, and namespaces) and troubleshooting common Kubernetes issues in production environments.

  • Experience with configuration management tools (Puppet, Chef, Ansible) and Kubernetes management platforms like Rancher a plus

  • Experience with Run:AI platform or similar tools for ML workload optimization (e.g., Kubeflow, MLFlow, Slurm) in Kubernetes environments.

  • Hands-on experience with Docker and containerized environments for AI/ML operations, including distributed training, scaling, and deployment.

  • Strong understanding of ML frameworks (e.g., Tensor Flow, Py Torch) and how they interact with Kubernetes clusters for model training and deployment.

  • Excellent analytical, communication, and problem-solving skills.

  • Ability to manage priorities in a fast-paced environment and collaborate within a matrix organization.

  • References to a particular number of years experience are for indicative purposes only. Applications from candidates with equivalent experience will be considered, provided that the candidate can demonstrate an ability to fulfill the principal duties of the role and possesses the required competencies.

Applicants: Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail disability-accomodations@qualcomm.com or call Qualcomm's toll-free number found here. Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries).

Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.

To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.

If you would like more information about this role, please contact Qualcomm Careers.
New Job Alerts
Hilton Dublin Charlemont

Director of Sales & Marketing

FULL TIME

November 20, 2024

View Job Description
Rangam Infotech Pvt Ltd

DevOps Cloud Engineer

Limerick

FULL TIME

November 20, 2024

View Job Description
Hays Recruitment

Group Financial Controller

FULL TIME

November 20, 2024

View Job Description
Muiriosa Foundation

Residential Leader Permanent Part-Time Contract Laois Region

FULL TIME & PART TIME

November 20, 2024

View Job Description
Fitzgeralds Accountants Limited

Qualified Practice Accountant / Senior Auditor

Maynooth

FULL TIME & PART TIME

November 20, 2024

View Job Description
Orbis Workshop Limited

Solid Surface Fabricator / Bench Joiner

Bagenalstown

FULL TIME

November 20, 2024

View Job Description
Sculpted By Aimee

Retail Makeup Artist - Boots Liffey Valley - Christmas Seasonal Contract

Dublin

PART TIME

November 20, 2024

View Job Description
Looking for similar job?
Amazon Data Services Ireland Limited

Controls Engineer - Data Center Systems Support, EMEA Controls

Dublin

FULL TIME

October 7, 2024

View Job Description
OCS

Laboratory Support Operative

Limerick

PART TIME

October 7, 2024

View Job Description
Decathlon Ireland

Digital Customer Support Assistant (Dublin)

Dublin

FULL TIME

October 4, 2024

View Job Description
The Salvation Army

Night Assistant Support Worker (Homeless Families)

FULL TIME

October 4, 2024

View Job Description
DFDS UK & Ireland

Senior Admin Stocks / Customer Support Coordinator

Rathcoole

October 10, 2024

View Job Description
Jobexpo.ma

Care Worker/Support Worker

Dublin

FULL TIME

October 4, 2024

View Job Description
See What’s New: Qualcomm Job Opportunities
Qualcomm

Senior Systems Analyst - Cork, Ireland

Cork

November 19, 2024

View Job Description
Qualcomm

Systems Staff Engineer - Sensors - Cork, Ireland

Cork

November 19, 2024

View Job Description
Qualcomm

Sensors Design Verification Engineer, Senior - Cork, Ireland

Cork

November 18, 2024

View Job Description
Qualcomm

Analog / Mixed-Signal Sensor IP Design Engineer - Cork, Ireland

Cork

November 5, 2024

View Job Description
Qualcomm

GPU Systems Software Engineer - Cork, Ireland

Cork

November 1, 2024

View Job Description
View More Jobs by Qualcomm
New Job Alerts
Hilton Dublin Charlemont

Director of Sales & Marketing

FULL TIME

November 20, 2024

View Job Description
Rangam Infotech Pvt Ltd

DevOps Cloud Engineer

Limerick

FULL TIME

November 20, 2024

View Job Description
Hays Recruitment

Group Financial Controller

FULL TIME

November 20, 2024

View Job Description
Muiriosa Foundation

Residential Leader Permanent Part-Time Contract Laois Region

FULL TIME & PART TIME

November 20, 2024

View Job Description
Fitzgeralds Accountants Limited

Qualified Practice Accountant / Senior Auditor

Maynooth

FULL TIME & PART TIME

November 20, 2024

View Job Description
Orbis Workshop Limited

Solid Surface Fabricator / Bench Joiner

Bagenalstown

FULL TIME

November 20, 2024

View Job Description
Sculpted By Aimee

Retail Makeup Artist - Boots Liffey Valley - Christmas Seasonal Contract

Dublin

PART TIME

November 20, 2024

View Job Description