South Africa Job Openings
Lesaka Technologies
Site Reliability Engineer
Cape Town
FULL TIME
November 19, 2024
Kazang – Micro Merchant Division
Senior Site Reliability Engineer
A vacancy exists for a Senior SRE within the Kazang - Micro Merchant Division, in Cape Town, South Africa (Hybrid).
We are seeking a Site Reliability Engineer (SRE) with expertise in Linux-based, open-source environments to ensure the reliability, scalability, and performance of our systems. In this role, you will design and implement automated solutions for monitoring and system optimisation while managing and maintaining critical infrastructure. You will work closely with the Dev Ops team to support deployments and CI/CD pipelines, leveraging open-source tools to address operational challenges and enhance system resilience.
Key Responsibilities include, but are not limited to:
Years of Experience:
Senior Site Reliability Engineer
A vacancy exists for a Senior SRE within the Kazang - Micro Merchant Division, in Cape Town, South Africa (Hybrid).
We are seeking a Site Reliability Engineer (SRE) with expertise in Linux-based, open-source environments to ensure the reliability, scalability, and performance of our systems. In this role, you will design and implement automated solutions for monitoring and system optimisation while managing and maintaining critical infrastructure. You will work closely with the Dev Ops team to support deployments and CI/CD pipelines, leveraging open-source tools to address operational challenges and enhance system resilience.
Key Responsibilities include, but are not limited to:
- Design, implement, and maintain reliable systems in a Linux and open-source environment to meet uptime and performance objectives.
- Support the Dev Ops team with CI/CD pipelines, ensuring seamless and reliable deployments.
- Manage and optimize AWS-based infrastructure for scalability, cost efficiency, and performance.
- Develop and maintain monitoring and alerting systems to ensure observability and proactively address system issues.
- Build and maintain robust solutions for metric collection, dashboarding, and alerting to provide actionable insights and real-time system visibility.
- Conduct root cause analysis for incidents, implementing preventive measures to improve system resilience.
- Perform regular system maintenance, including updates, patches, and optimizations.
- Prepare and deliver comprehensive reporting on system performance, incidents, and reliability metrics.
- Identify and mitigate risks to system reliability, scalability, and security.
- Ensure compliance with organizational and regulatory standards in system design and operations.
- Participate in a rotational on-call schedule to ensure the reliability and availability of critical systems.
Years of Experience:
- A minimum of 5 years of professional experience in Site Reliability Engineering, Dev Ops, or a related field, with demonstrated expertise in Linux-based, open-source environments, and cloud infrastructure (AWS).
- A Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field is required.
- Equivalent practical experience in lieu of a formal degree will be considered for highly qualified candidates.
-
Fault Finding and Debugging
Expertise in diagnosing and resolving complex system issues, including performance bottlenecks, service outages, and application errors, using debugging tools, logs, and monitoring data. -
Scripting and Programming
Proficiency in at least one programming or scripting language (e.g., Python, Bash, Go), with the ability to write automation scripts, develop tools, and optimize system performance. -
Cloud Infrastructure Management (AWS)
Hands-on experience with AWS services (e.g., EC2, S3, RDS, VPC), with the ability to design, manage, and optimize cloud-based infrastructure for scalability, reliability, and cost-efficiency. -
Monitoring and Observability
Skilled in implementing monitoring solutions (e.g., Prometheus, Grafana, ELK stack) and designing systems for metrics collection, dashboarding, and alerting to ensure system health and performance. -
Automation and Infrastructure as Code (Ia C)
Proficiency with tools like Ansible, Terraform, or similar frameworks to automate system management, deployments, and configurations, reducing manual effort and ensuring consistency.
-
Problem-Solving and Critical Thinking
Demonstrates a proactive and analytical approach to identifying issues, diagnosing root causes, and implementing effective solutions in complex technical environments. -
Collaboration and Teamwork
Works effectively with cross-functional teams, including Dev Ops, development, and operations, fostering a culture of shared ownership and open communication to achieve reliability goals. -
Adaptability and Continuous Learning
Embraces change, learns new technologies quickly, and adjusts strategies to meet evolving system and organizational needs, particularly in fast-paced, dynamic environments.
New Job Alerts
DHL Express
Finance Audit and Compliance Accountant
Johannesburg
FULL TIME
November 19, 2024
View Job DescriptionLooking for similar job?
Broll Property Group
Site Lead: Facilities Manager
Cape Town
FULL TIME
August 28, 2024
View Job DescriptionPeople Partners BPO Inc.
Sales Coordinator - Onsite, Night Shift
Umhlanga
FULL TIME
August 29, 2024
View Job DescriptionGameChange Solar
Site Engineer - South Africa
Johannesburg
FULL TIME
September 4, 2024
View Job DescriptionSee What’s New: Lesaka Technologies Job Opportunities
Lesaka Technologies
System/Application Support Engineer
Cape Town
FULL TIME
November 11, 2024
View Job DescriptionLesaka Technologies
Business Product Owner (BPO)
Cape Town
FULL TIME
October 16, 2024
View Job DescriptionLesaka Technologies
Growth Manager - Vaults
Cape Town
FULL TIME
October 15, 2024
View Job DescriptionLesaka Technologies
EasyPay Everywhere Tech Skills Bursary
Johannesburg
October 2, 2024
View Job DescriptionNew Job Alerts
DHL Express
Finance Audit and Compliance Accountant
Johannesburg
FULL TIME
November 19, 2024
View Job Description