Job offer

Site Reliability Engineer

As a Site Reliability Engineer at Man Group, you will be responsible for the reliability, stability, and performance of the technology that supports the company's multi-asset platform. You will work on developing and implementing solutions for monitoring and optimizing systems to ensure high availability and performance.

Abroad

Man Investments AG

100%

Job description: Site Reliability Engineer

Tasks

Ensuring the reliability and performance of critical systems across the global infrastructure through proactive monitoring and rapid incident response.
Design and implementation of observability solutions using tools such as Prometheus, Datadog, and ELK to provide insights and enable data-driven decisions.
Develop and maintain SLAs, SLOs, SLIs, and error budgets to guide reliability improvements and inform engineering priorities with data.
Automating operational tasks and building self-service capabilities to eliminate waste and improve efficiency.
Participation in incident response efforts, blameless post-mortems, and implementation of preventive measures to reduce outages.
Collaborate with development teams to improve system design, deployment practices, and operational excellence.
Configuration of CI/CD tools, management of auto-scaling, large GPU/CPU deployments, and high-performance distributed systems.
Contribute to capacity planning and performance budgeting to ensure that systems meet business requirements.
Management of multiple ELK clusters hosting hundreds of terabytes of log, telemetry, and APM data.

Requirements

Strong understanding of SRE principles, including SLAs, SLOs, error budgets, and reliability testing practices.
Familiarity with automation tools (Ansible, Terraform) and scripting/programming languages (Python, Go, or similar).
Strong troubleshooting and debugging skills across distributed systems, with the ability to diagnose complex production issues under pressure.
Experience with infrastructure management, e.g., on-call rotations, post-incident reviews.
Familiarity with Kubernetes and container orchestration.
A preventive mindset and the ability to take responsibility for reliability initiatives.

Advantages

Experience with AIOps/CICD pipelines and tools such as Jenkins, TeamCity.
Administration of Linux and Windows systems and exposure to cloud technologies (AWS/Azure).
Understanding of network concepts, load balancing, and distributed architectures.
Knowledge of ALM (Application Lifecycle Management), tooling for DevOps teams, DevOps teams.
Familiarity with ITIL v4 principles; desire to understand the actual benefits of our decisions.
Supported in India, motivated to succeed in remote communication and collaboration roles.

Benefits

Modern office space located on the MOEIOff campus with easy access to transportation and amenities.
Hybrid working model.
Competitive compensation package.
2.5 days of vacation pay.
Premium health insurance.
Corporate augmented reality program.
Referral bonus.
Mobilization for long-term service and volunteer work.
Multifunction card.
Opportunities for professional development, including internal tech talks.
Confidential support and engagement with Man Group's Employee Resource Groups.

Original job description

Job details

Found on:

December 23, 2025

Employer:

Man Investments AG

job percentage:

100%

Place of work:

Abroad

Place found on:

https://job-boards.eu.greenhouse.io/mangroup/jobs/4714467101

About the employer:

Man Investments AG is a global active investment manager.

The company offers alternative and traditional investment solutions.

The Swiss headquarters are located in Pfäffikon (SZ).

Man Group is listed on the London Stock Exchange and is part of the FTSE 250 Index.

Training opportunities powered by skillaware.