Job offer

Site Reliability Engineer

As a Site Reliability Engineer, you are responsible for the reliability, stability, and performance of the infrastructure and play an important role in shaping the future of our platform. You will work on innovative projects and have the opportunity to learn from experienced leaders.

Abroad

Man Investments AG

100%

Job description: Site Reliability Engineer

Tasks

Ensuring the reliability and performance of critical systems on global infrastructure through proactive monitoring and rapid incident response.
Design and implementation of observability solutions using tools such as Prometheus, Datadog, ELK, and Loki for meaningful and rapid incident response.
Development and maintenance of SLIs/SLOs to drive reliability improvements and inform engineering priorities.
Automating operational tasks and building self-service capabilities to eliminate toil and improve efficiency.
Participation in post-mortem analyses, blameless post-mortems, and implementation of preventive measures to avoid recurring problems.
Collaborate with development teams to improve system design, deployment practices, and operational excellence.
Configuration and rollout of large-scale infrastructures and high-performance distributed systems.
Contribute to capacity planning and performance budgeting to ensure that systems meet business requirements.
Management of multiple ELK clusters hosting hundreds of terabytes of log, telemetry, and APM data.

Requirements

Strong understanding of SRE principles, including SLIs, SLOs, error budgets, and reliability testing practices.
Strong background in software development and operations, with knowledge of Python, Java, or similar programming languages (Java/Scala, Terraform, and scripting/programming languages (Python, PHP, Perl/Csh)).
Strong troubleshooting and debugging skills in distributed systems, with the ability to diagnose complex production issues under pressure.
Expertise in incident management, on-call rotations, and post-incident reviews.
Familiarity with Kubernetes and container orchestration.
Proactive mindset and ability to take responsibility for reliability initiatives.
Experience with SRE/DevOps tools and practices (e.g., PagerDuty, OpsGenie, ELK, Log, or similar).
Administration of Linux and Windows systems and experience with cloud technologies (AWS/Azure).
Understanding of network concepts, load balancing, and distributed architectures.
Knowledge of ALM/CMMI principles; desire to understand the actual costs of decisions.
Proven track record in communication and collaboration skills.

We offer

Modern office space on the OFCOM campus with easy access to transportation and amenities.
Hybrid working model.
Competitive salary and benefits package.
25 days of vacation pay.
Premium health insurance.
Company-specific pension program.
Mental bonus.
Additional days off for long service and volunteer work.
Employee card.
Opportunities for professional development, including internal tech talks.
Trust, affinity, and engagement with the Man Group community.

Original job description

Job details

Found on:

December 30, 2025

Employer:

Man Investments AG

job percentage:

100%

Place of work:

Abroad

Place found on:

https://job-boards.eu.greenhouse.io/mangroup/jobs/4714467101

About the employer:

Man Investments AG is a global active investment manager.

The company offers alternative and traditional investment solutions.

The Swiss headquarters are located in Pfäffikon (SZ).

Man Group is listed on the London Stock Exchange and is part of the FTSE 250 Index.

Training opportunities powered by skillaware.