Technical Skills Must have:

Must‑Have

Observability & Reliability Engineering

· Strong hands‑on experience across core observability pillars including metrics, traces, service health and distributed systems visibility

· Practical experience implementing OpenTelemetry across application, platform and infrastructure layers

· Ability to design, deploy and operate end‑to‑end observability pipelines (collector‑to‑backend, agent management, data flows, routing and filtering)

· Strong understanding of SLI/SLO frameworks, error budgets and reliability‑focused operating models

· Experience defining alerting strategy, tuning thresholds and reducing operational noise through effective signal engineering

Observability Platforms & Tooling

· Hands‑on expertise in one or more enterprise‑grade observability platforms (Dynatrace, Splunk Observability, Datadog or equivalent)

· Proficiency with Prometheus ecosystem components including Alertmanager

· Experience designing clear, insightful dashboards and visualisations using Grafana

· Strong troubleshooting capability using metrics, traces and dependency insights to diagnose performance and availability issues

Cloud & Platform Monitoring

· Strong technical experience with at least one major public cloud (AWS, Azure or GCP)

· Monitoring fundamentals across cloud‑native services including compute, storage, networking, load balancers and managed services

· Solid understanding of cloud networking constructs (VPC/VNet, subnets, routing, NAT, firewalls and security groups)

Containers & Kubernetes

· Working knowledge of Kubernetes objects (pods, services, deployments) and operational lifecycle

· Experience monitoring containerised/app‑modernisation workloads

· Basic experience with Helm or Kustomize for packaging, configuration and deployment

· Ability to troubleshoot application behaviour and platform-level issues within container environments

Programming & Automation

· Proficiency in one or more languages (Python, Go, Java) to support automation and tooling

· Experience writing automation scripts and utilities supporting observability and SRE practices

· Awareness of integrating observability checks within CI/CD pipelines

· Comfort with shell scripting for diagnostics and operational tasks

Data & Analytics

· Strong understanding of time‑series data and telemetry characteristics

· Hands‑on experience with PromQL, SignalFlow, Metrics Explorer or equivalent query languages

· Ability to analyse latency percentiles (p95/p99), error rates and throughput metrics

· Working knowledge of SQL for querying telemetry backends or data stores

Alert me to jobs like this

Telemetry SRE Engineer Full Time

KBC Technologies Group

Job Overview

Log In

Sign Up

Telemetry SRE Engineer Full Time

KBC Technologies Group

Apply For This Job

Related Jobs

Online Navigator (FTC – Maternity cover) Full Time

Renewals Manager EMEA (£130,000-£135,000 OTE + Equity) at Airtable Full Time

Data Engineering Lead (£115k + Equity) at 9fin.com Full Time

Handler Full Time

Property Maintenance Co-ordinator Full Time

Investment Banking Vice President, M&A Full Time

Job Overview

Apply For This Job