Technical Skills Must have:
Must‑Have
Observability & Reliability Engineering
· Strong hands‑on experience across core observability pillars including metrics, traces, service health and distributed systems visibility
· Practical experience implementing OpenTelemetry across application, platform and infrastructure layers
· Ability to design, deploy and operate end‑to‑end observability pipelines (collector‑to‑backend, agent management, data flows, routing and filtering)
· Strong understanding of SLI/SLO frameworks, error budgets and reliability‑focused operating models
· Experience defining alerting strategy, tuning thresholds and reducing operational noise through effective signal engineering
Observability Platforms & Tooling
· Hands‑on expertise in one or more enterprise‑grade observability platforms (Dynatrace, Splunk Observability, Datadog or equivalent)
· Proficiency with Prometheus ecosystem components including Alertmanager
· Experience designing clear, insightful dashboards and visualisations using Grafana
· Strong troubleshooting capability using metrics, traces and dependency insights to diagnose performance and availability issues
Cloud & Platform Monitoring
· Strong technical experience with at least one major public cloud (AWS, Azure or GCP)
· Monitoring fundamentals across cloud‑native services including compute, storage, networking, load balancers and managed services
· Solid understanding of cloud networking constructs (VPC/VNet, subnets, routing, NAT, firewalls and security groups)
Containers & Kubernetes
· Working knowledge of Kubernetes objects (pods, services, deployments) and operational lifecycle
· Experience monitoring containerised/app‑modernisation workloads
· Basic experience with Helm or Kustomize for packaging, configuration and deployment
· Ability to troubleshoot application behaviour and platform-level issues within container environments
Programming & Automation
· Proficiency in one or more languages (Python, Go, Java) to support automation and tooling
· Experience writing automation scripts and utilities supporting observability and SRE practices
· Awareness of integrating observability checks within CI/CD pipelines
· Comfort with shell scripting for diagnostics and operational tasks
Data & Analytics
· Strong understanding of time‑series data and telemetry characteristics
· Hands‑on experience with PromQL, SignalFlow, Metrics Explorer or equivalent query languages
· Ability to analyse latency percentiles (p95/p99), error rates and throughput metrics
· Working knowledge of SQL for querying telemetry backends or data stores
At Wessex Garages, we’ve been putting customers first and delivering excellence in automotive retail for over 35 years. With dealerships...
Apply For This JobThis is a job that Jill, our AI Recruiter, is recruiting for on behalf of one of our customers. She...
Apply For This JobThis is a job that Jill, our AI Recruiter, is recruiting for on behalf of one of our customers. She...
Apply For This JobWe have an exciting opportunity for you to join our Operations team as a package handler.Who We AreFedEx is the...
Apply For This JobEildon Housing Association Selkirk Property Maintenance Co-ordinator Selkirk £44,583 to £49,536 per annum About The Role Starting salary dependent on...
Apply For This JobThe Banker is a senior level position responsible for assisting clients in raising funds in the capital markets, as well...
Apply For This Job