TQUKE0858_5346 - Lead for Enterprise tooling & observability for migration

Job Type: Contract

Work Mode: Hybrid (3 Days from office)

We are looking for an experienced Enterprise Tooling & Observability Lead to drive the strategy, design, implementation, and modernization of enterprise monitoring, logging, APM, and operational tooling during and after large-scale on-prem to AWS cloud migrations. The ideal candidate brings deep expertise across observability platforms, infrastructure/application monitoring, cloud-native operations, and integration of enterprise tools into cloud architectures. This role ensures a seamless migration of tooling capabilities, enhanced visibility, and improved reliability in the AWS operating model.

---

Key Responsibilities

1. Tooling & Observability Strategy Across Migration Lifecycle

· Define the end-to-end tooling & observability architecture that supports pre-migration, migration, and post-migration operations.

· Assess current on-prem tooling (monitoring, logging, APM, ITSM, event management) and define cloud-aligned target tooling architecture for AWS.

· Build a unified observability roadmap covering metrics, logs, traces, dashboards, SLO/SLA monitoring, and event correlation.

2. Migration-Aware Observability Design

· Identify tooling gaps that may arise during the migration of applications, networks, storage, and infrastructure.

· Ensure instrumentation readiness for applications moving via lift & shift, replatforming, containerization, and modernization.

· Define observability patterns for hybrid connectivity, multi-account AWS environments, and multi-region workloads.

3. AWS Cloud-Native Observability Integration

· Design and implement observability using AWS-native capabilities such as:

o CloudWatch, CloudTrail, X-Ray, VPC Flow Logs, GuardDuty, Security Hub

o Integration with AWS Control Tower/Organizations for enterprise-wide visibility

· Ensure seamless integration with third-party enterprise tools such as:

o Datadog, Dynatrace, AppDynamics

o Splunk/ELK

o Prometheus/Grafana

o ServiceNow, Jira, PagerDuty

· Drive modernization of legacy monitoring solutions to cloud-native ecosystems.

4. Tooling Consolidation & Optimization

· Evaluate existing tooling footprint and identify opportunities for consolidation, cost reduction, and simplification.

· Standardize tooling patterns and create reusable templates/playbooks for AWS workloads.

· Drive automation for alerting, dashboards, health checks, and operational insights.

5. Reliability, Performance & SRE Alignment

· Collaborate with platform and SRE teams to enhance observability maturity (SLIs, SLOs, error budgets).

· Build proactive monitoring capabilities to reduce incidents, improve MTTR, and support predictive operations.

· Ensure the observability platform aligns with enterprise DR, HA, and performance engineering strategies.

6. Governance, Security & Compliance

· Ensure observability tooling adheres to enterprise security, compliance, data governance, and access control policies.

· Define audit-ready logging strategies and ensure end-to-end traceability across hybrid and cloud environments.

· Build governance models for event noise reduction, alert hygiene, and service mapping accuracy.

7. Leadership & Stakeholder Management

· Lead cross-functional teams across application, infrastructure, cloud, DevOps, and security functions.

· Serve as the primary SME for observability decisions, guiding teams through architectural design and implementation.

· Present observability strategy, migration readiness, platform health, and maturity improvements to senior leadership.

· Mentor engineers and drive capability uplift across the organization.

---

Required Skills & Experience

Technical Expertise

· 14+ years of experience in enterprise monitoring, logging, APM, and observability tooling.

· Strong understanding of AWS architecture, cloud-native monitoring tools, and hybrid observability.

· Experience with:

o APM platforms: Dynatrace, AppDynamics, Datadog

o Logging platforms: Splunk, ELK/Opensearch, CloudWatch Logs

o Metrics & telemetry: Prometheus, Grafana, OpenTelemetry

o Event management: ServiceNow, PagerDuty, Moogsoft, BigPanda

· Strong knowledge of instrumentation for distributed systems, microservices, containers (EKS, ECS), serverless workloads, and legacy systems.



Migration & Architecture Skills

· Proven experience supporting large-scale on-prem to AWS migrations.

· Deep understanding of migration patterns and observability dependencies.

· Hands-on experience designing observability for multi-account AWS landing zones and multi-region architectures.

Soft Skills & Leadership

· Excellent communication, architectural documentation, and executive presentation skills.

· Ability to influence stakeholders across engineering, cloud, SRE, operations, and leadership.

· Experience leading cross-functional teams and managing vendor/tooling relationships.

---

Preferred Qualifications

· AWS Certified Solutions Architect / Cloud Practitioner / DevOps Engineer

· Certifications in observability platforms (Datadog, Dynatrace, Splunk, etc.)

· Knowledge of ITIL, SRE principles, and enterprise operational frameworks

· Experience with automation using Python, Terraform, CloudFormation (nice-to-have)

---

Success Indicators

· Smooth transition of observability and tooling through all migration waves.

· Enhanced end-to-end visibility across applications, networks, and infrastructure post-migration.

· Reduction in incidents, MTTR, and monitoring gaps after migration to AWS.

· Standardized tooling practices aligned with enterprise governance and cloud architecture.

· Strong stakeholder confidence and measurable uplift in observability maturity

Want To
WORK FOR YOU?

GET THE QUOTE

Want To
WORK WITH US?

CAREER