Job Type: Contract
Work Mode: Hybrid (3 Days from office)
We are looking for an experienced Enterprise Tooling & Observability Lead to drive the strategy, design, implementation, and modernization of enterprise monitoring, logging, APM, and operational tooling during and after large-scale on-prem to AWS cloud migrations. The ideal candidate brings deep expertise across observability platforms, infrastructure/application monitoring, cloud-native operations, and integration of enterprise tools into cloud architectures. This role ensures a seamless migration of tooling capabilities, enhanced visibility, and improved reliability in the AWS operating model.
---
Key Responsibilities
1. Tooling & Observability Strategy Across Migration Lifecycle
· Define the end-to-end tooling & observability architecture that supports pre-migration, migration, and post-migration operations.
· Assess current on-prem tooling (monitoring, logging, APM, ITSM, event management) and define cloud-aligned target tooling architecture for AWS.
· Build a unified observability roadmap covering metrics, logs, traces, dashboards, SLO/SLA monitoring, and event correlation.
2. Migration-Aware Observability Design
· Identify tooling gaps that may arise during the migration of applications, networks, storage, and infrastructure.
· Ensure instrumentation readiness for applications moving via lift & shift, replatforming, containerization, and modernization.
· Define observability patterns for hybrid connectivity, multi-account AWS environments, and multi-region workloads.
3. AWS Cloud-Native Observability Integration
· Design and implement observability using AWS-native capabilities such as:
o CloudWatch, CloudTrail, X-Ray, VPC Flow Logs, GuardDuty, Security Hub
o Integration with AWS Control Tower/Organizations for enterprise-wide visibility
· Ensure seamless integration with third-party enterprise tools such as:
o Datadog, Dynatrace, AppDynamics
o Splunk/ELK
o Prometheus/Grafana
o ServiceNow, Jira, PagerDuty
· Drive modernization of legacy monitoring solutions to cloud-native ecosystems.
4. Tooling Consolidation & Optimization
· Evaluate existing tooling footprint and identify opportunities for consolidation, cost reduction, and simplification.
· Standardize tooling patterns and create reusable templates/playbooks for AWS workloads.
· Drive automation for alerting, dashboards, health checks, and operational insights.
5. Reliability, Performance & SRE Alignment
· Collaborate with platform and SRE teams to enhance observability maturity (SLIs, SLOs, error budgets).
· Build proactive monitoring capabilities to reduce incidents, improve MTTR, and support predictive operations.
· Ensure the observability platform aligns with enterprise DR, HA, and performance engineering strategies.
6. Governance, Security & Compliance
· Ensure observability tooling adheres to enterprise security, compliance, data governance, and access control policies.
· Define audit-ready logging strategies and ensure end-to-end traceability across hybrid and cloud environments.
· Build governance models for event noise reduction, alert hygiene, and service mapping accuracy.
7. Leadership & Stakeholder Management
· Lead cross-functional teams across application, infrastructure, cloud, DevOps, and security functions.
· Serve as the primary SME for observability decisions, guiding teams through architectural design and implementation.
· Present observability strategy, migration readiness, platform health, and maturity improvements to senior leadership.
· Mentor engineers and drive capability uplift across the organization.
---
Required Skills & Experience
Technical Expertise
· 14+ years of experience in enterprise monitoring, logging, APM, and observability tooling.
· Strong understanding of AWS architecture, cloud-native monitoring tools, and hybrid observability.
· Experience with:
o APM platforms: Dynatrace, AppDynamics, Datadog
o Logging platforms: Splunk, ELK/Opensearch, CloudWatch Logs
o Metrics & telemetry: Prometheus, Grafana, OpenTelemetry
o Event management: ServiceNow, PagerDuty, Moogsoft, BigPanda
· Strong knowledge of instrumentation for distributed systems, microservices, containers (EKS, ECS), serverless workloads, and legacy systems.
Migration & Architecture Skills
· Proven experience supporting large-scale on-prem to AWS migrations.
· Deep understanding of migration patterns and observability dependencies.
· Hands-on experience designing observability for multi-account AWS landing zones and multi-region architectures.
Soft Skills & Leadership
· Excellent communication, architectural documentation, and executive presentation skills.
· Ability to influence stakeholders across engineering, cloud, SRE, operations, and leadership.
· Experience leading cross-functional teams and managing vendor/tooling relationships.
---
Preferred Qualifications
· AWS Certified Solutions Architect / Cloud Practitioner / DevOps Engineer
· Certifications in observability platforms (Datadog, Dynatrace, Splunk, etc.)
· Knowledge of ITIL, SRE principles, and enterprise operational frameworks
· Experience with automation using Python, Terraform, CloudFormation (nice-to-have)
---
Success Indicators
· Smooth transition of observability and tooling through all migration waves.
· Enhanced end-to-end visibility across applications, networks, and infrastructure post-migration.
· Reduction in incidents, MTTR, and monitoring gaps after migration to AWS.
· Standardized tooling practices aligned with enterprise governance and cloud architecture.
· Strong stakeholder confidence and measurable uplift in observability maturity