Skip to content
View dineshc227's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report dineshc227

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dineshc227/README.md
header

Site Reliability Engineer β€’ Observability β€’ Java App Support β€’ DevOps Engineer

Email Mobile Portfolio

🎯 Professional Summary

Results-driven Site Reliability Engineer with 4+ years of experience ensuring high availability and performance for mission-critical payment platforms on AWS. At DXC Technology:

  • πŸ”» Reduced MTTR by 30% through Python-based automation and structured incident response workflows
  • πŸ”• Cut alert noise by 40% via systematic Datadog monitor optimization β€” directly improving on-call quality and MTTD
  • βœ… Sustained 99.9%+ uptime across 50+ microservices processing millions of financial transactions daily
  • πŸ”„ Reduced P1/P2 repeat incidents by 25% through RCA-driven root cause elimination and permanent fixes

Deep expertise in incident command, Kubernetes, CI/CD pipelines, Terraform IaC, and production Java/Spring Boot systems.


πŸ”‘ Key Skills

  • Cloud & Infrastructure: AWS (EC2, S3, VPC, IAM, Auto Scaling), Kubernetes, Docker, Terraform
  • Observability & Monitoring: Datadog (APM, Logs, SLOs, Monitors), Splunk, Grafana, New Relic, Dynatrace
  • SRE Practices: Incident Management, P1/P2 War Rooms, RCA, SLI/SLO, Error Budgets, Alerting, On-Call
  • Programming: Python (automation, log analysis, alerting scripts), Java
  • CI/CD & DevOps: Azure DevOps, Jenkins, GitHub Actions, Git, Maven
  • Frameworks & Databases: Spring Boot, Spring MVC, Spring Data JPA, Spring Cloud | MySQL, PostgreSQL
  • Ticketing Tools: Jira, ServiceNow
  • ITIL Practices: Incident, Change, Major Incident, and Problem Management

Monitoring & Observability

Proficient in the end-to-end administration of a comprehensive APM and monitoring stack, including:

Datadog Grafana Kibana New Relic Dynatrace Splunk

Tools: Datadog | Grafana | Kibana | New Relic | Dynatrace | Splunk

  • Datadog Administration: Onboarding services, configuring agents, tuning metrics collection, and managing monitors end-to-end.
  • Visualization: Designing Datadog dashboards and SLO tracking for real-time visibility across logs, metrics, and APM traces.
  • Alerting: Optimizing monitor thresholds to reduce alert noise by 40% β€” improving MTTD and on-call quality.

Process & Framework

  • Service Management: Skilled in managing SLOs, SLIs, and SLAs to align IT services with business goals.
  • ITIL Practices: Well-versed in ITIL frameworks for Incident, Change, Major Incident, and Problem Management.

πŸ† Key Achievements

Achievement Impact
πŸ”» Reduced MTTR by 30% Python automation scripts for alert triage, log correlation & incident response at Qatar Airways
πŸ”• Cut alert noise by 40% Systematic Datadog monitor tuning β€” improved on-call quality & MTTD
βœ… Sustained 99.9%+ uptime Mission-critical payment infrastructure handling millions of daily international transactions
πŸ”„ Reduced repeat incidents by 25% RCA-driven root cause elimination with permanent corrective fixes

πŸ’Ό Professional Experience


DXC Technology, Bangalore β€” Site Reliability Engineer (Dec 2022 – Present)

Client: Qatar Airways β€” Payments Platform | AWS Β· Datadog Β· Kubernetes Β· Python Β· Java/Spring Boot Β· Microservices

  • Managed fault-tolerant AWS infrastructure (EC2, VPC, IAM, S3, Auto Scaling) underpinning 50+ microservices processing high-volume international payment transactions.
  • Maintained 99.9%+ uptime for mission-critical financial services, consistently meeting all SLO targets across production environments.
  • Reduced MTTR by 30% by engineering Python automation scripts for alert triage, log correlation, and incident response workflows β€” eliminating repetitive manual investigation steps.
  • Optimized Datadog monitors and alerting thresholds, reducing alert noise and false positives by 40%, enabling faster and more accurate incident detection.
  • Designed and owned Datadog dashboards and SLO tracking for end-to-end system visibility spanning logs, metrics, and APM traces.
  • Led P1/P2 incident war rooms and post-incident root cause analysis (RCA); implemented permanent corrective actions that cut repeat incidents by 25%.
  • Deployed and managed containerized workloads on Kubernetes β€” resolved CrashLoopBackOff failures, tuned resource limits/requests, and implemented HPA for cost-effective auto-scaling.
  • Built and maintained CI/CD pipelines via Azure DevOps (Git, Maven), enabling reliable zero-downtime deployments with significantly reduced rollback rates.
  • Provisioned and managed AWS resources using Terraform (IaC), improving environment consistency, reducing provisioning errors, and accelerating deployment velocity.
  • Partnered with development teams to troubleshoot Java/Spring Boot applications by analyzing JVM metrics, heap dumps, GC logs, and API latency data to resolve production performance bottlenecks.
  • Created and escalated Jira & ServiceNow tickets to development teams for faster incident resolution and tracking.
  • Prepared structured incident runbooks and playbooks, shared with clients and business stakeholders for operational clarity.

Wipro Ltd β€” Site Reliability Engineer (Apr 2022 – Nov 2022)

Domain: Enterprise Solutions | Critical Transaction Platforms | Datadog Β· Grafana Β· Python Β· AWS Β· Java/Spring Boot

  • Supported mission-critical AWS environments for international enterprise clients; drove SLO/SLI optimization using Datadog and Grafana.
  • Built Python automation scripts for alert validation and monitoring health checks, improving team efficiency and reducing noise-driven false escalations.
  • Analyzed system logs and cloud deployment patterns to identify recurring failure modes; implemented targeted fixes reducing incident recurrence.
  • Coordinated production readiness reviews for new payment services; improved cross-team onboarding documentation and operational runbooks.

πŸš€ Personal Projects

Real-Time Observability Platform (Datadog)

  • Built an end-to-end observability stack with custom Datadog dashboards, SLO tracking, log pipelines, and APM traces for a personal microservices environment.
  • Replicated production-grade alerting patterns to validate and refine monitor configurations β€” achieving a 40% reduction in alert noise.
  • Authored runbooks and incident playbooks as part of an open learning initiative β€” publicly available at iamdinesh.xyz.

πŸ› οΈ Technical Stack

πŸ“Š Monitoring & Observability

Datadog Grafana Kibana New Relic Dynatrace Splunk

🎫 Ticketing Systems

JIRA ServiceNow

☁️ Cloud & Infrastructure

AWS Kubernetes Docker Terraform

πŸ’» Programming Languages

Python Java

πŸ—„οΈ Databases

MySQL PostgreSQL

πŸ”„ CI/CD

Jenkins GitHub Actions Azure DevOps

πŸ“‹ Practices & Frameworks

SLOs SLIs SLAs ITIL Agile SRE Incident Management Problem Management Change Management Error Budgets

🎯 Java Ecosystem

Spring Boot Spring MVC Spring Data JPA Spring Cloud

πŸ–₯️ Operating Systems

Windows Ubuntu


πŸŽ“ Education

Master of Degree β€” JNTU Anantapur (2017 – 2019)

Transitioned into Site Reliability Engineering through self-directed cloud study, hands-on Java/SQL lab work, and professional on-the-job experience.

πŸ“œ Certifications

  • πŸ”„ AWS Certified Solutions Architect – Associate (In Progress; Exam Scheduled 2026)

πŸ“¬ Contact Me

If you'd like to collaborate, ask a question, or just say hello β€” feel free to drop a message!

Email Mobile Location

πŸ“Š GitHub Stats

Metric Details
πŸ† Total Contributions Contributions
πŸ“‚ Languages Used Languages
⭐ Total Stars Stars

Popular repositories Loading

  1. spring-and-hibernate-for-beginners spring-and-hibernate-for-beginners Public

    Forked from darbyluv2code/spring-and-hibernate-for-beginners

    Source code for the course: Spring and Hibernate for Beginners

    Java 1 1

  2. Profile Profile Public

    This is my profile

    1

  3. SRE-Obesrvability-K8s SRE-Obesrvability-K8s Public

    Java 1 1

  4. c-languge c-languge Public

  5. Data-Science--Cheat-Sheet Data-Science--Cheat-Sheet Public

    Forked from georgearun/Data-Science--Cheat-Sheet

    Cheat Sheets

    TeX

  6. jubilant-octo-giggle jubilant-octo-giggle Public

    useful this ethical hacking