Site Reliability Engineer (SRE) Job at Kanshe Infotech, Alpharetta, GA

K3dydEZNa0dac3gzOStOVHk5b1ZpQ0g4b0E9PQ==
  • Kanshe Infotech
  • Alpharetta, GA

Job Description

Job Title: Site Reliability Engineer (SRE)

Location: Alpharetta, GA- Only Local

Job Description:

We are looking for an experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in DevOps, cloud infrastructure, automation, monitoring, and system reliability . You will be responsible for ensuring high availability, scalability, and performance of production systems while driving operational excellence through automation.

Key Responsibilities:

  • Design, build, and maintain scalable and reliable infrastructure on AWS / Azure / GCP .

  • Develop automation for deployment, monitoring, and incident response.

  • Implement CI/CD pipelines using tools like Jenkins, GitHub Actions, or GitLab CI.

  • Monitor system performance and ensure uptime, latency, and capacity optimization .

  • Build and maintain infrastructure as code using Terraform, Ansible, or CloudFormation.

  • Collaborate with development teams to improve system reliability and deployment processes.

  • Implement robust monitoring, alerting, and logging using Prometheus, Grafana, ELK, or Datadog.

  • Participate in on-call rotations , incident response, and root cause analysis.

Required Skills:

  • 10+ years of experience as an SRE, DevOps, or Cloud Engineer .

  • Hands-on experience with AWS, Azure, or GCP .

  • Strong scripting skills in Python, Bash, or Go .

  • Proficient with Docker, Kubernetes, Helm .

  • Experience with Terraform, Ansible, or other IaC tools .

  • Expertise in monitoring & observability tools (Prometheus, Grafana, Splunk, ELK, Datadog).

  • Solid understanding of Linux system administration and networking concepts.

  • Strong troubleshooting and problem-solving skills.

Preferred Skills:

  • Experience with microservices and service mesh (Istio/Linkerd) .

  • Familiarity with security best practices and incident management .

  • Experience in performance tuning and capacity planning .

  • Exposure to SLA/SLO/SLI management and reliability metrics

Education:

  • Bachelor's or Master's degree in Computer Science, Information Technology, or related field.

Job Tags

Local area,

Similar Jobs

TLC Nursing

Travel Registered Nurse Endoscopy Job Job at TLC Nursing

Seize a pivotal opportunity as a Travel Registered Nurse specializing in Endoscopy, delivering care that enhances comfort, safety, and outcomes in Glenwood Springs, Colorado. Beginning December 30, 2025, this multi-week assignment offers guaranteed 40 hours per week with... 

Darwill

Senior SEO Specialist Job at Darwill

 ...Are you an SEO specialist who thrives on solving complex search challenges across large multi-location environments? We are seeking...  ...effectively. Learn and adopt agency process and reinforce with internal team members. Collaborate with content, design, and... 

Belterra Casino Resort

BARISTA- NOSH (ON CALL) Job at Belterra Casino Resort

 ...change. Must be able to review and comprehend all necessary documentation. Ability to effectively and efficiently move around food service area. Machines and equipment used, but not limited to, include cash register, coffee machine, espresso machine, coffee grinder... 

WHSmith North America

Azure Cloud Engineer Job at WHSmith North America

 ...The IT Cloud Infrastructure Engineer is responsible for designing, building, and maintaining secure, scalable cloud and on-premises infrastructure to support the organization...  ...infrastructure automation (Terraform, Ansible, Azure RM). Hands-on expertise with cloud and... 

Accellor

Dell Boomi, Integration Architect Job at Accellor

 ...around them and push the company forward. In this role, you will: Development and implementation of integration solutions using Dell Boomi. Design, build, and maintain integration processes between various applications, data sources, and cloud services....