Senior Site Reliability Engineer

Posted 2024-10-21

View full description

💎 Seniority level: Senior, 5+ years

📍 Location: India

🔍 Industry: Experience Management

🏢 Company: Experience.com

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: AWSDockerPythonBashElasticSearchGCPJenkinsKubernetesJiraElasticsearchGrafanaPrometheusCollaborationTerraform

Requirements:

Strong experience with cloud platforms (GCP / AWS).
Proficiency in configuration management and infrastructure as code tools (e.g., Terraform, Ansible, Puppet, Chef).
Deep understanding of containerization and orchestration technologies (Docker, Kubernetes).
Solid knowledge of scripting languages (Shell, Python, Bash).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
Strong background in Linux/Unix Administration.
Familiarity with delivery pipeline configuration tools like Jenkins.
Knowledge of application performance monitoring tools (e.g., NewRelic, Sumologic).
Familiarity with networking concepts (VPC, subnets, security principles).
Experience with distributed data systems (ElasticSearch, Atlas Search).

Responsibilities:

Make infrastructure simpler and more reliable.
Support development teams managing various systems and applications.
Analyze client operations for improvement opportunities.
Evaluate existing platforms and provide performance improvement recommendations.
Build automated provisioning and deployments.
Automate repetitive tasks across applications.
Ensure security for infrastructure, applications, data, and networks.
Coordinate change requests for infrastructure and applications.
Provide operational support for maintenance activities.
Generate availability and ad-hoc reports.
Maintain operational documentation and technical specifications.
Define DevOps quality guidelines and ensure compliance.
Assist in load and performance testing and disaster recovery planning.
Strive for team success.

Apply

Related Jobs

Apply

🔥 Senior Site Reliability Engineer, Databases

Posted 2024-08-28

📍 Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

💸 109000 - 169000 USD per year

🔍 Nonprofit, Technology

Proficient at automation/programming/scripting skills.
Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.) as well as modern observability infrastructure (Prometheus, Grafana, Logstash/Kibana, Icinga/Nagios, etc.).
Advanced knowledge of Linux and IO/data storage concepts, internals and troubleshooting.
Experience with managing remotely both bare-metal servers and virtualized environments.
5+ years experience in an SRE/Operations/DevOps role as part of a team.
Experience with high traffic and highly available website architectures and operations.
Strong English language skills.
Ability to work independently in a fast paced environment, as an effective part of a globally distributed team, including ticket tracking systems and asynchronous communication tools.
B.Sc. or M.Sc. in Computer Science or equivalent work experience.

Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments.
Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution.
Improving observability (alerting, metrics, monitoring) of database infrastructure.
Multi-datacenter systems design, capacity and infrastructure planning.
Taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia's production infrastructure and participating in an on call rotation.

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedis

Posted 2024-08-28

Apply

🔥 Senior Site Reliability Engineer, Databases

Posted 2024-08-28

🧭 Full-Time

💸 109000 - 169000 USD per year

🔍 Nonprofit, knowledge sharing

Proficient at automation/programming/scripting skills
Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.)
Advanced knowledge of Linux and IO/data storage concepts
Experience with managing remotely both bare-metal servers and virtualized environments
5+ years experience in an SRE/Operations/DevOps role
Experience with high traffic and highly available website architectures
Strong English language skills
Ability to work independently in a fast paced environment

Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments
Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution
Improving observability of database infrastructure
Designing multi-datacenter systems, capacity planning, and infrastructure planning
Participating in incident response and on-call rotation for system outages or alerts

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedisLinux

Posted 2024-08-28

Apply

🔥 Senior Site Reliability Engineer, Data Engineering

Posted 2024-08-22

🧭 Full-Time

💸 109047 - 169455 USD per year

🔍 Nonprofit / Technology

At least two years experience in an SRE/Operations/DevOps role as part of a team.
Experience supporting high availability distributed production systems.
Experience with database administration and support.
Comfortable with configuration management and orchestration tools (e.g., Puppet, Ansible, Chef, SaltStack).
Knowledge of modern observability infrastructure (monitoring, metrics, and logging).
Proficient in shell and scripting languages such as Python, Go, Bash, Ruby.
Good understanding of Linux/Unix fundamentals and debugging skills.
Excellent written and verbal communication skills.
BS or MS degree in Computer Science or equivalent work experience.

The Deployment, configuration and maintenance of the distributed data systems that comprise our data and analytics platform.
Implement data quality monitoring that alerts the team of possible data issues.
Collaborate closely with the Fundraising team to integrate and use data from self-hosted and third-party sources.
Provide engineering support during high-traffic or critical campaigns.
Write and update internal documentation of systems and processes.
Ensure compliance with regulations like the Donor Privacy Policy, GDPR, and PCI DSS.
Create and manage users and permissions for data access control.
Advise on data input best practices and develop processes for data entry consistency.
Work closely with Fundraising Analytics to gather and prioritize data enhancement requests.

PythonBashRubyC (Programming language)Data engineeringGoCommunication SkillsCollaboration

Posted 2024-08-22

Apply

Senior Site Reliability Engineer

Requirements:

Responsibilities:

Related Jobs

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities