Head of Cloud & Platform

Posted 2 months agoViewed
Brazil, ArgentinaFull-TimeFintech, Digital Banking
Company:RecargaPay
Location:Brazil, Argentina
Languages:English, Spanish
Seniority level:Lead, Extensive hands-on experience in software engineering roles
Experience:Extensive hands-on experience in software engineering roles
Skills:
AWSLeadershipPythonSQLAgileAWS EKSJavaKafkaKubernetesSCRUMSoftware ArchitectureSpring BootCross-functional Team LeadershipStrategyData engineeringGrafanaCI/CDTerraformMicroservicesCommunication SkillsAnalytical SkillsCollaborationProblem SolvingMentoringComplianceSoftware EngineeringChange ManagementNetworkingCritical thinkingTeamworkTroubleshootingRisk ManagementProcess improvementTechnical supportData analyticsData management
Requirements:
Academic background oriented toward Computer Science, Engineering, or Software Development disciplines Deep expertise in AWS cloud architecture, including multi-account management, VPC design, EKS, ECS, Lambda, and networking topologies Proven experience with Infrastructure as Code (Terraform, Pulumi) and GitOps automation at scale Strong understanding of Kubernetes internals, workload orchestration, and cost/performance optimization Experience implementing SRE and reliability frameworks: SLOs, error budgets, chaos testing, and automated incident remediation Mastery of observability and monitoring (CloudWatch, Grafana, Datadog, NewRelic) with trace/metric/log correlation Proficiency in security and compliance engineering: IAM, KMS, encryption, secrets lifecycle, policy enforcement (OPA/Rego), and regulatory controls (PCI, LGPD, GDPR) Experience defining and governing API and event-driven architectures (OpenAPI/AsyncAPI, Kafka schema registries) Deep knowledge of progressive delivery, service mesh (e.g., Istio), and DevSecOps pipelines Strong FinOps acumen: right-sizing, egress optimization, reserved instance and savings plan strategy, and service-level cost attribution Experience integrating AI-assisted workflows (GitHub Copilot Enterprise, LLM-based linters and others) into development and CI pipelines, with measurable productivity impact Extensive hands-on experience in software engineering roles, with solid proficiency in Java (Spring Boot) and working knowledge of Python and asynchronous programming Strong foundation in Object-Oriented Programming and relational database systems Solid understanding of web and mobile application architectures, including security, session management, and development best practices Expertise in Domain-Driven Design and microservices architecture, with proven ability to design high-performance, scalable, and reliable distributed systems Demonstrated experience defining and executing architectural roadmaps aligned with business and developer-experience goals Deep knowledge of networking in AWS Advanced experience architecting VPC topologies, including Transit Gateway, private/public subnet design, NAT/GW cost optimization, and egress control for regulated environments Hands-on experience implementing observability pipelines at scale, integrating NewRelic, CloudWatch, Prometheus, Grafana, Datadog Familiarity with EKS internals: node group management, autoscaling, and Kubernetes cost/latency optimization Proven experience managing multi-region and multi-environment deployments Expertise in AWS security hardening and compliance controls, including IAM least-privilege modeling, KMS envelope encryption, CloudTrail auditing, GuardDuty detections, and automatic remediation with Lambda/Step Functions Deep understanding of container security, image signing, ECR scanning, and OPA/Rego policy design for admission controllers Advanced experience with Infrastructure as Code using Terraform (modules, workspaces, policy enforcement) and Pulumi (multi-language stacks, secrets providers, CI integration) Proven ability to implement GitOps workflows, ensuring deterministic deployments and drift detection Strong policy-as-code practice to codify security/SRE guardrails across CI/CD and Kubernetes admission controllers Expertise automating application stack provisioning (app resources, service accounts, IAM bindings, egress controls) through reusable IaC modules and pipelines Deep understanding of progressive delivery (canary, blue/green, shadow traffic, automated rollback) and service mesh (Istio/Linkerd/App Mesh) for safe deployment strategies Mastery of resilience and reliability patterns: timeouts, bounded retries with jitter, circuit breakers, bulkheads, back-pressure, outbox/saga orchestration, and graceful degradation Deep knowledge of event-driven and streaming architectures (Kafka and others), including partitioning strategies, compaction/retention policies, rebalancing, ordering guarantees, exactly-once semantics, and schema evolution via registries Strong background in data performance engineering: caching (read-through/write-behind), connection pool tuning, pagination/cursoring, latency budgeting, and throughput modeling Experience with SLO-driven reliability: defining SLIs, error budgets, and reducing alert fatigue via multi-signal correlation Proficiency with production monitoring tools (NewRelic, Grafana, Datadog, CloudWatch) and advanced observability instrumentation Proven experience building self-service developer platforms (Backstage, Internal Developer Portals) that expose golden paths for application scaffolding, environment provisioning, and secure deployments Experience implementing event-driven DevEx tooling (e.g., ephemeral environments, automated CI insights, preview deployments) Strong knowledge of API lifecycle management and governance (OpenAPI/AsyncAPI, contract testing, versioning, idempotency, error modeling) Expertise in CI/CD automation and DevSecOps (GitHub Actions, CodeBuild/CodePipeline, artifact provenance, environment promotion, changelog automation) Practical compliance-by-design experience translating PCI-DSS, KYC/AML, GDPR, and LGPD controls into technical patterns (tokenization, segmentation, audit trails, retention/erasure) Experience leading AWS Well-Architected Framework reviews across all pillars (Security, Reliability, Performance, Cost, Operational Excellence, Sustainability) Experience designing cost-aware architectures, balancing performance, resilience, and financial efficiency Exposure to edge computing and CDN optimization (Lambda@Edge, CloudFront Functions, custom caching policies) Fluent in English and Spanish (Portuguese a plus) Exceptional leadership in cross-functional environments Strong communicator able to influence C-level stakeholders Proven ability to mentor senior technical leaders High decision-making autonomy with strong prioritization and problem-solving under ambiguity Strategic mindset with the ability to translate complex business goals into long-term technical direction Strategic thinker with a bias for measurable outcomes Strong decision-making and prioritization abilities under ambiguity Active contributor to a constructive feedback culture Comfort operating at both strategic and hands-on levels Ability to learn rapidly and adapt to new technologies
Responsibilities:
Define and execute the Cloud and Platform strategy Lead a multi-disciplinary organization covering Cloud Infrastructure, SRE, Platform Engineering, and DevSecOps Drive modernization of infrastructure and delivery pipelines Partner with executive leadership to define scalable operating models Establish a long-term architectural vision for cloud services, platform frameworks, and developer enablement tools Sponsor AI-assisted engineering adoption Serve as the ultimate technical and strategic authority for AWS, Kubernetes, IaC, Observability, and Reliability practices Oversee the design, scalability, and governance of the AWS multi-account organization Lead the definition and implementation of multi-region, multi-environment architectures Institutionalize well-architected principles Evolve network and connectivity architectures Own identity, access, and secrets management lifecycle Oversee monitoring and observability frameworks Ensure SLO-driven operations Lead resilience and reliability engineering practices Build and scale the company’s Internal Developer Platform (IDP) Define golden paths, opinionated tooling, and reusable infrastructure modules Ensure trunk-based development, progressive delivery, automated rollback, and health/SLO-gated deployments Drive GitOps adoption Expand event-driven and streaming platforms Partner with Security and Compliance to embed DevSecOps and Policy-as-Code practices Establish and lead a FinOps program Define cost-to-serve models per service Integrate cost and performance telemetry into platform dashboards Partner with Finance to align cloud spend forecasts Lead and mentor senior engineering managers and principal engineers Promote a culture of reliability, automation, and continuous improvement Establish governance rhythms Collaborate closely with Risk, Compliance, and Security to uphold standards
Similar Jobs:
Posted 3 days ago
Brazil, Serbia, South Africa, India, MexicoFull-TimeE-commerce
Head of E-commerce Operations (CRO) for US DTC Company (Remote)
Company:Paired
Posted 3 days ago
São PauloFull-TimeData Platform
Staff Data Platform Infrastructure Engineer - Technology
Company:Truelogic