Pluralis Research

Private Company
ShareTweet

Open Positions3

Competitive base salary for senior engineering roles in AustraliaFull-TimeMachine LearningPosted
  • Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions.
  • Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead.
  • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes.
  • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs.
  • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks.
  • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave.
  • Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes.
  • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management.
  • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.
PythonMachine LearninggRPC+1 more
Showing 1 of 3 positions

Similar Companies