Data Scientist II - Big Data R&D, Identity Graph & KYC

New
S
SocureIdentity Trust Infrastructure
Remote - USFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
Master’s degree with 2+ years of experience, or Ph.D. with 1+ years of experience
Required Skills
AWSPythonSQLSparkScalascikit-learnPySpark

Requirements

  • Master’s degree with 2+ years of experience, or Ph.D. with 1+ years of experience in data science or analytics
  • Proficiency in Python or Scala
  • Solid experience writing and optimizing SQL for large datasets
  • Comfort working in data lake / warehouse environments
  • Hands‑on experience with Spark or PySpark
  • Experience with common ML libraries (e.g., scikit‑learn, XGBoost)
  • Familiarity with UNIX environments and the AWS ecosystem (e.g., EMR, S3)
  • Working knowledge of supervised/unsupervised ML and basic statistics (similarity measures, clustering, evaluation metrics)
  • Exposure to graph techniques or graph databases (Neo4j, AWS Neptune, GraphFrames) is a strong plus
  • Experience with Elasticsearch or DynamoDB is a bonus
  • Experience with workflow tools such as Airflow for automating data pipelines is a bonus
  • Ability to break down loosely defined problems and iterate quickly with feedback

Responsibilities

  • Contribute to the design and implementation of machine learning, data mining, statistical, and graph-based algorithms for identity verification and anomaly detection.
  • Analyze large datasets to develop and refine entity-resolution and identity-matching algorithms for KYC and compliance solutions.
  • Build and maintain data-processing pipelines (ETL, feature generation, normalization) using Spark/PySpark and AWS (e.g., EMR, S3).
  • Support senior data scientists with feature engineering, data exploration, error analysis, and A/B test setup.
  • Evaluate new third‑party and internal data sources, profile data quality, design offline experiments, and summarize impact.
  • Implement and maintain SQL and Python/R code for data extraction, transformation, and validation, including code reviews and testing.
  • Provide analytical support to compliance and regulatory product teams, including ad hoc investigations, dashboards, and data deep dives.
  • Communicate findings clearly to peers and cross‑functional partners (Product, Engineering, Client Analysis).
  • Work effectively in a fast‑paced, cross‑functional environment, demonstrating ownership of tasks.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now