Replica Analytics is recruiting for data scientists to join our fast-growing startup. Working as part of the data science team, this role involves coordinating with external partners and clients on data synthesis projects as well as researching and implementing improvements to existing data synthesis pipelines. There are multiple roles for senior and junior data scientists.
The work involves the use and improvement of statistical machine learning methods and deep learning methods for synthetic data generation problems. This includes working with simple tabular datasets as well as complex longitudinal and high-dimensional data.
There will be lots of experimentation and the development of novel utility and privacy metrics to evaluate synthetic data.
- Maintain and improve existing production and quality control pipelines for synthetic data deliverables
- Communicate and coordinate with clients on data synthesis deliveries
- Participate in client education on data synthesis technologies
- Research & Development
- Contribute to the development of new technologies for data synthesis using a wide variety of machine learning methods; investigate various research topics in machine learning and statistics to determine the best method for data synthesis
- Contribute to the implementation and testing of production and research pipelines in Python and R as well as other languages
- Contribute to the dissemination of research results in the form of peer-reviewed papers, reports, and presentations
BSc/MSc/PhD degree (or equivalent) in mathematics, statistics, computer science, or electrical engineering
Work experience: 1 year for candidates with a PhD / 2 years for candidates with an MSc / 3 years for candidates with a BSc
Demonstrated ability for conducting statistical and machine learning research (in the form of a thesis, publications, or side projects) and to independently solve problems
Proficient in Python or R programming for data science (data cleaning/pre-processing, classification and regression, model evaluation, data visualization, writing and applying custom functions, parallelization)
Deep learning experience with PyTorch or TensorFlow would be a big plus
Excellent organizational and communication skills (verbal and oral)
Motivated to learn and apply new machine learning methods to solve real-life problems
- Experience working with health care data
- Knowledge of SAS and SAS programming would be a plus