Data Scientist

Responsibilities include:

  • Conduct in-depth empirical research on trading strategies and test results of such strategies
  • Establish scalable, efficient, automated processes for large scale, time-series data analyses
  • Manipulate and analyze complex, high volume, time-series data sets
  • Develop distributed/parallel solutions for data analysis, mining and visualization of structured and unstructured time-series data sets
  • Develop, optimize, parallelize and implement novel algorithms for statistical modeling and data mining of complex time-series data, on high performance multicore and distributed architectures, including IntelPhi, distributed clusters and Cuda (desired but not essential)
  • Provide expertise in Machine Learning to cross-functional teams, working throughout product development life cycles and in support of production trading operations
  • Monitor and continuously evaluate new methodologies and third-party technologies addressing analysis of large data sets applicable to time-series data
  • Provide for the development and execution of analyses that support decision making in diverse areas such as technology assessment, analytical method development and validation, and process improvement and optimization
  • Select and apply appropriate machine learning methodologies to provide actionable results, to all project related studies, under general strategic guidance
  • Deliver crucial components used to validate algorithmic trading strategies in a timely manner

Requirements:

  • Master’s degree in Computer Science, Statistics or related discipline with 1+ years of work experience
  • Experience in handling gigabyte and terabyte size datasets and working with distributed systems
  • Experience in applying descriptive and inferential statistics in Big Data problems
  • Fluent in theory and application of standard machine learning or data mining algorithms
  • Ability to utilize in-house file systems, databases, and data flow control systems built in C++, with new languages and technologies continuously being evaluated
  • Familiarity with big data technologies with ability to identify the best technology for a given problem
  • Comfortable with C/C++ and scripting (awk and python preferred)
  • Able to integrate and apply feedback in a professional manner
  • Able to work independently or as part of a team