Currently, we are looking for a Data Scientist / Machine Learning Engineer who will be a part of the general-purpose data science core team and work with tasks covering a wide variety of business needs with a soft focus on NLP or CV applications.
In this position, you will work with multiple data sources (usually textual, numerical and time-related data), huge and small datasets to develop, validate and deploy machine learning models, tune their performance & integrate them into data processing pipelines.
Responsibilities:
* Deal with both structured and unstructured data, collaborate with data engineers on defining data storage formats, state data collection requirements;
* Not only solve technical tasks but understand business needs and offer appropriate solutions, describe a chosen approach to non-technical people;
* Set up reproducible experiments: selection, training, validation and optimization of machine learning models, evaluation of their quality in business-related terms;
* Integrate data preprocessing and model inference into general data processing pipelines;
* Research new tools, papers, etc. in the machine learning area.
Requirements:
* Strong knowledge and deep understanding of
* Сlassical machine learning (linear models, decision trees, ensembles for classification and regression tasks, clustering and dimensionality reduction)
* Main concepts and stages of the modelling process (validation scheme, regularization, overfitting and generalization, data leaks, feature selection, etc.)
* Experience with Python scientific, visualization and ML-related libraries (numpy, scipy, scikit-learn, etc.)
* Experience with different clustering techniques
* Experience with classic NLP tools and techniques (nltk, spacy, n-grams, skip-grams, TF-IDF, tokenizers, lemmatization, dependencies parsing, etc.)
* Experience with NN frameworks, NLP-related architectures and libraries (Pytorch / Tensorflow, HuggingFace, fasttext, flair, sentence transformersWord2Vec, ElMo, RNN, CNN, Transformer, BERT, etc.)
* Experience in tuning pre-trained models for different NLP tasks
* Good Python programming skills
* Good spoken and written English (at least B1)
* Ability and desire to convert raw business requests into strictly formulated machine learning tasks
* Ability to formulate data gathering (or data labelling) requirements
* Minimum 2-year experience in machine learning