About Me


Thrive to be a full-stack data scientist who can conduct rigorous research to uncover business insights and also deliver production-level code to build practical applications.

I'm Yingchi, born in China and currently working as a data scientist at Indeed, Singapore. I specialize in building production-level data science solutions with big data environment, with familiarity and hands-on working experience with classical ML methods like Logistic Regression, Random Forest, Boosting, NLP techniques and Neural networks.

Topics of interest: text mining, recommendation, neural network and more...

I love taekwondo 🥋, piano 🎹 and ice cream 🍦. And I'm keen to learn, experience and share.

Download Resume

Skills


Experience


  • Apr 2019
    |
    Present

    Indeed

    Data Scientist

    Applied NLP techniques (entity embeddings) with tree-based ML models to estimate job salary using structured as well as unstructured text features.

    • Designed and developed the jobseeker salary inference pipeline including model (re)training with AWS SageMaker, model deployment by setting up REST and gRPC service from Python, and model monitoring with scheduled jobs.

    • Build Python modules for text summarization and ranking to generate representative content items, using NLP techniques such as TextRank and Word2Vec.

    • Prototype exploration and exploitation pipeline for dynamic ranking.

  • Jul 2018
    |
    Jan 2019

    Bitmain

    Data Scientist

    Part of the btc.com team.

    • Provided data insights for cryptocurrency mining platforms and blockchain explorers using Airflow scheduled Spark jobs.

    • Developed a transaction fee prediction engine using Neural Networks and Generalized Linear Models, building the end-to-end process from acquiring real-time data (Python parser with Redis and MySQL) to training and evaluating models.

    • Generated internal data reports using Spark SQL, Hive and graph databases like neo4j.

  • Jul 2017
    |
    Jul 2018

    DataSpark

    Data Scientist

    Work in the application team.

    • Researched on footfall analytics with telco data using machine learning algorithms such as Naive Bayes, Logistic Regression, and Random Forests. Implemented and productionized models into our data analytics platform using Python. Submitted two research papers based on that with one published..

    • Designed and develop the network planning application for telco operators to reduce upgrading cost while improving customer experience. The application was built with Scala and deployed in a big-data environment with Hadoop and Spark.

  • Dec 2016
    |
    Jan 2017

    ViSenze

    Data Analytics Intern

    • Established the pipeline of internal metrics reporting by understanding the raw data, current data management system and the requirements from various team leaders

    • Produced dashboards on system and business performance to enable stakeholders to make effective decisions, using Chartio and SQL

    • Assisted engineering teams in database design

  • June 2016
    |
    Nov 2016

    DataSpark

    Data Science Intern

    • Conducted geolocation data analysis projects to undercover new features and improve model accuracy by running Hadoop and Spark jobs; implemented reproducible code using R Markdown and Python for the projects.

    • Built interactive data visualizations (Web apps) using JavaScript, Node.js and React for internal and external clients.

  • May 2015
    |
    July 2015

    Millward Brown

    Market Research Analyst Intern

    • Prepared Budweiser's 2015 Q1 report which was well received by the client; discovered unusual patterns from data and initiated deep dive research to find explanations.

    • Collected and complied the consumer survey data weekly using SPSS Survey Reporter.

Education


  • 2018
    |
    Present

    National University of Singapore

    Master of Computer Science, 4.83/5.0

    Main courses taken:
    Neural Networks and Deep Learning (CS5242)
    Big-Data Analytics Technology (CS5344)
    Phenomena and Theories of Human-Computer Interaction (CS4249)
    Text Mining (CS5246)
    Knowledge Discovery and Data Mining (CS5228)
    Uncertainty Modelling in AI (CS5340)

  • 2013
    |
    2017

    National University of Singapore

    Bachelor of Business Analytics, 4.91/5.0

    Honours with Highest Distinction 🎓
    Winner of Lee Kuan Yem Gold Medal 🥇
    Awarded for Dean's List for 5 semesters

    Main courses taken:
    Mining Web Data for Business Insights (BT4222) | Search Engine Optimization & Analytics (BT4212)
    Data Mining (ST4240) | Business Intelligence Systems (IS4240)
    Stochastic Models in Management (DSC3215) | Computational Methods for Business
    Analytics (BT3102) | Statistical Methods for Finance (ST4245)
    Social Media Network Analysis (IS4241) | Simulation (ST3247)
    Stochastic Process (ST3236) | Regression Analysis (ST3131)

  • 2016
    |
    2016

    CFA Institute

    Passed Level I of the CFA Program

Publications


Footfall Count Estimation Techniques Using Mobile Data

2017 IEEE 18th International Conference on Mobile Data Management (MDM)

Playground


RNN Chinese Novel Generator

A Chinese text generator using RNN (Recurrent Neural Network) and LSTM (Long-short Term Memory) layers. The training text is Modu 《默读》, a popular web fiction in Chinese.

Flask Calendar Integrated with Plotly Charts

A concise calendar (Fullcalendar) using Flask framework, and integrated with plotly.js to showcase interactive charts for the data.

Contact


Somewhere in Singapore

yingchi.pei@gmail.com

Leave me a message :D