About Me

Thrive to be a full-stack data scientist who can conduct rigorous research to uncover business insights and also deliver production-level code to build practical applications.

I'm Yingchi, born in China and currently working as a data scientist at Indeed, Singapore. I specialize in building production-level data science solutions with big data environment, with familiarity and hands-on working experience with classical ML methods like Logistic Regression, Random Forest, Boosting, NLP techniques and Neural networks.

Topics of interest: text mining, recommendation, neural network and more...

I love taekwondo 🥋, piano 🎹 and ice cream 🍦. And I'm keen to learn, experience and share.

Download Resume



  • Apr 2019


    Data Scientist

    Applied NLP techniques (entity embeddings) with tree-based ML models to estimate job salary using structured as well as unstructured text features.

    • Designed and developed the jobseeker salary inference pipeline including model (re)training with AWS SageMaker, model deployment by setting up REST and gRPC service from Python, and model monitoring with scheduled jobs.

    • Build Python modules for text summarization and ranking to generate representative content items, using NLP techniques such as TextRank and Word2Vec.

    • Prototype exploration and exploitation pipeline for dynamic ranking.

  • Jul 2018
    Jan 2019


    Data Scientist

    Part of the btc.com team.

    • Provided data insights for cryptocurrency mining platforms and blockchain explorers using Airflow scheduled Spark jobs.

    • Developed a transaction fee prediction engine using Neural Networks and Generalized Linear Models, building the end-to-end process from acquiring real-time data (Python parser with Redis and MySQL) to training and evaluating models.

    • Generated internal data reports using Spark SQL, Hive and graph databases like neo4j.

  • Jul 2017
    Jul 2018


    Data Scientist

    Work in the application team.

    • Researched on footfall analytics with telco data using machine learning algorithms such as Naive Bayes, Logistic Regression, and Random Forests. Implemented and productionized models into our data analytics platform using Python. Submitted two research papers based on that with one published..

    • Designed and develop the network planning application for telco operators to reduce upgrading cost while improving customer experience. The application was built with Scala and deployed in a big-data environment with Hadoop and Spark.

  • Dec 2016
    Jan 2017


    Data Analytics Intern

    • Established the pipeline of internal metrics reporting by understanding the raw data, current data management system and the requirements from various team leaders

    • Produced dashboards on system and business performance to enable stakeholders to make effective decisions, using Chartio and SQL

    • Assisted engineering teams in database design

  • June 2016
    Nov 2016


    Data Science Intern

    • Conducted geolocation data analysis projects to undercover new features and improve model accuracy by running Hadoop and Spark jobs; implemented reproducible code using R Markdown and Python for the projects.

    • Built interactive data visualizations (Web apps) using JavaScript, Node.js and React for internal and external clients.

  • May 2015
    July 2015

    Millward Brown

    Market Research Analyst Intern

    • Prepared Budweiser's 2015 Q1 report which was well received by the client; discovered unusual patterns from data and initiated deep dive research to find explanations.

    • Collected and complied the consumer survey data weekly using SPSS Survey Reporter.


  • 2018

    National University of Singapore

    Master of Computer Science, 4.83/5.0

    Main courses taken:
    Neural Networks and Deep Learning (CS5242)
    Big-Data Analytics Technology (CS5344)
    Phenomena and Theories of Human-Computer Interaction (CS4249)
    Text Mining (CS5246)
    Knowledge Discovery and Data Mining (CS5228)
    Uncertainty Modelling in AI (CS5340)

  • 2013

    National University of Singapore

    Bachelor of Business Analytics, 4.91/5.0

    Honours with Highest Distinction 🎓
    Winner of Lee Kuan Yem Gold Medal 🥇
    Awarded for Dean's List for 5 semesters

    Main courses taken:
    Mining Web Data for Business Insights (BT4222) | Search Engine Optimization & Analytics (BT4212)
    Data Mining (ST4240) | Business Intelligence Systems (IS4240)
    Stochastic Models in Management (DSC3215) | Computational Methods for Business
    Analytics (BT3102) | Statistical Methods for Finance (ST4245)
    Social Media Network Analysis (IS4241) | Simulation (ST3247)
    Stochastic Process (ST3236) | Regression Analysis (ST3131)

  • 2016

    CFA Institute

    Passed Level I of the CFA Program


Footfall Count Estimation Techniques Using Mobile Data

2017 IEEE 18th International Conference on Mobile Data Management (MDM)


RNN Chinese Novel Generator

A Chinese text generator using RNN (Recurrent Neural Network) and LSTM (Long-short Term Memory) layers. The training text is Modu 《默读》, a popular web fiction in Chinese.

Flask Calendar Integrated with Plotly Charts

A concise calendar (Fullcalendar) using Flask framework, and integrated with plotly.js to showcase interactive charts for the data.


Somewhere in Singapore


Leave me a message :D