CLICK HERE: Code repo
CLICK HERE: Full Medium Blog
Overview
Normally feature engineering would take up to 80% of a Data Scientist’s time and also involves many domain knowledge. The whole process is quite tedious and some similar tasks are repeated every time across different projects. The web app I created free Data Scientist from the repeated coding and through some parameters definition in the User Interface, the features will be generated automatically for you, which would save estimated 80% of a Data Scientist’s time.
Take user inputs like the tables (entities) will be used The app will estimate features that allow user to adjust
Challenges
- How to modulize feature engineering components and reuse them effectively and efficiently
- Dynamic selection and interactive tables
- Make the app portable and easy to install
Solutions
- Took advantage of a third party package “featuretools” that take over the repeated tasks.
- Used MongoDB to store the user input; JQuery+Ajax+Js to make table editable and interactive.
- Leveraged two Docker to containerize the app and MongoDB, respectively and Docker-compose to organize them.
Details
Since I’ve already written an article about it on Medium.com, please check it out through the link above.