My Scicrop job application!ยถ
Check out the GitHub repository:
This project is a solution to a multi-class supervised learning task. The repository is structured in 5 notebooks in which we explore the data and try different approaches.
- FirstAnalysis
- Missing Values
- Machine Learning
- Optimize
- Validate
After we have the best solution we implement them in python scrips, there are 3 main scripts:
- train_test_split.py
- create_models.py
- predict.py
This is a complete data pipeline - from preparing the data, training, optimizing, saving the models and predicting target data. This pipeline can be ran using the make command:
1 | make train_predict
|
The data pipeline can also be ran using a docker container!
1 | make docker
|
Here we build the docker image, mount the host results/ directory to the cointainerโs
directory and run the pipeline, the prediction output is saved in the host.