https://agevolution.canalrural.com.br/wp-content/uploads/2019/06/Scicrop.jpg

My Scicrop job application!ยถ

Check out the GitHub repository:

_images/hiclipart.com.png


This project is a solution to a multi-class supervised learning task. The repository is structured in 5 notebooks in which we explore the data and try different approaches.

  1. FirstAnalysis
  2. Missing Values
  3. Machine Learning
  4. Optimize
  5. Validate

After we have the best solution we implement them in python scrips, there are 3 main scripts:

  1. train_test_split.py
  2. create_models.py
  3. predict.py

This is a complete data pipeline - from preparing the data, training, optimizing, saving the models and predicting target data. This pipeline can be ran using the make command:

1
make train_predict

The data pipeline can also be ran using a docker container!

1
make docker

Here we build the docker image, mount the host results/ directory to the cointainerโ€™s directory and run the pipeline, the prediction output is saved in the host.