Here the scenario: A new kaggle competition, a new dataset. Gigabites? ouch! Cold shivers as you anticipate hours waiting to extract features, train models and middle of the night cold feet as you’re ‘just checking’ that your python script is still running.

Not familiar with H2O, Spark’s MLlib or GraphLab? Fear not!

Stupendous Scikit-learn will come to your rescue with its line-up of out-of-core classifiers.

The rest of the story is on the Open Data Science Conference Blog: Riding on Large Data with Scikit-learn by yours truly.


If you liked this post, please share it on twitter And leave me your feedback, questions, comments, suggestions below. Much appreciated :)