Writing Effective Amazon Machine Learning

My article on the Amazon Machine Learning service first published on the ODSC blog and then republished on KDnuggets triggered a book project. Shortly after writing that article, I was contacted by Packt publishing to write an entire book on AWS Machine Learning service. Packt Publishing is well known for its many excellent technical books.

The book titled Effective Amazon Machine Learning is to be released around April 2017. The book is included in the early access program of packt publishing and the chapters are made available online as soon as they are written and reviewed. The book is also already announced on Amazon

The challenge in writing a book on a rather well documented AWS service is to avoid rewriting the AWS documentation while still bringing real valuable knowledge to the reader. My goal with this project is to write a book that will satisfy the beginner data scientist as well as the more seasoned one.

The book in 3 parts

An overview intro of Machine Learning and Predictive Analytics
The nuts and bolts of Amazon Machine learning
and then a more involved part on real world applications.

The Amazon Machine Learning service is built around Linear regression and classification and uses the very classic stochastic gradient algorithm. The Amazon approach with this service is to simplify predictive analytics projects. Democratization is the word. Which makes it a perfect platform for new data scientists.

However, data science is only simple when the underlying data science concepts are fully mastered. There is no shortcut. I explain these important concepts, such as overfitting and regularization, metrics and model evaluation, data cleaning and feature engineering in the first chapters of the book.

Although Amazon Machine Learning is simple by design it resides within the extremely rich AWS ecosystem. Services such as AWS Lambda, Athena or Redshift although not directly intended for data science, can be used in the preparation phase of a predictive analytics project to transform the raw data. The book explores different integrations around Amazon Machine Learning that leverage these other data focused services. I also show how to extend the Amazon Machine Learning functionalities by using the AWS Command Line Interface (CLI) and the python SDK (Boto3) to do Monte carlo cross validation and recursive feature selection (RFS).

Writing a book requires dedication, discipline, and time. A lot of time. But it’s most of all a very exciting project. I’m both thrilled, excited and tickled by the size of the challenge.

Wish me luck!

The book is available chapter after chapter at: Effective Amazon Machine Learning