Alexis Perrier


Hi, I'm Alexis Perrier, a Data Science consultant.
I help companies large and small, profit from machine learning.
I also teach and write about all things Data Science, from linear regression to deep learning with a penchant for NLP.
I work between Washington DC where I live
and Paris where I'm from.

Follow me on twitter @alexip.
Let's connect on Linkedin or AngelList.


Recent posts

  • Reduce GPU costs with startup scripts on the Google Cloud Engine

    Reduce GPU costs with on demand instances and startup scripts This post is about leveraging on demand capabilities of costly virtual instances on the Google Cloud Engine using startup scripts. Deep Learning is expensive Here’s the situation: You’re working on some large dataset, and you feel the irresistible urge to...

  • iPhone addiction? Get a grip!

    Is it a DNA thing? My wife has a super power! She is totally immune to the constant nagging of her iPhone. She has an amazing ability to resist checking her emails every 5 minutes, texting back on the spot and playing the whack-a-notification game all day long. Maybe it’s...

  • Top gsutil command lines to get started on Google Cloud Storage

    Google storage is a file storage service available from Google Cloud. Quite similar to Amazon S3 it offers interesting functionalities such as signed-urls, bucket synchronization, collaboration bucket settings, parallel uploads and is S3 compatible. Gsutil, the associated command line tool is part of the gcloud command line interface. After a...

  • AutoML on AWS

    Build a predictive analytics pipeline in a flash When Bayesian optimization meets the Stochastic Gradient Descent algorithm on the AWS marketplace, rich features bloom, models are trained, Time-To-Market shrinks and stakeholders are satisfied. In this article, we present an AWS based framework which allows non technical people to build predictive...

  • Gsutil cheatsheet

    gsutil is Google Storage CLI tool. Equivalent to aws s3 but for the Google Cloud Platform, it allows you to access Google Cloud Storage from the command line. Beyond moving files and managing buckets, gsutil is a powerful file management (rsync) and file publication tool (signed urls). Please find below...

  • AWS Machine Learning Big Data NYC

    My slides on AWS Machine Learning platform at the Global Big Data conference - NYC 2017 Oct 24. tl;dr: The AWS Machine Learning service is a simple but very efficient predictive analytics service for supervised classification and regression. The AWS ML service greatly simplifies the model selection and model optimization...

  • Workshop sur le Topic Modeling

    J’ai eu le plaisir de mener récemment un workshop sur le topic modeling dans le cadre du Master Méthode computationnelle et analyse de contenu à l’Université Paris Est Marne la vallée. Il y a assez peu de ressources en français sur le topic modeling. Le seul résultat que j’ai pu...

  • Writing Effective Amazon Machine Learning

    My article on the Amazon Machine Learning service first published on the ODSC blog and then republished on KDnuggets triggered a book project. Shortly after writing that article, I was contacted by Packt publishing to write an entire book on AWS Machine Learning service. Packt Publishing is well known for...

  • Large Data with Scikit-learn - Boston Meetup

    ### Large Data with Scikit-learn * Alexis Perrier - [@alexip](https://twitter.com/alexip) * Data & Software - [@BerkleeOnline](https://twitter.com/berkleeonline) - Day * Data Science contributor - [@ODSC](https://twitter.com/odsc) - Night ### Plan 1) What is large data? out-of-core, streaming, online, batch? 2) Algorithms 3) Implementation 4) Examples ### Many great alternatives * Dato: [GraphLab...

  • Paris Meetup slides Topic Modeling of Twitter Followers

    ### Topic Modeling #####appliqué aux fils twitters. * Alexis Perrier [@alexip](https://twitter.com/alexip) * Data & Software, Berklee College of Music, Boston [@BerkleeOnline](https://twitter.com/berkleeonline) * Data Science contributor [@ODSC](https://twitter.com/odsc) **Part I: Topic Modeling** * Nature et application * Algos et Librairies **Part II: Projet: followers sur twitter** * Methodes * Problemes * Viz...

  • Hands-on analysis of the Amazon Machine Learning service

    Is the new Amazon Machine Learning too simple to reap the benefits of predictive analytics? Machine Learning as a Service (MLaaS) promises to put data science within the reach of companies. In that context, Amazon Machine Learning is a predictive analytics service with binary/multiclass classification and linear regression features. The...

  • Jupyter, Zeppelin, Beaker: The Rise of the Notebooks

    One of the particularities of scientific computing is the need for experiments, explorations, and collaborations. This need is addressed by notebooks. Notebooks are collaborative web-based environments for data exploration and visualization — the perfect toolbox for data science. They help create reproducible, shareable, collaborative computational narratives. There are alternatives to...

  • Dynamics of Debates with Time Maps

    2015 presidential debates The race for the presidential nomination for both parties is going full speed with a plethora of debates. At time of writing there has been 4 Republican debates and 2 Democratic ones. These debates have high impacts on the presidential nomination race with candidates dropping out and...

  • NLP Analysis of the 2015 presidential candidate debates

    I’ve been fascinated by the recent presidential nomination debates. Their format, the number of participants, the post debates media frenzy all make for a good show. In the following 2 articles I’ve applied several powerful Text Mining and Natural Language Processing techniques to the transcripts. In this first article: Dissecting...

  • Scikit-learn's Out-of-Core Classifiers for Large Data

    Here the scenario: A new kaggle competition, a new dataset. Gigabites? ouch! Cold shivers as you anticipate hours waiting to extract features, train models and middle of the night cold feet as you’re just checking that your script is still running. A data set is said to be large when...

  • Segmentation of Twitter Timelines via Topic Modeling

    Following up on our first post on the subject, Topic Modeling of Twitter Followers, we compare different unsupervised methods to further analyze the timelines of the followers of the @alexip account. We compare the results obtained through Latent Semantic Analysis and Latent Dirichlet Allocation and we segment Twitter timelines based...

  • Topic Modeling of Twitter Followers

    In this post, we explore LDA an unsupervised topic modeling method in the context of twitter timelines. Given a twitter account, is it possible to find out what subjects its followers are tweeting about? Knowing the evolution or the segmentation of an account’s followers can give actionable insights to a...

  • Feature Importance in Random Forests

    Comparing Gini and Accuracy metrics We’re following up on Part I where we explored the Driven Data blood donation data set. The objective of the present article is to explore feature engineering and assess the impact of newly created features on the predictive power of the model in the context...

  • Blood Donation on DrivenData: Exploration

    Blood Donation on DrivenData - Part I Exploration DrivenData.org is a machine learning competition web site similar to the better known Kaggle.com site with a different angle. It focuses on leveraging Data Science for social issues. And it’s based in Boston! For the learning Data Scientist, DrivenData offers a good...

subscribe via RSS