Early 2023, we revamped the course and updated it with new content. It’s an intro 101 to NLP course with a focus on Enbeddings. We cover what is now old tech (Word2Vec) but which is still very relevant in the age of attention and transformers. If you’re new to NLP this is a good way to start and get to work with SpaCy, NLTK and other standard NLP librairies.
My new course on statistical learning is now available on openclassrooms.
Explore linear, logistic and polynomial regression with hands on exercises, real-world use-cases and non trivial datasets.
Bridging the gap between statistical modeling and machine learning.
In this Manning liveProject, you will assume the role of a data scientist specializing in natural language processing (NLP) at Stack Exchange, the platform behind stackoverflow. Your objective is to develop language models tailored to the unique technical content found on various Stack Exchange sites.
Your task is to construct a language model capable of completing queries and generating longer texts specifically for Stack Exchange sites. By the end of this project, you will have laid the groundwork for building domain-specific NLP systems, employing both statistical and deep learning techniques to create a robust and efficient language model.
In 2018, the renowned École Polytechnique, Mohammed VI Polytechnic University and the Foundation of École Polytechnique launched a Chair in “Data Science and Industrial Processes” in Morocco which I had the chance of inaugurating in the amazing UM6P campus of the Emines school of industrial management located in Benguerir Morocco.
Read more: Teaching Data Science at UM6P
I recently conducted 2 workshops on Topic Modeling with the R STM package for 2 very different student populations:
The Berklee College of Music Digital Studies MBA in Valencia
In the summer 2016, I had the pleasure of teaching a full Data Science Curriculum at General Assembly. Over 20 sessions and 60 hours, we covered a lot of ground, a lot of topics and some of the final students projects were amazing. The slides, code and datasets are all available on github at https://github.com/alexisperrier/gads.
The course covers:
- Statistical inference
- Bias and variance, Learning curves and overfitting.
- Visualization with matplotlib and plotly
- Supervised: Regression and classification
- Unsupervised: Clustering
- Time series: ARMA models
- Tree based models: Random forests and Boosted trees
- Support Vector Machines
- NLP: sentiment analysis, Topic Modeling, POS tagging, wordnet, …
- Logistic regression
It was an intense curriculum!