A revamped Intro to NLP course on openclassrooms
Early 2023, we revamped the course and updated it with new content. It’s an intro 101 to NLP course with a focus on Enbeddings. We cover what is now old tech (Word2Vec) but which is still very relevant in the age of attention and transformers. If you’re new to NLP this is a good way to start and get to work with SpaCy, NLTK and other standard NLP librairies.
New statistical learning course on openclassroooms
My new course on statistical learning is now available on openclassrooms.
Design Effective Statistical Models to Understand Your Data
Explore linear, logistic and polynomial regression with hands on exercises, real-world use-cases and non trivial datasets.
Bridging the gap between statistical modeling and machine learning.
Building Domain-Specific Language Models
In this Manning liveProject, you will assume the role of a data scientist specializing in natural language processing (NLP) at Stack Exchange, the platform behind stackoverflow. Your objective is to develop language models tailored to the unique technical content found on various Stack Exchange sites.
Your task is to construct a language model capable of completing queries and generating longer texts specifically for Stack Exchange sites. By the end of this project, you will have laid the groundwork for building domain-specific NLP systems, employing both statistical and deep learning techniques to create a robust and efficient language model.
Building Domain-Specific Language Models
Teaching data science at UM6P
In 2018, the renowned École Polytechnique, Mohammed VI Polytechnic University and the Foundation of École Polytechnique launched a Chair in “Data Science and Industrial Processes” in Morocco which I had the chance of inaugurating in the amazing UM6P campus of the Emines school of industrial management located in Benguerir Morocco.
Read more: Teaching Data Science at UM6P
Workshops: Topic modeling in R
I recently conducted 2 workshops on Topic Modeling with the R STM package for 2 very different student populations:
In French, for the Master in ETUDES NUMERIQUES ET INNOVATION at the Université Paris Marne La Vallée. Slides, Github
The Berklee College of Music Digital Studies MBA in Valencia
Data Science at General Assembly
In the summer 2016, I had the pleasure of teaching a full Data Science Curriculum at General Assembly. Over 20 sessions and 60 hours, we covered a lot of ground, a lot of topics and some of the final students projects were amazing. The slides, code and datasets are all available on github at https://github.com/alexisperrier/gads.
The course covers:
- Statistical inference
- Bias and variance, Learning curves and overfitting.
- Visualization with matplotlib and plotly
- Supervised: Regression and classification
- Unsupervised: Clustering
- Time series: ARMA models
- Tree based models: Random forests and Boosted trees
- Support Vector Machines
- NLP: sentiment analysis, Topic Modeling, POS tagging, wordnet, …
- Logistic regression
It was an intense curriculum!