A revamped Intro to NLP course on openclassrooms

Early 2023, we revamped the course and updated it with new content. It’s an intro 101 to NLP course with a focus on Enbeddings. We cover what is now old tech (Word2Vec) but which is still very relevant in the age of attention and transformers. If you’re new to NLP this is a good way to start and get to work with SpaCy, NLTK and other standard NLP librairies.

New statistical learning course on openclassroooms

My new course on statistical learning is now available on openclassrooms.

Design Effective Statistical Models to Understand Your Data

Explore linear, logistic and polynomial regression with hands on exercises, real-world use-cases and non trivial datasets.

Bridging the gap between statistical modeling and machine learning.

Building Domain-Specific Language Models

In this Manning liveProject, you will assume the role of a data scientist specializing in natural language processing (NLP) at Stack Exchange, the platform behind stackoverflow. Your objective is to develop language models tailored to the unique technical content found on various Stack Exchange sites.

Your task is to construct a language model capable of completing queries and generating longer texts specifically for Stack Exchange sites. By the end of this project, you will have laid the groundwork for building domain-specific NLP systems, employing both statistical and deep learning techniques to create a robust and efficient language model.

Building Domain-Specific Language Models

Teaching data science at UM6P

In 2018, the renowned École Polytechnique, Mohammed VI Polytechnic University and the Foundation of École Polytechnique launched a Chair in “Data Science and Industrial Processes” in Morocco which I had the chance of inaugurating in the amazing UM6P campus of the Emines school of industrial management located in Benguerir Morocco.

Read more: Teaching Data Science at UM6P

Workshops: Topic modeling in R

I recently conducted 2 workshops on Topic Modeling with the R STM package for 2 very different student populations:

Universite Paris Est Marne la Vallee In French, for the Master in ETUDES NUMERIQUES ET INNOVATION at the Université Paris Marne La Vallée. Slides, Github

Berklee Valencia The Berklee College of Music Digital Studies MBA in Valencia

Data Science at General Assembly

GA In the summer 2016, I had the pleasure of teaching a full Data Science Curriculum at General Assembly. Over 20 sessions and 60 hours, we covered a lot of ground, a lot of topics and some of the final students projects were amazing. The slides, code and datasets are all available on github at https://github.com/alexisperrier/gads.

The course covers:

  • Statistical inference
  • Bias and variance, Learning curves and overfitting.
  • Visualization with matplotlib and plotly
  • Supervised: Regression and classification
  • Unsupervised: Clustering
  • Time series: ARMA models
  • Tree based models: Random forests and Boosted trees
  • Support Vector Machines
  • NLP: sentiment analysis, Topic Modeling, POS tagging, wordnet, …
  • Logistic regression

It was an intense curriculum!