ELMo: Why it’s one of the biggest advancements in NLP


This post was originally published by Jerry Wei at Towards Data Science

#1: ELMo can uniquely account for a word’s context. Previous language models such as GloVe, Bag of Words, and Word2Vec simply produce an embedding based on the literal spelling of a word. They do not factor in how the word is being used. For example, these language models would return the same embedding for “trust” in the following examples:

I can’t trust you.

They have no trust left for their friend.

He has a trust fund.

ELMo, however, returns different embeddings for the same word depending on the words around it — its embeddings are context-sensitive. It would actually return different answers for “trust” in these examples because it would recognize that the word is being used in different contexts. This unique ability essentially means that ELMo’s embeddings have more available information, and thus performance will probably increase. A similar language modeling method that accounts for context is BERT.

#2: ELMo was trained on a lot of data. Whether you’re a veteran machine learning researcher or just a casual observer, you’re probably familiar with the power of big data. The original ELMo model was trained on a corpus of 5.5 billion words, and even the “small” version had a training set of 1 billion words. That’s a lot of data! Being trained on that much data means that ELMo has learned a lot of linguistic knowledge and will perform well on a wide scope of datasets.

#3: ELMo can be used by anyone! One of the most important factors that has driven the growth of machine learning as a field is the culture of making research open-source. By making code and datasets open-source, researchers can allow others in the field to easily apply and build on existing ideas. Conforming to this culture, ELMo is extensively open-source. It has a website which includes not only basic information about it, but also download links for the small, medium, and original versions of the model. People looking to use ELMo should definitely check out this website to get a quick copy of the model. Moreover, the code is published on GitHub and includes a pretty-extensive README that lets users know how to use ELMo. I’d be surprised if it took anyone more than a few hours to get a working ELMo model going.

Spread the word

This post was originally published by Jerry Wei at Towards Data Science

Related posts