StarSpace: mining and embedding user interests

StarSpace: mining and embedding user interests

towards-data-science

This post was originally published by LP Cheung at Towards Data Science

Background

StarSpace is an algorithm proposed by Facebook [1].

StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems

So, the keywords are “general-purpose” and “entity embedding”. In other words, you can embed whatever you want by StarSpace, including the user. Actually, HK01 data team also exploits StarSpace to do article embeddings, in parallel with other embedding algorithms.

How can it be so generic? The secret is: StarSpace learns to represent entities and their components into the same vector space. The only requirement to use StarSpace is you should provide the linkages between entities and components. Taking the linkages as training data, StarSpace can learn the similarity between components and, furthermore, the similarity between entities.

Intuition

The next question is what components can represent users. In HK01, users are represented by the articles read.

Three levels of entities. All mapped into the same space

The diagram above illustrates the linkages between different levels of entities. Users are represented by the articles read while articles are represented by the tokens. In particular, a user is represented by the bag of tokens in his/her reading history.

The intuition behind is that the information in each layer can be propagated into upstream and downstream layers. With the linkages, StarSpace can find the similarity between articles as they share the tokens. Likewise, the similarity between users can, therefore, be learned.

Result

User embedding learned by StarSpace, which has only 50 dimensions, can provide comparable performance in downstream tasks to the original user embedding, which has more than 20,000 dimensions.

Spread the word

This post was originally published by LP Cheung at Towards Data Science

Related posts