Zillow: Utilizing both explicit & implicit signals to power home recommendations

This post was originally published by  Mengnan Zhao at Zillow AI/ ML Research

Understanding user preferences is an important task for personalizing the home shopping experience for our customers.

At Zillow, we create a user profile for each consumer to capture their personalized interests in homes. Within the user profile, user preferences are mostly inferred from latent information implicitly expressed through users’ activities, such as clicks and saves. In addition to the implicit signals, some explicit user activities can help us understand user preference better, for example, consumers can set filters on Zillow as shown in Figure 1 based on their needs or interests during search sessions. After setting search filters, consumers can save the filter as a “Saved Search”. The Saved Search then enables them to check updates more conveniently on the site or receive instant or daily updates via emails, as shown in Figure 2. In this post we will examine strategies to incorporate the rich user explicit feedback (e.g. search filters) into our recommendation models to improve accuracy and relevance of the recommendations.

Figure 1 Filter setting on Zillow

Figure 2 Saved-Search on Zillow

Search Filter Data in Saved Search

A majority of Zillow’s sign-in users have at least one saved search. For users who have multiple saved-searches, we use their latest one to get the search filter data. The most frequently used features in the search filter are price (min and max) and number of bedrooms, followed by listing type and home type. Figure 3 displays the distribution of the number of bedrooms in the search filter. Figure 4 shows the distribution of prices in the filters and Figure 5 shows the density of price ranges. It shows that the majority of search filters have a narrow price range that peaks around $200K,  indicating most customers have a target price when searching for homes. This raises a question for the recommendation model – do we recommend homes within the target price range only, or look for  trade-off between exploration and exploitation?

Figure 3 Histogram of number of bedrooms in search filter

Figure 4 Histogram of price min and max in search filter

Figure 5 Density of price range in search filter

Implicit User Preference Representation

To learn more about the needs and interests of Zillow users and to better serve them, we create user profiles to capture  their preferences and power recommendations. Each user has their own user profile which is continually updated based on their activity. The user profile is represented by a set of histograms for various important home features; for each feature, the histogram contains the weighted preferences of the user and the weights are  updated based on their activity. For example, one customer is looking for a house and has viewed about 200 homes on Zillow, based on her browsing and clicking activities, we are able to create her user profile, as shown in Figure 6. The histograms show her preferences on the home price, number of bedrooms, and living square footage.

Figure 6 An example of implicit user profile

Incorporating Explicit Search Filters into User Preferences

The user profile could help us to learn about user preferences, meanwhile we are thinking about the improvements to better understand users. The current user profile is mostly built with implicit signals – users’ interests are inferred based on their past activity. However, there are some drawbacks when representing user preference purely based on implicit feedback. For example, some users like to window shop and explore expensive homes, even though those homes are excluded from their explicit preferences (e.g. search filter). As users express their preferences in a more direct and unambiguous way on Zillow, by setting search filters and saving the search, we present a method to incorporate these explicit signals into the implicit user profile.

There are several possible ways to integrate the search filter information into the current user representation. One way is using them to adjust or trim the feature histograms in the user profile. For example, the price range in the implicit user profile is [$25K, $300K] and the range in the search filter is [$100K, $175K], then we trim the price preference to [$100K, $175K], as shown in Figure 7. Two concerns rise with this method: (1) we have seen a majority of users set a narrow range in search filters (as shown in Figure 5), by trimming histograms with the filter preference, the recommendations could become very similar, less diverse, and less engaging; (2) consumers might still have interests in exploring more homes, outside their filter range. For example, customers may lower or increase their home purchase budget as they explore the options in the market.

Figure 7 Trimming price preference in implicit user profile with search filter

We decided to augment the user profile with extra features expressed by the search filter. Thus it gives the recommendation model the capability to take advantage of the explicit signals automatically and to satisfy the exploration needs of users.

Representing explicit signals as extra features

According to data exploration results, we focused on four features in the search filter: price, number of bedrooms, zip code, and latitude-longitude. The basic logic to represent them is matching values in the search filter with those in home attributes. Based on the matching results, we developed both binary and distance/similarity features: for binary features, if the home attribute falls into the filter range, the values equal 1, otherwise 0; for distance features (price, number of bedrooms, lat-lon), the farther the attributes from filter ranges, the larger the numerical values are; for similarity features (zip code), the more similar, the larger the values are, as illustrated in Figure 8.

Figure 8 Binary and distance/similarity search-filter features

For price, the binary feature represents whether the home price is within the range in the search filter (yes=1, no=0) and the distance feature is calculated by using the home price minus price min or max in the filter. For example, if the price range in the search filter is [$100K, $200K], for a home: price=50K, distance = -50K; price = 150K, distance = 0; price = 300K, distance = 100K.

For zip code, when calculating the binary feature, it equals 1 if all five digits are matched between the search filter and the home, otherwise 0. When calculating the similarity feature, we compared the digits starting from the first position and counted the number of matching digits, as shown in Figure 9.

Figure 9 An example of zip code similarity feature

For latitude-longitude (lat-lon), the binary feature is whether the home is located in the viewport of the search filter (yes=1, no=0) and the distance feature is the real distance between the home and the viewport midpoint, as shown in Figure 10.

Figure 10 An example of lat-lon distance feature

We eventually created four binary and four distance/similarity features to represent the search filter preferences, focusing on five aspects of homes: price, number of bedrooms, zip code, and location (lat-lon). 

Experiment & Evaluation

In order to evaluate whether incorporating the explicit signals improves the recommendation performance, we can train our recommendation model with or without using the extra features and compare the results. Specifically, we performed the experiments on our content-based home recommendation model. The model predicts a user’s click probability for each home based on various features (user profile, home profile, and user-home matching features). To utilize the explicit signals, we needed to represent them as extra input features and then train the model.

We performed experiments on our content-based home recommendation model. It took the home and user features as input and predicted a user’s click probability for a home. To evaluate whether incorporating the extra explicit features could improve the recommendation accuracy, we trained and tested models on different feature sets, such as using the extra features or not, using binary features, and using distance features. We then used the Normalized Discounted Cumulative Gain (NDCG) for the evaluation, a metric for measuring ranking quality. Specifically, after ranking our recommendations according to the predicted click probability, we use NDCG to evaluate the ranking results.

Figure 11 Relative NDCG Lifts

Figure 11 depicts the relative lift in NDCG when using extra features compared to not using them. The three curves represent three different extra feature sets:

  • “Binary” – adding four binary features besides the original implicit features
  • “Distance/Similarity” – adding four distance/similarity based features
  • “Binary & Distance/Similarity” – adding all eight extra features

Generally, the experiment results show that integrating explicit signals into the recommendation model could improve the accuracy compared to the model that uses only implicit signals. The lifts on top position 1 are the highest, indicating that we perform better for recommending the most relevant home after absorbing explicit signals. When using binary and distance/similarity features separately, the model using distance/similarity features outperforms the one using binary features in most positions, except top position 3 & 5, probably because distance/similarity features embed more information than the binary. While on top position 3, the binary features outperforms the distance/similarity, one possible explanation could be the binary value is sufficient to express the users’ preferences in some scenarios and more information (distance, similarity) might distract the model. As distance/similarity features are not always better, we did an experiment using all the eight features – letting the model coordinate the features, and it achieved the best result. Regarding the feature importance, lat-lon distance and price binary features seem to have more impacts than the others.

Conclusion

We explored the possibility to utilize both explicit and implicit signals for the home recommendation model in this blog post. By analyzing the explicit signals, especially the search filter, and leveraging them into the user profile, we saw significant improvements in the accuracy of our content-based home recommendation model, showing the power of understanding and representing user preferences in the recommendation. The search filter is an important expression of users’ interests and preferences. As a result, we are constantly working to obtain user feedback in order to present better recommendations to Zillow customers, and the proposed method here can be extended to those cases.

Spread the word

Related posts