Understanding Airbnb prices and how they make you happy

Jan-Philipp Thewes
3 min readOct 7, 2020
https://medium.com/@ashvinsrinivasan/ and Airbnb’s Design Department

Even though traveling is currently limited, it still is a vital part of many of our lives. And preparing the next trip based on Data Science seems like a good use of time until we can travel the world again.

Many of us use Airbnb as an easy way to get a stay while traveling. But do we really understand why we pay how much we pay for the apartment or room? Questions around this motivated me to take a look at some Airbnb Data and apply machine learning to it. As an example the city of Seattle is taken here, because the dataset was at hand and large enough for representative means.

My key questions while analyzing are:

1. What are the key factors determining the price of the apartments?

2. Will I be happier if I spend those extra dollars for the upgrade?

The Analysis Approach

Before taking a look at the data analysis results I want to outline the dataset that was used and how it was processed.
The data was provided on Kaggle by Airbnb itself. From that huge dataset some columns were dropped, because they would not aid in answering the given questions (e.g. listing_url). Afterwards the “cleaned” data was fed into a machine learning model of linear regression for training. Finally the model was evaluated with a test dataset. The test score were not perfect, but satisfying enough to have confidence in the trends provided by the model’s coefficients (r²-score of 0.65).
More details on the analysis approach can be found in the Github repository.

Results

The analysis results are mostly interpreted from the coefficients of the trained machine learning model. These correlate to the importance of the influential factors.

1. What are the key factors determining the price of the apartments?

When taking a look at the coefficients of the model after training it on the response value “price”, one can see that shared and private rooms drop the price the most. Moreover the neighborhood (=the location) seem to have the most positive influence.
After filtering out the neighborhoods one can also see that the number of bathrooms and bedrooms are more influential than the number of accommodates or the review score.

I could elaborate much more on those results but taking a look at the coefficients image yourself is worth more!

top 20 model coefficients for Airbnb price with and without consideration of neighborhoods

2. Will I be happier if I spend those extra dollars for the upgrade?

For this question another model was trained on the response value “review_score_rating”. From the results some surprising conclusions can be made. It has to be noted though, that the r²-score was not as promising. Therefore a correlation matrix of the data (only considering numerical values) and the coefficients are both taken into account.

correlation matrix of the numerical data

From both the matrix and the coefficients it can be interpreted, that the host plays a big role in the satisfaction of the travelers. Interestingly the review score for location and cleanliness do indeed rise with increasing prices as one would expect. The host has an influence on the reviews that is not dependent on the price.
Similar as for the price, the neighborhood is a significant factor for the happiness of the visitors.

In conclusion I would recommend and will also choose myself a fair balance between a great location/neighborhood and an inviting host for the next Airbnb booking.

top 20 model coefficients for Airbnb review scores

Try it out yourself!

Feel free to clone or fork my repository and start analyzing the data yourself! Draw your own conclusions and share your results and thoughts!

--

--