EdinBuzzed-A Bar Guide
Objective
Visiting new cities is fun and exciting but often we are perplexed by paradox of choice when choosing a local establishment to have a drink or a meal. With so many reviews and lists for “best” of everything, sometimes we need a simple user interface to get us the recommendations we want.
In order to create a solution, I chose to make a recommendation system that would output top 5 bars in Edinburgh, UK, given a user input of what they would like to eat, drink, and/or vibes of the bar (e.g. “fish and chips and live music” or “whisky and burgers”).
Methodology
Yelp provides a large dataset of over 5 million reviews over many different types of businesses that can be analyzed for new insights. This dataset can be found here
With the goal of creating a bar recommender, I chose to focus on one city, Edinburgh, UK and only keep reviews from bars that received ratings of 3 and above. This narrowed down the dataset to just a little over 11,000 reviews.
Analyses
I used natural language processing (NLP) to analyze my data, which was a corpus of bar reviews from which I could recommend to my user. The initial phase involved text-preprocessing steps to clean the corpus of review data, which had a mix of English and foreign words, as well as a number of food-related stop words that were removed. I used topic modeling in order to extract the areas that were most commonly mentioned in the text, representing general topics people care most about when choosing a bar.
From the multiple vectorizer and topic modelling techniques, I chose term frequency-inverse document frequency (TF-IDF) and non-negative matrix factorization (NMF) as my final methods.
UMAP Projection of Topics Modeled:
Extracted topics:
Results
Using the document vectors created by my topic model, I created a simple recommendation system that would calculate the cosine similarity between a vector created from any user input and the documents in my corpus and output the 5 highest values and the associated bars.
Code for the chosen model and analysis can be found here
Review presentation slides here