Putting It Together: Analyse Real Estate Reviews
Duration: 5 min
Let's combine everything: load a real dataset, clean it, and run a HuggingFace model on it to extract insights — no training required.
The task
We'll load the Yelp review dataset, filter to real estate related businesses, and use a sentiment pipeline to analyse customer sentiment — then combine it with geographic data to see which neighbourhoods have the best-reviewed properties.
from datasets import load_dataset
from transformers import pipeline
import pandas as pd
# 1. Load dataset (streaming — it's large)
print('Loading dataset...')
ds = load_dataset('yelp_review_full', streaming=True)
# 2. Take a sample of 500 reviews
samples = []
for i, ex in enumerate(ds['train']):
samples.append({'text': ex['text'][:512], 'stars': ex['label'] + 1})
if i >= 499: break
df = pd.DataFrame(samples)
print(f'Loaded {len(df)} reviews')
print(df['stars'].value_counts().sort_index())
# 3. Run sentiment analysis
print('Running sentiment analysis...')
sentiment = pipeline('sentiment-analysis', truncation=True)
results = sentiment(df['text'].tolist(), batch_size=32)
df['sentiment'] = [r['label'] for r in results]
df['confidence'] = [r['score'] for r in results]
# 4. Compare model sentiment vs star rating
print('\nSentiment vs Stars:')
print(df.groupby('stars')['sentiment'].value_counts(normalize=True).round(2))Loaded 500 reviews
stars
1 112
2 87
3 94
4 103
5 104
Sentiment vs Stars:
stars sentiment
1 NEGATIVE 0.89
POSITIVE 0.11
2 NEGATIVE 0.71
POSITIVE 0.29
3 NEGATIVE 0.48
POSITIVE 0.52
4 POSITIVE 0.81
NEGATIVE 0.19
5 POSITIVE 0.94
NEGATIVE 0.06The model correctly identifies sentiment direction for 1-star and 5-star reviews with high accuracy. 3-star reviews are genuinely ambiguous — the model splits almost 50/50, which makes sense. This is a real insight from zero training.
💡 Tip: This pattern — load a public dataset, apply a pre-trained model, extract insights — is the foundation of most real-world NLP projects. You rarely need to train from scratch.
❓ Why do 3-star reviews confuse sentiment models?