The Language of SF Real Estate
December 18, 2016
This weekend I embarked on a small data visualization project. I chose to look at San Francisco real estate listings to try to find interesting trends and relationships. Much has been covered on affordability, rising prices, and housing regulations, but I wanted to look through a different lens. I’ve always been curious about the use of language in real estate listings–it varies widely in type, style, grammatical correctness, and overall effect. My hypothesis was that there would be a detectable relationship between the language of a real estate listing and its asking price and geographical location. More specifically, I hypothesized an inverse relationship: listings with lower asking prices would contain more verbose and emotional language. I believed there would be a sort of “compensatory” effect at play, where less desirable listings are bulked up with more forceful language; their location and property alone wouldn’t speak (as loudly) for themselves.
Here is my process and results:
Start with a Question
What is the relationship between the use of language in real estate listings and the property's price and location?
Form a Hypothesis
Choose a Data Source
I started with a physical copy of the San Francisco Chronicle, Sunday edition. There were approximately 25 listings in San Francisco county to start with. I wanted to beef up the dataset with more numbers, so I turned to the Chronicle’s online real estate app to gather more data.
Gather Data
Choose Analysis and Viz Tools
- Alchemy API
- CartoDB
- Raw
- Plot.ly
Analyze Data
Visualize Data
Conclusion and Beyond
Some visualizations of the data: