Steph Brooks

The Language of SF Real Estate

The Language of SF Real Estate

Tags: san francisco data real estate language data viz draft

This weekend I embarked on a small data visualization project. I chose to look at San Francisco real estate listings to try to find interesting trends and relationships. Much has been covered on affordability, rising prices, and housing regulations, but I wanted to look through a different lens. I’ve always been curious about the use of language in real estate listings–it varies widely in type, style, grammatical correctness, and overall effect. My hypothesis was that there would be a detectable relationship between the language of a real estate listing and its asking price and geographical location. More specifically, I hypothesized an inverse relationship: listings with lower asking prices would contain more verbose and emotional language. I believed there would be a sort of “compensatory” effect at play, where less desirable listings are bulked up with more forceful language; their location and property alone wouldn’t speak (as loudly) for themselves.

Here is my process and results:

Start with a Question

What is the relationship between the use of language in real estate listings and the property's price and location?

Form a Hypothesis

Choose a Data Source

I started with a physical copy of the San Francisco Chronicle, Sunday edition. There were approximately 25 listings in San Francisco county to start with. I wanted to beef up the dataset with more numbers, so I turned to the Chronicle’s online real estate app to gather more data.

Gather Data

Choose Analysis and Viz Tools

  • Alchemy API
  • CartoDB
  • Raw
  • Plot.ly

Analyze Data

Visualize Data

Conclusion and Beyond

Some visualizations of the data:

1234567891011Bernal HeightsPacific HeightsCivic CenterDowntownFinancial DistrictHayes ValleyInner MissionMarinaNob HillNoe ValleyBayviewPacific Heights (lower)Potrero HillPresidio HeightsRichmondRichmond (Outer)SomaStonestownSunnysideTwin Peaks