Analyze a custom geography

John Johnson

2018/05/02

You may want to use this data to examine different geographies than the neighborhoods I present on this website. For example, you may want to study housing and property trends by census tracts, police districts, proximity to schools, different neighborhood boundaries, or something else.

To make this easier, I’ve built a crosswalk which contains 1 row for every TAXKEY (unique property identifier) which exists in the 2000-2018 MPROP file. Each TAXKEY is paired with the centroid coordinates of that parcel.1 2 This spatial crosswalk is available to download as a zipped shapefile directory here.

R (and specifically RStudio) is my preferred data analysis tool. Here’s an example of how to divide MPROP data into a custom geography using R.

# If you haven't installed these before, do so now
library(tidyverse) #  install.packages("tidyverse")
library(sf) #         install.packages("sf)

# Get the shapefile containing parcels and coordinates
# Download the zipped directory from here "https://drive.google.com/open?id=18nbv5wqICovjbYqnynXpS-mKOmuQYwYS"
parcels.coordinates <- st_read("YOUR/PATH/TO/SpatialCrosswalk.shp")
head(parcels.coordinates) # take a look
# parcels.coordinates has two columns:
#       - TAXKEY contains the unique identifier for every property
#       - geometry contains the xy coordinates for each property

# The coordinate for each property is almost always the centroid of the parcel
# In about 300 cases it is the geocoded address using the Google Maps API

# Get the shapefile containing your new geography
# In this case, I am using voting wards
wards <- st_read("/Users/johnsonjoh/Dropbox/Projects/SHPfiles/ward2012/ward.shp")

# Make sure the new shapefile uses the same projection as parcels.coordinates
# a projection is also called a "coordinate reference system"
old.projection <- st_crs(parcels.coordinates)
wards <- st_transform(x = wards, crs = old.projection)


# Now intersect the points with the ward polygons
parcels.to.wards <- st_intersection(parcels.coordinates, wards)
# looking at the number of observations, it looks like I lost a few parcels in this merge
# perform an anti_join to get the missing TAXKEYs.
missing.taxkeys <- anti_join((parcels.coordinates %>% st_set_geometry(NULL)),
                             (parcels.to.wards %>% st_set_geometry(NULL)))


# Now get the full mprop file
# Download the file from this link: https://drive.google.com/open?id=1kwFTSKjsHPKCsDT17-wAMvyeNmHJp6iR
# It's big, so these steps will be slow
mprop <- read_csv("/YOUR/PATH/TO/MPROP_2000_to_2018_May.csv",
                  col_types = "ccncccnnccccccnnncnnnccnnncnnnccccncnnccccccccnnncnccnncccccnnnncnccnncccccccccccccccccccccccnc")

# Now join the MPROP file to the parcel/ward crosswalk
?inner_join # read about different kinds of joins
mprop.with.wards <- inner_join(mprop, parcels.to.wards, by=c("TAXKEY"="TAXKEY"))

# Success!

# Now I can look at statistics by ward
# for example
total.units.by.ward <- mprop.with.wards %>%
  group_by(WARD, year) %>%
  summarise(TotalUnits = sum(NR_UNITS, na.rm = T))


# What about those missing parcels?
# This step is only if I care about the handfull of parcels I'm missing
# Probably the point falls in the Lake, or just outside the city boundaries or something
# Lets get their addresses
missing.taxkeys.mprop <- left_join(missing.taxkeys, mprop)
# Right now I have 1 line for every year they appear in the data. I'll summarize
missing.taxkeys.mprop <- missing.taxkeys.mprop %>%
  group_by(TAXKEY) %>%
  filter(row_number()==1)
# Now get the address
missing.taxkeys.mprop$address <- paste(missing.taxkeys.mprop$HOUSE_NR_LO, # House number
                                       missing.taxkeys.mprop$SDIR, #        Street Direction
                                       missing.taxkeys.mprop$STREET, #      Street name
                                       missing.taxkeys.mprop$STTYPE) #      Street type
# Just keep taxkey and address
missing.taxkeys.mprop <- missing.taxkeys.mprop[,c("TAXKEY","address")]
# Now I could check those addresses on google maps and manually assign them wards

  1. In about 300 cases (out of around 170,000), the shapefile parcel was not available, so I was unable to calculate a centroid. In these instances I used the Google Maps API to retrieve coordinates.

  2. In a handful of cases the centroid of a parcel will commonly fall outside of other shapefile polygons–for instance, in irregularly shaped parcels following the shoreline. I don’t have an automated way to deal with these issues, but I suggest a procedure for manually coding them as they occur in the R script above.