Image Source: WMATA

Image Source: WMATA

0. Introduction

This analysis aims to understand the benefits and impacts of Transit-Oriented Development (TOD) in the city of Washington D.C., with consideration of spatial biases. Using open data on Washington Metropolitan Area Transit Authority’s (WMATA) transit stops and lines, the analysis centers around benefits and potentials of TOD in the city. Some key things to keep in mind about this analysis:

  1. DC’s metro lines and stops have changed between the dates of the analysis. However, due to the lack of data on the 2009 stops, this study only uses the most recently published publicly available data (May 2023) for these stops.
  2. The study is an exploratory analysis and is not exhaustive.

1. Data Wrangling

ACS Census Tract Data

For this analysis, I have pulled 2019 and 2009 ACS data for Washington D.C. census tracts using the tidycensus package to conduct a 10-year analysis of social and demographic factors. The following selected variables were pulled from the Census Bureau:

  1. Total Population
  2. Median Household Income
  3. Median Rent
  4. Percentage White
  5. Percentage with Bachelors
  6. Percentage in Poverty
#Pull 2019 Data
tracts19 <- 
  get_acs(geography = "tract", 
          variables = c("B25026_001E","B02001_002E",
                        "B15001_050E","B15001_009E",
                        "B19013_001E","B25058_001E",
                        "B06012_002E"), 
          year=2019, state=11, county=001, 
          geometry=TRUE, output="wide") %>%
  st_transform('ESRI:102728') %>%
  rename(TotalPop = B25026_001E, 
         Whites = B02001_002E,
         FemaleBachelors = B15001_050E, 
         MaleBachelors = B15001_009E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E) %>%
  dplyr::select(-NAME, -starts_with("B")) %>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop,0),
         pctBachelors = ifelse(TotalPop > 0, ((FemaleBachelors + MaleBachelors) / TotalPop),0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         year = "2019")%>%
  dplyr::select(-Whites, -FemaleBachelors, -MaleBachelors, -TotalPoverty) 
 
tracts19$pctWhite <- ifelse(tracts19$pctWhite> 1, 1, tracts19$pctWhite)
tracts19$pctPoverty <- ifelse(tracts19$pctPoverty > 1, 1, tracts19$pctPoverty)

#Pull 2009 Data
tracts09 <- 
  get_acs(geography = "tract", 
          variables = c("B25026_001E","B02001_002E",
                        "B15001_050E","B15001_009E",
                        "B19013_001E","B25058_001E",
                        "B06012_002E"), 
          year=2009, state=11, county=001, 
          geometry=TRUE, output="wide") %>%
  st_transform('ESRI:102728') %>%
  rename(TotalPop = B25026_001E, 
         Whites = B02001_002E,
         FemaleBachelors = B15001_050E, 
         MaleBachelors = B15001_009E,
         MedHHInc = B19013_001E, 
         MedRent = B25058_001E,
         TotalPoverty = B06012_002E) %>%
  dplyr::select(-NAME, -starts_with("B")) %>%
  mutate(pctWhite = ifelse(TotalPop > 0, Whites / TotalPop,0),
         pctBachelors = ifelse(TotalPop > 0, ((FemaleBachelors + MaleBachelors) / TotalPop),0),
         pctPoverty = ifelse(TotalPop > 0, TotalPoverty / TotalPop, 0),
         year = "2009") %>%
  dplyr::select(-Whites, -FemaleBachelors, -MaleBachelors, -TotalPoverty) 

tracts09$pctWhite <- ifelse(tracts09$pctWhite> 1, 1, tracts09$pctWhite)
tracts09$pctPoverty <- ifelse(tracts09$pctPoverty > 1, 1, tracts09$pctPoverty)

### Combine 2009 and 2019 data
allTracts <- rbind(tracts09,tracts19)

Wrangling Transit Open Data

Additionally, to evaluate transit information, I pulled open data of WMATA’s transit stops and transit lines, filtering them for the Washington D.C. boundaries.

Figure 1 visualizes the most updated version of the WMATA stops and lines in DC.

ggplot() + 
  geom_sf(data=st_union(tracts09), color= "#5D4954", fill= "#6699CA") +
  geom_sf(data = dc_transitlines, aes(color = LINE)) +
  scale_color_manual(values = dc_transitlines$LINE,
                     labels = dc_transitlines$NAME) +
  geom_sf(data=dc_transitstops, 
          show.legend = "point", size= 2) +
  labs(title="WMATA Lines and Stops", 
       subtitle="Washington D.C.", 
       caption="Figure 1") +
  guides(color = guide_legend(title = "WMATA Lines",
                              keywidth = 1,  # Adjust key width as needed
                              keyheight = 1,  ))+# Adjust key height as needed
              
  mapTheme()

Identifying TOD areas for Analysis

To classify TOD areas, I created half-mile buffers from each WMATA subway stop and related them to census tracts.

WMATA Stop Half-Mile Buffers

stopBuffer <- st_buffer(dc_transitstops, 2640)

stopUnion <- st_union(st_buffer(dc_transitstops, 2640))

dc_transitBuffers <- 
  rbind(
     stopBuffer %>%
      mutate(Legend = "Buffer") %>%
      dplyr::select(Legend),
     stopUnion %>%
      st_sf() %>%
      mutate(Legend = "Unioned Buffer"))

ggplot() +
  geom_sf(data=st_union(tracts09)) +
  geom_sf(data=dc_transitBuffers, aes(fill = "#6699CA")) +
  geom_sf(data=dc_transitstops, show.legend = "point") +
  facet_wrap(~Legend) + 
  labs(title = "Half-Mile Buffers around WMATA Transit Stops", 
       caption = "Figure 2") +
  mapTheme() +
  theme(legend.position = "none")

buffer <- filter(dc_transitBuffers, Legend=="Unioned Buffer")

Comparative Spatial Selection

The following figure visualizes three potential spatial selections for TOD areas. The first simply clips DC to the half-mile buffer of each WMATA stop, meaning the classified TOD area can be understood as anywhere within a half-mile from a transit stop. The second takes a broader view of TOD stops and selects all census tracts whose centroid falls within the half-mile buffer. The third selects all census tracts which fall within the half-mile buffer.

clip <- 
  st_intersection(buffer, tracts09) %>%
  dplyr::select(TotalPop) %>%
  mutate(Selection_Type = "Clip")

selection1 <-  #spatial intersection with polygons
  tracts19[buffer,] %>%
  dplyr::select(TotalPop) %>%
  mutate(Selection_Type = "Spatial Selection")

selectCentroids <-
  st_centroid(tracts19)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts19, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(TotalPop) %>%
  mutate(Selection_Type = "Select by Centroids")

intersections <- rbind(clip, selection1, selectCentroids)

ggplot() +
  geom_sf(data=intersections, aes(fill = TotalPop)) +
  geom_sf(data=dc_transitstops, show.legend = "point", pch= 18) +
  scale_fill_viridis_c() +
  facet_wrap(~Selection_Type) + 
    labs(title = "Comparing Spatial Selections of TOD Areas", 
       caption = "Figure 3") +
  mapTheme()

Out of these three options, this analysis will move forward with the “select by centroids” method as a means of identifying TOD areas as it is mostly effective in identifying tracts which each stop is likely to reach. While this method is based on the assumption that residents of each census tract are spatially distributed in a way that allows for at least half of them to be in the half-mile reach of the WMATA stops, this method still has less spatial assumptions than the two other methods, making it optimal for further analysis.

2. Visualizing TOD Impacts on ACS Variables

This following section will look at four different social/demographic variables over space and time. Based on the previous section, that are considered TOD are outlines in white.

Median Rent

Based on the following figure, median rent has increased over the decade across the city, after inflation adjustment. However, TOD tracts here have generally shown a much greater increase in rent, almost doubling in amount. This may suggest that these are popular choices of neighborhoods for households and that they might have moved in closer to transit stops thus pushing the median rent up. For non-TOD tracts, rents seem to have gone up at a smaller scale.

allTracts_rent <- allTracts %>%
  mutate(MedRent.inf = ifelse(year == "2009", MedRent * 1.19,MedRent)) # inflation
    
ggplot() +
  geom_sf(data=allTracts_rent, aes(fill = MedRent
          )) + geom_sf(data= selectCentroids, color= "white", alpha= 0.2,
        ) +
  scale_fill_viridis_c(option = "magma", labels = scales::dollar) +
  facet_wrap(~year) + 
  mapTheme()+
   labs(fill = "Median Rent", title = "2009 vs 2019 Median Rent in TOD and non-TOD Areas", caption = "Figure 4")

Percent Poverty

The percentage of population in poverty seems to be going down but a small rate. Especially for the tracts near transit stops (in white), poverty percentages appear to be decreasing, marked by a shift of colors in the graph from greens/ yellow to blues.

ggplot() +
  geom_sf(data = allTracts, aes(fill = pctPoverty)) +
  geom_sf(data = selectCentroids, color = "white", alpha = 0.2) +
  scale_fill_viridis_c(option = "turbo", labels = scales::percent) +  
  facet_wrap(~year) + 
  mapTheme() +
  labs(fill = "Percent Poverty", title = "2009 vs 2019 Percent Poverty in TOD and non-TOD Areas", caption = "Figure 5")

Median Household Income

The median household income, similar to rent seems to be increasing for the city as a whole. TOD tracts have a significantly increasing median household income while tracts away from TOD seem to be consistent: the tracts at the top of the map continue to remain affluent ($175,000 to $250,000 on an average) while those at the bottom continue to be below $50,000 on an average. This might point to a benefit of being closer to transit.

allTracts_income <- allTracts %>%
  mutate(MedHHInc.inf = ifelse(year == "2009", MedHHInc * 1.19,MedHHInc)) # inflation

ggplot() +
  geom_sf(data=allTracts_income , aes(fill = MedHHInc
          )) + geom_sf(data= selectCentroids, color= "white", alpha= 0.1,
        ) +
  scale_fill_viridis_c(option = "mako", labels = scales::dollar) +
  facet_wrap(~year) + 
  mapTheme()+
   labs(fill = "Median Household income", title = "2009 vs 2019 Household Income in TOD and non-TOD Areas", caption = "Figure 6")

Percent White

The percent white population appears to be expanding across the city from 2009 to 2019. Some TOD tracts also have a higher percentage of white population moving in between 2009-2019. However, this may be a spatial bias: the changes can potentially be attributed to white populations moving towards central areas rather than solely TOD areas, as white populations generally increased in tracts that were bordering already high white percentage census tracts and other central areas in DC.

ggplot() +
  geom_sf(data = allTracts, aes(fill = pctWhite)) +
  geom_sf(data = selectCentroids, color = "white", alpha = 0.2)+
  scale_fill_viridis_c(option = "cividis", labels = scales::percent) +  
  facet_wrap(~year) + 
  mapTheme() +
  labs(fill = "Percent White", title = "2009 vs 2019 Percent in TOD and non-TOD Areas", caption = "Figure 7")

3. TOD Indicator Bar Plots

The changes in these four variables over time and between TOD and non-TOD areas can also be examined through bar plots:

allTracts.group <- 
  rbind(
    st_centroid(allTracts)[buffer,] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),
    st_centroid(allTracts)[buffer, op = st_disjoint] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>%
  mutate(MedRent.inf = ifelse(year == "2009", MedRent * 1.19, MedRent)) %>%
 mutate(MedHHInc.inf = ifelse(year == "2009", MedHHInc * 1.19, MedHHInc) )  # inflation rate

allTracts.Summary <-    #wide
  st_drop_geometry(allTracts.group) %>%
  group_by(year, TOD) %>%
  summarize(Rent = mean(MedRent, na.rm = T),
            Population = mean(TotalPop, na.rm = T),
            Percent_White = mean(pctWhite, na.rm = T),
            Percent_Bach = mean(pctBachelors, na.rm = T),
            Percent_Poverty = mean(pctPoverty, na.rm = T), 
            Median_HH_income= 
              mean(MedHHInc, na.rm = T))

allTracts.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  ggplot(aes(year, Value, fill = TOD)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Variable, scales = "free", ncol = 3, nrow = 2) +
  scale_fill_manual(values = c("#2B4570", "#A8D0DB")) +
  labs(title = "2009 vs 2019 Indicators in DC TOD and Non-TOD Areas",
       caption = "Figure 8") +
  guides(color = guide_legend(title = "")) +
  plotTheme() + theme(legend.position = "bottom")

Median Household Income

Median household income was similar for TOD and non-TOD areas in 2009 at about $60k. While both increased in 2019, TOD areas experienced a higher increase, 20% more than non-TOD areas at around $100k, suggesting either that TOD areas attracted households with higher paying jobs or that they came with other amenities that higher income households valued.

Percent White

Washington D.C. grew its white population percentage across the city, with a slightly greater growth for TOD areas. On average, the percent of white population in TOD neighborhoods grew to surpass 50%, while within non-TOD tracts it is about 30% of the total population. The percent white population in TOD areas has increased by more than 25%.

Percent Bachelors

For TOD areas, the percent with bachelors decreased slightly while increasing slightly in non-TOD areas. However, both percentage values decreased at a very nominal absolute value.

Population

In 2009, TOD areas previously had a smaller population compared to non-TOD areas, while in 2019 TOD areas higher population than non-TOD areas by about 200 people on average. While this could signal the migration of residents to areas with better transit it can also potentially be attributed to the completion of residential construction near TOD, requiring more data for confirmation.

Percent Poverty

For non-TOD areas, percent of population in poverty remained similar, while TOD areas experienced a slight decrease.

Rent

Rent increased for both TOD and non-TOD areas, however, TOD areas experienced a much higher growth at around $300 more than non-TOD areas in 2019 compared to just about $60 more than non-TOD areas in 2009, suggesting transit-rich residential neighbourhoods are valued higher now than before due to TOD.

4. TOD Indicator Table

These differences can also be visualized through the table below:

allTracts.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, Value, -year.TOD) %>%
  mutate(Value = round(Value, 2)) %>%
  spread(year.TOD, Value) %>%
  kable() %>%
  kable_styling() %>%
  footnote(general_title = "2009 vs 2019 Indicators in DC TOD and Non-TOD Areas",
           general = "Figure 9")
Variable 2009: Non-TOD 2009: TOD 2019: Non-TOD 2019: TOD
Median_HH_income 61998.86 63546.57 86196.55 101401.36
Percent_Bach 0.02 0.04 0.02 0.03
Percent_Poverty 0.18 0.20 0.18 0.17
Percent_White 0.27 0.43 0.34 0.54
Population 2996.05 2869.95 3572.93 3767.50
Rent 910.39 968.29 1358.34 1642.61
2009 vs 2019 Indicators in DC TOD and Non-TOD Areas
Figure 9

5. Population and rent within 0.5 mile of each transit station

Population

The following graduated symbol map visualizes the changes in total population for TOD areas, or the selected census tracts whose centroids fell within a half-mile buffer from a dc transit stop. As shown in the figure, total population generally remained consistent across TOD areas, with the exception of a large growth of population in two areas (purple in the 2009 map and bright yellow in the 2019 map). These two points had a population of zero in 2009 to about 8000 in 2019. Overall, these maps can be interpreted as visualizing a very slow and subtle increase in population across TOD areas, with a few specific areas areas filling up and increasing in population. Given that these could have been due to new housing finishing in construction sometime between 2009 to 2019, more information would be required to give policy recommendations.

selectCentroids19POP <-
  st_centroid(tracts19)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts19, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(TotalPop) %>%
  mutate(Year = 2019)

selectCentroids09POP <-
  st_centroid(tracts09)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts09, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(TotalPop) %>%
  mutate(Year = 2009)

intersectPOP <- rbind(selectCentroids09POP, selectCentroids19POP)

TOD_tracts_pop <- intersectPOP %>% st_intersection(dc_transitstops, intersectPOP)
total_pop <- TOD_tracts_pop %>%
  st_drop_geometry() %>%
  group_by(NAME, Year) %>%
  summarise(n = sum(TotalPop))

labels_n <- TOD_tracts_pop %>%
  left_join(total_pop,
            by = c("NAME" = "NAME")) %>%
  st_as_sf() %>%
  arrange(desc(n)) 

ggplot() +
  geom_sf(data = allTracts, fill = "grey95") +
  geom_sf(
    data = labels_n,
    pch = 21,
    aes(size = n, fill = n, title = ""),
    col = "grey20"
  ) +
  scale_size_continuous(
    range = c(1, 9),
    guide = guide_legend(
      direction = "horizontal",
      nrow = 1,
      label.position = "bottom",
      title = "Population"
    )
  ) +
  scale_fill_viridis_c( alpha = 0.6) +
  facet_wrap(~Year.y) +
  labs(title = "2009 vs 2019 Total Population", caption = "Figure 10") +
  mapTheme() +
  theme(legend.position = "bottom")

Rent

The following graduated symbol map visualizes the changes in median rent for TOD areas, or the selected census tracts whose centroids fell within a half-mile buffer from a dc transit stop. As shown in the figure, median rent generally increased for all areas around these transit stops and new stops in 2019 experience similar median rents compared to their neighbouring stops.

However, to unpack the impact of these TOD stops spatially, further analysis below through means of a function of rent to distance chart will explore the influence of proximity to median rent.

selectCentroids19MED <-
  st_centroid(tracts19)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts19, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(MedRent) %>%
  mutate(Year = 2019)

selectCentroids09MED <-
  st_centroid(tracts09)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts09, GEOID), by = "GEOID") %>%
  st_sf() %>%
  dplyr::select(MedRent) %>%
  mutate(Year = 2009)

intersectMED <- rbind(selectCentroids09MED, selectCentroids19MED)

TOD_tracts_rent <- intersectMED %>% st_intersection(dc_transitstops, intersectMED)
med_rent <- TOD_tracts_rent %>%
  st_drop_geometry() %>%
  group_by(NAME, Year) %>%
  summarise(m = median(MedRent))

labels_m <- TOD_tracts_rent %>%
  left_join(med_rent,
            by = c("NAME" = "NAME")) %>%
  st_as_sf() %>%
  arrange(desc(m)) 

ggplot() +
  geom_sf(data = allTracts, fill = "grey95") +
  geom_sf(
    data = labels_m,
    pch = 21,
    aes(size = m, fill = m),
    col = "grey20"
  ) +
  scale_size_continuous(
    range = c(1, 9),
    guide = guide_legend(
      direction = "horizontal",
      nrow = 1,
      label.position = "bottom",
      title = "Median Rent"
    )
  ) +
    scale_fill_viridis(alpha = 0.8) +
  facet_wrap(~Year.y) +
  labs(title = "2009 vs 2019 Median Rent", caption = "Figure 11") +
  mapTheme() +
  theme(legend.position = "bottom")

6. MRB Analysis of Rent and Distance to Subway Stations

Lastly, median rent will be analyzed as a function of distance to subway stations. In other words, the relationship between rent and distance to a subway station will be visualized and compared between years to show the effect of TOD on rent over time.

dc_transit_MRB <- multipleRingBuffer(st_union(dc_transitstops), 25080, 1320)

allTracts.rings <-
  st_join(st_centroid(dplyr::select(allTracts.group, GEOID, year)),
          dc_transit_MRB) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(allTracts, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) #convert to miles

ggplot() +
  geom_sf(data=dc_transit_MRB) +
  geom_sf(data=dc_transitstops, size=1) +
  geom_sf(data=st_union(tracts19), fill=NA, size=1.2) +
  labs(title="Quarter-mile Buffers from WMATA Stops",
       caption = "Figure 12") +
  mapTheme()

The following figure visualizes the rent function of distance. Unfortunately, the data pulled has some missing values and provides a less clear visualization of rent over distance for further areas, specifically beyond 2.25 miles.

Based on this figure, while median rent (inflation adjusted) has generally increased rather significantly across all areas in DC, values have particularly increased in TOD areas with comfortable walking distances to a subway stop. For median rents of homes within a quarter-mile of subway stops have almost doubled in ten years–rising to about $1875 in 2019, up from about $1000 in 2009. Other than that, the difference between 2019 and 2009 values seem stable, meaning there are similar, near-parallel increases between the two lines. There appears to be a spike at the 2-mile mark, which should be further explored.

These findings suggest that TOD results in renters being more willing to pay rent premiums for walkability to a subway station. They could also be willing to pay more for the amenities or safety in neighbourhoods with transit. However, based on the limited data of both this figure and of the indicators of the areas near transit stops, it would be recommended to further analyze this issue, as well as take a closer look at the different strengths of transit-rich areas, and see how these amenities could be applied to lower median rent census tracts as well as neighbourhoods with more affordable income, to improve to liveability of areas across DC.

allTracts.rings <- allTracts.rings %>% 
  select(GEOID, year) %>% 
  st_centroid() %>% 
  st_join(dc_transit_MRB, join = st_intersects) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(allTracts.group, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) #convert to miles

allTracts.rings.summary <- st_drop_geometry(allTracts.rings) %>%
    group_by(distance, year) %>%
    summarize(Median_Rent = median(MedRent, na.rm=T))

ggplot(allTracts.rings.summary,
       aes(distance, Median_Rent, color = year)) +
  geom_point(size = 3) + 
  geom_line(size = 2) +
  labs(
    title = "Median Rent (2019 Inflation Adjusted $) as a Function of Distance (Miles)",
    caption = "Figure 13"
  )