Introduction

Boston is one of the largest cities in the United States and is an integral part of the Northeast Economic Corridor. Over time, it has become a center for culture and innovation. The high density of higher education institutes like Harvard and MIT have brought Boston to the forefront of scientific research. Boston is also major hub for various industries, including finance, technology, healthcare, and biotechnology.

Boston’s rapid transit system, known as the “T,” is a vital component of the city’s public transportation network. Operated by the Massachusetts Bay Transportation Authority (MBTA), it consists of four subway lines (Red, Orange, Blue, and Green), serving both the city and surrounding suburbs. Transit-Oriented Development (TOD) in Boston, like many other cities, is aimed at creating vibrant, mixed-use communities around the city’s extensive public transit network, particularly the rapid transit subway lines.

This policy brief attempts to address the value and citzen settlement patterns of transit-rich neighborhoods, comparing them to the city at large. It makes extensive use of the American Community Survey data, along with publically available information on the MBTA’s rapid transit routes and stops.

This code is built upon the classwork discussed here.

R Setup and Installing Packages

This code chunk handles the essential tasks of loading necessary packages, configuring the Census API key, defining class functions, specifying a color palette, and managing global environment settings.

library(tidyverse)
library(tidycensus)
library(sf)
library(kableExtra)
library(tidyr)
library(ggplot2)
library(viridis)
library(prettydoc)
library(stringr)

options(scipen=999)
options(tigris_class = "sf")

#these are functions developed for this class, so it takes you especially to the book to get those

source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

palette5 <- c("#fde725","#5ec962","#21918c","#3b528b","#440154")

census_api_key('bf2d507651b5a621dbadd44533fb4f3deaab26bf', overwrite = TRUE)

Loading Census Data

The years chosen for analysis are 2016 and 2020. Over this period, Boston’s Rapid Transit went through significant changes, the most notable one being the planning and construction of the extended Green Line. By extending the Green Line into historically under-served areas like Somerville and Medford, it has not only eased commuting for residents but also bolstered ridership across the entire line, reducing congestion and offering more accessible transit options to thousands of daily commuters.

acs_variable_list.2020 <- load_variables(2020, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)

acs_variable_list.2016 <- load_variables(2016, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)

The variables chosen for this analysis include:

  1. Total Population in Occupied Housing Units: This variable is related to TOD because the success of transit-oriented development often depends on population density. Higher population density typically leads to more demand for public transportation services. TOD areas are designed to accommodate a larger population within a compact, walkable neighborhood, which can reduce reliance on personal vehicles and promote the use of public transit.

  2. Median household Income in the past 12 months: Income levels are relevant to TOD because they influence both the demand for public transportation and the types of amenities and services that can be supported in a TOD area. Low- to moderate-income households may rely more on public transit, making it crucial to have affordable transportation options in TOD developments. Additionally, higher-income residents can support a range of businesses and services within the TOD area.

  3. Median Rent over the past 12 months: Rent levels are important in TOD areas because they can affect housing affordability, which is a critical factor for attracting residents who want to live near transit hubs. Affordable housing options in TODs can encourage people to choose public transportation over private cars, as they can live closer to work and amenities without a long commute.

  4. Number of People employed in the Business, Arts or Sciences sector: Boston is known as a hub for these industries, with numerous prestigious universities and cultural institutions located in the area. If residents engaged in this sector are choosing to located within TOD districts, it’s indicative of what future routes and access should be prioritized.

  5. Median gross rent as a percentage of household income: This variable relates to housing affordability, which is a key aspect of TOD. The percentage of household income spent on rent can indicate whether housing is affordable for the local population. Lower rent as a percentage of income suggests greater affordability, which can attract a diverse mix of residents, including those who rely on public transit. It also relates to the previous indicator, as the Business, Arts and Sciences sector typically involves high paying jobs for residents.

tracts16 <- 
  get_acs(geography = "tract", 
          variables = c("B25026_001E",#pop
                        "B19013_001E", #med inc
                        "B25058_001E", #rent
                        "B24011_002E", #ppl in business science arts 
                        "B25071_001E"), #inc as % of rent
          year=2016, 
          state = 25,
          county=025, #using the boston/suffolk county FIPS code
          geometry=TRUE, output="wide")%>%
  st_transform('ESRI:102728') %>%
  rename(TotalPop = B25026_001E, 
         MedInc = B19013_001E,
         MedRent = B25058_001E, 
         BusArtsScience = B24011_002E,
         IncSpentOnRentPCt = B25071_001E)%>%
  dplyr::select(-NAME, -ends_with("M")) %>%
  mutate(year = "2016") 

# 2020

tracts20 <- 
   get_acs(geography = "tract", 
          variables = c("B25026_001E",#pop
                        "B19013_001E", #med inc
                        "B25058_001E", #rent
                        "B24011_002E", #ppl in business science arts 
                        "B25071_001E"), #inc as % of rent
          year=2020, 
          state = 25,
          county=025, #using the boston/suffolk county FIPS code
          geometry=TRUE, output="wide")%>%
  st_transform('ESRI:102728') %>%
  rename(TotalPop = B25026_001E, 
         MedInc = B19013_001E,
         MedRent = B25058_001E, 
         BusArtsScience = B24011_002E,
         IncSpentOnRentPCt = B25071_001E)%>%
  dplyr::select(-NAME, -ends_with("M")) %>%
  mutate(year = "2020") 


#binding the data
allTracts <- rbind(tracts16,tracts20)

Loading MBTA Stops

MBTA Rapid Transit data represents the station stops on the five subway, streetcar/trolley and Silver Line bus “T” lines (Blue, Green, Orange, Red and Silver) in the Massachusetts Bay Transportation Authority’s rapid transit rail network. The layers were developed by the Central Transportation Planning Staff (CTPS), with additional editing by MassGIS based on current aerial imagery and information from this source.

url <-"https://gis.massdot.state.ma.us/arcgis/rest/services/Multimodal/GTFS_Systemwide/MapServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json"

MBTAStops <- st_read(url) %>% 
  st_transform(st_crs(tracts16))  
#clipping to tracts boundaries 

clippedMBTA <- 
  st_intersection(MBTAStops, tracts16)

#plotting

ggplot() + 
  geom_sf(data=st_union(tracts16)) +
  geom_sf(data=clippedMBTA) +
  labs(title="MBTA Stops", 
       subtitle="Boston, MA", 
       caption="Figure 1.1") +
  mapTheme()

Data Analysis

This section is focused on examining various socio-economic indicators in relation to transit-oriented development (TOD) and non-TOD areas within the context of Boston’s public transit system. It makes use of the ggplot and kable packages for effective visualizations.

Creating Buffers and Selecting Census Tracts

Half mile buffers were created as half a mile translates to a roughly 10 minute walkshed, making it reasonable to assume that the area within those buffers are the most significantly affected by the presence of transit.

#creating a buffer

MBTABuffers <- 
  rbind(
    st_buffer(clippedMBTA, 2640) %>% #the number at the end is in feet, because our CRS is in feet
      mutate(Legend = "Buffer") %>%
      dplyr::select(Legend),
    st_union(st_buffer(clippedMBTA, 2640)) %>%
      st_sf() %>%
      mutate(Legend = "Unioned Buffer")) #all st functions will be spatial related stuff

ggplot() +
  geom_sf(data=MBTABuffers) +
  geom_sf(data=clippedMBTA, show.legend = "point") +
  facet_wrap(~Legend) + 
  labs(caption = "Figure 2.1") +
  mapTheme()

#just the buffers

buffer <- filter(MBTABuffers, Legend=="Unioned Buffer")

The study was done by selecting census tracts based on the “Select by Centroids” method as the dataset for the MBTA stops was point based, and studying the range of geographies serviced by transit was best represented by the central location of the census tract features.

#selecting the TOD tracts by centroid

selectCentroids <-
  rbind(
    st_centroid(tracts16)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts16, GEOID), by = "GEOID") %>%
  st_sf() %>%
  mutate(Selection_Type = "Select by Centroids"),
  st_centroid(tracts20)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts16, GEOID), by = "GEOID") %>%
  st_sf() %>%
  mutate(Selection_Type = "Select by Centroids"))

ggplot() +
  geom_sf(data=selectCentroids, aes()) +
  geom_sf(data=clippedMBTA, show.legend = "point") +
  labs(title = "Selected TOD Tracts", caption = "Figure 2.2") +
  mapTheme()

Comparing TOD vs non-TOD Indicators

This section compares various socio-economic indicators between TOD and non-TOD areas. This involves calculating indicators related to population, median income, median rent, percentage of income spent on rent, and the number of residents employed in the business, arts, and science sector for both types of areas. The code segment generates visualizations using ggplot for each indicator, providing a comparative analysis between TOD and non-TOD areas over time. A majority of the following visualization have been defined using 5 classes of variables, defined as a “quintile break”.

#comparing TOD vs non TOD indicators 

allTracts.group <- 
  rbind(
    st_centroid(allTracts)[buffer,] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),
    st_centroid(allTracts)[buffer, op = st_disjoint] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>%
  mutate(MedRent.inf = ifelse(year == "2016", MedRent * 1.10, MedRent)) 

Total Population

The entire city of Boston has witnessed a slow but steady decline in population between the years of 2016 and 2020. The same change is also witnessed within the TOD districts. However, the change within the TOD districts seems to be more pronounced, with numerous census tracts moving to lower quintiles within these years. To understand the cause and effect of this decline, it is important to understand other indicators such as income and rent.

#plotting total pop

ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(TotalPop)), color="transparent")+
   scale_fill_manual(values = palette5,
                    labels = c("0 - 2040", "2040 - 2780", "2780 - 3655", "3655 - 4810", "4810 - 9620"),
                    name = "Popluation\n(Quintile Breaks)") + mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent",
          linewidth = 0.8)+
  labs(
    title = "Total Population, 2016-2020",
    subtitle = "Boston MBTA, TOD vs non TOD tracts",
    caption = "Figure 2.3, Data: US Census Bureau, ACS 5-year estimates",
    fill = "Total Population")

Median Income

The median income within the TOD districts appears to have increased between 2016 - 2020. a large number of census tracts across the city, and specifically within the TOD districts appear to have jumped to a higher quintile break.

Combined with the decrease in population density, we can infer that while certain medium to high income residents may have been able to stay in the city, lower income residents could have been forced to move due to unaffordability. But to draw this conclusion, we must look at the rent and affordability statistics.

#plotting med inc
ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(MedInc)), color="transparent")+
  scale_fill_manual(values = palette5,
                    labels = qBr(allTracts.group, "MedInc"), 
                    name = "Median Income\n(Quintile Breaks)") + mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8)+
  labs(
    title = "Median Income in 2020 adjusted Dollars, 2016-2020",
    subtitle = "Boston MBTA, TOD vs non TOD tracts",
    caption = "Figure 2.4, Data: US Census Bureau, ACS 5-year estimates",
    fill = "Median Income")

Median Rent

Median rent has witnessed a significant increase within the TOD districts. This reflects the growing demand for housing in proximity to transit hubs. This upward trend in rental prices underscores the changing dynamics of urban development in these areas. However, the increase in rent seems to be consistent with changes across the city. Most, if not all census tracts seems to have jumped up a quintile.

However, it can also be a reason for the declining populations in the transit rich areas, with low income residents moving out due to affordability issues. It is difficult to make these conclusions without taking a more comprehensive look at the different factors affecting these changes. A deeper spatial analysis, looking into the factors and their correlations would make a more compelling case.

#plotting med rent
ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(MedRent)), color="transparent")+
  scale_fill_manual(values = palette5,
                    labels = qBr(allTracts.group, "MedRent"),
                    name = "Median Rent\n(Quintile Breaks)") + mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent",linewidth = 0.8)+
    
  labs(
    title = "Median Rent in 2020 adjusted Dollars, 2016-2020",
    subtitle = "Boston MBTA, TOD vs non TOD tracts",
    caption = "Figure 2.5, Data: US Census Bureau, ACS 5-year estimates",
    fill = "Median Rent")

Percentage of Income Spent on Rent

The percentage of income spent on rent is the gross rent of each household, expressed as a fraction of the household income. This is a good indicator of affordability, with rent burdened and severely rent burdened populations being defined as households that spend more than 30% and 50% of their incomes on rent, respectively.

It is hard to make a meaningful conclusion looking at this map as the quintiles make it hard to distinguish changes in each individual census tract.

#plotting pct income spent on rent
ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(IncSpentOnRentPCt)), color="transparent")+
  scale_fill_manual(values = palette5,
labels = ((qBr(allTracts.group, "IncSpentOnRentPCt", rnd=FALSE))),
                    name = "Percentage\n(Quintile Breaks)") + 
  mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8)+
    
  labs(title = "Percent of Income Spent on Rent: Quintiles", subtitle = "Boston MBTA, TOD vs non TOD tracts", caption = "Figure 2.6, ACS 5-Year") 

To fix this problem, we can adjust the breaks in the map so that we can see a more defined picture of the rent burden in the city of Boston. By adjusting these breaks to color code all the rent burdened tracts, we can see how this indicator is different within and outside of the census tract boundaries. Most tracts in Boston are experiencing some level or rent burden, with the small exception of tracts in the downtown area, within the TOD tracts. This can be explained due to the higher income levels of residents living in the area. However, this is not the norm. As we move further away from the city center we can see how the rent burden drastically increases, in both 2016 and 2020. The lack of affordable housing in Boston, particularly in proximity to transit is an issue.

breaks <- c(0, 15, 20, 25, 30, 100)  
labels <- c("0-15 %", "16-20 %", "21-25 %", "26-30 %", "31+ %") 

ggplot(allTracts.group) +
  geom_sf(data = allTracts) +
  geom_sf(aes(fill = cut(IncSpentOnRentPCt, breaks = breaks, labels = labels))) +
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8) + 
  scale_fill_manual(values = palette5,  
    name = "Rent Burdened",
    breaks = labels) +
  labs(title = "Percent of Income Spent on Rent", subtitle = "Boston MBTA, TOD vs non TOD tracts", caption = "Figure 2.7, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme() 

Residents employed in Business, Arts or Sciences

In these maps we can clearly see a concentration of individuals working in business, arts, and sciences living in the Central Business District (CBD) or Transit-Oriented Development (TOD).

Proximity to the workplace for creative/tech-based professions within the CBD or accessible TOD areas significantly reduces commute times for professionals, explaining this concentration.

#plotting BusArtSciences

breaks <- c(0, 32500, 65000, 97500, 130000, 188250)  
labels <- c("0 - 32,500", "32,500 - 65,000", "65,000 - 97,500", "97,500 - 13,000", "13,000+") 

ggplot(allTracts.group) +
  geom_sf(data = allTracts) +
  geom_sf(aes(fill = cut(BusArtsScience, breaks = breaks, labels = labels))) +
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8) + 
  scale_fill_manual(values = palette5,  
    name = "Number of Residents",
    breaks = labels) +
  labs(title = "Residents Employed in the Business, Arts and Science Sector", subtitle = "Boston MBTA, TOD vs non TOD tracts", caption = "Figure 2.8, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme() 

Tabulating the Data

This figure tabulates the summarized data for various indicators, broken down by year and TOD status. It computes summary statistics (e.g., mean values) for population, rent, income, business, arts, sciences employment, and the percentage of income spent on rent. Based on the data, we can see that:

  1. The population has witnessed very little change (decline) over the 2016 - 2020. for all variables.

  2. The gap in Median Income has widened over the years. Average income in both TOD and non-TOD areas increased substantially from 2016 to 2020, with TOD areas having higher income levels in both years.

  3. Average rent prices increased in both TOD and non-TOD areas from 2016 to 2020, with TOD areas consistently having higher rent values, indicating a willingness from residents to live in transit rich neighborhoods.

  4. The percentage of income spent on rent decreased in TOD areas from 2016 to 2020, indicating improved affordability, while it increased slightly in non-TOD areas.

  5. The number of Business, Arts and Science employees increased by nearly 20% within the TOD districts. The downtown clearly has an effect on this trend.

# tables 

allTracts.Summary <- 
  st_drop_geometry(allTracts.group) %>% #best to go non spatial here because we're just making a table
  group_by(year, TOD) %>%
  summarize(Population = mean(TotalPop, na.rm = T),
            Rent = mean(MedRent, na.rm = T),
            Income = mean(MedInc, na.rm = T),
            Bus_Art_Sciences = mean(BusArtsScience, na.rm = T),
            Percent_Inc_on_Rent = mean(IncSpentOnRentPCt, na.rm = T))

allTracts.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, Value, -year.TOD) %>%
  mutate(Value = round(Value, 2)) %>%
  spread(year.TOD, Value) %>%
  kable(., col.names = c("Mean Variables","non-TOD","TOD","non-TOD","TOD"), align="lrrrr") %>%
  kable_styling() %>%
  footnote(general_title = "\n",
           general = "Table 2.1")%>% 
      add_header_above(., header = c(" "=1, "2016" = 2, "2020" = 2))
2016
2020
Mean Variables non-TOD TOD non-TOD TOD
Bus_Art_Sciences 56847.32 56306.43 66755.11 72593.21
Income 60561.86 63027.75 78208.02 84099.83
Percent_Inc_on_Rent 32.16 31.50 32.65 30.16
Population 3661.81 3389.58 3588.64 2865.01
Rent 1134.73 1340.18 1434.84 1731.78

Table 2.1

Plotting Graphs

The final section of the code involves creating grouped bar graphs to visualize the differences in indicators across time and space (TOD vs. non-TOD). These charts further validate the inferences illustrated by the previous chart.

#grouped bar graphs

allTracts.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  ggplot(aes(year, Value, fill = TOD)) +
    geom_bar(stat = "identity", position = "dodge") +
    facet_wrap(~Variable, scales = "free", ncol=3) +
    scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
    labs(title = "Indicator differences across time and space") +
    theme(legend.position="top")+
  labs(x=NULL, y=NULL) +
    theme(panel.spacing = unit(1, "lines")) + 
    plotTheme()

Discussing Spatial Biases

Spatial biases wield a substantial influence on metrics like rent concerning their relationship with transit accessibility. These biases can originate from several sources, with human error constituting a notable concern. During the data collection and analysis phases, inaccuracies may emerge due to misclassifications of geographic locations and inconsistent or incomplete data recording. These errors, while unintended, can substantially distort the resulting metrics, making it essential to implement rigorous data quality control measures. Furthermore, the presence of spatial autocorrelation, where nearby areas exhibit similar characteristics, can also introduce biases. This spatial autocorrelation can be a consequence of unmodeled spatial processes, leading to erroneous assumptions of independence in statistical analyses. Additionally, data analysis must be conducted with an acute awareness of the presence of uncontrolled variables and unaccounted data issues, such as the margin of error inherent in census data or addressing missing values (NAs). Failing to address these facets can introduce unintended biases and uncertainties into the analysis, potentially undermining the reliability of conclusions drawn from spatial assessments. Thus, meticulous data validation, rigorous cleaning procedures, and the comprehensive consideration of various influential factors are vital steps to mitigate spatial biases in analyses of this nature.

The Downtown Effect

For many of the trends observed, they are not solely contingent on transit accessibility. They are influenced by a multifaceted web of factors, including neighborhood amenities, housing policies, and socioeconomic disparities. Overlooking the significance of Boston’s Central Business District (CBD), a thriving hub of employment and economic activity, can lead to an inherent upward skew in rent rates near this area.

While transit accessibility is pivotal to TOD, it should be viewed within the broader context of Boston’s urban dynamics. The CBD’s significance as an employment and economic center underscores the need for strategic planning that promotes balanced growth, affordability, and equitable access to transportation options.

More Data Analysis

This section extends the preceding data analysis on Transit-Oriented Development. It incorporates graduated symbol maps and graphs depicting the relationship between rent and distance from transit.

Graduated Symbol Maps

The following visualizations are a proportional display of rent and populations of census tracts intersecting the half mile buffers of transit stops.

buffers_stops <- filter(MBTABuffers, Legend=="Buffer")

buffers_stops$ID <- seq_along(buffers_stops$geometry) 

intersected_buffer <- selectCentroids %>% 
  st_intersection(buffers_stops, selectCentroids)

total_stops <- intersected_buffer %>%
  st_drop_geometry() %>%
  group_by(ID, GEOID, year) %>%
  summarise(pop = sum(TotalPop), rent = mean(MedRent, na.rm = TRUE))

labels_n <- clippedMBTA %>%
  select(GEOID, geometry) %>%
  left_join(total_stops,
            by = "GEOID") 

Population within 1/2 mile

Here, in contrast to the “sum” of census tract populations surrounding the transit stations, the study makes use of the average populations of surrounding census tracts. This was done so that the analysis offers a more nuanced and representative perspective. Mean values take into account the distribution of data points, providing an average that reflects the central tendency. As each census tract intersects multiple buffers, the sum total of the populations was reaching 5-digit numbers for each transit stop, giving a misleading representation of the population within the city.

The first observable difference in the 2016 and 2020 is the loss of data - many of the transit stops are missing from the 2020 maps. This can be a result of an erraneous data join, considering the shape of census tracts changed between those years as a result of fluctuating populations. Part of the data loss is also a result of missing data from census.

Despite these errors, we can see a difference in 2020, with a decline in population, most distinctly observed in the CBD/downtown area. We can see a high concentration of residents both at the center and southern part of the city, likely due to a concentration of jobs and universities.

ggplot() +
  geom_sf(data = tracts20, fill = "transparent") +  
  geom_sf(data = tracts16, fill = "transparent") +  
  geom_sf(data = labels_n %>%
             filter(!is.na(year)),  
          aes(fill = pop, size= pop),
          shape = 21, 
          alpha = 0.2) + 
  scale_size_continuous(range = c(0.5, 4), name = "Average Population") +
  scale_fill_gradientn(colors = hcl.colors(5, "viridis", rev = TRUE, alpha = 0.9), name = "")  +
  labs(size = "Point Size") +
   labs(title = "Average Population within half-mile of MBTA Stops", subtitle = "Proportional Symbol Map", caption = "Figure 4.1, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme()

Rent within 1/2 mile

The rent maps show similar inconsistencies in data across the two years. The downtown and Back Bay areas (located at the most transit rich regions of the map) have consistently remained the most desirable and expensive areas throughout. The rent is lowest in areas in and around Roxbury, making it a rare affordable yet transit rich neighborhood. The presence of universities also lowers the cost of rent especially in South Boston.

ggplot() +
  geom_sf(data = tracts20, fill = "transparent") +   
  geom_sf(data = tracts16, fill = "transparent") +  
  geom_sf(data = labels_n%>%
             filter(!is.na(year)), 
          aes(fill = rent, size= rent),
          shape = 21, 
          alpha = 0.2) + 
  scale_size_continuous(range = c(0.5, 4), name = "Average Rent") +
  scale_fill_gradientn(colors = hcl.colors(5, "viridis", rev = TRUE, alpha = 0.9), name = "") +
  labs(size = "Point Size") +
  labs(title = "Average Rent within half-mile of MBTA Stops", subtitle = "Proportional Symbol Map", caption = "Figure 4.2, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme()

Rent as a function of distance from the transit stops

This is a valuable metric as it reveals how housing costs change with proximity to public transportation. It informs accessibility, transportation cost savings, environmental impact, and equity considerations, aiding urban planning and policy decisions. It empowers individuals to make informed housing choices and helps developers identify investment opportunities. Nevertheless, it should be considered alongside other factors like amenities and job accessibility, and its impact may vary by location and transit quality.

MBTA_MRB <- multipleRingBuffer(st_union(clippedMBTA),
                                maxDistance = 34320,
                                interval =  2640)

# Your ggplot code
ggplot() +
  geom_sf(data = MBTA_MRB, aes(color = distance), fill = "transparent", linewidth = 0.5, alpha = 0.1 ) +
  geom_sf(data = clippedMBTA, size = 1) +
  geom_sf(data = st_union(tracts16), fill = NA, linewidth = 0.7) +
  labs(title = "Half mile buffers") +
  scale_color_viridis(name = "Distance\n(feet)") +
  mapTheme()

tracts20.rings <- tracts20 %>% 
  select(GEOID, year) %>% 
  st_centroid() %>% 
  st_join(MBTA_MRB, join = st_intersects) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(tracts20, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) 

tracts20.rings.summary <- st_drop_geometry(tracts20.rings) %>%
    group_by(distance, year) %>%
    summarize(Mean_Rent = mean(MedRent, na.rm=T))

tracts16.rings <- tracts16 %>% 
  select(GEOID, year) %>% 
  st_centroid() %>% 
  st_join(MBTA_MRB, join = st_intersects) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(tracts16, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) 

tracts16.rings.summary <- st_drop_geometry(tracts16.rings) %>%
    group_by(distance, year) %>%
    summarize(Mean_Rent = mean(MedRent, na.rm=T))


summary_bind <- rbind((tracts16.rings.summary%>%
        filter(!is.na(Mean_Rent) & !is.nan(Mean_Rent))), 
       tracts20.rings.summary%>%
        filter(!is.na(Mean_Rent) & !is.nan(Mean_Rent))
)

The first obvious conclusion is that rent is significantly higher in the year 2020 than in 2016. Further, the decrease in rent as one moves away from transit stops can be attributed to several factors. Firstly, properties located closer to transit stops are more convenient for commuters, reducing the need for private transportation and saving on commuting costs. This increased accessibility and reduced transportation expenses make these areas more desirable, leading to higher demand and, consequently, higher rents.

ggplot(summary_bind,
       aes(distance, Mean_Rent, colour=year)) +
       scale_colour_manual(values = palette5, name = "Year") +
       geom_point(size=3) + 
       geom_line(size=2) + 
    labs(
    title = "Half mile buffers",
    x = "Distance in Miles",  # Change the X-axis label
    y = "Rent in Dollars"   # Change the Y-axis label
  ) +
       plotTheme()

In contrast, as one moves further away from transit stops, the convenience of access to public transportation diminishes. As a result, the demand for properties in these areas decreases, which puts downward pressure on rent prices. Additionally, properties farther from transit hubs may offer larger spaces or different amenities that can offset the inconvenience of a longer commute, further contributing to the decline in rent prices as we move away from transit stops. These combined factors create a rent gradient with transit proximity at its core.

Policy Recommendations

Based on this report, the following policy broad recommendations can be made:

  1. Affordable Housing: Given the rise in rent and increasing rent burden, policies should focus on the development of affordable housing options near transit hubs. Affordable housing initiatives can help retain a diverse population and promote transit use.

  2. Transit Accessibility: Continue investments in improving transit infrastructure and accessibility to ensure that residents can easily access public transportation. Enhancements can include expanding transit networks, improving station facilities, and increasing service frequency.

  3. Data Monitoring: Establish a comprehensive data monitoring system to regularly assess changes in population, income, and rent in TOD districts. This data-driven approach will enable policymakers to make timely adjustments to housing and transportation policies.

Boston’s Transit-Oriented Development presents both opportunities and challenges. While it offers the potential for sustainable urban growth and reduced car dependency, it also requires careful planning and policies to address issues of housing affordability and equitable development. By implementing the recommended policies and closely monitoring demographic and economic changes, Boston can create vibrant and inclusive neighborhoods around its transit network.

---
title: "Transit Oriented Development: Analyzing Boston's Rapid Transit"
subtitle: "MUSA 508, Assignment 2"
author: "Samriddhi Khare, University of Pennsylvania"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_float: true
    code_folding: hide
    code_download: true
    theme: cerulean 
---

# Introduction

Boston is one of the largest cities in the United States and is an integral part of the Northeast Economic Corridor. Over time, it has become a center for culture and innovation. The high density of higher education institutes like Harvard and MIT have brought Boston to the forefront of scientific research. Boston is also major hub for various industries, including finance, technology, healthcare, and biotechnology.    

<img src="D:/Fall_2023/PPA/Assignment2/images/MBTA_RapidTransit_11_2022_2000px.jpg" width="300" align="right" style="display: inline; margin: 0 10px;"/>

Boston's rapid transit system, known as the "T," is a vital component of the city's public transportation network. Operated by the Massachusetts Bay Transportation Authority (MBTA), it consists of four subway lines (Red, Orange, Blue, and Green), serving both the city and surrounding suburbs. Transit-Oriented Development (TOD) in Boston, like many other cities, is aimed at creating vibrant, mixed-use communities around the city's extensive public transit network, particularly the rapid transit subway lines. 




This policy brief attempts to address the value and citzen settlement patterns of transit-rich neighborhoods, comparing them to the city at large. It makes extensive use of the American Community Survey data, along with publically available information on the MBTA's rapid transit routes and stops. 

This code is built upon the classwork discussed [here](https://github.com/mafichman/musa_5080_2023/tree/main).

```{r setup, include=FALSE}
  knitr::opts_chunk$set(
    echo = TRUE,
    warning = FALSE,
    message = FALSE,
    out.width = '100%',
    fig.retina =3
  )
```

## R Setup and Installing Packages

This code chunk handles the essential tasks of loading necessary packages, configuring the Census API key, defining class functions, specifying a color palette, and managing global environment settings.

```{r setup_packages, warning = FALSE, message = FALSE}

library(tidyverse)
library(tidycensus)
library(sf)
library(kableExtra)
library(tidyr)
library(ggplot2)
library(viridis)
library(prettydoc)
library(stringr)

options(scipen=999)
options(tigris_class = "sf")

#these are functions developed for this class, so it takes you especially to the book to get those

source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

palette5 <- c("#fde725","#5ec962","#21918c","#3b528b","#440154")

census_api_key('bf2d507651b5a621dbadd44533fb4f3deaab26bf', overwrite = TRUE)

```


## Loading Census Data

The years chosen for analysis are 2016 and 2020. Over this period, Boston's Rapid Transit went through significant changes, the most notable one being the planning and construction of the extended Green Line. By extending the Green Line into historically under-served areas like Somerville and Medford, it has not only eased commuting for residents but also bolstered ridership across the entire line, reducing congestion and offering more accessible transit options to thousands of daily commuters.


```{r load_variables, cache=TRUE}

acs_variable_list.2020 <- load_variables(2020, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)

acs_variable_list.2016 <- load_variables(2016, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)
```

The variables chosen for this analysis include: 

1. Total Population in Occupied Housing Units: This variable is related to TOD because the success of transit-oriented development often depends on population density. Higher population density typically leads to more demand for public transportation services. TOD areas are designed to accommodate a larger population within a compact, walkable neighborhood, which can reduce reliance on personal vehicles and promote the use of public transit.

2. Median household Income in the past 12 months: Income levels are relevant to TOD because they influence both the demand for public transportation and the types of amenities and services that can be supported in a TOD area. Low- to moderate-income households may rely more on public transit, making it crucial to have affordable transportation options in TOD developments. Additionally, higher-income residents can support a range of businesses and services within the TOD area.

3. Median Rent over the past 12 months: Rent levels are important in TOD areas because they can affect housing affordability, which is a critical factor for attracting residents who want to live near transit hubs. Affordable housing options in TODs can encourage people to choose public transportation over private cars, as they can live closer to work and amenities without a long commute.

4. Number of People employed in the Business, Arts or Sciences sector: Boston is known as a hub for these industries, with numerous prestigious universities and cultural institutions located in the area. If residents engaged in this sector are choosing to located within TOD districts, it's indicative of what future routes and access should be prioritized.  

5. Median gross rent as a percentage of household income: This variable relates to housing affordability, which is a key aspect of TOD. The percentage of household income spent on rent can indicate whether housing is affordable for the local population. Lower rent as a percentage of income suggests greater affordability, which can attract a diverse mix of residents, including those who rely on public transit. It also relates to the previous indicator, as the Business, Arts and Sciences sector typically involves high paying jobs for residents. 

```{r chunky, results='hide'}

tracts16 <- 
  get_acs(geography = "tract", 
          variables = c("B25026_001E",#pop
                        "B19013_001E", #med inc
                        "B25058_001E", #rent
                        "B24011_002E", #ppl in business science arts 
                        "B25071_001E"), #inc as % of rent
          year=2016, 
          state = 25,
          county=025, #using the boston/suffolk county FIPS code
          geometry=TRUE, output="wide")%>%
  st_transform('ESRI:102728') %>%
  rename(TotalPop = B25026_001E, 
         MedInc = B19013_001E,
         MedRent = B25058_001E, 
         BusArtsScience = B24011_002E,
         IncSpentOnRentPCt = B25071_001E)%>%
  dplyr::select(-NAME, -ends_with("M")) %>%
  mutate(year = "2016") 

# 2020

tracts20 <- 
   get_acs(geography = "tract", 
          variables = c("B25026_001E",#pop
                        "B19013_001E", #med inc
                        "B25058_001E", #rent
                        "B24011_002E", #ppl in business science arts 
                        "B25071_001E"), #inc as % of rent
          year=2020, 
          state = 25,
          county=025, #using the boston/suffolk county FIPS code
          geometry=TRUE, output="wide")%>%
  st_transform('ESRI:102728') %>%
  rename(TotalPop = B25026_001E, 
         MedInc = B19013_001E,
         MedRent = B25058_001E, 
         BusArtsScience = B24011_002E,
         IncSpentOnRentPCt = B25071_001E)%>%
  dplyr::select(-NAME, -ends_with("M")) %>%
  mutate(year = "2020") 


#binding the data
allTracts <- rbind(tracts16,tracts20)

```

## Loading MBTA Stops

MBTA Rapid Transit data represents the station stops on the five subway, streetcar/trolley and Silver Line bus "T" lines (Blue, Green, Orange, Red and Silver) in the Massachusetts Bay Transportation Authority's rapid transit rail network. The layers were developed by the Central Transportation Planning Staff (CTPS), with additional editing by MassGIS based on current aerial imagery and information from [this source.](https://hub.arcgis.com/maps/massgis::mbta-rapid-transit/about)

```{r name, results = 'hide'}

url <-"https://gis.massdot.state.ma.us/arcgis/rest/services/Multimodal/GTFS_Systemwide/MapServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json"

MBTAStops <- st_read(url) %>% 
  st_transform(st_crs(tracts16))  
```

```{r}

#clipping to tracts boundaries 

clippedMBTA <- 
  st_intersection(MBTAStops, tracts16)

#plotting

ggplot() + 
  geom_sf(data=st_union(tracts16)) +
  geom_sf(data=clippedMBTA) +
  labs(title="MBTA Stops", 
       subtitle="Boston, MA", 
       caption="Figure 1.1") +
  mapTheme()
```

# Data Analysis

This section is focused on examining various socio-economic indicators in relation to transit-oriented development (TOD) and non-TOD areas within the context of Boston's public transit system. It makes use of the `ggplot` and `kable` packages for effective visualizations. 

## Creating Buffers and Selecting Census Tracts

Half mile buffers were created as half a mile translates to a roughly 10 minute walkshed, making it reasonable to assume that the area within those buffers are the most significantly affected by the presence of transit.

```{r}
#creating a buffer

MBTABuffers <- 
  rbind(
    st_buffer(clippedMBTA, 2640) %>% #the number at the end is in feet, because our CRS is in feet
      mutate(Legend = "Buffer") %>%
      dplyr::select(Legend),
    st_union(st_buffer(clippedMBTA, 2640)) %>%
      st_sf() %>%
      mutate(Legend = "Unioned Buffer")) #all st functions will be spatial related stuff

ggplot() +
  geom_sf(data=MBTABuffers) +
  geom_sf(data=clippedMBTA, show.legend = "point") +
  facet_wrap(~Legend) + 
  labs(caption = "Figure 2.1") +
  mapTheme()

#just the buffers

buffer <- filter(MBTABuffers, Legend=="Unioned Buffer")


```

The study was done by selecting census tracts based on the "Select by Centroids" method as the dataset for the MBTA stops was point based, and studying the range of geographies serviced by transit was best represented by the central location of the census tract features. 

```{r}

#selecting the TOD tracts by centroid

selectCentroids <-
  rbind(
    st_centroid(tracts16)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts16, GEOID), by = "GEOID") %>%
  st_sf() %>%
  mutate(Selection_Type = "Select by Centroids"),
  st_centroid(tracts20)[buffer,] %>%
  st_drop_geometry() %>%
  left_join(., dplyr::select(tracts16, GEOID), by = "GEOID") %>%
  st_sf() %>%
  mutate(Selection_Type = "Select by Centroids"))

ggplot() +
  geom_sf(data=selectCentroids, aes()) +
  geom_sf(data=clippedMBTA, show.legend = "point") +
  labs(title = "Selected TOD Tracts", caption = "Figure 2.2") +
  mapTheme()

```

## Comparing TOD vs non-TOD Indicators 

This section compares various socio-economic indicators between TOD and non-TOD areas. This involves calculating indicators related to population, median income, median rent, percentage of income spent on rent, and the number of residents employed in the business, arts, and science sector for both types of areas. The code segment generates visualizations using ggplot for each indicator, providing a comparative analysis between TOD and non-TOD areas over time. A majority of the following visualization have been defined using 5 classes of variables, defined as a "quintile break".  


```{r}

#comparing TOD vs non TOD indicators 

allTracts.group <- 
  rbind(
    st_centroid(allTracts)[buffer,] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "TOD"),
    st_centroid(allTracts)[buffer, op = st_disjoint] %>%
      st_drop_geometry() %>%
      left_join(allTracts) %>%
      st_sf() %>%
      mutate(TOD = "Non-TOD")) %>%
  mutate(MedRent.inf = ifelse(year == "2016", MedRent * 1.10, MedRent)) 

```


### Total Population


The entire city of Boston has witnessed a slow but steady decline in population between the years of 2016 and 2020. The same change is also witnessed within the TOD districts. However, the change within the TOD districts seems to be more pronounced, with numerous census tracts moving to lower quintiles within these years. To understand the cause and effect of this decline, it is important to understand other indicators such as income and rent.  

```{r}
#plotting total pop

ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(TotalPop)), color="transparent")+
   scale_fill_manual(values = palette5,
                    labels = c("0 - 2040", "2040 - 2780", "2780 - 3655", "3655 - 4810", "4810 - 9620"),
                    name = "Popluation\n(Quintile Breaks)") + mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent",
          linewidth = 0.8)+
  labs(
    title = "Total Population, 2016-2020",
    subtitle = "Boston MBTA, TOD vs non TOD tracts",
    caption = "Figure 2.3, Data: US Census Bureau, ACS 5-year estimates",
    fill = "Total Population")
  
```

### Median Income

The median income within the TOD districts appears to have increased between 2016 - 2020. a large number of census tracts across the city, and specifically within the TOD districts appear to have jumped to a higher quintile break. 

Combined with the decrease in population density, we can infer that while certain medium to high income residents may have been able to stay in the city, lower income residents could have been forced to move due to unaffordability. But to draw this conclusion, we must look at the rent and affordability statistics.   

```{r}
#plotting med inc
ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(MedInc)), color="transparent")+
  scale_fill_manual(values = palette5,
                    labels = qBr(allTracts.group, "MedInc"), 
                    name = "Median Income\n(Quintile Breaks)") + mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8)+
  labs(
    title = "Median Income in 2020 adjusted Dollars, 2016-2020",
    subtitle = "Boston MBTA, TOD vs non TOD tracts",
    caption = "Figure 2.4, Data: US Census Bureau, ACS 5-year estimates",
    fill = "Median Income")
  
```

### Median Rent

Median rent has witnessed a significant increase within the TOD districts. This reflects the growing demand for housing in proximity to transit hubs. This upward trend in rental prices underscores the changing dynamics of urban development in these areas. However, the increase in rent seems to be consistent with changes across the city. Most, if not all census tracts seems to have jumped up a quintile.

However, it can also be a reason for the declining populations in the transit rich areas, with low income residents moving out due to affordability issues. It is difficult to make these conclusions without taking a more comprehensive look at the different factors affecting these changes. A deeper spatial analysis, looking into the factors and their correlations would make a more compelling case. 

```{r}
#plotting med rent
ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(MedRent)), color="transparent")+
  scale_fill_manual(values = palette5,
                    labels = qBr(allTracts.group, "MedRent"),
                    name = "Median Rent\n(Quintile Breaks)") + mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent",linewidth = 0.8)+
    
  labs(
    title = "Median Rent in 2020 adjusted Dollars, 2016-2020",
    subtitle = "Boston MBTA, TOD vs non TOD tracts",
    caption = "Figure 2.5, Data: US Census Bureau, ACS 5-year estimates",
    fill = "Median Rent")
  
```

### Percentage of Income Spent on Rent

The percentage of income spent on rent is the gross rent of each household, expressed as a fraction of the household income. This is a good indicator of affordability, with rent burdened and severely rent burdened populations being defined as households that spend more than 30% and 50% of their incomes on rent, respectively. 

It is hard to make a meaningful conclusion looking at this map as the quintiles make it hard to distinguish changes in each individual census tract. 

```{r}
#plotting pct income spent on rent
ggplot()+
  geom_sf(data = allTracts.group, aes(fill = q5(IncSpentOnRentPCt)), color="transparent")+
  scale_fill_manual(values = palette5,
labels = ((qBr(allTracts.group, "IncSpentOnRentPCt", rnd=FALSE))),
                    name = "Percentage\n(Quintile Breaks)") + 
  mapTheme() +
  facet_wrap(~year)+
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8)+
    
  labs(title = "Percent of Income Spent on Rent: Quintiles", subtitle = "Boston MBTA, TOD vs non TOD tracts", caption = "Figure 2.6, ACS 5-Year") 
  
```

To fix this problem, we can adjust the breaks in the map so that we can see a more defined picture of the rent burden in the city of Boston. By adjusting these breaks to color code all the rent burdened tracts, we can see how this indicator is different within and outside of the census tract boundaries. Most tracts in Boston are experiencing some level or rent burden, with the small exception of tracts in the downtown area, within the TOD tracts. This can be explained due to the higher income levels of residents living in the area. However, this is not the norm. As we move further away from the city center we can see how the rent burden drastically increases, in both 2016 and 2020. The lack of affordable housing in Boston, particularly in proximity to transit is an issue.

```{r RENT:INCOME}

breaks <- c(0, 15, 20, 25, 30, 100)  
labels <- c("0-15 %", "16-20 %", "21-25 %", "26-30 %", "31+ %") 

ggplot(allTracts.group) +
  geom_sf(data = allTracts) +
  geom_sf(aes(fill = cut(IncSpentOnRentPCt, breaks = breaks, labels = labels))) +
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8) + 
  scale_fill_manual(values = palette5,  
    name = "Rent Burdened",
    breaks = labels) +
  labs(title = "Percent of Income Spent on Rent", subtitle = "Boston MBTA, TOD vs non TOD tracts", caption = "Figure 2.7, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme() 

```

### Residents employed in Business, Arts or Sciences 

In these maps we can clearly see a concentration of individuals working in business, arts, and sciences living in the Central Business District (CBD) or Transit-Oriented Development (TOD). 

Proximity to the workplace for creative/tech-based professions within the CBD or accessible TOD areas significantly reduces commute times for professionals, explaining this concentration. 

```{r}
#plotting BusArtSciences

breaks <- c(0, 32500, 65000, 97500, 130000, 188250)  
labels <- c("0 - 32,500", "32,500 - 65,000", "65,000 - 97,500", "97,500 - 13,000", "13,000+") 

ggplot(allTracts.group) +
  geom_sf(data = allTracts) +
  geom_sf(aes(fill = cut(BusArtsScience, breaks = breaks, labels = labels))) +
  geom_sf(data = allTracts.group %>%
            filter(TOD == "TOD") %>%
            st_union(),
          color = "red",
          fill = "transparent", linewidth = 0.8) + 
  scale_fill_manual(values = palette5,  
    name = "Number of Residents",
    breaks = labels) +
  labs(title = "Residents Employed in the Business, Arts and Science Sector", subtitle = "Boston MBTA, TOD vs non TOD tracts", caption = "Figure 2.8, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme() 

```

## Tabulating the Data

This figure tabulates the summarized data for various indicators, broken down by year and TOD status. It computes summary statistics (e.g., mean values) for population, rent, income, business, arts, sciences employment, and the percentage of income spent on rent. Based on the data, we can see that: 

1. The population has witnessed very little change (decline) over the 2016 - 2020. for all variables. 

2. The gap in Median Income has widened over the years. Average income in both TOD and non-TOD areas increased substantially from 2016 to 2020, with TOD areas having higher income levels in both years.

3. Average rent prices increased in both TOD and non-TOD areas from 2016 to 2020, with TOD areas consistently having higher rent values, indicating a willingness from residents to live in transit rich neighborhoods. 

4. The percentage of income spent on rent decreased in TOD areas from 2016 to 2020, indicating improved affordability, while it increased slightly in non-TOD areas.

5. The number of Business, Arts and Science employees increased by nearly 20% within the TOD districts. The downtown clearly has an effect on this trend. 

```{r}

# tables 

allTracts.Summary <- 
  st_drop_geometry(allTracts.group) %>% #best to go non spatial here because we're just making a table
  group_by(year, TOD) %>%
  summarize(Population = mean(TotalPop, na.rm = T),
            Rent = mean(MedRent, na.rm = T),
            Income = mean(MedInc, na.rm = T),
            Bus_Art_Sciences = mean(BusArtsScience, na.rm = T),
            Percent_Inc_on_Rent = mean(IncSpentOnRentPCt, na.rm = T))

allTracts.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, Value, -year.TOD) %>%
  mutate(Value = round(Value, 2)) %>%
  spread(year.TOD, Value) %>%
  kable(., col.names = c("Mean Variables","non-TOD","TOD","non-TOD","TOD"), align="lrrrr") %>%
  kable_styling() %>%
  footnote(general_title = "\n",
           general = "Table 2.1")%>% 
      add_header_above(., header = c(" "=1, "2016" = 2, "2020" = 2))
```
## Plotting Graphs

The final section of the code involves creating grouped bar graphs to visualize the differences in indicators across time and space (TOD vs. non-TOD). These charts further validate the inferences illustrated by the previous chart.

```{r}
#grouped bar graphs

allTracts.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  ggplot(aes(year, Value, fill = TOD)) +
    geom_bar(stat = "identity", position = "dodge") +
    facet_wrap(~Variable, scales = "free", ncol=3) +
    scale_fill_manual(values = c("#bae4bc", "#0868ac")) +
    labs(title = "Indicator differences across time and space") +
    theme(legend.position="top")+
  labs(x=NULL, y=NULL) +
    theme(panel.spacing = unit(1, "lines")) + 
    plotTheme()
```


# Discussing Spatial Biases

Spatial biases wield a substantial influence on metrics like rent concerning their relationship with transit accessibility. These biases can originate from several sources, with human error constituting a notable concern. During the data collection and analysis phases, inaccuracies may emerge due to misclassifications of geographic locations and inconsistent or incomplete data recording. These errors, while unintended, can substantially distort the resulting metrics, making it essential to implement rigorous data quality control measures. Furthermore, the presence of spatial autocorrelation, where nearby areas exhibit similar characteristics, can also introduce biases. This spatial autocorrelation can be a consequence of unmodeled spatial processes, leading to erroneous assumptions of independence in statistical analyses. Additionally, data analysis must be conducted with an acute awareness of the presence of uncontrolled variables and unaccounted data issues, such as the margin of error inherent in census data or addressing missing values (NAs). Failing to address these facets can introduce unintended biases and uncertainties into the analysis, potentially undermining the reliability of conclusions drawn from spatial assessments. Thus, meticulous data validation, rigorous cleaning procedures, and the comprehensive consideration of various influential factors are vital steps to mitigate spatial biases in analyses of this nature.

## The Downtown Effect

<img src="D:/Fall_2023/PPA/Assignment2/images/Screenshot 2023-09-21 210839.jpg" width="300" align="right" style="display:inline; margin: 0 10px;"/>

For many of the trends observed, they are not solely contingent on transit accessibility. They are influenced by a multifaceted web of factors, including neighborhood amenities, housing policies, and socioeconomic disparities. Overlooking the significance of Boston's Central Business District (CBD), a thriving hub of employment and economic activity, can lead to an inherent upward skew in rent rates near this area. 

While transit accessibility is pivotal to TOD, it should be viewed within the broader context of Boston's urban dynamics. The CBD's significance as an employment and economic center underscores the need for strategic planning that promotes balanced growth, affordability, and equitable access to transportation options.

# More Data Analysis

This section extends the preceding data analysis on Transit-Oriented Development. It incorporates graduated symbol maps and graphs depicting the relationship between rent and distance from transit.

## Graduated Symbol Maps

The following visualizations are a proportional display of rent and populations of census tracts intersecting the half mile buffers of transit stops.   

```{r setting up the grad symbols}

buffers_stops <- filter(MBTABuffers, Legend=="Buffer")

buffers_stops$ID <- seq_along(buffers_stops$geometry) 

intersected_buffer <- selectCentroids %>% 
  st_intersection(buffers_stops, selectCentroids)

total_stops <- intersected_buffer %>%
  st_drop_geometry() %>%
  group_by(ID, GEOID, year) %>%
  summarise(pop = sum(TotalPop), rent = mean(MedRent, na.rm = TRUE))

labels_n <- clippedMBTA %>%
  select(GEOID, geometry) %>%
  left_join(total_stops,
            by = "GEOID") 

```

### Population within 1/2 mile

Here, in contrast to the "sum" of census tract populations surrounding the transit stations, the study makes use of the average populations of surrounding census tracts. This was done so that the analysis offers a more nuanced and representative perspective. Mean values take into account the distribution of data points, providing an average that reflects the central tendency. As each census tract intersects multiple buffers, the sum total of the populations was reaching 5-digit numbers for each transit stop, giving a misleading representation of the population within the city.

The first observable difference in the 2016 and 2020 is the loss of data - many of the transit stops are missing from the 2020 maps. This can be a result of an erraneous data join, considering the shape of census tracts changed between those years as a result of fluctuating populations. Part of the data loss is also a result of missing data from census. 

Despite these errors, we can see a difference in 2020, with a decline in population, most distinctly observed in the CBD/downtown area. We can see a high concentration of residents both at the center and southern part of the city, likely due to a concentration of jobs and universities. 

```{r population grad symbols}
ggplot() +
  geom_sf(data = tracts20, fill = "transparent") +  
  geom_sf(data = tracts16, fill = "transparent") +  
  geom_sf(data = labels_n %>%
             filter(!is.na(year)),  
          aes(fill = pop, size= pop),
          shape = 21, 
          alpha = 0.2) + 
  scale_size_continuous(range = c(0.5, 4), name = "Average Population") +
  scale_fill_gradientn(colors = hcl.colors(5, "viridis", rev = TRUE, alpha = 0.9), name = "")  +
  labs(size = "Point Size") +
   labs(title = "Average Population within half-mile of MBTA Stops", subtitle = "Proportional Symbol Map", caption = "Figure 4.1, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme()

```


### Rent within 1/2 mile

The rent maps show similar inconsistencies in data across the two years. The downtown and Back Bay areas (located at the most transit rich regions of the map) have consistently remained the most desirable and expensive areas throughout. The rent is lowest in areas in and around Roxbury, making it a rare affordable yet transit rich neighborhood. The presence of universities also lowers the cost of rent especially in South Boston.


```{r rent grad symbol}
ggplot() +
  geom_sf(data = tracts20, fill = "transparent") +   
  geom_sf(data = tracts16, fill = "transparent") +  
  geom_sf(data = labels_n%>%
             filter(!is.na(year)), 
          aes(fill = rent, size= rent),
          shape = 21, 
          alpha = 0.2) + 
  scale_size_continuous(range = c(0.5, 4), name = "Average Rent") +
  scale_fill_gradientn(colors = hcl.colors(5, "viridis", rev = TRUE, alpha = 0.9), name = "") +
  labs(size = "Point Size") +
  labs(title = "Average Rent within half-mile of MBTA Stops", subtitle = "Proportional Symbol Map", caption = "Figure 4.2, ACS 5-Year") +
  facet_wrap(~year) +
  mapTheme()

```

## Rent as a function of distance from the transit stops

This is a valuable metric as it reveals how housing costs change with proximity to public transportation. It informs accessibility, transportation cost savings, environmental impact, and equity considerations, aiding urban planning and policy decisions. It empowers individuals to make informed housing choices and helps developers identify investment opportunities. Nevertheless, it should be considered alongside other factors like amenities and job accessibility, and its impact may vary by location and transit quality.

```{r MRB}

MBTA_MRB <- multipleRingBuffer(st_union(clippedMBTA),
                                maxDistance = 34320,
                                interval =  2640)

# Your ggplot code
ggplot() +
  geom_sf(data = MBTA_MRB, aes(color = distance), fill = "transparent", linewidth = 0.5, alpha = 0.1 ) +
  geom_sf(data = clippedMBTA, size = 1) +
  geom_sf(data = st_union(tracts16), fill = NA, linewidth = 0.7) +
  labs(title = "Half mile buffers") +
  scale_color_viridis(name = "Distance\n(feet)") +
  mapTheme()

```

```{r setting up the buffers}

tracts20.rings <- tracts20 %>% 
  select(GEOID, year) %>% 
  st_centroid() %>% 
  st_join(MBTA_MRB, join = st_intersects) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(tracts20, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) 

tracts20.rings.summary <- st_drop_geometry(tracts20.rings) %>%
    group_by(distance, year) %>%
    summarize(Mean_Rent = mean(MedRent, na.rm=T))

tracts16.rings <- tracts16 %>% 
  select(GEOID, year) %>% 
  st_centroid() %>% 
  st_join(MBTA_MRB, join = st_intersects) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(tracts16, GEOID, MedRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) 

tracts16.rings.summary <- st_drop_geometry(tracts16.rings) %>%
    group_by(distance, year) %>%
    summarize(Mean_Rent = mean(MedRent, na.rm=T))


summary_bind <- rbind((tracts16.rings.summary%>%
        filter(!is.na(Mean_Rent) & !is.nan(Mean_Rent))), 
       tracts20.rings.summary%>%
        filter(!is.na(Mean_Rent) & !is.nan(Mean_Rent))
)

```

The first obvious conclusion is that rent is significantly higher in the year 2020 than in 2016. Further, the decrease in rent as one moves away from transit stops can be attributed to several factors. Firstly, properties located closer to transit stops are more convenient for commuters, reducing the need for private transportation and saving on commuting costs. This increased accessibility and reduced transportation expenses make these areas more desirable, leading to higher demand and, consequently, higher rents.


```{r tabling the rent vs distance}



ggplot(summary_bind,
       aes(distance, Mean_Rent, colour=year)) +
       scale_colour_manual(values = palette5, name = "Year") +
       geom_point(size=3) + 
       geom_line(size=2) + 
    labs(
    title = "Half mile buffers",
    x = "Distance in Miles",  # Change the X-axis label
    y = "Rent in Dollars"   # Change the Y-axis label
  ) +
       plotTheme()

```

In contrast, as one moves further away from transit stops, the convenience of access to public transportation diminishes. As a result, the demand for properties in these areas decreases, which puts downward pressure on rent prices. Additionally, properties farther from transit hubs may offer larger spaces or different amenities that can offset the inconvenience of a longer commute, further contributing to the decline in rent prices as we move away from transit stops. These combined factors create a rent gradient with transit proximity at its core.

# Policy Recommendations 

Based on this report, the following policy broad recommendations can be made: 

1. Affordable Housing: Given the rise in rent and increasing rent burden, policies should focus on the development of affordable housing options near transit hubs. Affordable housing initiatives can help retain a diverse population and promote transit use.

2. Transit Accessibility: Continue investments in improving transit infrastructure and accessibility to ensure that residents can easily access public transportation. Enhancements can include expanding transit networks, improving station facilities, and increasing service frequency.

3. Data Monitoring: Establish a comprehensive data monitoring system to regularly assess changes in population, income, and rent in TOD districts. This data-driven approach will enable policymakers to make timely adjustments to housing and transportation policies.

Boston's Transit-Oriented Development presents both opportunities and challenges. While it offers the potential for sustainable urban growth and reduced car dependency, it also requires careful planning and policies to address issues of housing affordability and equitable development. By implementing the recommended policies and closely monitoring demographic and economic changes, Boston can create vibrant and inclusive neighborhoods around its transit network.