Analyzing Parking Trends in San Francisco, California

Samriddhi Khare, Roshini Ganesh
Final Project, MUSA 550

This project aims to utilize San Francisco’s open parking data to map and visualize parking availability, occupancy, and clustering trends within the city over the recent months/years. Data from numerous sources have been utilized to make inferences about parking trends in the city. These data repositories include:

  1. Parking meter data to cross-validate areas of high parking activity by recorded transactions
  2. Census Bureau data using the API for the selected geography
  3. OSM Street Maps data for street network analysis

File setup and data collection

The first step of this analysis comprises the essential tasks of loading necessary packages, configuring different APIs for data collection, and managing global environment settings.

Code
# Import packages

import altair as alt
import geopandas as gpd
import pandas as pd
import numpy as np
import hvplot.pandas
import pandas as pd
#import seaborn as sns
from matplotlib import pyplot as plt
import holoviews as hv
from shapely.geometry import Polygon
from shapely.geometry import MultiPolygon
import requests
import geoviews as gv
import geoviews.tile_sources as gvts
import folium
from folium import plugins
from shapely.geometry import Point
import xyzservices
import osmnx as ox
import networkx as nx
import pygris
import cenpy



%matplotlib inline

# See lots of columns
pd.options.display.max_rows = 9999 
pd.options.display.max_colwidth = 200

# Hide warnings due to issue in shapely package 
# See: https://github.com/shapely/shapely/issues/1345
np.seterr(invalid="ignore");

Data Wrangling

This step involves gathering data on parking for 2023 and preliminary data cleaning for the large dataset. All geospatial datasets are set to a uniform coordinate reference system, and boundary shapefiles are primed for use in OSM street network API.

Code
#np.seterr(invalid="ignore");

#parking meter data

meters = pd.read_csv('.\data\Parking_Meters_20231220.csv')
# Convert to geodataframe

geometry = [Point(xy) for xy in zip(meters['LONGITUDE'], meters['LATITUDE'])]
meters = gpd.GeoDataFrame(meters, geometry=geometry)
meters.crs = 'EPSG:4326'

meters = meters.to_crs('EPSG:3857')

# neighborhoods in sf

#neighborhoods = gpd.read_file("./data/Analysis_Neighborhoods.geojson")

#neighborhoods = neighborhoods.to_crs('EPSG:3857')

#bay area counties and sf county geometries

#bay_area_counties = gpd.read_file("./data/bayarea_county.geojson")
#bay_area_counties = bay_area_counties.to_crs('EPSG:4326')


#sf_county =  bay_area_counties[bay_area_counties['COUNTY'] == 'San Francisco']

#sf_poly = sf_county.iloc[0]
#sf_poly = sf_poly.geometry

#from shapely.ops import cascaded_union

#sf_poly = cascaded_union(sf_poly)

#print(type(sf_poly))

#bay_area_counties.head()

Parking meters in San Francicso

The interactive map below visually represents the distribution of parking meters in San Francisco, showcasing distinct levels of aggregation. Notably, a concentrated area with high meter density emerges in the northeast region, coinciding with the presence of major tech company headquarters. However, drawing conclusive insights from the map alone is challenging. Considerations such as street network density become key determinants of parking availability. Therefore, it is essential to contextualize this data with factors like street networks and population variables. It is crucial to recognize that a high availability of parking does not necessarily indicate an absence of scarcity; demand may still surpass supply.

Code
# All coords
coords = meters[["LATITUDE", "LONGITUDE"]] # Remember, (lat, lon) order

# let's center the map on Philadelphia
m = folium.Map(
    location=[37.77, -122.43], zoom_start=12, tiles=xyzservices.providers.CartoDB.DarkMatter
)
folium.plugins.FastMarkerCluster(data=coords).add_to(m)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Open Street Map Data

To streamline the workflow with this large dataset, relevant OSM data is refined by excluding highways, where parking is not allowed. This ensures the dataset focuses solely on accessible areas with available parking spaces. A new graph is created and plotted to reflect only the non-highway streets.

Code
city_name = 'San Francisco, California, USA'
G = ox.graph.graph_from_place(city_name, network_type='drive', simplify=False, retain_all=True)
#ox.plot_graph(G, bgcolor='k', node_color='w', node_size=5, edge_color='w', edge_linewidth=0.5)
Code
# Filter out highways (e.g., motorways, trunk roads)
non_highway_edges = [(u, v, key) for u, v, key, data in G.edges(keys=True, data=True) if 'highway' not in data or 'highway' in data and data['highway'] != 'motorway']

# Create a new graph with non-highway streets
G = G.edge_subgraph(non_highway_edges)

# Plot the non-highway street network
#ox.plot_graph(G, bgcolor='k', node_color='w', edge_color='w', node_size=5, edge_linewidth=0.5)

With this joined dataset as the base, the following transformations are performed:

  1. First, the latitude and logitude attributes for the cleaned OSM data are defined in xy coordinates to allow calculations related to location.
  2. Next, the nearest_edges function is used to determine the closest street edge to each parking meter.
  3. Third, the city’s parking meter data is integrated with the OSM Street Network data using the merge function to associate parking-related details with their corresponding street locations and the larger surrounding road infrastructure.
  4. The joined dataset is then cleaned to:
    • Drop columns that do not contribute to this study,
    • Eliminate streets that have zero parking meters, and
    • Remove outlier values to only retain street lengths between 0 and 100 meters. This is done to ensure that the count of meters per street is normalized across the dataset. This is important because a 5-mile street segment could inherently accommodate more meters than a 1-mile street segment. The constraints distill the original dataset into a comparable dataset of street & parking factors.
Code
sf_edges = ox.graph_to_gdfs(G, edges=True, nodes=False)

G_projected = ox.project_graph(G, to_crs= 'EPSG:3857' )

# Define the longitude and latitude columns in meters
x = meters['geometry'].x
y = meters['geometry'].y

# Use the nearest_edges function to find the nearest edge for each parking meter
nearest_edges = ox.distance.nearest_edges(G_projected, X=meters.geometry.x, Y=meters.geometry.y)

meters_nodes = pd.DataFrame(nearest_edges, columns=['u', 'v', 'key'])
meters_nodes['Count'] = 1


grouped = meters_nodes.groupby(['u', 'v'])['Count'].sum().reset_index()
merged_gdf = sf_edges.merge(grouped, on=['u', 'v'], how='left')
merged_gdf = merged_gdf.loc[merged_gdf['Count'] > 0]

# List of columns to drop
columns_to_drop = ['u', 'v', 'osmid', 'oneway', 'lanes', 'ref', 'maxspeed', 'reversed', 'access', 'bridge', 'junction', 'width', 'tunnel']

# Drop the specified columns
merged_gdf = merged_gdf.drop(columns=columns_to_drop)

merged_gdf['truecount'] = merged_gdf['Count'] / merged_gdf['length']

#removing outliers

# Assuming merged_gdf is your DataFrame and 'length' is the name of the column
column_name = 'length'

# Create a boolean mask to filter rows based on the condition
mask = (merged_gdf[column_name] >= 10) & (merged_gdf[column_name] <= 100)

# Apply the mask to the DataFrame
merged_gdf = merged_gdf[mask]

#merged_gdf.head()

Parking Meter Distribution Analysis:Number of Parking Meters by length per Street

On analyzing the number of parking meters by street segment, disparities in the distribution of parking meters become apparent. Parking meters tend to be concentrated in downtown areas such as Union Square and Fisherman’s Wharf. An interesting contrast however is observed in Nob Hill in the downtown area which exhibits a low density of parking meters. This prompts an inquiry into the factors contributing to variations in meter density even within prominent city centers. In the subsequent sections of this study, demographic factors that affect parking distribution are explored.

While outside of the purview of this analysis, questions also arise about other factors that influence the siting of parking meters including meter make, meter activity, and meter revenue, which may be important contributors to parking siting decisions. These are to be included in future iterations of this study.

Code
merged_gdf.explore(tiles='cartodbdark_matter', column = 'truecount')
Make this Notebook Trusted to load map: File -> Trust Notebook