Analyzing Parking Trends in San Francisco, California
Samriddhi Khare, Roshini Ganesh
Final Project, MUSA 550
This project aims to utilize San Francisco’s open parking data to map and visualize parking availability, occupancy, and clustering trends within the city over the recent months/years. Data from numerous sources have been utilized to make inferences about parking trends in the city. These data repositories include:
Parking meter data to cross-validate areas of high parking activity by recorded transactions
The first step of this analysis comprises the essential tasks of loading necessary packages, configuring different APIs for data collection, and managing global environment settings.
Code
# Import packagesimport altair as altimport geopandas as gpdimport pandas as pdimport numpy as npimport hvplot.pandasimport pandas as pd#import seaborn as snsfrom matplotlib import pyplot as pltimport holoviews as hvfrom shapely.geometry import Polygonfrom shapely.geometry import MultiPolygonimport requestsimport geoviews as gvimport geoviews.tile_sources as gvtsimport foliumfrom folium import pluginsfrom shapely.geometry import Pointimport xyzservicesimport osmnx as oximport networkx as nximport pygrisimport cenpy%matplotlib inline# See lots of columnspd.options.display.max_rows =9999pd.options.display.max_colwidth =200# Hide warnings due to issue in shapely package # See: https://github.com/shapely/shapely/issues/1345np.seterr(invalid="ignore");
Data Wrangling
This step involves gathering data on parking for 2023 and preliminary data cleaning for the large dataset. All geospatial datasets are set to a uniform coordinate reference system, and boundary shapefiles are primed for use in OSM street network API.
Code
#np.seterr(invalid="ignore");#parking meter datameters = pd.read_csv('.\data\Parking_Meters_20231220.csv')# Convert to geodataframegeometry = [Point(xy) for xy inzip(meters['LONGITUDE'], meters['LATITUDE'])]meters = gpd.GeoDataFrame(meters, geometry=geometry)meters.crs ='EPSG:4326'meters = meters.to_crs('EPSG:3857')# neighborhoods in sf#neighborhoods = gpd.read_file("./data/Analysis_Neighborhoods.geojson")#neighborhoods = neighborhoods.to_crs('EPSG:3857')#bay area counties and sf county geometries#bay_area_counties = gpd.read_file("./data/bayarea_county.geojson")#bay_area_counties = bay_area_counties.to_crs('EPSG:4326')#sf_county = bay_area_counties[bay_area_counties['COUNTY'] == 'San Francisco']#sf_poly = sf_county.iloc[0]#sf_poly = sf_poly.geometry#from shapely.ops import cascaded_union#sf_poly = cascaded_union(sf_poly)#print(type(sf_poly))#bay_area_counties.head()
Parking meters in San Francicso
The interactive map below visually represents the distribution of parking meters in San Francisco, showcasing distinct levels of aggregation. Notably, a concentrated area with high meter density emerges in the northeast region, coinciding with the presence of major tech company headquarters. However, drawing conclusive insights from the map alone is challenging. Considerations such as street network density become key determinants of parking availability. Therefore, it is essential to contextualize this data with factors like street networks and population variables. It is crucial to recognize that a high availability of parking does not necessarily indicate an absence of scarcity; demand may still surpass supply.
Code
# All coordscoords = meters[["LATITUDE", "LONGITUDE"]] # Remember, (lat, lon) order# let's center the map on Philadelphiam = folium.Map( location=[37.77, -122.43], zoom_start=12, tiles=xyzservices.providers.CartoDB.DarkMatter)folium.plugins.FastMarkerCluster(data=coords).add_to(m)m
Make this Notebook Trusted to load map: File -> Trust Notebook
Open Street Map Data
To streamline the workflow with this large dataset, relevant OSM data is refined by excluding highways, where parking is not allowed. This ensures the dataset focuses solely on accessible areas with available parking spaces. A new graph is created and plotted to reflect only the non-highway streets.
# Filter out highways (e.g., motorways, trunk roads)non_highway_edges = [(u, v, key) for u, v, key, data in G.edges(keys=True, data=True) if'highway'notin data or'highway'in data and data['highway'] !='motorway']# Create a new graph with non-highway streetsG = G.edge_subgraph(non_highway_edges)# Plot the non-highway street network#ox.plot_graph(G, bgcolor='k', node_color='w', edge_color='w', node_size=5, edge_linewidth=0.5)
With this joined dataset as the base, the following transformations are performed:
First, the latitude and logitude attributes for the cleaned OSM data are defined in xy coordinates to allow calculations related to location.
Next, the nearest_edges function is used to determine the closest street edge to each parking meter.
Third, the city’s parking meter data is integrated with the OSM Street Network data using the merge function to associate parking-related details with their corresponding street locations and the larger surrounding road infrastructure.
The joined dataset is then cleaned to:
Drop columns that do not contribute to this study,
Eliminate streets that have zero parking meters, and
Remove outlier values to only retain street lengths between 0 and 100 meters. This is done to ensure that the count of meters per street is normalized across the dataset. This is important because a 5-mile street segment could inherently accommodate more meters than a 1-mile street segment. The constraints distill the original dataset into a comparable dataset of street & parking factors.
Code
sf_edges = ox.graph_to_gdfs(G, edges=True, nodes=False)G_projected = ox.project_graph(G, to_crs='EPSG:3857' )# Define the longitude and latitude columns in metersx = meters['geometry'].xy = meters['geometry'].y# Use the nearest_edges function to find the nearest edge for each parking meternearest_edges = ox.distance.nearest_edges(G_projected, X=meters.geometry.x, Y=meters.geometry.y)meters_nodes = pd.DataFrame(nearest_edges, columns=['u', 'v', 'key'])meters_nodes['Count'] =1grouped = meters_nodes.groupby(['u', 'v'])['Count'].sum().reset_index()merged_gdf = sf_edges.merge(grouped, on=['u', 'v'], how='left')merged_gdf = merged_gdf.loc[merged_gdf['Count'] >0]# List of columns to dropcolumns_to_drop = ['u', 'v', 'osmid', 'oneway', 'lanes', 'ref', 'maxspeed', 'reversed', 'access', 'bridge', 'junction', 'width', 'tunnel']# Drop the specified columnsmerged_gdf = merged_gdf.drop(columns=columns_to_drop)merged_gdf['truecount'] = merged_gdf['Count'] / merged_gdf['length']#removing outliers# Assuming merged_gdf is your DataFrame and 'length' is the name of the columncolumn_name ='length'# Create a boolean mask to filter rows based on the conditionmask = (merged_gdf[column_name] >=10) & (merged_gdf[column_name] <=100)# Apply the mask to the DataFramemerged_gdf = merged_gdf[mask]#merged_gdf.head()
Parking Meter Distribution Analysis:Number of Parking Meters by length per Street
On analyzing the number of parking meters by street segment, disparities in the distribution of parking meters become apparent. Parking meters tend to be concentrated in downtown areas such as Union Square and Fisherman’s Wharf. An interesting contrast however is observed in Nob Hill in the downtown area which exhibits a low density of parking meters. This prompts an inquiry into the factors contributing to variations in meter density even within prominent city centers. In the subsequent sections of this study, demographic factors that affect parking distribution are explored.
While outside of the purview of this analysis, questions also arise about other factors that influence the siting of parking meters including meter make, meter activity, and meter revenue, which may be important contributors to parking siting decisions. These are to be included in future iterations of this study.