About This Project
In this assignment at the University of Pennsylvania Stuart Weitzman
School of Design, CPLN 675 Land Use and Environmental Modeling, our goal
was to develop a predictive flood inundation model using Calgary,
Alberta, Canada as the base geography on which to train a model that
could be applied to other cities, using the same factors.
For this model we selected the factors of: - Soil Drainage Efficacy
(i.e., soil type and its capacity to allow water absorption) - Water
Flow Accumulation (i.e., how much–and where–water collects
topographically?) - Surface Permeability (i.e., whether impervious
surfaces, such as buildings and concrete, sit on a given segment of land
area) - Distance to Water Bodies (i.e., how far is a given point in land
from where water collects?)
Setup & Visualizing Our Dependent Variable and Factors
Below, we walk through the steps of setting up and visualizing our
data, to help us to understand the apparent relationship between our
dependent variable, historic flooding (hist_flood), and our four factors
in the Calgay geography.
Importing Preliminary Spatial Data
First, we load our two cities’ boundaries and some other preliminary
datasets, and put them in web Mercator projections (CRS 3395).
calgary_boundary <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_BOUNDARY.geojson")
calgary_boundary <- calgary_boundary %>%
st_transform(crs = 3395)
boise_boundary <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_BOUNDARY.geojson")
boise_boundary <- boise_boundary %>%
st_transform(crs = 3395)
#HYDROLOGY
#calgary_hydrology <- st_read("C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Data_Calgary/CalgaryHydrology.shp")
#calgary_hydrology <- calgary_hydrology %>%
# st_transform(crs = 3395)
#boise_hydrology1 <- st_read("C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Data_Boise/BoiseHydrology1.shp")
#boise_hydrology2 <- st_read("C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Data_Boise/BoiseHydrology2.shp")
# boise_hydrology1 <- boise_hydrology1 %>%
# st_transform(crs = 3395)
#boise_hydrology2 <- boise_hydrology2 %>%
# st_transform(crs = 3395)
calgary_master_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_MASTERFISH.geojson")
boise_master_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_MASTERFISH.geojson")
Creating Fishnets
Now, we use R to create fishnets for the two cities with the same
cell size and projections, to ensure consistency among forthcoming
maps.
calgary_fishnet <- st_make_grid(calgary_boundary,
cellsize = 500,
square = FALSE) %>%
.[calgary_boundary] %>%
st_sf() %>%
mutate(uniqueID = rownames(.))
ggplot()+
geom_sf(data = calgary_fishnet,
fill = "pink")+
geom_sf(data = calgary_boundary,
color = "black", fill = "transparent") +
mapTheme

boise_fishnet <- st_make_grid(boise_boundary,
cellsize = 500,
square = FALSE) %>%
.[boise_boundary] %>%
st_sf() %>%
mutate(uniqueID = rownames(.))
ggplot()+
geom_sf(data = boise_fishnet,
fill = "lightblue")+
geom_sf(data = boise_boundary,
color = "black", fill = "transparent") +
mapTheme

#st_write(calgary_fishnet, "C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Derived Files/calgary_fishnet.shp", geometry?=?TRUE)
#st_write(boise_fishnet, "C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Derived Files/boise_fishnet.shp", geometry?=?TRUE)
Dependent Variable: Flooding Maps
Now, we visualize our two geographies with their outlays of
historical flooding. Considering historical flooding is the dependent
variable which all of our factors in our model are trying to predict, it
is good to first understand where flooding occurs as a baseline of our
model’s success.
grid.arrange(ncol = 2,
ggplot() +
labs(title="Calgary, Historical Flooding and Hydrology",
subtitle="Calgary; Yellow = Historical flooding observed")+
geom_sf(data = calgary_boundary,
color = "black")+
geom_sf(data=calgary_master_fish %>%
filter(hist_flood == 6),
fill="yellow",alpha = 0.6,color='transparent') +
# geom_sf(data=calgary_hydrology,
# fill = 'light blue')+
mapTheme,
ggplot() +
geom_sf(data = boise_boundary,
color = "black")+
geom_sf(data=boise_master_fish %>%
filter(hist_flood == 6),
fill="yellow",alpha = 0.6,color='transparent') +
#geom_sf(data=boise_hydrology1,
# fill = 'light blue')+
# geom_sf(data=boise_hydrology2,
# fill = 'light blue')+
mapTheme+
labs(title="Boise, Historical Flooding and Hydrology",
subtitle="Boise; Yellow = Historical flooding observed"))

Importing Other Files for Factors and Map Elements
Here, we import our spatial data to map our factors and train our
model. Most of this data was cleaned, derived, and assembled using
ArcGIS Pro.
Mapping
Calgary
Here, we create maps of our four factors for our base geography,
Calgary.

#IMPERVIOUS v. PERVIOUS SURFACES -- Good
ggplot() +
geom_sf(data = calgary_boundary,
color = "black")+
geom_sf(data = calgary_surf_fish,
aes(fill = Pervious),
color = "transparent")+
scale_fill_viridis('Surface Permeability (6 = least)',
direction = -1, # Direction of the color palette - either 1 or -1
option = 'F', # Viridis has multiple palettes to choose from
alpha = 0.6)+
labs(title="Surface Permeability") +
mapTheme

#DISTANCE TO WATER BODY -- Good
ggplot() +
geom_sf(data = calgary_boundary,
color = "black")+
geom_sf(data = calgary_disth2o_fish,
aes(fill = gridcode),
color = "transparent")+
scale_fill_viridis('Distance to Water (6 = closest)',
direction = -1, # Direction of the color palette - either 1 or -1
option = 'F',
alpha = 0.6)+
labs(title="Distance to Water Body") +
mapTheme

#FLOW ACCUMULATION -- good
ggplot() +
geom_sf(data = calgary_boundary,
color = "black")+
geom_sf(data = calgary_flow_fish,
aes(fill = FlowAcc),
color = "transparent")+
scale_fill_viridis('Flow Accumulation (6 = greatest)',
direction = -1,
option = 'F',
alpha = 0.6)+
labs(title="Water Flow Accumulation") +
mapTheme

Boise
Likewise, here we visualize the same four factors within Boise’s
boundaries.
###Boise
#SOIL DRAINAGE -- Good
ggplot() +
geom_sf(data = boise_boundary,
fill = "transparent", color = "black")+
geom_sf(data = boise_drain_fish,
aes(fill = DrainRank),
color = "transparent")+
scale_fill_viridis('Soil Drainage Efficacy (6 = best)',
direction = -1, # Direction of the color palette - either 1 or -1
option = 'F', # Viridis has multiple palettes to choose from
alpha = 0.6)+
labs(title="Soil Type & Drainage") +
mapTheme

#FLOW ACCUMULATION -- good
ggplot() +
geom_sf(data = boise_boundary,
fill = "transparent", color = "black")+
geom_sf(data = boise_flow_fish,
aes(fill = gridcode),
color = "transparent")+
scale_fill_viridis('Flow Accumulation (6 = greatest)',
direction = -1,
option = 'F',
alpha = 0.6)+
labs(title="Water Flow Accumulation") +
mapTheme

#IMPERVIOUS v. PERVIOUS SURFACES -- good
ggplot() +
geom_sf(data = boise_boundary,
color = "black")+
geom_sf(data = boise_surf_fish,
aes(fill = gridcode),
color = "transparent")+
scale_fill_viridis('Surface Permeability (6 = least)',
direction = -1,
option = 'F',
alpha = 0.6)+
labs(title="Surface Permeability") +
mapTheme

#DISTANCE TO WATER BODY -- good
ggplot() +
geom_sf(data = boise_boundary,
color = "black")+
geom_sf(data = boise_disth2o_fish,
aes(fill = gridcode),
color = "transparent")+
scale_fill_viridis('Distance to Water (6 = closest)',
direction = -1, # Direction of the color palette - either 1 or -1
option = 'F',
alpha = 0.6)+
labs(title="Distance to Water Body") +
mapTheme

Data Wrangling
Here, we clean the tabular data underlying the spatial data to
prepare it for running through the model.
Plotting
In these plots, we see how our factors in Calgary average out in
areas within the city that have flooding versus those that have not
flooded. We notice that areas that have soil with better drainage
efficacy (DrainRank) have been more likely to not flood. Likewise, areas
with lots of flow accumulation are more likely to flood than areas with
minimal accumulation.
Model Building
Here, we walk through the steps of building and training our model,
and then applying that model to our other geography, Boise.
Binomial Model
Here, we employ a glm linear model, which finds that soil drainage
efficacy (“DrainRank”) and Flow Accumulation (“FlowAcc”) are extremely
statistically significant in their relationship to Calgary’s historical
flooding. Distance to water body (“disth2o_gr”) was also found to be
statistically significant. Surface permeability (“Pervious”), however,
did not have a very strong relationship with observed flooding in
Calgary.
calgaryModel <- glm(hist_flood ~ DrainRank + FlowAcc + Pervious + disth2o_gr,
family="binomial"(link="logit"), data = calgaryTrain1 %>%
as.data.frame() %>% dplyr::select(-geometry,-OBJECTID,hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr))
summary(calgaryModel)
##
## Call:
## glm(formula = hist_flood ~ DrainRank + FlowAcc + Pervious + disth2o_gr,
## family = binomial(link = "logit"), data = calgaryTrain1 %>%
## as.data.frame() %>% dplyr::select(-geometry, -OBJECTID,
## hist_flood, DrainRank, FlowAcc, Pervious, disth2o_gr))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.829715 0.604726 1.372 0.170047
## DrainRank -1.573263 0.171256 -9.187 < 2e-16 ***
## FlowAcc 0.495187 0.058706 8.435 < 2e-16 ***
## Pervious 0.007057 0.045789 0.154 0.877513
## disth2o_gr 0.121958 0.034912 3.493 0.000477 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2026.9 on 7153 degrees of freedom
## Residual deviance: 1718.4 on 7149 degrees of freedom
## AIC: 1728.4
##
## Number of Fisher Scoring iterations: 8
Model Validation
The items below interrogate how successful our model is at predicting
Calgary’s flooding, using several different visualizations and
statistical evaluations. In both the histogram and filled line graph,
our predictions are that flooded areas have about a .5 chance of
flooding, while there is more nuance for areas that do not have
historical flooding.
classProbs <- predict(calgaryModel, calgaryTest1, type="response")
hist(classProbs)

testProbs <- data.frame(obs = as.numeric(calgaryTest1$hist_flood),
pred = classProbs)
ggplot(testProbs, aes(x = pred, fill=as.factor(obs))) +
geom_density() +
facet_grid(obs ~ .) +
xlab("Probability") +
geom_vline(xintercept = .5) +
scale_fill_manual(values = c("#669999", "#00CCCC"),
labels = c("Not Flooded","Flooded"),
name = "") +
plotTheme

Confusion Metrics
The confusion metrics predict a high degree of accuracy with our
Calgary model, while still generating false positives and false
negatives.
testProbs$predClass = ifelse(testProbs$pred > .05 ,1,0)
#predicting that if its greater than 50% its a 1, less is 0
caret::confusionMatrix(reference = as.factor(testProbs$obs),
data = as.factor(testProbs$predClass),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 2454 43
## 1 537 32
##
## Accuracy : 0.8108
## 95% CI : (0.7965, 0.8246)
## No Information Rate : 0.9755
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0587
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.42667
## Specificity : 0.82046
## Pos Pred Value : 0.05624
## Neg Pred Value : 0.98278
## Prevalence : 0.02446
## Detection Rate : 0.01044
## Detection Prevalence : 0.18558
## Balanced Accuracy : 0.62356
##
## 'Positive' Class : 1
##
## How it works:
# **Predicted = 0, Observed = 0 -> True Negative**
# **Predicted = 1, Observed = 1 -> True Positive**
# **Predicted = 1, Observed = 0 -> False Positive**
# **Predicted = 0, Observed = 1 -> False Negative**
# **1. Sensitivity - the proportion of actual positives (1's) that were predicted to be positive. Also known as "true positive rate".**
# **2. Specificity - The proportion of actual negatives (0's) that were predicted to be negatives. Also known as "true negative rate".**
ROC Curve
Given that the ROC Curve is not bowed out to the top and left edges,
the model is not tremendously over-fit.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Area under the curve: 0.7751
Cross Validation
Cross validation tests the strength of our model on sample data. We
test our model on 100 test sets, and then we estimate the strength of
our model.
ctrl<- trainControl(method = "cv",
number = 100,
p = 0.7,
savePredictions = TRUE)
cvFit <- train(as.factor(hist_flood) ~ ., data =
calgary_vars %>%
as.data.frame() %>%
dplyr::select(hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr),
method="glm", family="binomial",
trControl = ctrl)
cvFit
## Generalized Linear Model
##
## 10220 samples
## 4 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (100 fold)
## Summary of sample sizes: 10118, 10117, 10118, 10118, 10118, 10117, ...
## Resampling results:
##
## Accuracy Kappa
## 0.9702579 0
The histogram of our cross validation shows that all tests of our
model are highly accurate.
ggplot(as.data.frame(cvFit$resample), aes(Accuracy)) +
geom_histogram() +
scale_x_continuous(limits = c(0, 1)) +
labs(x="Accuracy",
y="Count")+
plotTheme

Attaching Predictions to Calgary Model
In the final step of building our Calgary model, we attach our
model’s predictions to our original Calgary fishnet…
Visualizing Calgary Predictions
…and we visualize our model’s prediction of flooding against the
historical data of observed flooding.
ggplot() +
geom_sf(data=calgary1, aes(fill=factor(ntile(allPredictions,5))),
colour=NA) +
scale_fill_manual(values=c("#DEF5E5FF","#49C1ADFF","#357BA2FF","#3E356BFF","#0B0405FF"),
labels=as.character(quantile(calgary1$allPredictions,
c(0, .25, .5, .75, 1),
na.rm=T)),
name="Predicted Quintile\nProbabilities(%)\n(Darkest=\nHighest)") +
geom_sf(data=calgary1 %>%
filter(hist_flood == 1),
fill="yellow",alpha = .4, colour="NA") +
mapTheme +
labs(title="Observed and Predicted Flooding Areas",
subtitle="Calgary; Yellow = Historical flooding; Gradient = Predicted flooding (Darkest = Most)")

We see that our Calgary model has some unusual vertical striping. We
are not sure why this happened, but suspect it bay be the result of
incongruity units among our datasets visualized. We elaborate below as
to the potential implications of this, and how we would go about
adjusting our model to try to rectify this incorrect visualization.
Calgary Confusion Metrics:
By running confusion metrics, we see that our model predicts true
negatives and true positives well. However, it has a lot of false
positives because soil drainage efficacy appears to be over-weighted,
affecting the shape of the model and the confusion metrics
visualization.
calgary1 %>%
mutate(confResult=case_when(allPredictions < 4 & hist_flood==0 ~ "True_Negative",
allPredictions >= 1 & hist_flood==1 ~ "True_Positive",
allPredictions < 4 & hist_flood==1 ~ "False_Negative",
allPredictions >= 4 & hist_flood==0 ~ "False_Positive")) %>%
ggplot()+
geom_sf(aes(fill = confResult), color = "transparent")+
scale_fill_manual(values = c("Red","Light Blue","Orange","Pink"),
name="Outcomes")+
labs(title="Calgary Confusion Metrics") +
mapTheme

Data Wrangling - Boise
Here, we pull in our Boise data, which is our original fishnet, but
fully populated with Boise’s data on the four factors on which we
trained our Calgary model.
boise_master_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_MASTERFISH.geojson")
## Reading layer `BOISE_MASTERFISH' from data source
## `https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_MASTERFISH.geojson'
## using driver `GeoJSON'
## Simple feature collection with 2255 features and 15 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -12954270 ymin: 5360116 xmax: -12923270 ymax: 5388839
## Projected CRS: WGS 84 / World Mercator
boise_vars <-
boise_master_fish %>%
dplyr::select(hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr,OBJECTID)
boise_vars <- boise_vars %>% mutate(hist_flood=recode(hist_flood, '1'='0', '6'='1'))
boise_vars <- boise_vars %>% mutate(hist_flood = as.numeric(hist_flood))
boise_allPredictions <-
predict(cvFit, boise_vars, type="prob")[,2]
boise1 <-
cbind(boise_vars,boise_allPredictions) %>%
mutate(boise_allPredictions = round(boise_allPredictions * 100))
Testing the Calgary Flood Model on Boise
Here, we apply our Calgary flood model to our other geography, Boise.
We attach the prediction values, trained on the Calgary model, to Boise,
based on Boise’s spatial distribution of the four factors that we input
to our Calgary model.
ggplot() +
geom_sf(data=boise1, aes(fill=factor(ntile(boise_allPredictions,5))),
colour=NA) +
scale_fill_manual(values = c("#DEF5E5FF","#49C1ADFF","#357BA2FF","#3E356BFF","#0B0405FF"),
labels=as.character(quantile(boise1$boise_allPredictions,
c(0, .25, .5, .75, 1),
na.rm=T)),
name="Predicted Quintile\nProbabilities(%)\n(Darkest=\nHighest)") +
geom_sf(data=boise1 %>%
filter(hist_flood == 1),
fill="yellow",alpha = 0.5, colour="NA") +
coord_sf(xlim = c(-12955270, -12923270), ylim = c(5352033, 5394976), expand = FALSE) +
mapTheme +
labs(title="Observed and Predicted Flooding Areas",
subtitle="Boise; Yellow = Historical flooding; Gradient = Predicted flooding (Darkest = Most)")

---
title: "Predictive Flood Modelling for Boise, Idaho"
author: "Lindsey Hover, Samriddhi Khare & Anastasia Osorio"
date: "4/6/2023"
output: 
  html_document:
    theme: readable
    toc: true
    toc_float: true
    code_download: true
    code_folding: hide
editor_options: 
  markdown: 
    wrap: 72
---

```{r eval=FALSE, include=FALSE}
rm(list=ls())
#setwd("C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/RScripts")

install.packages("rmarkdown") 
install.packages("knitr")
install.packages("caret") 
install.packages("pscl")
install.packages("plotROC") install.packages("pROC")
install.packages("kableExtra") install.packages("gridExtra")
install.packages("raster") 
install.packages("kableExtra") install.packages("geojsonio)

```

```{r setup, include=FALSE,message = FALSE,cache=TRUE}
knitr::opts_chunk$set(echo = TRUE)
options(scipen=999)
library(knitr)
```

```{r libraries, message=FALSE, warning=FALSE, include=FALSE}
library(caret)
library(plotROC)
library(pROC)
library(sf)
library(tidyverse)
library(knitr)
library(kableExtra)
library(tigris)
library(viridis)
library(ggplot2)
library(gridExtra)
library(raster)
#library(rgdal)
library(kableExtra)
library(geojsonio)

```

```{r mapTheme, include=FALSE}
mapTheme <- theme(plot.title =element_text(size=12),
                  plot.subtitle = element_text(size=8),
                  plot.caption = element_text(size = 6),
                  axis.line=element_blank(),
                  axis.text.x=element_blank(),
                  axis.text.y=element_blank(),
                  axis.ticks=element_blank(),
                  axis.title.x=element_blank(),
                  axis.title.y=element_blank(),
                  panel.background=element_blank(),
                  panel.border=element_blank(),
                  panel.grid.major=element_line(colour = 'transparent'),
                  panel.grid.minor=element_blank(),
                  legend.direction = "vertical", 
                  legend.position = "right",
                  plot.margin = margin(1, 1, 1, 1, 'cm'),
                  legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))

plotTheme <- theme(
  plot.title =element_text(size=12),
  plot.subtitle = element_text(size=8),
  plot.caption = element_text(size = 6),
  axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
  axis.text.y = element_text(size = 10),
  axis.title.y = element_text(size = 10),
  # Set the entire chart region to blank
  panel.background=element_blank(),
  plot.background=element_blank(),
  #panel.border=element_rect(colour="#F0F0F0"),
  # Format the grid
  panel.grid.major=element_line(colour="#D0D0D0",size=.75),
  axis.ticks=element_blank())
```

# About This Project

In this assignment at the University of Pennsylvania Stuart Weitzman
School of Design, CPLN 675 Land Use and Environmental Modeling, our goal
was to develop a predictive flood inundation model using Calgary,
Alberta, Canada as the base geography on which to train a model that
could be applied to other cities, using the same factors.

For this model we selected the factors of: - Soil Drainage Efficacy
(i.e., soil type and its capacity to allow water absorption) - Water
Flow Accumulation (i.e., how much--and where--water collects
topographically?) - Surface Permeability (i.e., whether impervious
surfaces, such as buildings and concrete, sit on a given segment of land
area) - Distance to Water Bodies (i.e., how far is a given point in land
from where water collects?)

# Calgary and Boise

We selected Boise, Idaho as our comparable city.

# Setup & Visualizing Our Dependent Variable and Factors

Below, we walk through the steps of setting up and visualizing our data,
to help us to understand the apparent relationship between our dependent
variable, historic flooding (hist_flood), and our four factors in the
Calgay geography.

### Data Sources

-   <https://opendata.cityofboise.org/>
-   <https://opengisdata-idahodeq.opendata.arcgis.com/>
-   <https://insideidaho.org/browse-org.html#>
-   <https://apps.nationalmap.gov/downloader/>
-   <https://data.calgary.ca/>
-   <https://open.alberta.ca/opendata>

### Importing Preliminary Spatial Data

First, we load our two cities' boundaries and some other preliminary
datasets, and put them in web Mercator projections (CRS 3395).

```{r load_boundary_data, echo=TRUE, message=FALSE, warning=FALSE, results='hide'}
calgary_boundary <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_BOUNDARY.geojson")


calgary_boundary <- calgary_boundary %>% 
  st_transform(crs = 3395)
  
boise_boundary <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_BOUNDARY.geojson")

boise_boundary <- boise_boundary %>% 
  st_transform(crs = 3395)

#HYDROLOGY
#calgary_hydrology <- st_read("C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Data_Calgary/CalgaryHydrology.shp")
  #calgary_hydrology <- calgary_hydrology %>% 
   # st_transform(crs = 3395)

#boise_hydrology1 <- st_read("C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Data_Boise/BoiseHydrology1.shp")
#boise_hydrology2 <- st_read("C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Data_Boise/BoiseHydrology2.shp")
 # boise_hydrology1 <- boise_hydrology1 %>% 
  #    st_transform(crs = 3395)
  #boise_hydrology2 <- boise_hydrology2 %>% 
   #   st_transform(crs = 3395)
  
calgary_master_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_MASTERFISH.geojson")

boise_master_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_MASTERFISH.geojson")


```

### Creating Fishnets

Now, we use R to create fishnets for the two cities with the same cell
size and projections, to ensure consistency among forthcoming maps.

```{r fishnets, echo=TRUE}
calgary_fishnet <- st_make_grid(calgary_boundary,
                                cellsize = 500,
                                square = FALSE) %>% 
  .[calgary_boundary] %>% 
  st_sf() %>% 
  mutate(uniqueID = rownames(.))

ggplot()+
  geom_sf(data = calgary_fishnet,
          fill = "pink")+
  geom_sf(data = calgary_boundary, 
          color = "black", fill = "transparent") +
  mapTheme


boise_fishnet <- st_make_grid(boise_boundary,
                                cellsize = 500,
                                square = FALSE) %>% 
  .[boise_boundary] %>% 
  st_sf() %>% 
  mutate(uniqueID = rownames(.))

ggplot()+
  geom_sf(data = boise_fishnet,
          fill = "lightblue")+
  geom_sf(data = boise_boundary, 
          color = "black", fill = "transparent") +
  mapTheme


#st_write(calgary_fishnet, "C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Derived Files/calgary_fishnet.shp", geometry?=?TRUE)
#st_write(boise_fishnet, "C:/Users/Lindsey/Desktop/CPLN675_Land_Use_Modeling_Desktop/Midterm/Derived Files/boise_fishnet.shp", geometry?=?TRUE)
```

### Dependent Variable: Flooding Maps

Now, we visualize our two geographies with their outlays of historical
flooding. Considering historical flooding is the dependent variable
which all of our factors in our model are trying to predict, it is good
to first understand where flooding occurs as a baseline of our model's
success.

```{r hist_flooding, echo=TRUE}
grid.arrange(ncol = 2,
  ggplot() +
    labs(title="Calgary, Historical Flooding and Hydrology",
       subtitle="Calgary; Yellow = Historical flooding observed")+
  geom_sf(data = calgary_boundary,
           color = "black")+
    geom_sf(data=calgary_master_fish  %>% 
               filter(hist_flood == 6),
           fill="yellow",alpha = 0.6,color='transparent') +
  # geom_sf(data=calgary_hydrology,
   #        fill = 'light blue')+
    mapTheme,
  
  ggplot() +
    geom_sf(data = boise_boundary,
           color = "black")+
    geom_sf(data=boise_master_fish  %>% 
               filter(hist_flood == 6),
           fill="yellow",alpha = 0.6,color='transparent') +
   #geom_sf(data=boise_hydrology1,
    #       fill = 'light blue')+
    # geom_sf(data=boise_hydrology2,
     #     fill = 'light blue')+
    mapTheme+
  labs(title="Boise, Historical Flooding and Hydrology",
       subtitle="Boise; Yellow = Historical flooding observed"))
```

### Importing Other Files for Factors and Map Elements

Here, we import our spatial data to map our factors and train our model.
Most of this data was cleaned, derived, and assembled using ArcGIS Pro.

```{r fish_shps, message=FALSE, warning=FALSE, include=FALSE, results='hide'}


#SOIL DRAINAGE
calgary_drain_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_DRAINAGE.geojson")
  calgary_drain_fish <- calgary_drain_fish %>% 
    st_transform(crs = 3395)
  
boise_drain_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_DRAINAGE.geojson")
  boise_drain_fish <- boise_drain_fish %>% 
    st_transform(crs = 3395)
    
#FLOW ACCUMULATION -- USES MASTER FILE!!
calgary_flow_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_MASTERFISH.geojson")
  calgary_flow_fish <- calgary_flow_fish %>% 
    st_transform(crs = 3395)

boise_flow_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_FLOW_ACCU.geojson")

#IMPERVIOUS v. PERVIOUS SURFACES
calgary_surf_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_SURFACE.geojson")
  calgary_surf_fish <- calgary_surf_fish %>% 
    st_transform(crs = 3395) 

boise_surf_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_SURFACE.geojson")

#DISTANCE TO WATER BODY
calgary_disth2o_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_DIST_TO_WATER.geojson")
  calgary_disth2o_fish <- calgary_disth2o_fish %>% 
    st_transform(crs = 3395)
  
boise_disth2o_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_DIST_TO_WATER.geojson")
```

# Mapping

#### Calgary

Here, we create maps of our four factors for our base geography,
Calgary.

```{r calgary_factor_maps, echo=FALSE, message=FALSE, warning=FALSE, img_width = 10 }
###Calgary
  #SOIL DRAINAGE -- Good
  ggplot() +
    geom_sf(data = calgary_boundary,
           color = "black")+
    geom_sf(data = calgary_drain_fish,
            aes(fill = DrainageRa),
            color = "transparent")+
    scale_fill_viridis('Soil Drainage Efficacy (6 = best)',
                       direction = -1, # Direction of the color palette - either 1 or -1
                       option = 'F',  # Viridis has multiple palettes to choose from
                       alpha = 0.6)+  
    labs(title="Soil Type & Drainage") +
    mapTheme
 

```

```{r}
#IMPERVIOUS v. PERVIOUS SURFACES -- Good
ggplot() +
  geom_sf(data = calgary_boundary,
           color = "black")+
  geom_sf(data = calgary_surf_fish,
          aes(fill = Pervious),
          color = "transparent")+
  scale_fill_viridis('Surface Permeability (6 = least)',
                     direction = -1, # Direction of the color palette - either 1 or -1
                     option = 'F',  # Viridis has multiple palettes to choose from
                     alpha = 0.6)+  
  labs(title="Surface Permeability") +
  mapTheme

```

```{r}

#DISTANCE TO WATER BODY -- Good
ggplot() +
  geom_sf(data = calgary_boundary,
          color = "black")+
  geom_sf(data = calgary_disth2o_fish,
          aes(fill = gridcode),
          color = "transparent")+
  scale_fill_viridis('Distance to Water (6 = closest)',
                     direction = -1, # Direction of the color palette - either 1 or -1
                     option = 'F',
                     alpha = 0.6)+  
  labs(title="Distance to Water Body") +
  mapTheme

```

```{r}

  #FLOW ACCUMULATION -- good
  ggplot() +
    geom_sf(data = calgary_boundary,
            color = "black")+
    geom_sf(data = calgary_flow_fish,
            aes(fill = FlowAcc),
            color = "transparent")+
    scale_fill_viridis('Flow Accumulation (6 = greatest)',
                       direction = -1, 
                       option = 'F', 
                       alpha = 0.6)+  
    labs(title="Water Flow Accumulation") +
    mapTheme
  
```




#### Boise

Likewise, here we visualize the same four factors within Boise's
boundaries.

```{r boise_factor_maps, echo=TRUE, message=FALSE, warning=FALSE}
###Boise
  #SOIL DRAINAGE -- Good
  ggplot() +
   geom_sf(data = boise_boundary,
                         fill = "transparent", color = "black")+
                 geom_sf(data = boise_drain_fish,
                         aes(fill = DrainRank),
                         color = "transparent")+
                 scale_fill_viridis('Soil Drainage Efficacy (6 = best)',
                                    direction = -1, # Direction of the color palette - either 1 or -1
                                    option = 'F',  # Viridis has multiple palettes to choose from
                                    alpha = 0.6)+  
                 labs(title="Soil Type & Drainage") +
                 mapTheme
```


```{r}
               
#FLOW ACCUMULATION -- good
  ggplot() +
   geom_sf(data = boise_boundary,
                         fill = "transparent", color = "black")+
                 geom_sf(data = boise_flow_fish,
                         aes(fill = gridcode),
                         color = "transparent")+
                 scale_fill_viridis('Flow Accumulation (6 = greatest)',
                                    direction = -1, 
                                    option = 'F',  
                                    alpha = 0.6)+  
                 labs(title="Water Flow Accumulation") +
                 mapTheme

```

```{r}
               
#IMPERVIOUS v. PERVIOUS SURFACES -- good
  ggplot() +
    geom_sf(data = boise_boundary,
                         color = "black")+
                 geom_sf(data = boise_surf_fish,
                         aes(fill = gridcode),
                         color = "transparent")+
                 scale_fill_viridis('Surface Permeability (6 = least)',
                                    direction = -1, 
                                    option = 'F',  
                                    alpha = 0.6)+  
                 labs(title="Surface Permeability") +
                 mapTheme

```

```{r}
               
#DISTANCE TO WATER BODY -- good
ggplot() +
  geom_sf(data = boise_boundary,
                         color = "black")+
                 geom_sf(data = boise_disth2o_fish,
                         aes(fill = gridcode),
                         color = "transparent")+
                 scale_fill_viridis('Distance to Water (6 = closest)',
                                    direction = -1, # Direction of the color palette - either 1 or -1
                                    option = 'F',
                                    alpha = 0.6)+  
                 labs(title="Distance to Water Body") +
                 mapTheme
```

### Data Wrangling

Here, we clean the tabular data underlying the spatial data to prepare
it for running through the model.

```{r calgary_data_manipulation, include=FALSE}
calgary_master_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/calgary_geojsons/calgary_geojsons/CALGARY_MASTERFISH.geojson")

  calgary_vars <- 
    calgary_master_fish %>% 
    dplyr::select(hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr,OBJECTID)
  
  calgary_vars <- calgary_vars %>% mutate(hist_flood=recode(hist_flood, '1'='0', '6'='1'))
  calgary_vars <- calgary_vars %>% mutate(hist_flood = as.numeric(hist_flood))
  
```

### Plotting

In these plots, we see how our factors in Calgary average out in areas
within the city that have flooding versus those that have not flooded.
We notice that areas that have soil with better drainage efficacy
(DrainRank) have been more likely to not flood. Likewise, areas with
lots of flow accumulation are more likely to flood than areas with
minimal accumulation.

```{r flood_or_not_plots, include=FALSE}
  calgary_plot <-
    calgary_vars %>% 
    as.data.frame() %>% 
    dplyr::select(hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr) %>% 
      gather(variable, value, -hist_flood)
 
  # 6 = previously flooded ; 1 = not flooded
       ggplot(calgary_plot %>%
                group_by(hist_flood, variable) %>%
                summarize(mean = mean(value))) + 
         geom_bar(aes(as.factor(hist_flood), 
                      mean, 
                      fill=as.factor(hist_flood)),
                  stat="identity") + 
         facet_wrap(~variable, ncol=5, scales = "free") +
         scale_fill_manual(values = c("#669999", "#00CCCC"),
                           labels = c("Not Flooded","Flooded"),
                           name = "") +
         labs(x="Flooded or Not", y="Value")
```

# Model Building

Here, we walk through the steps of building and training our model, and
then applying that model to our other geography, Boise.

```{r model_train, include=FALSE}

trainIndex1 <- createDataPartition(calgary_vars$hist_flood, 
                                  p = .70,
list = FALSE,
times = 1)
       
       calgaryTrain1 <- calgary_vars[ trainIndex1,]
       calgaryTest1  <- calgary_vars[-trainIndex1,]
```

### Binomial Model

Here, we employ a glm linear model, which finds that soil drainage
efficacy ("DrainRank") and Flow Accumulation ("FlowAcc") are extremely
statistically significant in their relationship to Calgary's historical
flooding. Distance to water body ("disth2o_gr") was also found to be
statistically significant. Surface permeability ("Pervious"), however,
did not have a very strong relationship with observed flooding in
Calgary.

```{r binomial, echo=TRUE}


calgaryModel <- glm(hist_flood ~ DrainRank + FlowAcc + Pervious + disth2o_gr,
                            family="binomial"(link="logit"), data = calgaryTrain1 %>%
                            as.data.frame() %>% dplyr::select(-geometry,-OBJECTID,hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr))                     

summary(calgaryModel)

```

### Model Validation

The items below interrogate how successful our model is at predicting
Calgary's flooding, using several different visualizations and
statistical evaluations. In both the histogram and filled line graph,
our predictions are that flooded areas have about a .5 chance of
flooding, while there is more nuance for areas that do not have
historical flooding.

```{r validation, echo=TRUE}

classProbs <- predict(calgaryModel, calgaryTest1, type="response")

hist(classProbs)

testProbs <- data.frame(obs = as.numeric(calgaryTest1$hist_flood),
                        pred = classProbs)

ggplot(testProbs, aes(x = pred, fill=as.factor(obs))) + 
  geom_density() +
  facet_grid(obs ~ .) + 
  xlab("Probability") + 
  geom_vline(xintercept = .5) +
     scale_fill_manual(values = c("#669999", "#00CCCC"),
                           labels = c("Not Flooded","Flooded"),
                           name = "") +
    plotTheme
  

```

### Confusion Metrics

The confusion metrics predict a high degree of accuracy with our Calgary
model, while still generating false positives and false negatives.

```{r confusion_matrix, echo=TRUE, message=FALSE, warning=FALSE}
testProbs$predClass  = ifelse(testProbs$pred > .05 ,1,0)
#predicting that if its greater than 50% its a 1, less is 0

caret::confusionMatrix(reference = as.factor(testProbs$obs), 
                       data = as.factor(testProbs$predClass), 
                       positive = "1")



## How it works:
  # **Predicted = 0, Observed = 0 -> True Negative**
  # **Predicted = 1, Observed = 1 -> True Positive**
  #  **Predicted = 1, Observed = 0 -> False Positive**
  #  **Predicted = 0, Observed = 1 -> False Negative**
    # **1. Sensitivity - the proportion of actual positives (1's) that were predicted to be positive. Also known as "true positive rate".**
    # **2. Specificity - The proportion of actual negatives (0's) that were predicted to be negatives. Also known as "true negative rate".**
```

### ROC Curve

Given that the ROC Curve is not bowed out to the top and left edges, the
model is not tremendously over-fit.

```{r roc_curve, echo=FALSE}

ggplot(testProbs, aes(d = obs, m = pred)) + 
  geom_roc(n.cuts = 50, labels = FALSE) + 
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') 
```

```{r auc, echo=FALSE, warning=FALSE}
auc(testProbs$obs, testProbs$pred)
```

### Cross Validation

Cross validation tests the strength of our model on sample data. We test
our model on 100 test sets, and then we estimate the strength of our
model.

```{r k_fold, echo=TRUE, message=FALSE, warning=FALSE}
ctrl<- trainControl(method = "cv", 
                     number = 100, 
                     p = 0.7, 
                     savePredictions = TRUE)

cvFit <- train(as.factor(hist_flood) ~ .,  data =
                 calgary_vars %>% 
                 as.data.frame() %>% 
              dplyr::select(hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr), 
               method="glm", family="binomial",
               trControl = ctrl)

cvFit

```

The histogram of our cross validation shows that all tests of our model
are highly accurate.

```{r cv_hist, echo=TRUE, message=FALSE, warning=FALSE}
ggplot(as.data.frame(cvFit$resample), aes(Accuracy)) + 
  geom_histogram() +
  scale_x_continuous(limits = c(0, 1)) +
  labs(x="Accuracy",
       y="Count")+
  plotTheme
```

### Attaching Predictions to Calgary Model

In the final step of building our Calgary model, we attach our model's
predictions to our original Calgary fishnet...

```{r predict_whole, message=FALSE, warning=FALSE, include=FALSE}

allPredictions <- 
  predict(cvFit, calgary_vars, type="prob")[,2]

calgary_vars <- 
  cbind(calgary_vars,allPredictions) %>%
  mutate(allPredictions = round(allPredictions * 100)) 


calgary1 <- calgary_vars %>%
 mutate(PredClass = ifelse(allPredictions > 10, 1, 0))

calgary1 <- calgary1 %>%
 mutate(Correct = ifelse(PredClass == hist_flood, "1", "0"),
         Incorrect = ifelse(PredClass != hist_flood, "1", "0"))

```

### Visualizing Calgary Predictions

...and we visualize our model's prediction of flooding against the
historical data of observed flooding.

```{r predicted_map1, echo=TRUE, message=FALSE, warning=FALSE}
ggplot() + 
  geom_sf(data=calgary1, aes(fill=factor(ntile(allPredictions,5))), 
          colour=NA) +
   scale_fill_manual(values=c("#DEF5E5FF","#49C1ADFF","#357BA2FF","#3E356BFF","#0B0405FF"),
                    labels=as.character(quantile(calgary1$allPredictions,
                                                 c(0, .25, .5, .75, 1),
                                                 na.rm=T)),
                    name="Predicted Quintile\nProbabilities(%)\n(Darkest=\nHighest)") +
    geom_sf(data=calgary1  %>% 
               filter(hist_flood == 1), 
               fill="yellow",alpha = .4, colour="NA") +
  mapTheme +
  labs(title="Observed and Predicted Flooding Areas",
       subtitle="Calgary; Yellow = Historical flooding; Gradient = Predicted flooding (Darkest = Most)")
```

We see that our Calgary model has some unusual vertical striping. We are
not sure why this happened, but suspect it bay be the result of
incongruity units among our datasets visualized. We elaborate below as
to the potential implications of this, and how we would go about
adjusting our model to try to rectify this incorrect visualization.

### Calgary Confusion Metrics:

By running confusion metrics, we see that our model predicts true
negatives and true positives well. However, it has a lot of false
positives because soil drainage efficacy appears to be over-weighted,
affecting the shape of the model and the confusion metrics
visualization.

```{r error_map, echo=TRUE, message=FALSE, warning=FALSE}
calgary1 %>%
  mutate(confResult=case_when(allPredictions < 4 & hist_flood==0 ~ "True_Negative",
                              allPredictions >= 1 & hist_flood==1 ~ "True_Positive",
                              allPredictions < 4 & hist_flood==1 ~ "False_Negative",
                              allPredictions >= 4 & hist_flood==0 ~ "False_Positive")) %>%
  ggplot()+
  geom_sf(aes(fill = confResult), color = "transparent")+
  scale_fill_manual(values = c("Red","Light Blue","Orange","Pink"),
                    name="Outcomes")+
  labs(title="Calgary Confusion Metrics") +
  mapTheme

```

### Data Wrangling - Boise

Here, we pull in our Boise data, which is our original fishnet, but
fully populated with Boise's data on the four factors on which we
trained our Calgary model.

```{r echo=TRUE}

boise_master_fish <- st_read("https://raw.githubusercontent.com/ana-oso/CalgaryvBoise/0ab4aadf9e7d83409aefd5ac8dab3e6515240b1f/geojsons/boise_geojsons/boise_geojsons/BOISE_MASTERFISH.geojson")

boise_vars <- 
    boise_master_fish %>% 
    dplyr::select(hist_flood,DrainRank,FlowAcc,Pervious,disth2o_gr,OBJECTID)
  
boise_vars <- boise_vars %>% mutate(hist_flood=recode(hist_flood, '1'='0', '6'='1'))
boise_vars <- boise_vars %>% mutate(hist_flood = as.numeric(hist_flood))

boise_allPredictions <- 
  predict(cvFit, boise_vars, type="prob")[,2] 

boise1 <- 
  cbind(boise_vars,boise_allPredictions) %>% 
  mutate(boise_allPredictions = round(boise_allPredictions * 100))


```

# Testing the Calgary Flood Model on Boise

Here, we apply our Calgary flood model to our other geography, Boise. We
attach the prediction values, trained on the Calgary model, to Boise,
based on Boise's spatial distribution of the four factors that we input
to our Calgary model.

```{r echo=TRUE}
ggplot() + 
  geom_sf(data=boise1, aes(fill=factor(ntile(boise_allPredictions,5))), 
          colour=NA) +
  scale_fill_manual(values = c("#DEF5E5FF","#49C1ADFF","#357BA2FF","#3E356BFF","#0B0405FF"),
                    labels=as.character(quantile(boise1$boise_allPredictions,
                                                            c(0, .25, .5, .75, 1),
                                                 na.rm=T)),
                    name="Predicted Quintile\nProbabilities(%)\n(Darkest=\nHighest)") +
    geom_sf(data=boise1  %>% 
               filter(hist_flood == 1), 
               fill="yellow",alpha = 0.5, colour="NA") +
 coord_sf(xlim = c(-12955270, -12923270), ylim = c(5352033, 5394976), expand = FALSE) +
  mapTheme +
  labs(title="Observed and Predicted Flooding Areas",
       subtitle="Boise; Yellow = Historical flooding; Gradient = Predicted flooding (Darkest = Most)")

```

# Results

Unfortunately, it does not seem that our model works too well on Boise!
The areas where our model predicted that Boise would flood have
historically been free of flooding. While this may be a result of
inaccurate/uncorrelated factors, it could also be the result of data
incongruities that caused the visible, vertical striping on our Calgary
model.

### NEXT STEPS

If we were to revisit our model to tune it to be more accurate, we would
train it to be more sensitive to surface permeability, and to
de-prioritize soil drainage efficacy, which influenced the whole shape
of the prediction in ways that are not accurate to the historical flood
patterns. In order to do this, we might need to use a multivariate model
that allows us to manually weight one factor more highly against
another.

As a fist step in diagnosing why the model did not work very well on
Boise, would run the confusion metrics on Boise to see where there is
misalignment.
