Introduction

Divvy is a public bike sharing system owned by the Chicago Department of Transportation and operated by Motivate. Divvy bikes were first introduced to Chicago in 2013, and since then the network has expanded from 750 to 6,000 bikes. In 2016, Divvy receieved funding from the state to expand to Evanston, the suburb directly north of Chicago. Currently, there are 11 Evanston Divvy stations, which are distributed throughout the city.

Bike sharing systems have a range of uses for the public. Divvys are used for everything from scenic tourist rides along the Lakefront Trail to bike courier deliveries to daily commutes to the office (1). Additionally, bikes can be used as a Divvy member (‘Subscriber’ in the data below), or they can be rented for individual uses (‘Customer’ in the data).

Transportation planners have studied bike sharing and found that the benefits of bike sharing systems, while promising, are distributed unequally among the population, favoring younger male users (2). Additionally, Divvy bikes potentially offset emissions from other transportation sources if the user rides instead of drives.

This project is intended to lead to a better understanding of how the public uses Evanston Divvy infrastructure. I hope to show what gender and age demographics are using Divvy bikes, how Divvy use varies between seasons, and which trips within, to, and from Evanston are most frequently traveled on Divvys.

Based on background research, I expect that there will be a greater proportion of male riders on average, and that seasonal weather changes will significantly affect the number of Divvy rides taken. I also expect to find a higher number of rides by younger users closer to Northwestern University’s campus in Evanston.

Data & Methods

Data Prep

library(dplyr);library(data.table)

Divvy System Data for Plotting

https://www.divvybikes.com/system-data provides open access data for all Divvy trips (3). The following code downloads Divvy data for the entire Chicago network from Q3 2016 to Q4 2018 (the whole period where Evanston Divvy stations were operational). The downlaoded files are subsetted to remove the trip duration column, which is flawed in some of the files (we will recalculate later). Column names are standardized across data tables using the setNames function.

As a result, q1:q10 variables store a data table for each quarter of Evanston-relevant Divvy data.

q1<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2016_Q3Q4.zip | tar -xf- --to-stdout *Trips_2016_Q3.csv")[,-5] 
q2<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2016_Q3Q4.zip | tar -xf- --to-stdout *Trips_2016_Q4.csv")[,-5] %>% setNames(colnames(q1))
q3<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2017_Q1Q2.zip | tar -xf- --to-stdout *Trips_2017_Q1.csv")[,-5] %>% setNames(colnames(q1))  
q4<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2017_Q1Q2.zip | tar -xf- --to-stdout *Trips_2017_Q2.csv")[,-5] %>% setNames(colnames(q1))
q5<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2017_Q3Q4.zip | tar -xf- --to-stdout *Trips_2017_Q3.csv")[,-5] %>% setNames(colnames(q1))
q6<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2017_Q3Q4.zip | tar -xf- --to-stdout *Trips_2017_Q4.csv")[,-5] %>% setNames(colnames(q1))
q7<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2018_Q1.zip | funzip")[,-5] %>% setNames(colnames(q1))
q8<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2018_Q2.zip | funzip")[,-5] %>% setNames(colnames(q1))
q9<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2018_Q3.zip | funzip")[,-5] %>% setNames(colnames(q1))
q10<-fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2018_Q4.zip | funzip")[,-5] %>% setNames(colnames(q1))

The data for different quarters is not all standardized- q1:q5, q6,and q7:q10 all use different formats for the time columns. Next, we combine these three groups into three separate long-format tables to manipulate and later merge.

divvy_trips_oldtime <- bind_rows(q1,q2,q3,q4,q5)
divvy_trips_newtime <- bind_rows(q7,q8,q9,q10)
divvy_trips_othertime <- q6
rm(q1,q2,q3,q4,q5,q6,q7,q8,q9,q10)

To store information about the stations corresponding to the Divvy trips, download most recent station metadata from Divvy and store in data table. Note that 2018 station metadata is not yet available.

evanston_stations <- 
  fread("curl https://s3.amazonaws.com/divvy-data/tripdata/Divvy_Trips_2017_Q3Q4.zip | tar -xf- --to-stdout *Stations*.csv") %>% subset(city == 'Evanston')

Here, we subset each table to reduce to trips that start or end in Evanston based on the evanston_stations table.

evanston_trips_old <- subset(divvy_trips_oldtime, from_station_id %in% evanston_stations$id | to_station_id %in% evanston_stations$id)
evanston_trips_new <- subset(divvy_trips_newtime, from_station_id %in% evanston_stations$id | to_station_id %in% evanston_stations$id)
evanston_trips_other <- subset(divvy_trips_othertime, from_station_id %in% evanston_stations$id | to_station_id %in% evanston_stations$id)
rm(divvy_trips_oldtime,divvy_trips_newtime,divvy_trips_othertime)

Next, the time columns in each table are converted to the same POSIXct time format using the as.POSIXct command.

# convert time columns to date-time
evanston_trips_old$starttime <- as.POSIXct(evanston_trips_old$starttime,
                                       format='%m/%d/%Y %H:%M:%S')
evanston_trips_old$stoptime <- as.POSIXct(evanston_trips_old$stoptime,
                                            format='%m/%d/%Y %H:%M:%S')
evanston_trips_new$starttime <- as.POSIXct(evanston_trips_new$starttime,
                                            format='%Y-%m-%d %H:%M:%S')
evanston_trips_new$stoptime <- as.POSIXct(evanston_trips_new$stoptime,
                                           format='%Y-%m-%d %H:%M:%S')
evanston_trips_other$starttime <- as.POSIXct(evanston_trips_other$starttime,
                                           format='%m/%d/%Y %H:%M')
evanston_trips_other$stoptime <- as.POSIXct(evanston_trips_other$stoptime,
                                          format='%m/%d/%Y %H:%M')

Finally, the reconciled data formats are combined into one long table containing all Divvy trips in Evanston.

evanston_trips <- bind_rows(evanston_trips_old,evanston_trips_other,evanston_trips_new)
rm(evanston_trips_new,evanston_trips_other,evanston_trips_old)

To replace the damaged trip duration columns that we removed, a new duration column is calculated using the difftime function in units of seconds.

evanston_trips$duration <- difftime(evanston_trips$stoptime,evanston_trips$starttime,
                                    units="secs")

Divvy Trip Frequency Data for Mapping

The ‘bikedata’ R package contains origin-destination (O-D) trip frequency data for several of the largest bike sharing systems in the world, including Chicago’s Divvy bike system (https://github.com/ropensci/bikedata) (4).

library(bikedata)

In order to download the Divvy trip frequency data from the bikedata package, we create a temporary directory called ‘bikedb’ and query the database for Divvy trips during the same time period as the above Divvy system data used for plotting (July 2016-December 2018) using the dl_bikedata and store_bikedata functions.

bikedb <- file.path (tempdir (), "bike.sqlite")
dl_bikedata (city = 'divvy', dates = 201607:201813, quiet = TRUE)
store_bikedata (data_dir = tempdir (), bikedb = bikedb, quiet = TRUE)

The bikedb directory now contains all relevant divvy trips for our mapping analysis. Below, we extract data from this directory on divvy stations and trips into separate variables. We additionally subset trips from the database based on birth year and add the trip frequency data for these demographic categories. The resulting ntrips data frame contains the trip origin, destination, total trip frequency between stations, and then trip frequency for each specific demographic subset (born before 1990, born 1990 or later).

stns <- bike_stations (bikedb = bikedb, city = 'ch')
ntrips <- bike_tripmat (bikedb = bikedb, city = 'ch', long = TRUE)

#subset based on a ranges of birth year of riders
later1989 <- bike_tripmat (bikedb = bikedb, city = 'ch',long=TRUE,birth_year= 1990:2020)
before1990 <- bike_tripmat (bikedb = bikedb, city = 'ch',long=TRUE,birth_year= 1800:1989)

#add trip frequency for each subset as column in complete data frame
ntrips$later1989 <- later1989$numtrips
ntrips$before1990 <- before1990$numtrips
rm(later1989,before1990)

Finally, we subset the trips and stations files using the evanston_stations information. As with the data above, we only include trips that start, end, or both within Evanston.

#id 92 is an evanston station with slightly different coordinates in the table, 
#so we manually subset it
evanstonstations <- subset(stns,longitude %in% evanston_stations$longitude | id == 92) 
evanstontrips <- subset(ntrips,start_station_id %in% evanstonstations$stn_id |
                          end_station_id %in% evanstonstations$stn_id)

Now, we have completed our data download, cleaning, and preparation. In order to avoid rerunning the data downloads to do the analysis below, the following commands save each required data table to csv in your working directory:

#plotting data
write.csv(evanston_trips,"evanstontrips_plotting.csv",row.names = FALSE)
write.csv(evanston_stations,"evanstonstations_plotting.csv",row.names = FALSE)
#mapping data
write.csv(evanstontrips,"evanstontrips_mapping.csv",row.names = FALSE)
write.csv(evanstonstations,"evanstonstations_mapping.csv",row.names=FALSE)
write.csv(stns,"divvystations_mapping.csv",row.names = FALSE)
rm(evanston_trips,evanston_stations,evanstontrips,evanstonstations,stns)

Divvy Trip Mapping

library(sp);library(dplyr);library(ggplot2);library(osmplotr);library(readr);library(tidyr)

First, we load the data tables generated from the data manipulation above. We could use the same variables, but saving and loading the csv bypasses the download time if we want to re-use the data for different plots.

divvystations <- read_csv('divvystations_mapping.csv')
evanstontrips <- read_csv('evanstontrips_mapping.csv')
evanstonstations <- read_csv('evanstonstations_mapping.csv')

Next, we set up a data frame called ggdf (to be used with ggplot2) which contains start coordinates (x1,y1) and end coordinates (x2,y2) for Divvy trip segments. The match function identifies the lon/lat of each trip start and end station in evanstontrips by matching with the station id in the divvystations data table. The right columns of ggdf contain the frequency of trips along each path based on the birthyear subset criteria as well as the total number of trips.

To facet the demographic categories in ggplot2, we use the gather function to create a long format data frame with a ‘Demographic’ column.

x1 <- divvystations$longitude[match(evanstontrips$start_station_id, divvystations$stn_id)]
y1 <- divvystations$latitude[match(evanstontrips$start_station_id, divvystations$stn_id)]
x2 <- divvystations$longitude[match(evanstontrips$end_station_id, divvystations$stn_id)]
y2 <- divvystations$latitude[match(evanstontrips$end_station_id, divvystations$stn_id)]

ggdf <- data.frame(x1=x1,x2=x2,y1=y1,y2=y2,ntrips=evanstontrips$numtrips,
                   before1990=evanstontrips$before1990,later1989=evanstontrips$later1989)

ggdflong <- gather(ggdf, demographic, ntrips, before1990:later1989)

#rename records to proper facet titles
ggdflong$demographic[ggdflong$demographic=='before1990'] <- 'Riders Born Before 1990'
ggdflong$demographic[ggdflong$demographic=='later1989'] <- 'Riders Born 1990 or Later'

Mapping Divvy Trips Within Evanston

Use osmplotr package functions (osm_basemap,add_osm_objects) to create a background street map of evanston using defined bounding box coordinates.

#determine bounding boxes from coordinates
evanstonbbox <- get_bbox(c(-87.66, 42.02, -87.71, 42.07))

#create basemap
mapevanston <- osm_basemap (bbox = evanstonbbox, bg = 'gray40')

#extract OSM street data and overlay on map
evanstonhighway <- extract_osm_objects (key = 'highway', bbox = evanstonbbox)
mapevanston <- add_osm_objects (mapevanston, evanstonhighway, col = 'gray60')

Construct ggplot map starting with background map of evanston and adding lines with geom_segment() between the start and end station in the origin-destination data. For this map, we subset the ggdf to require the start and end station to be within evanton. Using aesthetics, number of trips are mapped to the color and size of the line. Points and labels are added at Divvy station locations. This visualization is inspired by an example vignette from the bikeshare r package, but is heavily customized (5).

Map 1: Within Evanston Total Trips

evanstontotalplot <- mapevanston + 
  geom_segment(data = ggdf %>% 
                 filter(x1 %in% evanstonstations$longitude & 
                          x2 %in% evanstonstations$longitude),
               aes(x = x1, y = y1, xend = x2, yend = y2,col=ntrips),
               size=1.8,alpha=0.7,show.legend=TRUE) +
  scale_color_gradient(low='#3399FF',high='#FF0099',trans='log10') +
  geom_point(data=evanstonstations,aes(x=longitude,y=latitude),
             shape = 19, colour = "blue", size = 3, stroke = 3,show.legend = TRUE)+ 
  theme(legend.position="bottom",plot.title = element_text(hjust=0.5),
        plot.margin=unit(c(5.5,5.5,5.5,5.5),"points")) + 
  ggtitle('Total Evanston Divvy Trip Frequency Heatmap')+
  labs(color='Number of Divvy Trips') + scale_size(guide = "none") +
  geom_text(data=evanstonstations,size=2,color='white',
            aes(x=longitude,y=latitude,label=name)) +
  guides(color = guide_colorbar(title.position="top"))

The next faceted map compares the trip frequency among the two different age groups we filtered from the database.

Map 2: Within Evanston Trips by Birth Year of Rider

evanstonageplot <- mapevanston + 
  geom_segment(data = ggdflong  %>% 
                 filter(ntrips>0) %>%
                 filter(x1 %in% evanstonstations$longitude & x2 %in% 
                          evanstonstations$longitude),
               aes(x = x1, y = y1, xend = x2, yend = y2,col=ntrips),
               size=1.4,alpha=0.7,show.legend=TRUE) +
  scale_color_gradient(low='#3399FF', high='#FF0099',trans='log10') +
  geom_point(data=evanstonstations,aes(x=longitude,y=latitude),
             shape = 21, colour = "blue", fill = "lightblue",
             size = 1, stroke = 3,show.legend = TRUE)+ 
  theme(legend.position="right",plot.title = 
          element_text(hjust=0.5),plot.margin=unit(c(5.5,5.5,5.5,5.5),"points")) + 
  ggtitle('Evanston Divvy Trips By Age')  +
  labs(color='Number of \nDivvy Trips') + scale_size(guide = "none") + 
  facet_wrap(~demographic) + 
  guides(color = guide_colorbar(title.position="top", title.hjust = 0.5))

Now get basemap for Chicago to prepare for Chicago map, using bounding box from coordinates. Like the evanston basemap, this uses osmplotr functions to extract OpenStreetMap street data and overlap on the map in grey. To give spatial location of evanston, evanston streets are overlaid on top of existing basemap.

chibbox <- get_bbox(c(-87.75,41.84,-87.6,42.07))
mapchi <- osm_basemap (bbox = chibbox, bg = 'gray10')
chihighway <- extract_osm_objects (key = 'highway', bbox = chibbox)
mapchi <- add_osm_objects (mapchi, chihighway, col = 'gray20')
smallevanstonbbox <- get_bbox(c(-87.66, 42.03, -87.7, 42.07))
dat_B <- extract_osm_objects (key = 'highway', bbox = smallevanstonbbox)
mapchi<- add_osm_objects(mapchi,dat_B,col='grey40')

Mapping Divvy Trips To & From Evanston

The next map uses similar methods as the previous ones, except it only includes trips leaving Evanston to Chicago. I add arrows using the arrow = arrow() command to indicate directionality.

Map 3: Leaving Evanston Total Trips

leavingevanstonplot <- mapchi + 
  geom_segment(data=ggdf %>% filter(ntrips>0) 
               %>% filter(x1 %in% evanstonstations$longitude) %>% 
                 filter(!(x1 %in% evanstonstations$longitude & x2 %in% 
                            evanstonstations$longitude)),
               aes(x = x1, y = y1, xend = x2, yend = y2,col=ntrips),
               size=0.08,alpha=1,show.legend=TRUE,
               arrow = arrow(length = unit(0.25,"cm"))) + 
  scale_color_gradient(trans= "log10", low='#3399FF', high='#FF0099') +
  geom_point(data=divvystations %>% 
               filter(longitude %in% (ggdf$x1) | longitude %in% (ggdf$x2)),
             aes(x=longitude,y=latitude),
             shape = 21, colour = "blue", fill = "lightblue",
             size = 0.1,show.legend = TRUE)+ 
  theme(legend.position="right",plot.margin=unit(c(5.5,5.5,5.5,5.5),"points")) + 
  ggtitle('Divvy Trips Leaving Evanston')  +
  labs(color='Number of \nDivvy Trips') + scale_size(guide = "none")

Map 4: Entering Evanston Total Trips

The same method is used to map trips entering Evanston from Chicago.

enteringevanstonplot <- mapchi + 
  geom_segment(data=ggdf %>% filter(ntrips>0) %>% 
                 filter(x2 %in% evanstonstations$longitude) %>% 
                 filter(!(x1 %in% evanstonstations$longitude & x2 %in% 
                            evanstonstations$longitude)), 
               aes(x = x1, y = y1, xend = x2, yend = y2,col=ntrips),
               size=.08,alpha=1,show.legend=TRUE,
               arrow = arrow(length = unit(0.25,"cm"))) + 
  scale_color_gradient(trans= "log10", low='#3399FF', high='#FF0099') +
  geom_point(data=divvystations %>% 
               filter(longitude %in% (ggdf$x1) | longitude %in% (ggdf$x2)),
             aes(x=longitude,y=latitude),
             shape = 21, colour = "blue", fill = "lightblue",
             size = 0.1,show.legend = TRUE)+ 
  theme(legend.position="right",plot.margin=unit(c(5.5,5.5,5.5,5.5),"points")) + 
  ggtitle('Divvy Trips Entering Evanston')  +
  labs(color='Number of \nDivvy Trips') + scale_size(guide = "none")

Code to save all plots:

ggsave('evanstontotalplot.pdf',evanstontotalplot,dpi=400)
ggsave('evanstonageplot.pdf',evanstonageplot,dpi=400)
ggsave('leavingevanstonplot.pdf',leavingevanstonplot,dpi=400)
ggsave('enteringevanstonplot.pdf',enteringevanstonplot,dpi=400)

Divvy Trip Plotting

After mapping the frequency of trips between, to, and from Evanston stations, we now want to visualize our data on Divvy users that we prepared above. First we load packages that will enable us to use date-time scales (scales package) and create waffle plots (waffle plots).

library(scales);library(lubridate);library(waffle)

Then use the read.csv() command to load the plotting data that we prepared above. We filter to exclude outlier trips that last over 12 hours to avoid disproportionate impacts on mean trip duration for our plots.

To give more options for time variables to plot, we generate columns of Year, Month, and date formatted columns by day and by month using the as.Date() command. The last factor command ensures that months are plotted chronologically instead of alphabetically.

Divvy_df <- read_csv('evanstontrips_plotting.csv') %>% filter(duration < 43200)

Divvy_df$Year <- format(Divvy_df$starttime,"%Y")
Divvy_df$Month <- format(Divvy_df$starttime,"%b")
Divvy_df$StartDate = as.Date(Divvy_df$starttime, "%m/%d/%Y")
Divvy_df$MonthYear =  format(as.Date(Divvy_df$StartDate, format="%d-%m-%Y"),"%m/%Y")

Divvy_df$Month = factor(Divvy_df$Month, levels = month.abb)

Group 1: Analysis of total trips, all customer types

Plot 1.1: bar plot of number of total rides colored by user type - note the contrast between winter and summer non-member users. Plot 1.2: bar plot of number of trips taken by month - shows recent growth in ntrips over the summer. Plot 1.3: waffle plot showing proportion of trips taken by customers vs. subscribers in the month of January 2018. Plot 1.4: waffle plot showing proportion of trips taken by customers vs. subscribers in the month of July 2018.

For the waffle plots, the ‘parts’ variable defines the groups that are used to create the proportions that the waffle chart visualizes.

Plot1.1 <- ggplot(Divvy_df,aes(x=Month,group=usertype,fill=usertype)) + 
  geom_bar(stat='count',position='stack') + facet_wrap(~Year) + 
  ggtitle('Monthly Divvy Trips by User Type') + ylab('Number of Trips') + 
  labs(fill='User Type') + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Plot1.2 <- ggplot(Divvy_df,aes(x=Year,group=1)) + geom_bar(stat='count',fill="blue4") + 
  facet_wrap(~Month) + ggtitle('Number of Divvy Trips by Month, 2016-2018') + 
  ylab('Number of Trips')

parts1 = c(`Subscriber` = nrow(Divvy_df %>% 
                                filter(usertype=='Subscriber') %>% 
                                filter(Year==2018) %>% filter(Month=='Jan')),
          `Customer` = nrow(Divvy_df %>%                                                                
                              filter(usertype=='Customer') %>% 
                              filter(Year==2018) %>% filter(Month=='Jan')))
Plot1.3 <- waffle(49.5*parts1/nrow(Divvy_df %>% filter(Year==2017) %>% 
                                    filter(Month=='Jan')),rows = 4,
                  colors = c("#8dd3c7", "#fb8072", "white"),
                  title='User Type for January 2018 Divvy Trips')

parts2 = c(`Subscriber` = nrow(Divvy_df %>% 
                                filter(usertype=='Subscriber') %>% 
                                filter(Year==2018) %>% filter(Month=='Jul')),
          `Customer` = nrow(Divvy_df %>% 
                              filter(usertype=='Customer')%>% 
                              filter(Year==2018) %>% filter(Month=='Jul')))
Plot1.4 <- waffle(51*parts2/nrow(Divvy_df %>% filter(Year==2017) %>% 
                                  filter(Month=='Jul')),rows = 4,
                  colors = c("#8dd3c7", "#fb8072", "white"),
                  title='User Type for July 2018 Divvy Trips')

Group 2: Analysis of trips by gender. This information is only available for Divvy subscribers, and not single use customers.

Plot 2.1: Bar Plot of Monthly Count of Female and Male Riders

Plot2.1 <- ggplot(Divvy_df %>% drop_na(),aes(x=Month,group=gender,fill=gender)) + 
  geom_bar(stat='count',position='stack')+ facet_wrap(~Year) + 
  ggtitle('Monthly Divvy Trips by Gender') + ylab('Number of Trips') + 
  labs(fill='Gender') + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Group 3: Analysis of trips by birth year. This information is also only available for Divvy subscribers.

Plot 3.1: Box and whisker plot showing the average ride duration per month of different Evanston age groups.

#age vs. trip length in each month
Plot3.1 <- Divvy_df %>% drop_na() %>% mutate(age = 2019 - birthyear) %>% 
  mutate(agegroup = findInterval(age, c(10,20,30,40,50,60))) %>% 
  mutate(agegroup = recode_factor(agegroup,`1` = '<20 Years Old',`2` = '20s',
                                  `3`='30s',`4`='40s',`5`='50s',`6`='60s')) %>%  
  ggplot(aes(x=Month,y=duration)) + stat_boxplot(fill="orange", alpha=0.4) + 
  facet_wrap(~agegroup) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  scale_y_log10() + ylab('Average Ride Duration') + ggtitle('Average Ride Duration Per Month by Age Group')

Results

This map shows the frequency of Divvy trips between different Evanston stations. The brightest pink lines are the most frequently traveled routes.


This second figure compares trip frequency among two demographics: riders born before 1990, or riders born 1990 and later. This comparison reveals that there is a high frequency of riders from the younger demographic riding on Divvy stations in the East section of Evanston near Lake Michigan, whereas the frequency of riders for riders born before 1990 seems to be more evenly distributed across the stations.


The arrows in the two maps above indicate the direction of travel, and each map shows trips entering or leaving Evanston. Like the first maps, the brightest pink arrows represent the most frequently traveled trips. There is a greater density of arrows, and brighter pink lines, for stations closer to the lake. This shows that many longer distance Divvy trips in and out of Evanston are traveling close to the lake and reaching destinations near the lakefront.


The plot above shows the number of Divvy trips every month, between 2016-2018. The graphs show that much of the growth in the number of Divvy trips has occured in the peak summer months of July and August, where there has been a steady increase in trips between 2016 and 2018.


The two bar plots above show the mounthly count of rides from 2016-2018, and the colors of the bars in the first plot are grouped by gender, where the colors for the second bar are grouped by user type. The graphs show that there are consistently more male users, but that the proportion of customers and subscribers varies by season, which we explore more below.


To further explore the patterns in the previous User Type bar plot, these waffle plots show the proportion of rides by each User Type in a winter month (January) and a summer month (July). These plots illustrate the vast difference between the customer base who are using Divvy bikes in different seasons.


This last plot shows the relationship between rider age and trip duration, and how it varies during different months of the year. The center of each box shows the median trip duration, and the box gives the interquartile range (IQR) of the data. Individual outliers are shown as black dots. The graph shows that for the two youngest age groups, <20 and 20-29 years old, there is a noticeable increase in ride duration in the summer months. On the other hand, average ride length for older groups (40s-60s) becomes much more uniform over different times of the year.

Conclusions

Within Evanston, it is clear that different age demographics are using Divvy bikes differently. Younger riders are riding a greater frequency of trips in the stations in the East part of Evanston. This makes sense considering the large student population living near Northwestern University, where several of these Evanston Divvy stations are located.

The maps also point to the general pattern that longer Divvy trips entering and leaving Evanston generally travel to and from stations near the lake in Chicago. This might be explained by the popularity of the Lakefront Bike Trail in Chicago.

Much of the increase in the number of Divvy trips is occuring in peak summer months, while changes in the number of trips during other seasons from 2016-2018 is more variable. Also, the user type of Divvy bikes is heavily related to season. The greater proportion of ‘customer’ Divvy Rides who are paying for single uses in the summer suggests that summer weather increases this type of recreational Divvy use for tourists and residents alike. The much higher proportion of Divvy subscribers riding through the winter shows that many subscribers probably use Divvy’s for more consistent forms of commuting or travel.

The intriguing finding from the last graph suggests that in colder months, there is a trend of older riders riding similar lengths of time on average, while younger riders increase their ride duration during warmer months.

Further Directions

A future analysis should consider the question of why certain trip paths are traveled more frequently. This might be partially explained by ease of travel, such as bike lanes, and it would be valuable to compare the locations of bike lanes with popular Divvy routes.

References and Resources

  1. Pratt, Gregory. “Divvy bikes are more than a toy for tourists.” Chicago Tribune. February 4, 2018. Accessed March 20, 2018. https://www.chicagotribune.com/news/columnists/ct-met-pratt-divvy-bike-column-20180126-story.html.

  2. Ricci, Miriam. “Bike Sharing: A Review of Evidence on Impacts and Processes of Implementation and Operation.” Research in Transportation Business & Management, Managing the Business of Cycling, 15 (June 1, 2015): 28–38. https://doi.org/10.1016/j.rtbm.2015.03.003

  3. “Divvy Data.” Divvy. Accessed March 19, 2019. https://www.divvybikes.com/system-data.

  4. Mark Padgham, Richard Ellison (2017). bikedata. Journal of Open Source Software, 2(20). URL https://doi.org/10.21105/joss.00471.

  5. ‘bikedata.’ The Comprehensive R Archive Network. October 5, 2018. Accessed March 20, 2019. https://cran.r-project.org/web/packages/bikedata/vignettes/bikedata.html#7_visualisation_of_bicycle_trips.