Shared Bike Analysis

The analysis is done by Xinyao Zhang from University of Maryland, College Park. It analyzes and visualizes bike share data from the city of Los Angeles.

with data can be found here.

Data Preparation

1.Data Preview

First, let's take a take a look at the data:

Trip ID Duration Start Time End Time Starting Station ID Starting Station Latitude Starting Station Longitude Ending Station ID Ending Station Latitude Ending Station Longitude Bike ID Plan Duration Trip Route Category Passholder Type Starting Lat-Long Ending Lat-Long
0 1912818 180 2016-07-07T04:17:00 2016-07-07T04:20:00 3014.0 34.056610 -118.23721 3014.0 34.056610 -118.23721 6281.0 30.0 Round Trip Monthly Pass {'longitude': '-118.23721', 'latitude': '34.05... {'longitude': '-118.23721', 'latitude': '34.05...
1 1919661 1980 2016-07-07T06:00:00 2016-07-07T06:33:00 3014.0 34.056610 -118.23721 3014.0 34.056610 -118.23721 6281.0 30.0 Round Trip Monthly Pass {'longitude': '-118.23721', 'latitude': '34.05... {'longitude': '-118.23721', 'latitude': '34.05...
2 1933383 300 2016-07-07T10:32:00 2016-07-07T10:37:00 3016.0 34.052898 -118.24156 3016.0 34.052898 -118.24156 5861.0 365.0 Round Trip Flex Pass {'longitude': '-118.24156', 'latitude': '34.05... {'longitude': '-118.24156', 'latitude': '34.05...
3 1944197 10860 2016-07-07T10:37:00 2016-07-07T13:38:00 3016.0 34.052898 -118.24156 3016.0 34.052898 -118.24156 5861.0 365.0 Round Trip Flex Pass {'longitude': '-118.24156', 'latitude': '34.05... {'longitude': '-118.24156', 'latitude': '34.05...
4 1940317 420 2016-07-07T12:51:00 2016-07-07T12:58:00 3032.0 34.049889 -118.25588 3032.0 34.049889 -118.25588 6674.0 0.0 Round Trip Walk-up {'longitude': '-118.25588', 'latitude': '34.04... {'longitude': '-118.25588', 'latitude': '34.04...

We can find that the starting station ID, starting station latitude and longtitude, starting Lat-Long. All these four columns provide the geography info on which they can support each other when some values are missing.
Then we check the uniqueness of each trip and calculate if there's any value missing from the dataset.

Trip ID                           0
Duration                          0
Start Time                        0
End Time                          0
Starting Station ID              19
Starting Station Latitude        48
Starting Station Longitude       48
Ending Station ID                96
Ending Station Latitude        1051
Ending Station Longitude       1051
Bike ID                          10
Plan Duration                   766
Trip Route Category               0
Passholder Type                   0
Starting Lat-Long             33805
Ending Lat-Long                1051
dtype: int64

By checking, we find each trip is unique, however, there're certain amount of data that are empty.

2.Filling the Missing Value

By looking at each trip which miss one or more attributes, we choose either to fill in the missing value that can be inferred from other trips, or delete those which are uncertain.

Now the data is clean and can be used!

Which start/stop stations are most popular?

The most popular starting station:

ID:     Number of bikes:
3069    5095
3030    5038
3005    4851
3064    4617
3031    4594
Name: Starting Station ID, dtype: int64

The most popular ending station:

ID:     Number of bikes:
3005    6262
3031    5517
3014    5382
3042    5293
3069    5069
Name: Ending Station ID, dtype: int64

What is the average distance traveled?

Let's calculate numbers of one way and round trip.

One Way       118554
Round Trip     12782
Name: Trip Route Category, dtype: int64

We create a new column 'Distance' based on the distance between starting and ending stations and create a new column 'Speed' based on the distance and duration.

Trip ID Duration End Time Starting Station ID Starting Station Latitude Starting Station Longitude Ending Station ID Ending Station Latitude Ending Station Longitude Bike ID Plan Duration Trip Route Category Passholder Type distance Distance Speed
Start Time
2016-07-07 12:51:00 1944075 780 2016-07-07 13:04:00 3021 34.045609 -118.23703 3054 34.039219 -118.23649 6717 30 One Way Monthly Pass 0.712231 0.712231 0.000913
2016-07-07 12:54:00 1944073 600 2016-07-07 13:04:00 3022 34.046070 -118.23309 3014 34.056610 -118.23721 5721 30 One Way Monthly Pass 1.231928 1.231928 0.002053
2016-07-07 12:59:00 1944067 600 2016-07-07 13:09:00 3076 34.040600 -118.25384 3005 34.048550 -118.25905 5957 365 One Way Flex Pass 1.005915 1.005915 0.001677
2016-07-07 13:01:00 1944063 960 2016-07-07 13:17:00 3031 34.044701 -118.25244 3078 34.064281 -118.23894 6351 30 One Way Monthly Pass 2.507469 2.507469 0.002612
2016-07-07 13:02:00 1944061 960 2016-07-07 13:18:00 3031 34.044701 -118.25244 3047 34.039982 -118.26640 6200 365 One Way Flex Pass 1.389164 1.389164 0.001447

However, we find that this way of estimating speed is not good. Since people can leave the bike for some time, even over night, instead of riding it all the time. So we choose the trip where duration<1200 to gurantee that most of time people are actually riding.
We calulate the average speed of one-way bikers: 0.005618115731924246

And calculate the total and average travel distance of one-way bikers: 642956.2187142154

However, this distance can not be accurate, because for round trip, people may stop at somewhere to shopping or visiting. We can't know how much time they actucally spent on riding bikes from current dataset.

Besides, the number of round trips is much smaller than that of one way trips. So we choose to use the average distance of one way trips to estimate the average distance traveled.

So the average distance: 5.423319489129135

How many riders include bike sharing as a regular part of their commute?

Monthly Pass    80823
Walk-up         40743
Flex Pass        9457
Staff Annual      313
Name: Passholder Type, dtype: int64


Here we consider people who use Monthly Pass and Flex Pass as regular users.
So, Monthly Pass: 80823 + Flex Pass: 9457 = 90280

Data Visuals: Display or graph 3 metrics or trends from the data set that are interesting to you.

Here we can draw a pie chart shows the Proportion of diff passholder type
this chart can also combined with time series or location

Monthly Pass    80823
Walk-up         40743
Flex Pass        9457
Staff Annual      313
Name: Passholder Type, dtype: int64



How does ridership change with seasons? Types of passes used, trip duration, etc

Trip duration daily pattern

We can't find a clear pattern of daily average duration change with time. But what should be noticed is that there are three days where duration is extremely high. There might be some huge events on those days that the traffic was affected and people tend to choose bike instead of car.

Here we can find a clear trend that the average duration time went up in August and reached highest point in September, then decreased through October to January, reaching it's lowest point at January. After that it raised again and kept at a medial level in March. This may because the weather in September, October is nice and natural conditions are good for cycling. But in Dec and Jan, it's somehow cold outside and people choose to ride less time on the road.

Types of passes used

Monthly Pass

Flex Pass

Walk-up

For Flex pass type, it kept relatively flat all the time.
For Monthly pass and Walk-up, they have some patterns! It seems that when one went down, the other went up

Monthly Pass

Walk-up

There are weekly pattern for the use of Monthly pass and walk-up!

From the plot above we can see that the numbers of passes used experienced periodly drop and rise. On weekdays, the number of monthly pass went up and kept high. At the weekend, the requests decreased quickly and reached its lowest point on Sunday. However, the number of walk-up performed in a opposite way. It went high at weekends and dropped down on weekdays.

The reason for this pattern can be guessed that people who use bike sharing as a regular part of their commute,just like the monthly pass users, will use bikes to work or study more often on weekdays. On the other side, people who don't regularly use sharing bikes, such as the walk-up users, will be more likely to use bikes during their weekends and holidays, when they go outside travelling or sightseeing.

Is there a net change of bikes over the course of a day? If so, when and where should bikes be transported in order to make sure bikes match travel patterns?

There will surely be a net change at each station over the course of day. Because there are so many days and stations to analyze, we think it's more practical to give the recommendation monthly about transportation of bikes based on the net change of each station every month.

Let's draw a heatmap for the bikes net change of each station monthly.

From the heatmap, we can find the max and min values for each month, and then move the bikes from station with max value(which means it has most net bikes coming in) to station with min value(it has most net bikes going out). For example, in 2016-07, the number of bikes in station 3042 increased by 106 and number of bikes in 3068 decreased by 153, so we would recommend that at the end of July, bikes should be transported from 3042 to 3068.

For long-run pattern, we find that numbers of bikes in some stations kept increasing or decreasing at relatively high level.

Increasing station: 3005,3014,3022,3023,3031,3032,3042,3063,3082 Decreasing station: 3007,3024,3027,3028,3029,3030,3052,3055,3068

We recommend transport bikes monthly from increasing stations to decreasing stations according to our analysis. The specific strategy of transportation routes will be decided by the distance and exact change numbers.

What is the breakdown of Trip Route Category-Passholder type combinations? What might make a particular combination more popular?

Trip Route Category Passholder Type Number of trips
0 One Way Flex Pass 8974
1 One Way Monthly Pass 77054
2 One Way Staff Annual 230
3 One Way Walk-up 32296
4 Round Trip Flex Pass 483
5 Round Trip Monthly Pass 3769
6 Round Trip Staff Annual 83
7 Round Trip Walk-up 8447

Trip Route Category One Way Round Trip
Passholder Type
Flex Pass 8974 483
Monthly Pass 77054 3769
Staff Annual 230 83
Walk-up 32296 8447


From the figure we could find that one way trip is more popular among all kinds of passholder types than round trip.

People who most often ride one way trip are monthly pass memebers. They use bikes as their regualr ways to commute. They probably ride a one way trip to work or study and stay at that place for a long time, then ride another one way trip back home.

For round trips, it's more popular among walk-up groups than in other groups. That's may because walk-up people tend to only use bike occasionly, such as a short trip to pick up or buy something, or go sightseeing around. In this situation, they are more likely to come back to the same station to return bike. Another reason can be that they are not so familiar with different station locations, so going back to where they get it is the simplest way to return.