The analysis is done by Xinyao Zhang from University of Maryland, College Park. It analyzes and visualizes bike share data from the city of Los Angeles.
First, let's take a take a look at the data:
Trip ID | Duration | Start Time | End Time | Starting Station ID | Starting Station Latitude | Starting Station Longitude | Ending Station ID | Ending Station Latitude | Ending Station Longitude | Bike ID | Plan Duration | Trip Route Category | Passholder Type | Starting Lat-Long | Ending Lat-Long | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1912818 | 180 | 2016-07-07T04:17:00 | 2016-07-07T04:20:00 | 3014.0 | 34.056610 | -118.23721 | 3014.0 | 34.056610 | -118.23721 | 6281.0 | 30.0 | Round Trip | Monthly Pass | {'longitude': '-118.23721', 'latitude': '34.05... | {'longitude': '-118.23721', 'latitude': '34.05... |
1 | 1919661 | 1980 | 2016-07-07T06:00:00 | 2016-07-07T06:33:00 | 3014.0 | 34.056610 | -118.23721 | 3014.0 | 34.056610 | -118.23721 | 6281.0 | 30.0 | Round Trip | Monthly Pass | {'longitude': '-118.23721', 'latitude': '34.05... | {'longitude': '-118.23721', 'latitude': '34.05... |
2 | 1933383 | 300 | 2016-07-07T10:32:00 | 2016-07-07T10:37:00 | 3016.0 | 34.052898 | -118.24156 | 3016.0 | 34.052898 | -118.24156 | 5861.0 | 365.0 | Round Trip | Flex Pass | {'longitude': '-118.24156', 'latitude': '34.05... | {'longitude': '-118.24156', 'latitude': '34.05... |
3 | 1944197 | 10860 | 2016-07-07T10:37:00 | 2016-07-07T13:38:00 | 3016.0 | 34.052898 | -118.24156 | 3016.0 | 34.052898 | -118.24156 | 5861.0 | 365.0 | Round Trip | Flex Pass | {'longitude': '-118.24156', 'latitude': '34.05... | {'longitude': '-118.24156', 'latitude': '34.05... |
4 | 1940317 | 420 | 2016-07-07T12:51:00 | 2016-07-07T12:58:00 | 3032.0 | 34.049889 | -118.25588 | 3032.0 | 34.049889 | -118.25588 | 6674.0 | 0.0 | Round Trip | Walk-up | {'longitude': '-118.25588', 'latitude': '34.04... | {'longitude': '-118.25588', 'latitude': '34.04... |
We can find that the starting station ID, starting station latitude and longtitude, starting Lat-Long.
All these four columns provide the geography info on which they can support each other when some values are missing.
Then we check the uniqueness of each trip and calculate if there's any value missing from the dataset.
By looking at each trip which miss one or more attributes, we choose either to fill in the missing value that can be inferred from other trips, or delete those which are uncertain.
Now the data is clean and can be used!
Let's calculate numbers of one way and round trip.
We create a new column 'Distance' based on the distance between starting and ending stations and create a new column 'Speed' based on the distance and duration.
Here we can draw a pie chart shows the Proportion of diff passholder type
this chart can also combined with time series or location
We can't find a clear pattern of daily average duration change with time. But what should be noticed is that there are three days where duration is extremely high. There might be some huge events on those days that the traffic was affected and people tend to choose bike instead of car.
Monthly Pass
Flex Pass
Walk-up
For Flex pass type, it kept relatively flat all the time.
For Monthly pass and Walk-up, they have some patterns! It seems that when one went down, the other went up
Monthly Pass
Walk-up
There are weekly pattern for the use of Monthly pass and walk-up!
From the plot above we can see that the numbers of passes used experienced periodly drop and rise. On weekdays, the
number of monthly pass went up and kept high. At the weekend, the requests decreased quickly and reached its lowest
point on Sunday. However, the number of walk-up performed in a opposite way. It went high at weekends and dropped
down on weekdays.
The reason for this pattern can be guessed that people who use bike sharing as a regular part of their commute,just like
the monthly pass users, will use bikes to work or study more often on weekdays. On the other side, people who don't
regularly use sharing bikes, such as the walk-up users, will be more likely to use bikes during their weekends and
holidays, when they go outside travelling or sightseeing.
There will surely be a net change at each station over the course of day. Because there are so many days and stations to analyze, we think it's more practical to give the recommendation monthly about transportation of bikes based on the net change of each station every month.
Let's draw a heatmap for the bikes net change of each station monthly.
From the heatmap, we can find the max and min values for each month, and then move the bikes from station with max
value(which means it has most net bikes coming in) to station with min value(it has most net bikes going out).
For example, in 2016-07, the number of bikes in station 3042 increased by 106 and number of bikes in 3068 decreased
by 153, so we would recommend that at the end of July, bikes should be transported from 3042 to 3068.
For long-run pattern, we find that numbers of bikes in some stations kept increasing or decreasing at relatively high level.
Increasing station: 3005,3014,3022,3023,3031,3032,3042,3063,3082 Decreasing station: 3007,3024,3027,3028,3029,3030,3052,3055,3068
We recommend transport bikes monthly from increasing stations to decreasing stations according to our analysis.
The specific strategy of transportation routes will be decided by the distance and exact change numbers.