vignettes/traveltimeCLT.Rmd
traveltimeCLT.Rmd
Mobility is vital to human activities, as it is an integral component of our economic and trade networks, social interactions, political ties, and our quality of life. Large-scale trip-level data, with temporal and spatial coverage, enable us to better diagnose transportation problems and develop efficient solutions, for instance, to increased congestion-levels. Such data is progressively available from global positioning systems (GPS) on mobile phones and other devices. At the heart of many such transportation problems is the estimation of travel time between locations.
This traveltimeCLT
package provides a novel method implmented in Elmasri et. al. (2020) for generating prediction intervals for travel time along a given route. By accounting for temporal-dependency in vehicle speeds, the parameters of the Gaussian asymptotic distribution for travel time normalized by distance are esimtated with less (or negligable) empirical bias compared to alternative methods.
trips
This package includes a small data set (trips
) that aggregates map-matched anonymized mobile phone GPS data collected in Quebec city in 2014 using the Mon Trajet smartphone application developed by Brisk Synergies Inc. The precise duration of the time period is kept confidential.
trips
is used to explore methods offered in this package. Data similar to trips
can be generated using other sources of GPS data, however, a preprocessing stage is necessary to map-match and aggregate the information up to the road segment (link) level. Such preprocessing is already performed on trips
.
traveltimeCLT
does not perform this preprocessing stage, and the software cannot make use of raw GPS data.
library(traveltimeCLT)
data(trips)
head(trips)
#> tripID linkID timeBin speed duration_secs distance_meters
#> 1 2700 10469 Weekday 5.431914 13.000000 70.61488
#> 2 2700 10444 Weekday 9.219505 18.927792 174.50487
#> 3 2700 10460 Weekday 9.052796 8.589937 77.76295
#> 4 2700 10462 Weekday 6.850282 14.619859 100.15015
#> 5 2700 10512 Weekday 6.075674 5.071986 30.81574
#> 6 2700 5890 Weekday 10.771731 31.585355 340.22893
#> entry_time
#> 1 2014-04-28 11:07:27
#> 2 2014-04-28 11:07:41
#> 3 2014-04-28 11:07:58
#> 4 2014-04-28 11:08:07
#> 5 2014-04-28 11:08:21
#> 6 2014-04-28 11:08:26
Travel data is organized around the notions of trips and links. Links are road segments each with well-defined beginning and end points and which can be traversed. A vehicle performs a trip when it travels from a start point to an end point through a sequence of links. Thus trips can be considered as ordered sequences of links. trips
includes data for a collection of trips.
Field tripID
contains each trip’s ID, whereas field linkID
contains the IDs of each link making up a trip. Both fields need to be numerical. It is assumed that, in the data set, all trips are grouped together and all links of a given trip appear in the order in which they are traversed (No verification is performed to that effect). A given link corresponds to some physical entity such as a road segment, a portion of a road segment, or any combination of those. Hence, it is expected that links are used in more than one trip.
Field timeBin
(character string, or factor) refers to the time “category” when the traversal of a given link occurred. Time bins should reflect as much as possible time periods of the week encompassing similar traffic classes. In trips
we define five time bins: Weekday
, MorningRush
, EveningRush
, EveningNight
and Weekendday
.
#> Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
#> %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
Period of the week | Time bin |
---|---|
Mon - Fri outside rush hour | Weekday |
Mon - Fri, 7AM - 9AM | MorningRush |
Mon - Fri, 3PM - 6PM | EveningRush |
Sat 9AM - 9PM + Sun 9AM - 7PM | Weekendday |
Sat - Sun otherwise | EveningNight |
Field logspeed
contains the natural logarithm of the speed (in meter/second) of traversal of each link. This information is central to the estimation algorithm.
Field traveltime
refers to the traversal time (in seconds) of each link. This field is mostly for reference and is not used directly in the package.
Field length
refers to the length (in meters) of each link. This information is used by the prediction algorithm.
Field time
gives the entry time in POSIXct format for the traversal of a given link. Individual datums are not used directly; however, the start time of the very first traversal is likely to be useful for providing the start time of a whole trip to the prediction function. Field time
determine the time bin for each link, as illustrated in Table 1.
traveltimeCLT
includes functionality to convert time stamps to alternative time bins, please refer to Time bins.
This section should include a detailed description of the statistical/mathematical methods implemented in the package.
traveltimeCLT
The estimation algorithm can be executed by calling the traveltimeCLT
function. The function’s interface is as follows.
traveltimeCLT(data,
model = c('trip-specific', 'population'),
estimate = c('both', 'mean-only'),
lag = 1L,
nsamples=500L,
min.links=5L,
timebin_rules = NULL)
For example, traveltimeCLT
can be called using the trips
data.frame
as
fit <- traveltimeCLT(data = trips,model = 'trip-specific', estimate = 'both', lag = 1, nsamples = 500, min.links = 5)
Table 3 provides a description for each parameter.
data
|
A data frame of trips and their road level travel information, formated as trips , see trips or View(data(trips)) .
|
model
|
Specifies whether the trip-specific or population models should be used.
|
estimate
|
Specifies whether to estimate both the mean and variance or mean-only . Only applied with model=trip-specific .
|
lag
|
Maximum lag at which to calculate the autocorrelations. Default is 1 for the first order-autocorrelations. |
nsamples
|
The number of trips to sample for parameter estimation. |
min.links
|
The minimum number of links in each of the sampled trip. |
timebin_rules
|
A list containing, start, end, days and tag for each timebin of the dataset (see example in time_bins ).
|
This section is taken from the HMM repo, and must be checked for applicability to the new method.
The implementation allows imputing for combinations of road links and time bins, for which the lack of sufficient observations prevents reliable parameter estimation. When imputation is required for a given combination, estimates for \(\mu\), \(\sigma\), \(\Gamma\) and \(\gamma\) are calculated on the basis of data available for the time bin involved for all road links.
This approach differs from the one in Woodard et.al.(2016), where imputation is performed on the basis of road classification data (e.g. “arterial” or “primary collector road”). This implementation does not handle road classification data.
Imputation is performed in the following three cases:
Case 1: when a combination (links x timebins
) has fewer than L
total observations;
Case 2: when a combination has fewer than L
initial state observations only;
Case 3: when a combination has only initial states.
The number of observations specified in parameter L
determines the threshold below which imputation occurs for cases 1 and 2.
The execution of traveltimeCLT
returns a list of the parameters in Table 4.
network parameters
|
to complete. |
rho
|
to complete. |
residual variance
|
to complete. |
nsamples
|
An integer corresponding to the number of trips sampled for parameter estimation., equal to the parameter nsamples that was passed in the function call.
|
min.links
|
An integer corresponding to the minimum number of links for each sampled trip, equal to the parameter min.links that was passed in the function call.
|
lag
|
An integer corresponding to the maximum lag for calculating the autocorrelations, equal to the parameter lag that was passed in the function call.
|
timebin_rules
|
A list containing details of the utilized timebins. Equals \(NULL\) if default timebins are used. |
estimate
|
A string specifying whether both the mean and variance or mean-only were calculated. Same as parameter estimate that was passed in the function call.
|
model
|
A string indicating whether the trip-specific or population model was used. Same as the parameter model that was passed in the function call.
|
This section should include a description of the methods used for prediction.
predict.traveltime
or predict
The prediction algorithm is executed by calling the predict.traveltime
, or equivalently the S3
method predict
, which has the following interface:
predict(object,
newdata,
level,
...)
For example, we can randomly select 10 trips from the dataset trips
to use as the test data for prediction. Once these trips are selected, we use the remaining trips to estimate the model by calling traveltimeCLT
. The output of traveltimeCLT
and the test data are then passed to predict to generate predictions for the 10 randomly sampled trips.
test_trips = sample_trips(trips, 10)
train = trips[!trips$tripID %in% test_trips,]
test = trips[trips$tripID %in% test_trips,]
fit <- traveltimeCLT(train, lag = 1)
pred <- predict(fit, test, 0.95)
Table 5 provides a description for each parameter.
object
|
A list object corresponding to the output traveltimeCLT , of class traveltimeCLT .
|
newdata
|
A data frame of new trips and their road level travel information, formatted as trips , see trips or data(trips); View(trips) .
|
levels
|
Significance levels. Default is 0.95. |
...
|
For passing extra parameters, for example, time bin rules to be passed to rules2timebins , see rules2timebins in the trip-specific method.
|
predict.traveltime
returns a vector of length n
, where n
is the number of trips passed to predict
. For each trip, predict
returns the tripID, ETA (trip length in seconds), variance, and the lower and upper bounds of the confidence interval of travel time, according to the specified significance level. The output for the above example is provided below.
predict(fit, test, 0.95)
pred <-print(pred)
tripID ETA variance lwr upr1 11273 1734.00888 30990.369 1343.40284 2124.6149
2 11396 784.61539 26200.146 425.46409 1143.7667
3 13922 1277.75451 152802.053 410.41290 2145.0961
4 21978 1266.31193 23227.051 928.15163 1604.4722
5 1000165 55.26015 1175.211 -20.80462 131.3249
6 1002992 947.83252 6023.799 775.62154 1120.0435
7 1003992 1931.34743 62290.712 1377.56748 2485.1274
8 1005895 1247.47739 23665.694 906.13895 1588.8158
9 2001730 1549.57224 39551.002 1108.30231 1990.8422
10 2004090 2244.24231 695368.988 393.98024 4094.5044
This section should include a discussion on the trip-specific vs population models.
To specify different time bins than the ones supplied, traveltimeHMM
provides a functional method called rules2timebins()
that translates human readable weekly time bins to a conversion function.
rules2timebins()
takes a list of lists of time bin rules, where each sub-list specifies 4 variables, start
and end
as the start and end time of a time bin in 24h format, days
as a vector specifying the weekdays this time bin applies to, 1
for Sunday, and 2:5
for weekdays, and finally a tag
to specify a name for the time bin.
For example,
rules = list(
list(start='6:30', end= '9:00', days = 1:5, tag='MR'),
list(start='15:00', end= '18:00', days = 2:5, tag='ER')
)
specifies two time bins calls MR
for morning rush, and ER
for evening rush, the former is for all weekdays and the latter is for Tuesdays to Fridays. All other time intervals are assigned a time bin called Other
.
Passing rules
to rules2timebins()
would return an easy to use functional, for example
rules2timebins(rules)
time_bins <-time_bins("2019-08-16 15:25:00 EDT") ## Friday
1] "ER"
[time_bins("2019-08-17 21:25:58 EDT") ## Saturday
1] "Other" [
To change the time bins in trips
, run the code
$timeBin <- time_bins(trips$time)
tripshead(trips)
tripID linkID timeBin logspeed traveltime length time1 2700 10469 Other 1.692292 13.000000 70.61488 2014-04-28 06:07:27
2 2700 10444 Other 2.221321 18.927792 174.50487 2014-04-28 06:07:41
3 2700 10460 Other 2.203074 8.589937 77.76295 2014-04-28 06:07:58
4 2700 10462 Other 1.924290 14.619859 100.15015 2014-04-28 06:08:07
5 2700 10512 Other 1.804293 5.071986 30.81574 2014-04-28 06:08:21
6 2700 5890 Other 2.376925 31.585355 340.22893 2014-04-28 06:08:26
Remark: at package loading a default time_bins()
functional is created, which constructs the default time bins of trips
.
Running the trip-specific
model in traveltimeCLT
with user-defined time bins requires several steps. First, the user defines the time bin rules and converts them to a functional using rules2timebins
rules = list(
list(start='6:30', end= '9:00', days = 1:5, tag='MR'),
list(start='15:00', end= '18:00', days = 2:5, tag='ER')
)
time_bins <- rules2timebins(rules)
Second, the time bins are applied to the data, and trips are divided into training and testing sets.
trips$timeBin <- time_bins(trips$time)
test_trips = sample_trips(trips, 10)
train = trips[!trips$tripID %in% test_trips,]
test = trips[trips$tripID %in% test_trips,]
Third, the timebin_rules
are passed to traveltime_CLT
.
fit <- traveltimeCLT(train, lag = 1, timebin_rules = rules)
Fourth, the timebin_rules
are again passed to predict
.
pred <- predict(fit, test, 0.95, rules)
In order to esitmate the population
model with user-defined time bins, the data must initially be subset according to the desired time bin, before fitting using traveltimeCLT
. This is necessitated because the population
model randomnly samples the data without consideration for time bins. Once user-defined time bins are applied to the data, the population
modeln can be fit as below.
morning_rush_trips = trips[trips$timeBin=="MR",]
test_trips = sample_trips(trips, 10)
train = trips[!morning_rush_trips$tripID %in% test_trips,]
test = trips[morning_rush_trips$tripID %in% test_trips,]
fit <- traveltimeCLT(train, lag = 1, model = population, timebin_rules = rules)
pred <- predict(fit, test, 0.95, rules)
Elmasri, M., Labbe, A., Larocque, D., Charlin, L,2020. “Predictive inference for travel time on transportation networks”. https://arxiv.org/abs/2004.11292
Woodard, D., Nogin, G., Koch, P., Racz, D., Goldszmidt, M., Horvitz, E., 2017. “Predicting travel time reliability using mobile phone GPS data”. Transportation Research Part C, 75, 30-44. http://dx.doi.org/10.1016/j.trc.2016.10.011