How to sum values in a column associated with duplicates in another column in R? -


working in r. have data frame taxi rides i've sorted vehicle id , time stamp (earliest latest) looks below:

taxi = arrange(taxi, taxi$vehicle_id, taxi$timestamp) 

image of data frame example in excel

enter image description here

you notice "distance" column covers distances between each ride every taxi vehicle. utilize formula sums distances each taxi. likewise repeat process on time stamp subtracting latest date earliest date calculate time on road each vehicle. hope use both sum distance , time difference calculate speed each taxi. final product matrix looks following (pending conversion meters , seconds):

image of final product example in excel

enter image description here

i believe loop in r useful here, identifying each vehicle id, running through associated values distance , summing them, i'm unsure start. started data set of unique taxi vehicle ids r identify them. below have far summing distances; i'm not sure how tell r sum each vehicle id range:

taxi.uniques = unique(taxi$vehicle_id) (i in taxi) {     look.at = taxi$vehicle_id ==      sum(all distances vehicle id) }  

and here have subtracting latest timestamp each vehicle it's earliest timestamp:

taxi.uniques = unique(taxi$vehicle_id) (i in taxi) {     look.at = taxi$vehicle_id ==      max(taxi$timestamp[look.at]) - min(taxi$timestamp[look.at])  } 

not sure if i'm on right track , appreciate help!

if aren't opposed using data.table, in single step.

require(data.table) taxi<-data.table(vehicleid=c('taxi1','taxi2','taxi3','taxi1','taxi2','taxi3'),                  timestamp=sys.time()+sample(60:600,6),                  distance=sample(1:50,6))  taxi[vehicleid %in% c('taxi1','taxi3'),list(timeonroad=max(timestamp)-min(timestamp),            distance=sum(distance)),      by='vehicleid'] 

data.table structure dt[i, j, by]. used subsetting condition, j used running different operations on variables or selecting columns, grouping. adding subset need.


Comments