dplyr的使用

首页 > 代码库 > dplyr的使用

2024-08-06 15:48:39 214人阅读

做数据预处理一直用Hardly Wickham的plyr软件包，数据量稍微大点，基本就用data.table软件包。Hardly WickHam的dplyr软件包出来有一段时间了，在性能上又有了更大的提高。为了以后使用，做些笔记。

These five functions provide the basis of a language of data manipulation. At the most basic level, you can only alter a tidy data frame in five useful ways: you can reorder the rows (arrange()), pick observations and variables of interest (filter() and select()), add new variables that are functions of existing variables (mutate()) or collapse many values to a summary (summarise()). The remainder of the language comes from applying the five functions to different types of data, like to grouped data, as described next.

例子1：plyr::ddply和dplyr::group_by的比较

 1 system.time({ 2 plans <- group_by(flights, tailnum) 3 delay <- summarise(plans,  4 count = n(), 5 dist = mean(distance, na.rm=T), 6 delay = mean(arr_delay,na.rm = T) 7 )  8 }) 9 10 user system elapsed 11 0.092 0.003 0.09712 13 system.time({14 ddply(flights, ‘tailnum‘, function(x) data.frame(count=nrow(x), dist=mean(x$distance,na.rm=T), delay=mean(x$arr_delay,na.rm=T)))15 })16 17 user system elapsed 18 2.467 0.016 2.500

dplyr的使用

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > dplyr的使用

dplyr的使用

看完仍有疑问？有类似问题直接问程序猿