Yesterday, Hadley Wickham commented on my post on the frequency of calling various R functions that it would be helpful to have the number of packages that call a function in addition to the number of times that the function is called. I compiled the relevant data last night: you can grab it here This data set includes a row for every function I found, indexed by each of the packages and files in which it was used. At this higher level of resolution, I record the number of times each function was called.
To get a sense of the correspondence between these measures, below I’ve plotted the number of packages using each function in my data set against the log number of times each function is called:

And here’s a new top 25 most called functions table:
| Function | Packages Using Function |
|---|---|
| function | 1903 |
| if | 1846 |
| c | 1795 |
| length | 1791 |
| list | 1679 |
| for | 1656 |
| return | 1559 |
| stop | 1538 |
| paste | 1526 |
| rep | 1512 |
| matrix | 1419 |
| is.null | 1413 |
| sum | 1396 |
| max | 1368 |
| cat | 1308 |
| names | 1297 |
| is.na | 1241 |
| min | 1216 |
| cbind | 1175 |
| nrow | 1158 |
| sqrt | 1157 |
| t | 1134 |
| 1120 | |
| class | 1120 |
| seq | 1098 |
Very interesting, did you use an R package to collect all this data, or is it a custom script or maybe another program? Looking at your first post, the only function i’ve never actually used is ?attr, but given it’s popularity i’m starting to think i should learn exactly what it is.
The data were collected with a collection of Perl and Ruby scripts; the analysis was done in R. I’ll put everything on GitHub today and publish a link in a post when it’s up.
Very interesting!
In the day you published this thread i was thinking about a text mining approach (with “tm” package) to perfrom a classification of R-coding examples. This may be useful as search engine criterion in a website that contains R example. Better than tagging, i suppose.
(we are developing a RUG’s website, for Italy)
What did you think about?