r - Error with ggplot2 mapping variable to y and using stat="bin" -


i using ggplot2 make histogram:

geom_histogram(aes(x=...), y="..ncount../sum(..ncount..)") 

and error:

mapping variable y , using stat="bin".   stat="bin", attempt set y value count of cases in each group.   can result in unexpected behavior , not allowed in future version of ggplot2.   if want y represent counts of cases, use stat="bin" , don't map variable y.   if want y represent values in data, use stat="identity".   see ?geom_bar examples. (deprecated; last used in version 0.9.2) 

what causes in general? confused error because i'm not mapping variable y, histogram-ing x , height of histogram bar represent normalized fraction of data (such bar heights sum 100% of data.)

edit: if want make density plot geom_density instead of geom_histogram, use ..ncount../sum(..ncount..) or ..scaled..? i'm unclear ..scaled.. does.

the confusion here long standing 1 (as evidenced verbose warning message) starts stat_bin.

but users don't typically realize confusion revolves around stat_bin, since typically encounter problems while using either geom_bar or geom_histogram. note documentation each: both use stat = "bin" (in current ggplot2 versions stat has been split stat_bin continuous data , stat_count discrete data) default.

but let's up. geom_*'s control actual rendering of data sort of geometric form. stat_*'s transform data. distinction bit confusing in practice, because adding layer of stat_bin will, default, invoke geom_bar , can seem indistinguishable geom_bar when you're learning.

in case, consider "bar"-like geom's: histograms , bar charts. both going involve binning of data somewhere along line. our data either pre-summarised or not. instance, might want bar plot from:

x a b b b 

or equivalently from

x  y  3 b  3 

the first hasn't been binned yet. second pre-binned. default behavior both geom_bar , geom_histogram assume have not pre-binned data. attempt call stat_bin (for histograms, stat_count bar charts) on x values.

as warning says, try map y resulting counts. if also attempt map y other variable end in here there dragons territory. mapping y functions of variables returned stat_bin (..count.., etc.) should ok , should not throw warning (it doesn't me using @mnel's example above).

the take-away here geom_bar if you've pre-computed heights of bars, remember use stat = "identity", or better yet use newer geom_col uses stat = "identity" default. geom_histogram it's unlikely have pre-computed bins, in cases need remember not map y beyond what's returned stat_bin.

geom_dotplot uses it's own binning stat, stat_bindot, , discussion applies here well, believe. sort of thing hasn't been issue 2d binning cases (geom_bin2d , geom_hex) since there hasn't been flexibility available in analogous z variable binned y variable in 1d case. if future updates start allowing more fancy manipulations of 2d binning cases suppose become have watch out there.


Comments