i using ggplot2 make histogram:
geom_histogram(aes(x=...), y="..ncount../sum(..ncount..)")
and error:
mapping variable y , using stat="bin". stat="bin", attempt set y value count of cases in each group. can result in unexpected behavior , not allowed in future version of ggplot2. if want y represent counts of cases, use stat="bin" , don't map variable y. if want y represent values in data, use stat="identity". see ?geom_bar examples. (deprecated; last used in version 0.9.2)
what causes in general? confused error because i'm not mapping variable y
, histogram-ing x
, height of histogram bar represent normalized fraction of data (such bar heights sum 100% of data.)
edit: if want make density plot geom_density
instead of geom_histogram
, use ..ncount../sum(..ncount..)
or ..scaled..
? i'm unclear ..scaled..
does.
the confusion here long standing 1 (as evidenced verbose warning message) starts stat_bin
.
but users don't typically realize confusion revolves around stat_bin
, since typically encounter problems while using either geom_bar
or geom_histogram
. note documentation each: both use stat = "bin"
(in current ggplot2 versions stat has been split stat_bin
continuous data , stat_count
discrete data) default.
but let's up. geom_*
's control actual rendering of data sort of geometric form. stat_*
's transform data. distinction bit confusing in practice, because adding layer of stat_bin
will, default, invoke geom_bar
, can seem indistinguishable geom_bar
when you're learning.
in case, consider "bar"-like geom's: histograms , bar charts. both going involve binning of data somewhere along line. our data either pre-summarised or not. instance, might want bar plot from:
x a b b b
or equivalently from
x y 3 b 3
the first hasn't been binned yet. second pre-binned. default behavior both geom_bar
, geom_histogram
assume have not pre-binned data. attempt call stat_bin
(for histograms, stat_count
bar charts) on x
values.
as warning says, try map y
resulting counts. if also attempt map y
other variable end in here there dragons territory. mapping y
functions of variables returned stat_bin
(..count..
, etc.) should ok , should not throw warning (it doesn't me using @mnel's example above).
the take-away here geom_bar
if you've pre-computed heights of bars, remember use stat = "identity"
, or better yet use newer geom_col
uses stat = "identity"
default. geom_histogram
it's unlikely have pre-computed bins, in cases need remember not map y
beyond what's returned stat_bin
.
geom_dotplot
uses it's own binning stat, stat_bindot
, , discussion applies here well, believe. sort of thing hasn't been issue 2d binning cases (geom_bin2d
, geom_hex
) since there hasn't been flexibility available in analogous z
variable binned y
variable in 1d case. if future updates start allowing more fancy manipulations of 2d binning cases suppose become have watch out there.
Comments
Post a Comment