ruby - Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([]) -


consider code:

h = hash.new(0)  # new hash pairs default have 0 values h[1] += 1  #=> {1=>1} h[2] += 2  #=> {2=>2} 

that’s fine, but:

h = hash.new([])  # empty array default value h[1] <<= 1  #=> {1=>[1]}                  ← ok h[2] <<= 2  #=> {1=>[1,2], 2=>[1,2]}      ← why did `1` change? h[3] << 3   #=> {1=>[1,2,3], 2=>[1,2,3]}  ← `3`? 

at point expect hash be:

{1=>[1], 2=>[2], 3=>[3]} 

but it’s far that. happening , how can behavior expect?

first, note behavior applies default value subsequently mutated (e.g. hashes , strings), not arrays.

tl;dr: use hash.new { |h, k| h[k] = [] } if want simplest, idiomatic solution.


what doesn’t work

why hash.new([]) doesn’t work

let’s more in-depth @ why hash.new([]) doesn’t work:

h = hash.new([]) h[0] << 'a'  #=> ["a"] h[1] << 'b'  #=> ["a", "b"] h[1]         #=> ["a", "b"]  h[0].object_id == h[1].object_id  #=> true h  #=> {} 

we can see our default object being reused , mutated (this because passed 1 , default value, hash has no way of getting fresh, new default value), why there no keys or values in array, despite h[1] still giving value? here’s hint:

h[42]  #=> ["a", "b"] 

the array returned each [] call default value, we’ve been mutating time contains our new values. since << doesn’t assign hash (there can never assignment in ruby without = present), we’ve never put our actual hash. instead have use <<= (which << += +):

h[2] <<= 'c'  #=> ["a", "b", "c"] h             #=> {2=>["a", "b", "c"]} 

this same as:

h[2] = (h[2] << 'c') 

why hash.new { [] } doesn’t work

using hash.new { [] } solves problem of reusing , mutating original default value (as block given called each time, returning new array), not assignment problem:

h = hash.new { [] } h[0] << 'a'   #=> ["a"] h[1] <<= 'b'  #=> ["b"] h             #=> {1=>["b"]} 

what work

the assignment way

if remember use <<=, hash.new { [] } is viable solution, it’s bit odd , non-idiomatic (i’ve never seen <<= used in wild). it’s prone subtle bugs if << inadvertently used.

the mutable way

the documentation hash.new states (emphasis own):

if block specified, called hash object , key, , should return default value. it block’s responsibility store value in hash if required.

so must store default value in hash within block if wish use << instead of <<=:

h = hash.new { |h, k| h[k] = [] } h[0] << 'a'  #=> ["a"] h[1] << 'b'  #=> ["b"] h            #=> {0=>["a"], 1=>["b"]} 

this moves assignment our individual calls (which use <<=) block passed hash.new, removing burden of unexpected behavior when using <<.

note there 1 functional difference between method , others: way assigns default value upon reading (as assignment happens inside block). example:

h1 = hash.new { |h, k| h[k] = [] } h1[:x] h1  #=> {:x=>[]}  h2 = hash.new { [] } h2[:x] h2  #=> {} 

the immutable way

you may wondering why hash.new([]) doesn’t work while hash.new(0) works fine. key numerics in ruby immutable, naturally never end mutating them in-place. if treated our default value immutable, use hash.new([]) fine too:

h = hash.new([].freeze) h[0] += ['a']  #=> ["a"] h[1] += ['b']  #=> ["b"] h[2]           #=> [] h              #=> {0=>["a"], 1=>["b"]} 

of ways, prefer way—immutability makes reasoning things simpler (this is, after all, method has no possibility of hidden or subtle unexpected behavior).


isn’t strictly true, methods instance_variable_set bypass this, must exist metaprogramming since l-value in = cannot dynamic.


Comments