python - Pandas.concat on Sparse Dataframes... a mystery? -


why when concatenating 2 dataframes, result sparse... in weird way ? how can evaluate memory occupated concatenated dataframe ?

i wrote guys code sample better understand issue :

import pandas pd  df1 = pd.dataframe({'a': [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],               'b': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],               'c': [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0],               'd': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],               'e': [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0]},             index=['a','b','c','d','e','f','g','h','i','j','k','l']).to_sparse(fill_value=0)  df2 = pd.dataframe({'f': [0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0],               'g': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0],               'h': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],               'i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],               'j': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6]},             index=['a','b','c','d','e','f','g','h','i','j','k','l']).to_sparse(fill_value=0)  print("df1 sparse size =", df1.memory_usage().sum(),"bytes, density =", df1.density) print(type(df1)) print('default_fill_value =', df1.default_fill_value) print(df1.values)  print("df2 sparse size =", df2.memory_usage().sum(),"bytes, density =", df2.density) print(type(df2)) print('default_fill_value =', df2.default_fill_value) print(df2.values)  result = pd.concat([df1,df2], axis=1)  print(type(result)) # seems alright print('default_fill_value =', result.default_fill_value) # default fill value not 0 ??? print(result.values) # what's "nan" blocks ? # result.density # throw error # result.memory_usage # throw error 

and more : know what's happening on here ?

this known problem , there issue it.


Comments