why when concatenating 2 dataframes, result sparse... in weird way ? how can evaluate memory occupated concatenated dataframe ?
i wrote guys code sample better understand issue :
import pandas pd df1 = pd.dataframe({'a': [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'b': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'c': [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0], 'd': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'e': [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0]}, index=['a','b','c','d','e','f','g','h','i','j','k','l']).to_sparse(fill_value=0) df2 = pd.dataframe({'f': [0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0], 'g': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0], 'h': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'i': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'j': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6]}, index=['a','b','c','d','e','f','g','h','i','j','k','l']).to_sparse(fill_value=0) print("df1 sparse size =", df1.memory_usage().sum(),"bytes, density =", df1.density) print(type(df1)) print('default_fill_value =', df1.default_fill_value) print(df1.values) print("df2 sparse size =", df2.memory_usage().sum(),"bytes, density =", df2.density) print(type(df2)) print('default_fill_value =', df2.default_fill_value) print(df2.values) result = pd.concat([df1,df2], axis=1) print(type(result)) # seems alright print('default_fill_value =', result.default_fill_value) # default fill value not 0 ??? print(result.values) # what's "nan" blocks ? # result.density # throw error # result.memory_usage # throw error
and more : know what's happening on here ?
this known problem , there issue it.
Comments
Post a Comment