python - Concat DataFrame Reindexing only valid with uniquely valued Index objects -


i trying concat following:

df1

    price   side    timestamp timestamp            2016-01-04 00:01:15.631331072   0.7286  2   1451865675631331 2016-01-04 00:01:15.631399936   0.7286  2   1451865675631400 2016-01-04 00:01:15.631860992   0.7286  2   1451865675631861 2016-01-04 00:01:15.631866112   0.7286  2   1451865675631866 

and

df2

    bid bid_size    offer   offer_size timestamp                2016-01-04 00:00:31.331441920   0.7284  4000000 0.7285  1000000 2016-01-04 00:00:53.631324928   0.7284  4000000 0.7290  4000000 2016-01-04 00:01:03.131234048   0.7284  5000000 0.7286  4000000 2016-01-04 00:01:12.131444992   0.7285  1000000 0.7286  4000000 2016-01-04 00:01:15.631364096   0.7285  4000000 0.7290  4000000 

with

 data = pd.concat([df1,df2], axis=1)   

but follwing output:

invalidindexerror                         traceback (most recent call last) <ipython-input-38-2e88458f01d7> in <module>() ----> 1 data = pd.concat([df1,df2], axis=1)       2 data = data.fillna(method='pad')       3 data = data.fillna(method='bfill')       4 data['timestamp'] =  data.index.values#converting datetime       5 data['timestamp'] = pd.to_datetime(data['timestamp'])#converting datetime  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)     810                        keys=keys, levels=levels, names=names,     811                        verify_integrity=verify_integrity, --> 812                        copy=copy)     813     return op.get_result()     814   /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)     947         self.copy = copy     948  --> 949         self.new_axes = self._get_new_axes()     950      951     def get_result(self):  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_new_axes(self)    1013                 if == self.axis:    1014                     continue -> 1015                 new_axes[i] = self._get_comb_axis(i)    1016         else:    1017             if len(self.join_axes) != ndim - 1:  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_comb_axis(self, i)    1039                 raise typeerror("cannot concatenate list of %s" % types)    1040  -> 1041         return _get_combined_index(all_indexes, intersect=self.intersect)    1042     1043     def _get_concat_axis(self):  /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _get_combined_index(indexes, intersect)    6120             index = index.intersection(other)    6121         return index -> 6122     union = _union_indexes(indexes)    6123     return _ensure_index(union)    6124   /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _union_indexes(indexes)    6149     6150         if hasattr(result, 'union_many'): -> 6151             return result.union_many(indexes[1:])    6152         else:    6153             other in indexes[1:]:  /usr/local/lib/python2.7/site-packages/pandas/tseries/index.pyc in union_many(self, others)     959             else:     960                 tz = this.tz --> 961                 = index.union(this, other)     962                 if isinstance(this, datetimeindex):     963                     this.tz = tz  /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in union(self, other)    1553                 result.extend([x x in other._values if x not in value_set])    1554         else: -> 1555             indexer = self.get_indexer(other)    1556             indexer, = (indexer == -1).nonzero()    1557   /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer(self, target, method, limit, tolerance)    1890     1891         if not self.is_unique: -> 1892             raise invalidindexerror('reindexing valid uniquely'    1893                                     ' valued index objects')    1894   invalidindexerror: reindexing valid uniquely valued index objects   

i have removed additional columns , removed duplicates , na there conflict - not know whats wrong.

please thanks

pd.concat requires indices unique. remove rows duplicate indices, use

df = df.loc[~df.index.duplicated(keep='first')] 

import pandas pd pandas import timestamp  df1 = pd.dataframe(     {'price': [0.7286, 0.7286, 0.7286, 0.7286],      'side': [2, 2, 2, 2],      'timestamp': [1451865675631331, 1451865675631400,                   1451865675631861, 1451865675631866]},     index=pd.datetimeindex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))   df2 = pd.dataframe(     {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],      'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],      'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],      'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},     index=pd.datetimeindex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))   df1 = df1.loc[~df1.index.duplicated(keep='first')] #              price  side         timestamp # 2000-01-01  0.7286     2  1451865675631331 # 2001-01-01  0.7286     2  1451865675631861 # 2002-01-01  0.7286     2  1451865675631866  df2 = df2.loc[~df2.index.duplicated(keep='first')] #                bid  bid_size   offer  offer_size # 2000-01-01  0.7284   4000000  0.7285     1000000 # 2001-01-01  0.7284   4000000  0.7290     4000000 # 2002-01-01  0.7284   5000000  0.7286     4000000 # 2003-01-01  0.7285   1000000  0.7286     4000000 # 2004-01-01  0.7285   4000000  0.7290     4000000  result = pd.concat([df1, df2], axis=0) print(result)                bid  bid_size   offer  offer_size   price  side     timestamp 2000-01-01     nan       nan     nan         nan  0.7286     2  1.451866e+15 2001-01-01     nan       nan     nan         nan  0.7286     2  1.451866e+15 2002-01-01     nan       nan     nan         nan  0.7286     2  1.451866e+15 2000-01-01  0.7284   4000000  0.7285     1000000     nan   nan           nan 2001-01-01  0.7284   4000000  0.7290     4000000     nan   nan           nan 2002-01-01  0.7284   5000000  0.7286     4000000     nan   nan           nan 2003-01-01  0.7285   1000000  0.7286     4000000     nan   nan           nan 2004-01-01  0.7285   4000000  0.7290     4000000     nan   nan           nan 

note there pd.join, can join dataframes based on indices, , handle non-unique indices based on how parameter. rows duplicate index not removed.

in [94]: df1.join(df2) out[94]:               price  side         timestamp     bid  bid_size   offer  \ 2000-01-01  0.7286     2  1451865675631331  0.7284   4000000  0.7285    2000-01-01  0.7286     2  1451865675631400  0.7284   4000000  0.7285    2001-01-01  0.7286     2  1451865675631861  0.7284   4000000  0.7290    2002-01-01  0.7286     2  1451865675631866  0.7284   5000000  0.7286                 offer_size   2000-01-01     1000000   2000-01-01     1000000   2001-01-01     4000000   2002-01-01     4000000    in [95]: df1.join(df2, how='outer') out[95]:               price  side     timestamp     bid  bid_size   offer  offer_size 2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000 2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000 2001-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7290     4000000 2002-01-01  0.7286     2  1.451866e+15  0.7284   5000000  0.7286     4000000 2003-01-01     nan   nan           nan  0.7285   1000000  0.7286     4000000 2004-01-01     nan   nan           nan  0.7285   4000000  0.7290     4000000 

Comments