i have large text file (400 mb) containing data in format so:
805625228 linked 670103907:0.981545 805829325 linked 901909901:0.981545 803485795 linked 1030404117:0.981545 805865780 linked 811300706:0.981545
id linked id:probability_of_link
... ... .... ... ...
the text file contains millions of such entries, , have several such text files. part of analyzing data, parse data multiple times (each of text files in different formats). when parsing , working data in python, notice memory usage shoot 3 gb @ times.
what better approach dumping data text files? store in json/sql database; , how of performance boost give me? kind of database best suited data?
fyi, data shown above produced structured .csv files containing millions of rows.
Comments
Post a Comment