i want parse xml file following structure:
<?xml version="1.0" encoding="utf-8"?> <toplevel fileformat = "config"> <objectlist objecttype = "type1"> <o><a>value111</a><b>value121</b><c>value131<c/></o> <o><a>value112</a><b>value122</b><c>value132<c/></o> ... </objectlist> <objectlist objecttype = "type2"> <o><a>value21</a><b>value22</b><c>value23<c/></o> ... </objectlist> <objectlist objecttype = "type3"> <o><a>value31</a><b>value32</b><c>value33<c/></o> ... </objectlist> ... <objectlist objecttype = "typen"> <o><a>valuen1</a><b>valuen2</b><c>valuen3<c/></o> ... </objectlist> </toplevel>
i need data 1 node, e.g. 'objectlist objecttype = "type3"'. may not node in 3rd position. have select based on name. finally, children of node (a, b, c) should stored in data frame.
- how can retrieve node?
- how can extract child data data frame?
any ideas? in advance!
use xml package parse xml:
library(xml) ### load xml d <- xmltreeparse("test.xml") top <- xmlroot(d)
use xpath query need, objectlist
nodes objecttype='type3'
attribute:
n <- getnodeset(top, "//objectlist[@objecttype='type3']") [[1]] <objectlist objecttype="type3"> <o> <a>value31</a> <b>value32</b> <c>value33</c> </o> </objectlist>
convert structure inside object matrix
m <- lapply(n, function(o) t(sapply(xmlchildren(o), function(x) xmlsapply(x, xmlvalue)))) > m [[1]] b c o "value31" "value32" "value33"
you can combine of them (i.e. if have multiple matching objectlist
objects) data frame:
d <- as.data.frame(do.call("rbind", m)) > d b c o value31 value32 value33
Comments
Post a Comment