Parse XML data in R -


i want parse xml file following structure:

<?xml version="1.0" encoding="utf-8"?> <toplevel fileformat = "config">     <objectlist objecttype = "type1">         <o><a>value111</a><b>value121</b><c>value131<c/></o>         <o><a>value112</a><b>value122</b><c>value132<c/></o>         ...     </objectlist>     <objectlist objecttype = "type2">         <o><a>value21</a><b>value22</b><c>value23<c/></o>         ...     </objectlist>     <objectlist objecttype = "type3">         <o><a>value31</a><b>value32</b><c>value33<c/></o>         ...     </objectlist>     ...     <objectlist objecttype = "typen">         <o><a>valuen1</a><b>valuen2</b><c>valuen3<c/></o>         ...     </objectlist> </toplevel> 

i need data 1 node, e.g. 'objectlist objecttype = "type3"'. may not node in 3rd position. have select based on name. finally, children of node (a, b, c) should stored in data frame.

  • how can retrieve node?
  • how can extract child data data frame?

any ideas? in advance!

use xml package parse xml:

library(xml) ### load xml d <- xmltreeparse("test.xml") top <- xmlroot(d) 

use xpath query need, objectlist nodes objecttype='type3' attribute:

n <- getnodeset(top, "//objectlist[@objecttype='type3']")  [[1]] <objectlist objecttype="type3">  <o>   <a>value31</a>   <b>value32</b>   <c>value33</c>  </o> </objectlist> 

convert structure inside object matrix

m <- lapply(n, function(o)        t(sapply(xmlchildren(o),          function(x) xmlsapply(x, xmlvalue))))  > m [[1]]           b         c         o "value31" "value32" "value33" 

you can combine of them (i.e. if have multiple matching objectlist objects) data frame:

d <- as.data.frame(do.call("rbind", m))  > d               b       c o value31 value32 value33 

Comments