reshape - Unique rows with multiple comma separated entries in R -


background: in process of annotating snps gwas in organism without annotation. using chained tblastn table ucsc along biomart map each snp probable gene(s).

i have dataframe looks this:

            snp   hu_mrna     gene  chr1.111642529 nm_002107    h3f3a  chr1.111642529 nm_005324    h3f3b  chr1.111801684 bc098118     <na>  chr1.111925084 nm_020435    gjc2   chr1.11801605 ak027740     <na>   chr1.11801605 nm_032849    c13orf33  chr1.151220354 nm_018913    pcdhga10  chr1.151220354 nm_018918    pcdhga5 

what end single row each snp, , comma delimit genes , hu_mrnas. here after:

            snp            hu_mrna    gene  chr1.111642529 nm_002107,nm_005324   h3f3a  chr1.111801684  bc098118,nm_020435   gjc2   chr1.11801605  ak027740,nm_032849   c13orf33  chr1.151220354 nm_018913,nm_018918   pcdhga10,pcdhga5 

now know can flick of wrist in perl, want in r. suggestions?

you can use aggregate paste each 1 , merge @ end:

x <- structure(list(snp = structure(c(1l, 1l, 2l, 3l, 4l, 4l, 5l,  5l), .label = c("chr1.111642529", "chr1.111801684", "chr1.111925084",  "chr1.11801605", "chr1.151220354"), class = "factor"), hu_mrna = structure(c(3l,  4l, 2l, 7l, 1l, 8l, 5l, 6l), .label = c("ak027740", "bc098118",  "nm_002107", "nm_005324", "nm_018913", "nm_018918", "nm_020435",  "nm_032849"), class = "factor"), gene = structure(c(4l, 5l, 1l,  3l, 1l, 2l, 6l, 7l), .label = c("<na>", "c13orf33", "gjc2", "h3f3a",  "h3f3b", "pcdhga10", "pcdhga5"), class = "factor")), .names = c("snp",  "hu_mrna", "gene"), class = "data.frame", row.names = c(na, -8l ))  a1 <- aggregate(hu_mrna~snp,data=x,paste,sep=",") a2 <- aggregate(gene~snp,data=x,paste,sep=",") merge(a1,a2)              snp              hu_mrna              gene 1 chr1.111642529 nm_002107, nm_005324      h3f3a, h3f3b 2 chr1.111801684             bc098118              <na> 3 chr1.111925084            nm_020435              gjc2 4  chr1.11801605  ak027740, nm_032849    <na>, c13orf33 5 chr1.151220354 nm_018913, nm_018918 pcdhga10, pcdhga5 

Comments