background: in process of annotating snps gwas in organism without annotation. using chained tblastn table ucsc along biomart map each snp probable gene(s).
i have dataframe looks this:
snp hu_mrna gene chr1.111642529 nm_002107 h3f3a chr1.111642529 nm_005324 h3f3b chr1.111801684 bc098118 <na> chr1.111925084 nm_020435 gjc2 chr1.11801605 ak027740 <na> chr1.11801605 nm_032849 c13orf33 chr1.151220354 nm_018913 pcdhga10 chr1.151220354 nm_018918 pcdhga5
what end single row each snp, , comma delimit genes , hu_mrnas. here after:
snp hu_mrna gene chr1.111642529 nm_002107,nm_005324 h3f3a chr1.111801684 bc098118,nm_020435 gjc2 chr1.11801605 ak027740,nm_032849 c13orf33 chr1.151220354 nm_018913,nm_018918 pcdhga10,pcdhga5
now know can flick of wrist in perl, want in r. suggestions?
you can use aggregate
paste
each 1 , merge
@ end:
x <- structure(list(snp = structure(c(1l, 1l, 2l, 3l, 4l, 4l, 5l, 5l), .label = c("chr1.111642529", "chr1.111801684", "chr1.111925084", "chr1.11801605", "chr1.151220354"), class = "factor"), hu_mrna = structure(c(3l, 4l, 2l, 7l, 1l, 8l, 5l, 6l), .label = c("ak027740", "bc098118", "nm_002107", "nm_005324", "nm_018913", "nm_018918", "nm_020435", "nm_032849"), class = "factor"), gene = structure(c(4l, 5l, 1l, 3l, 1l, 2l, 6l, 7l), .label = c("<na>", "c13orf33", "gjc2", "h3f3a", "h3f3b", "pcdhga10", "pcdhga5"), class = "factor")), .names = c("snp", "hu_mrna", "gene"), class = "data.frame", row.names = c(na, -8l )) a1 <- aggregate(hu_mrna~snp,data=x,paste,sep=",") a2 <- aggregate(gene~snp,data=x,paste,sep=",") merge(a1,a2) snp hu_mrna gene 1 chr1.111642529 nm_002107, nm_005324 h3f3a, h3f3b 2 chr1.111801684 bc098118 <na> 3 chr1.111925084 nm_020435 gjc2 4 chr1.11801605 ak027740, nm_032849 <na>, c13orf33 5 chr1.151220354 nm_018913, nm_018918 pcdhga10, pcdhga5
Comments
Post a Comment