i can't grab multiple html tag content 1 file. here's last part of code parsing , printing text 1 file 1 row each file reads:
$h = http::headers->new; $p = html::headparser->new($h); $p->parse($text); ($h->header_field_names) { @values = split ',', $h->header($_); if (/keywords/i , /description/i , /title/i) { $csv1->print ($fh1, \@values); #} elsif (/description/i) { # $csv1->print ($fh1, \@values); #} elsif (/title/i) { # $csv1->print ($fh1, \@values); } } }
i can , write first tag file nothing more. i'd csv or tab delimited row multiple values.
i made basic html files this
<head><keyword>test</keyword> <description>test2</description> <title>test3</title></head>
i've tried few different ways no luck.
i can extract content never , . in html file in front of cause not see content. real life html files vs. made ones seem stump it.
your html invalid. (<description>
, <keyword>
not valid elements.)
use strict; use warnings; use html::headparser; use http::headers; $text = <<'eof'; <head> <meta charset="utf-8"> <title>foo bar baz</title> <meta name="description" content="foo"> <meta name="author" content="bar"> </head> eof $h = http::headers->new; $p = html::headparser->new($h); $p->parse($text); ($h->header_field_names) { printf("%s: %s\n", $_, $h->header($_)); }
output:
title: foo bar baz x-meta-author: bar x-meta-charset: utf-8 x-meta-description: foo
update
if want create csv file, use text::csv
, change print loop this:
my $csv = text::csv->new({eol => $/}); @fields = ('title', 'x-meta-author', 'x-meta-description'); $csv->print(*stdout, [map { $h->header($_) } @fields]);
which produces:
"foo bar baz",bar,foo
i'll leave part iterating on multiple input files , printing different filehandle you.
Comments
Post a Comment