HTML tag parsing and write to file with Perl -


i can't grab multiple html tag content 1 file. here's last part of code parsing , printing text 1 file 1 row each file reads:

   $h = http::headers->new;    $p = html::headparser->new($h);    $p->parse($text);      ($h->header_field_names) {       @values = split ',', $h->header($_);       if (/keywords/i , /description/i , /title/i) {          $csv1->print ($fh1, \@values);        #} elsif (/description/i) {       #   $csv1->print ($fh1, \@values);       #} elsif (/title/i) {       #   $csv1->print ($fh1, \@values);     }    } } 

i can , write first tag file nothing more. i'd csv or tab delimited row multiple values.

i made basic html files this

<head><keyword>test</keyword> <description>test2</description> <title>test3</title></head> 

i've tried few different ways no luck.

i can extract content never , . in html file in front of cause not see content. real life html files vs. made ones seem stump it.

your html invalid. (<description> , <keyword> not valid elements.)

use strict; use warnings;  use html::headparser; use http::headers;  $text = <<'eof'; <head>   <meta charset="utf-8">   <title>foo bar baz</title>   <meta name="description" content="foo">   <meta name="author" content="bar"> </head> eof  $h = http::headers->new; $p = html::headparser->new($h); $p->parse($text);  ($h->header_field_names) {     printf("%s: %s\n", $_, $h->header($_)); } 

output:

title: foo bar baz x-meta-author: bar x-meta-charset: utf-8 x-meta-description: foo 

update

if want create csv file, use text::csv , change print loop this:

my $csv = text::csv->new({eol => $/}); @fields = ('title', 'x-meta-author', 'x-meta-description'); $csv->print(*stdout, [map { $h->header($_) } @fields]); 

which produces:

"foo bar baz",bar,foo 

i'll leave part iterating on multiple input files , printing different filehandle you.


Comments