Nokogiri Html Parsing


Q: How to parse Html Document with Nokogiri?

Answer: In model:

1) Put this line in your gem file.

    gem 'nokogiri'

2) Require these after installing nokogiri gem in you project.


require 'open-uri' # required for this
require 'net/http' # not for this its used in my project
require 'uri'#not required for this [Its just used in my project]
require 'rubygems' #no need of it in rails project as Gems are automatically loaded.
require 'nokogiri' #no need of it in rails project as Gems are automatically loaded.

3) Write this method in your model


def self.html_parser
    html_file = open("#{Rails.root}/abc.html").read
    html_string = Nokogiri::HTML(html_file)
    html_string.css('div.facility .faci').each do |facility|
          puts facility.css('.facAltText').text
          puts facility.at_css('img')[:src]
          puts facility.at_css(".prodLink")[:href]
    end
end

4) In rails console call this

ModelName.html_parser
Advertisements

I am Senior Software Engineer. I love reading, writing, sharing,developing, hiking, movies, trips, mountains, brooks, hills etc.

Tagged with: , , ,
Posted in Nokogiri, Parser/Scrapper

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

StackOverFlow
Categories
Archives
%d bloggers like this: