-
Notifications
You must be signed in to change notification settings - Fork 128
Changelog
Felipe Lima edited this page Mar 5, 2015
·
23 revisions
- Updates gem dependencies
- Adds
user_agentanduser_agent_aliasconfig options toWombat.configure
- Updates gem dependencies
- Adds content-type=text/html header to Mechanize if missing
- Retry page.click on relative links
- Adds ability to crawl a prefetched Mechanize page (thanks to @dsjbirch)
- Added support for hash based property selectors (eg.:
css: 'header'instead of'css=.header')
- Updated gem dependencies
- Added header properties (thanks to @kdridi)
-
Fixed bug in selectors that used XPath functions like
concat(thanks to @viniciusdaniel)
- Added proxy settings configuration (thanks to @phortx)
- Fixed minor bug in HTML property locator
This version contains some breaking changes (not backwards compatible), most notably to for_each that is now specified through the option :iterator and nested block parameters that are gone.
- Added syntatic sugar methods
Wombat.scrapeandCrawler#scrapethat alias to their respectivecrawlmethod implementation; - Gem internals suffered big refactoring, removed code duplication;
- DSL syntax simplified for nested properties. Now the nested block takes no arguments;
- DSL syntax changed for iterated properties. Iterators can now be named just like other properties and won't be automatically named as
iterator#{i}anymore. Specified through the:iteratoroption; -
Crawler#list_pageis now calledCrawler#path; - Added new
:followproperty type that crawls links in pages.
-
Breaking change:
Metadata#formatrenamed toMetadata#document_formatdue to method name clash with Kernel#format
- Fixed a bug on malformed selectors
- Fixed a bug where multiple calls to #crawl would not clean up previously iterated array results and yield repeated results
- Added utility method
Wombat.crawlthat eliminates the need to have a ruby class instance to use Wombat. Now you can use justWombat.crawland start working. The class based format still works as before though.
- Added the ability to provide a block to Crawler#crawl and override the default crawler properties for a one off run (thanks to @danielnc)