Crawler Engine¶
-
class
engine.CrawlerEngine.CrawlerEngine[source]¶ -
class
CustomSpider(*a, **kw)[source]¶ -
allowed_domains= ['en.wikipedia.org']¶
-
config= {'start_urls': 'http://en.wikipedia.org/wiki/Programming_language', 'allowed_domains': 'en.wikipedia.org'}¶
-
config_file= <closed file '/home/docs/checkouts/readthedocs.org/user_builds/iosr-crawler/checkouts/latest/src/engine/conf.crawler', mode 'r'>¶
-
config_path= '/home/docs/checkouts/readthedocs.org/user_builds/iosr-crawler/checkouts/latest/src/engine/conf.crawler'¶
-
crawler¶
-
handles_request(request)¶
-
log(message, level=10, **kw)¶ Log the given messages at the given log level. Always use this method to send log messages from your spider
-
make_requests_from_url(url)¶
-
name= 'spider'¶
-
parse(response)¶
-
parse_start_url(response)¶
-
process_results(response, results)¶
-
rules= (<scrapy.contrib.spiders.crawl.Rule object at 0x7fa6cf2d3c50>,)¶
-
set_crawler(crawler)¶
-
settings¶
-
start_requests()¶
-
start_urls= ['http://en.wikipedia.org/wiki/Programming_language']¶
-
-
CrawlerEngine.get_urls(query)[source]¶ Retrieves all URLs associated with given query form database.
Returns: list of URLs.
-
class