Crawler Engine¶
-
class
engine.CrawlerEngine.
CrawlerEngine
[source]¶ -
class
CustomSpider
(*a, **kw)[source]¶ -
allowed_domains
= ['en.wikipedia.org']¶
-
config
= {'start_urls': 'http://en.wikipedia.org/wiki/Programming_language', 'allowed_domains': 'en.wikipedia.org'}¶
-
config_file
= <closed file '/home/docs/checkouts/readthedocs.org/user_builds/iosr-crawler/checkouts/latest/src/engine/conf.crawler', mode 'r'>¶
-
config_path
= '/home/docs/checkouts/readthedocs.org/user_builds/iosr-crawler/checkouts/latest/src/engine/conf.crawler'¶
-
crawler
¶
-
handles_request
(request)¶
-
log
(message, level=10, **kw)¶ Log the given messages at the given log level. Always use this method to send log messages from your spider
-
make_requests_from_url
(url)¶
-
name
= 'spider'¶
-
parse
(response)¶
-
parse_start_url
(response)¶
-
process_results
(response, results)¶
-
rules
= (<scrapy.contrib.spiders.crawl.Rule object at 0x7fa6cf2d3c50>,)¶
-
set_crawler
(crawler)¶
-
settings
¶
-
start_requests
()¶
-
start_urls
= ['http://en.wikipedia.org/wiki/Programming_language']¶
-
-
CrawlerEngine.
get_urls
(query)[source]¶ Retrieves all URLs associated with given query form database.
Returns: list of URLs.
-
class