Crawler Engine¶

class engine.CrawlerEngine.CrawlerEngine[source]¶

class CustomSpider(*a, **kw)[source]¶

config = {'start_urls': 'http://en.wikipedia.org/wiki/Programming_language', 'allowed_domains': 'en.wikipedia.org'}¶

config_file = <closed file '/home/docs/checkouts/readthedocs.org/user_builds/iosr-crawler/checkouts/latest/src/engine/conf.crawler', mode 'r'>¶

config_path = '/home/docs/checkouts/readthedocs.org/user_builds/iosr-crawler/checkouts/latest/src/engine/conf.crawler'¶

log(message, level=10, **kw)¶: Log the given messages at the given log level. Always use this method to send log messages from your spider

rules = (<scrapy.contrib.spiders.crawl.Rule object at 0x7fa6cf2d3c50>,)¶

CrawlerEngine.add_query(user_id, query)[source]¶

Add crawling query for given user.

Parameters:	user_id (int) – ID of user associated with the query. query (str) – User’s query.

CrawlerEngine.get_urls(query)[source]¶

Retrieves all URLs associated with given query form database.

Returns:	list of URLs.

CrawlerEngine.get_user_queries(user_id)[source]¶

Retrieves user queries form database.

Parameters:	user_id (int) – Id of user associated with the query.
Returns:	list of user queries.

static CrawlerEngine.notify_agents()[source]¶: Notifies agent about new crawling query.

CrawlerEngine.start_crawling()[source]¶: Notifies all agents and if crawling process in not started, starts it.