Extractor

class nlp.extractor.NLPExtractor[source]
build_stop_word_regex()[source]

Creates stop word regex.

Returns:stop word pattern.
static calculate_word_scores(phrase_list)[source]

Calculates words scores based on their frequency and degree.

Parameters:phrase_list (list) – List of phrases to be processed.
Returns:mapping between word and its score.
static generate_candidate_keyword_scores(phrase_list, word_score)[source]

Generates scores for candidate keywords.

Parameters:
  • phrase_list (list) – List of phrases to be processed.
  • word_score (map) – Mapping between word and its score.
Returns:

mapping between phrases and their scores.

static generate_candidate_keywords(sentence_list, stopword_pattern)[source]

Generates list of keywords candidates.

Parameters:
  • sentence_list (list) – List of sentences to be processed.
  • stopword_pattern (str) – Stop words pattern.
Returns:

list of keywords

static is_number(word)[source]

Checks whether word is a number.

Parameters:word (str) – Word to be checked.
Returns:True or False
load_stop_words()[source]

Utility function to load stop words from a file and return as a list of words.

Returns:list A list of stop words.
run(text)[source]

Extracts keywords from the text.

Parameters:text (str) – Text to be processed.
Returns:list of keywords.
static separate_words(text, min_word_return_size)[source]

Utility function to return a list of all words that are have a length greater than a specified number of characters.

Parameters:
  • text (str) – The text that must be split in to words.
  • min_word_return_size (int) – The minimum no of characters a word must have to be included.
Returns:

list of separated words.

static split_sentences(text)[source]

Utility function to return a list of sentences.

Parameters:text (str) – The text that must be split in to sentences.
Returns:sentences List of sentences created due to split.