venv/lib/python3.7/site-packages/nltk/__pycache__/text.cpython-37.pyc

B
D(<28>]<5D>j<00>	@srdZddlmZmZmZmZddlmZddlm	Z	m
Z
mZddlm
Z
ddlZddlZddlmZddlmZdd	lmZdd
lmZddlmZddlmZmZdd
lmZmZddl m!Z!ddl"m#Z#ddl$m%Z%eddddddddg<07>Z&Gdd<1A>de'<27>Z(e#Gdd<1C>de'<27><03>Z)Gdd<1E>de'<27>Z*e#Gdd <20>d e'<27><03>Z+Gd!d"<22>d"e+<2B>Z,d#d$<24>Z-e.d%k<02>r`e-<2D>dddd d"gZ/dS)&a
This module brings together a variety of NLTK functionality for
text analysis, and provides simple, interactive interfaces.
Functionality includes: concordancing, collocation discovery,
regular expression search over tokenized strings, and
distributional similarity.
<EFBFBD>)<04>print_function<6F>division<6F>unicode_literals<6C>absolute_import)<01>log)<03>defaultdict<63>Counter<65>
namedtuple)<01>reduceN)<01>	text_type)<01>MLE)<01>padded_everygram_pipeline)<01>FreqDist)<01>ConditionalFreqDist)<02>	tokenwrap<61>LazyConcatenation)<02>	f_measure<72>BigramAssocMeasures)<01>BigramCollocationFinder)<01>python_2_unicode_compatible)<01>
sent_tokenize<7A>ConcordanceLine<6E>left<66>query<72>right<68>offset<65>
left_print<EFBFBD>right_print<6E>linec@sTeZdZdZedd<03><00>Zdddd<06>fdd<08>Zd	d
<EFBFBD>Zdd<0C>Zddd<0F>Z	ddd<12>Z
dS)<15>ContextIndexa
    A bidirectional index between words and their 'contexts' in a text.
    The context of a word is usually defined to be the words that occur
    in a fixed window around the word; but other definitions may also
    be used by providing a custom context function.
    cCsH|dkr||d<00><00>nd}|t|<00>dkr<||d<00><00>nd}||fS)z;One left token and one right token, normalized to lowercaser<00>z*START*z*END*)<02>lower<65>len)<04>tokens<6E>irr<00>r%<00>+/tmp/pip-install-4m6m_5d_/nltk/nltk/text.py<70>_default_context2s$zContextIndex._default_contextNcCs|S)Nr%)<01>xr%r%r&<00><lambda>9<00>zContextIndex.<lambda>csv|<04>_<00><02>_|r|<02>_n<08>j<03>_<02>r6<72>fdd<02><08>D<00><01>t<04><01>fdd<04>t<05><02>D<00><01><01>_t<04><01>fdd<04>t<05><02>D<00><01><01>_dS)Ncsg|]}<01>|<01>r|<01>qSr%r%)<02>.0<EFBFBD>t)<01>filterr%r&<00>
<listcomp>Asz)ContextIndex.__init__.<locals>.<listcomp>c3s(|] \}}<02><00>|<02><01><00><01>|<01>fVqdS)N)<02>_key<65>
_context_func)r+r$<00>w)<02>selfr#r%r&<00>	<genexpr>Csz(ContextIndex.__init__.<locals>.<genexpr>c3s(|] \}}<02><00><00>|<01><02><00>|<02>fVqdS)N)r0r/)r+r$r1)r2r#r%r&r3Fs)r/<00>_tokensr0r'<00>CFD<46>	enumerate<74>_word_to_contexts<74>_context_to_words)r2r#Zcontext_funcr-<00>keyr%)r-r2r#r&<00>__init__9szContextIndex.__init__cCs|jS)zw
        :rtype: list(str)
        :return: The document that this context index was
            created from.
        )r4)r2r%r%r&r#IszContextIndex.tokenscCsJ|<00>|<01>}t|j|<00>}i}x(|j<02><03>D]\}}t|t|<05><01>||<q(W|S)z<>
        Return a dictionary mapping from words to 'similarity scores,'
        indicating how often these two words occur in the same
        context.
        )r/<00>setr7<00>itemsr)r2<00>wordZ
word_contexts<74>scoresr1Z
w_contextsr%r%r&<00>word_similarity_dictQs
z!ContextIndex.word_similarity_dict<63>cCs~tt<01>}x\|j|<00>|<01>D]H}xB|j|D]4}||kr*|||j|||j||7<q*WqWt||jdd<02>d|<02>S)NT)r9<00>reverse)r<00>intr7r/r8<00>sorted<65>get)r2r=<00>nr><00>cr1r%r%r&<00>
similar_words`s(zContextIndex.similar_wordsFcs<><00>fdd<02><08>D<00><01><03>fdd<02><08>D<00><01><01><01>fdd<02>tt<01><03><01>D<00>}ttj<04><01><02>|rf|rftdd<06><06><03><01><02>n&<26>spt<07>St<07><00>fdd<08><08>D<00><01>}|Sd	S)
a<EFBFBD>
        Find contexts where the specified words can all appear; and
        return a frequency distribution mapping each context to the
        number of times that context was used.

        :param words: The words used to seed the similarity search
        :type words: str
        :param fail_on_unknown: If true, then raise a value error if
            any of the given words do not occur at all in the index.
        csg|]}<01><00>|<01><01>qSr%)r/)r+r1)r2r%r&r.usz0ContextIndex.common_contexts.<locals>.<listcomp>csg|]}t<00>j|<00><01>qSr%)r;r7)r+r1)r2r%r&r.vscsg|]}<01>|s<04>|<00>qSr%r%)r+r$)<02>contexts<74>wordsr%r&r.wsz%The following word(s) were not found:<3A> c3s*|]"}<01>j|D]}|<02>kr|VqqdS)N)r7)r+r1rF)<02>commonr2r%r&r3<00>sz/ContextIndex.common_contexts.<locals>.<genexpr>N)<08>ranger"r
r;<00>intersection<6F>
ValueError<EFBFBD>joinr)r2rIZfail_on_unknown<77>empty<74>fdr%)rKrHr2rIr&<00>common_contextsjszContextIndex.common_contexts)r@)F)<0B>__name__<5F>
__module__<EFBFBD>__qualname__<5F>__doc__<5F>staticmethodr'r:r#r?rGrRr%r%r%r&r*s

rc@sLeZdZdZdd<03>fdd<05>Zdd<07>Zdd	<09>Zd
d<0B>Zdd
d<0E>Zddd<11>Z	dS)<15>ConcordanceIndexzs
    An index that can be used to look up the offset locations at which
    a given word occurs in a document.
    cCs|S)Nr%)r(r%r%r&r)<00>r*zConcordanceIndex.<lambda>cCsJ||_||_tt<03>|_x.t|<01>D]"\}}|<00>|<04>}|j|<00>|<03>q WdS)a<>
        Construct a new concordance index.

        :param tokens: The document (list of tokens) that this
            concordance index was created from.  This list can be used
            to access the context of a given word occurrence.
        :param key: A function that maps each token to a normalized
            version that will be used as a key in the index.  E.g., if
            you use ``key=lambda s:s.lower()``, then the index will be
            case-insensitive.
        N)r4r/r<00>list<73>_offsetsr6<00>append)r2r#r9<00>indexr=r%r%r&r:<00>s

zConcordanceIndex.__init__cCs|jS)z{
        :rtype: list(str)
        :return: The document that this concordance index was
            created from.
        )r4)r2r%r%r&r#<00>szConcordanceIndex.tokenscCs|<00>|<01>}|j|S)z<>
        :rtype: list(int)
        :return: A list of the offset positions at which the given
            word occurs.  If a key function was specified for the
            index, then given word's key will be looked up.
        )r/rZ)r2r=r%r%r&<00>offsets<74>s
zConcordanceIndex.offsetscCsdt|j<01>t|j<02>fS)Nz+<ConcordanceIndex for %d tokens (%d types)>)r"r4rZ)r2r%r%r&<00>__repr__<5F>szConcordanceIndex.__repr__<5F>Pc	Cs<>|t|<01>dd}|d}g}|<00>|<01>}|r<>x<EFBFBD>|D]<5D>}|j|}|jtd||<00>|<07>}	|j|d||<00>}
d<05>|	<09>|d<06>}d<05>|
<EFBFBD>d|<03>}d<05>|||g<03>}
t|	||
||||
<0A>}|<05>|<0E>q4W|S)zB
        Find all concordance lines given the query word.
        <20><00>rr rJN)r"r]r4<00>maxrOrr[)r2r=<00>widthZ
half_width<EFBFBD>context<78>concordance_listr]r$Z
query_wordZleft_contextZ
right_contextrrZ
line_print<EFBFBD>concordance_liner%r%r&<00>find_concordance<63>s,


z!ConcordanceIndex.find_concordance<63>cCsj|j||d<01>}|std<02>nJt|t|<04><01>}td<03>|t|<04><01><02>x&t|d|<03><00>D]\}}t|j<06>qPWdS)a<>
        Print concordance lines given the query word.
        :param word: The target word
        :type word: str
        :param lines: The number of lines to display (default=25)
        :type lines: int
        :param width: The width of each line, in characters (default=80)
        :type width: int
        :param save: The option to save the concordance.
        :type save: bool
        )rcz
no matcheszDisplaying {} of {} matches:N)rg<00>print<6E>minr"<00>formatr6r)r2r=rc<00>linesrer$rfr%r%r&<00>print_concordance<63>s
z"ConcordanceIndex.print_concordanceN)r_)r_rh)
rSrTrUrVr:r#r]r^rgrmr%r%r%r&rX<00>s

"rXc@s eZdZdZdd<03>Zdd<05>ZdS)<07>
TokenSearchera<72>
    A class that makes it easier to use regular expressions to search
    over tokenized strings.  The tokenized string is converted to a
    string where tokens are marked with angle brackets -- e.g.,
    ``'<the><window><is><still><open>'``.  The regular expression
    passed to the ``findall()`` method is modified to treat angle
    brackets as non-capturing parentheses, in addition to matching the
    token boundaries; and to have ``'.'`` not match the angle brackets.
    cCsd<01>dd<03>|D<00><01>|_dS)N<>css|]}d|dVqdS)<03><<3C>>Nr%)r+r1r%r%r&r3sz)TokenSearcher.__init__.<locals>.<genexpr>)rO<00>_raw)r2r#r%r%r&r:szTokenSearcher.__init__cCs<>t<00>dd|<01>}t<00>dd|<01>}t<00>dd|<01>}t<00>dd|<01>}t<00>||j<03>}x(|D] }|<03>d<03>sL|<03>d<05>rLtd	<09><01>qLWd
d<0B>|D<00>}|S)a"
        Find instances of the regular expression in the text.
        The text is a list of tokens, and a regexp pattern to match
        a single token must be surrounded by angle brackets.  E.g.

        >>> from nltk.text import TokenSearcher
        >>> print('hack'); from nltk.book import text1, text5, text9
        hack...
        >>> text5.findall("<.*><.*><bro>")
        you rule bro; telling you bro; u twizted bro
        >>> text1.findall("<a>(<.*>)<man>")
        monied; nervous; dangerous; white; white; white; pious; queer; good;
        mature; white; Cape; great; wise; wise; butterless; white; fiendish;
        pale; furious; better; certain; complete; dismasted; younger; brave;
        brave; brave; brave
        >>> text9.findall("<th.*>{3,}")
        thread through those; the thought that; that the thing; the thing
        that; that that thing; through these than through; them that the;
        through the thick; them that they; thought that the

        :param regexp: A regular expression
        :type regexp: str
        z\srorpz(?:<(?:rqz)>)z	(?<!\\)\.z[^>]z$Bad regexp for TokenSearcher.findallcSsg|]}|dd<01><00>d<02><01>qS)r <00><><EFBFBD><EFBFBD><EFBFBD>z><)<01>split)r+<00>hr%r%r&r.,sz)TokenSearcher.findall.<locals>.<listcomp>)<07>re<72>sub<75>findallrr<00>
startswith<EFBFBD>endswithrN)r2<00>regexp<78>hitsrur%r%r&rxs
zTokenSearcher.findallN)rSrTrUrVr:rxr%r%r%r&rn<00>s	rnc@s<>eZdZdZdZd6dd<05>Zdd<07>Zdd	<09>Zd7dd
<0A>Zd8dd<0F>Z	d9dd<13>Z
d:dd<15>Zdd<17>Zdd<19>Z
dd<1B>Zd;dd<1D>Zd<dd<1F>Zd d!<21>Zd=d#d$<24>Zd>d'd(<28>Zd)d*<2A>Zd+d,<2C>Zd-d.<2E>Ze<17>d/<2F>Zd0d1<64>Zd2d3<64>Zd4d5<64>ZdS)?<3F>Texta<74>
    A wrapper around a sequence of simple (string) tokens, which is
    intended to support initial exploration of texts (via the
    interactive console).  Its methods perform a variety of analyses
    on the text's contexts (e.g., counting, concordancing, collocation
    discovery), and display the results.  If you wish to write a
    program which makes use of these analyses, then you should bypass
    the ``Text`` class, and use the appropriate analysis function or
    class directly instead.

    A ``Text`` is typically initialized from a given document or
    corpus.  E.g.:

    >>> import nltk.corpus
    >>> from nltk.text import Text
    >>> moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))

    TNcCs<>|jrt|<01>}||_|r ||_ndd|dd<03>krb|dd<03><00>d<01>}d<04>dd<06>|d|<03>D<00><01>|_n"d<04>dd<06>|dd	<09>D<00><01>d
|_dS)zv
        Create a Text object.

        :param tokens: The source text.
        :type tokens: sequence of str
        <20>]Nr@rJcss|]}t|<01>VqdS)N)r)r+<00>tokr%r%r&r3Zsz Text.__init__.<locals>.<genexpr>r css|]}t|<01>VqdS)N)r)r+rr%r%r&r3\s<00>z...)<06>_COPY_TOKENSrYr#<00>namer\rO)r2r#r<><00>endr%r%r&r:Ks z
Text.__init__cCs
|j|S)N)r#)r2r$r%r%r&<00>__getitem__bszText.__getitem__cCs
t|j<01>S)N)r"r#)r2r%r%r&<00>__len__eszText.__len__<5F>OrhcCs.d|jkrt|jdd<03>d<04>|_|j<03>|||<03>S)a<>
        Prints a concordance for ``word`` with the specified context window.
        Word matching is not case-sensitive.

        :param word: The target word
        :type word: str
        :param width: The width of each line, in characters (default=80)
        :type width: int
        :param lines: The number of lines to display (default=25)
        :type lines: int

        :seealso: ``ConcordanceIndex``
        <20>_concordance_indexcSs|<00><00>S)N)r!)<01>sr%r%r&r)|r*z"Text.concordance.<locals>.<lambda>)r9)<05>__dict__rXr#r<>rm)r2r=rcrlr%r%r&<00>concordancels
zText.concordancecCs4d|jkrt|jdd<03>d<04>|_|j<03>||<02>d|<03>S)a<>
        Generate a concordance for ``word`` with the specified context window.
        Word matching is not case-sensitive.

        :param word: The target word
        :type word: str
        :param width: The width of each line, in characters (default=80)
        :type width: int
        :param lines: The number of lines to display (default=25)
        :type lines: int

        :seealso: ``ConcordanceIndex``
        r<>cSs|<00><00>S)N)r!)r<>r%r%r&r)<00>r*z'Text.concordance_list.<locals>.<lambda>)r9N)r<>rXr#r<>rg)r2r=rcrlr%r%r&re<00>s
zText.concordance_listr@r`cs<>d|jkr|j|kr|j|ks<>||_||_ddlm}|<03>d<04><01>t<06>|j|<02>}|<04>	d<05>|<04>
<EFBFBD>fdd<07><08>t<0B>}|<04>|j
|<01>|_dd	<09>|jD<00>S)
a
        Return collocations derived from the text, ignoring stopwords.

        :param num: The maximum number of collocations to return.
        :type num: int
        :param window_size: The number of tokens spanned by a collocation (default=2)
        :type window_size: int
        <20>
_collocationsr)<01>	stopwords<64>englishr`cst|<00>dkp|<00><01><00>kS)N<>)r"r!)r1)<01>
ignored_wordsr%r&r)<00>r*z'Text.collocation_list.<locals>.<lambda>cSsg|]\}}|d|<00>qS)rJr%)r+<00>w1<77>w2r%r%r&r.<00>sz)Text.collocation_list.<locals>.<listcomp>)r<>Z_numZ_window_size<7A>nltk.corpusr<73>rIrZ
from_wordsr#Zapply_freq_filterZapply_word_filterrZnbestZlikelihood_ratior<6F>)r2<00>num<75>window_sizer<65><00>finderZbigram_measuresr%)r<>r&<00>collocation_list<73>s


zText.collocation_listcCs*dd<02>|<00>||<02>D<00>}tt|dd<04><02>dS)a
        Print collocations derived from the text, ignoring stopwords.

        :param num: The maximum number of collocations to print.
        :type num: int
        :param window_size: The number of tokens spanned by a collocation (default=2)
        :type window_size: int
        cSsg|]\}}|d|<00>qS)rJr%)r+r<>r<>r%r%r&r.<00>sz%Text.collocations.<locals>.<listcomp>z; )<01>	separatorN)r<>rir)r2r<>r<>Zcollocation_stringsr%r%r&<00>collocations<6E>szText.collocationscCs|j<00>|<01>S)zJ
        Count the number of times this word appears in the text.
        )r#<00>count)r2r=r%r%r&r<><00>sz
Text.countcCs|j<00>|<01>S)zQ
        Find the index of the first occurrence of the word in the text.
        )r#r\)r2r=r%r%r&r\<00>sz
Text.indexcCst<00>dS)N)<01>NotImplementedError)r2<00>methodr%r%r&<00>readability<74>szText.readabilitycs<>d|jkr$t|jdd<03>dd<03>d<05>|_<03><02><04><00>|jj<05><01><02><01><06>kr<>t<07><01><00><01>t<08><00><01>fdd<07><08><01><06>D<00><01>}dd	<09>|<03>	|<02>D<00>}t
t|<04><01>nt
d
<EFBFBD>dS)a~
        Distributional similarity: find other words which appear in the
        same contexts as the specified word; list most similar words first.

        :param word: The word used to seed the similarity search
        :type word: str
        :param num: The number of words to generate (default=20)
        :type num: int
        :seealso: ContextIndex.similar_words()
        <20>_word_context_indexcSs|<00><00>S)N)<01>isalpha)r(r%r%r&r)<00>r*zText.similar.<locals>.<lambda>cSs|<00><00>S)N)r!)r<>r%r%r&r)<00>r*)r-r9c3s0|](}<01>|D]}|<02>kr|<01>ks|VqqdS)Nr%)r+r1rF)rH<00>wcir=r%r&r3<00>szText.similar.<locals>.<genexpr>cSsg|]\}}|<01>qSr%r%)r+r1<00>_r%r%r&r.<00>sz Text.similar.<locals>.<listcomp>z
No matchesN)r<>rr#r<>r!r7Z
conditionsr;r<00>most_commonrir)r2r=r<>rQrIr%)rHr<>r=r&<00>similar<61>s
zText.similarc
Cs<>d|jkrt|jdd<03>d<04>|_yJ|j<03>|d<05>}|s<td<06>n*dd<08>|<03>|<02>D<00>}ttd	d
<EFBFBD>|D<00><01><01>Wn*tk
r<EFBFBD>}zt|<05>Wdd}~XYnXdS)aY
        Find contexts where the specified words appear; list
        most frequent common contexts first.

        :param words: The words used to seed the similarity search
        :type words: str
        :param num: The number of words to generate (default=20)
        :type num: int
        :seealso: ContextIndex.common_contexts()
        r<>cSs|<00><00>S)N)r!)r<>r%r%r&r)r*z&Text.common_contexts.<locals>.<lambda>)r9TzNo common contexts were foundcSsg|]\}}|<01>qSr%r%)r+r1r<>r%r%r&r.sz(Text.common_contexts.<locals>.<listcomp>css|]\}}|d|VqdS)r<>Nr%)r+r<>r<>r%r%r&r3	sz'Text.common_contexts.<locals>.<genexpr>N)	r<>rr#r<>rRrir<>rrN)r2rIr<>rQZranked_contexts<74>er%r%r&rR<00>s

zText.common_contextscCsddlm}|||<01>dS)z<>
        Produce a plot showing the distribution of the words through the text.
        Requires pylab to be installed.

        :param words: The words to be plotted
        :type words: list(str)
        :seealso: nltk.draw.dispersion_plot()
        r)<01>dispersion_plotN)Z	nltk.drawr<77>)r2rIr<>r%r%r&r<>s	zText.dispersion_plotr<74>cCs(t||<01>\}}t|d<01>}|<05>||<04>|S)N)<01>order)r
rZfit)r2Ztokenized_sentsrEZ
train_dataZpadded_sents<74>modelr%r%r&<00>_train_default_ngram_lms
zText._train_default_ngram_lm<6C>d<00>*c	Cs<>dd<02>td<03>|j<02><01>D<00>|_t|d<04>sFtdtjd<06>|j|jdd<08>|_	g}|d	ksZt
d
<EFBFBD><01>xZt|<04>|kr<>x@t|j	j
|||d<0B><03>D]&\}}|dkr<>q<EFBFBD>|d
kr<>P|<04>|<06>q<>W|d7}q\W|r<>d<03>|<02>dnd}|t|d|<01><00>}t|<08>|S)a
        Print random text, generated using a trigram language model.
        See also `help(nltk.lm)`.

        :param length: The length of text to generate (default=100)
        :type length: int

        :param text_seed: Generation can be conditioned on preceding context.
        :type text_seed: list(str)

        :param random_seed: A random seed or an instance of `random.Random`. If provided,
        makes the random sampling part of generation reproducible. (default=42)
        :type random_seed: int

        cSsg|]}|<01>d<00><01>qS)rJ)rt)r+<00>sentr%r%r&r.3sz!Text.generate.<locals>.<listcomp>rJZ
trigram_modelzBuilding ngram index...)<01>filer<65>)rErz!The `length` must be more than 0.)<02>	text_seed<65>random_seedz<s>z</s>r roN)rrOr#Z_tokenized_sents<74>hasattrri<00>sys<79>stderrr<72>Z_trigram_model<65>AssertionErrorr"r6<00>generater[r)	r2<00>lengthr<68>r<>Zgenerated_tokens<6E>idx<64>token<65>prefixZ
output_strr%r%r&r<>!s*
z
Text.generatecGs|<00><00>j|<01>dS)zc
        See documentation for FreqDist.plot()
        :seealso: nltk.prob.FreqDist.plot()
        N)<02>vocab<61>plot)r2<00>argsr%r%r&r<>Psz	Text.plotcCsd|jkrt|<00>|_|jS)z.
        :seealso: nltk.prob.FreqDist
        <20>_vocab)r<>rr<>)r2r%r%r&r<>Ws

z
Text.vocabcCs@d|jkrt|<00>|_|j<02>|<01>}dd<03>|D<00>}tt|d<04><02>dS)a<>
        Find instances of the regular expression in the text.
        The text is a list of tokens, and a regexp pattern to match
        a single token must be surrounded by angle brackets.  E.g.

        >>> print('hack'); from nltk.book import text1, text5, text9
        hack...
        >>> text5.findall("<.*><.*><bro>")
        you rule bro; telling you bro; u twizted bro
        >>> text1.findall("<a>(<.*>)<man>")
        monied; nervous; dangerous; white; white; white; pious; queer; good;
        mature; white; Cape; great; wise; wise; butterless; white; fiendish;
        pale; furious; better; certain; complete; dismasted; younger; brave;
        brave; brave; brave
        >>> text9.findall("<th.*>{3,}")
        thread through those; the thought that; that the thing; the thing
        that; that that thing; through these than through; them that the;
        through the thick; them that they; thought that the

        :param regexp: A regular expression
        :type regexp: str
        <20>_token_searchercSsg|]}d<00>|<01><01>qS)rJ)rO)r+rur%r%r&r.|sz Text.findall.<locals>.<listcomp>z; N)r<>rnr<>rxrir)r2r{r|r%r%r&rx`s


zText.findallz\w+|[\.\!\?]cCs<>|d}x$|dkr,|j<00>||<00>s,|d8}q
W|dkr>||nd}|d}x(|t|<01>krr|j<00>||<00>sr|d7}qLW|t|<01>kr<>||nd}||fS)z<>
        One left & one right token, both case-normalized.  Skip over
        non-sentence-final punctuation.  Used by the ``ContextIndex``
        that is created for ``similar()`` and ``common_contexts()``.
        r rz*START*z*END*)<03>_CONTEXT_RE<52>matchr")r2r#r$<00>jrrr%r%r&<00>_context<78>sz
Text._contextcCs
d|jS)Nz
<Text: %s>)r<>)r2r%r%r&<00>__str__<5F>szText.__str__cCs
d|jS)Nz
<Text: %s>)r<>)r2r%r%r&r^<00>sz
Text.__repr__)N)r<>rh)r<>rh)r@r`)r@r`)r@)r@)r<>)r<>Nr<4E>)rSrTrUrVr<>r:r<>r<>r<>rer<>r<>r<>r\r<>r<>rRr<>r<>r<>r<>r<>rxrv<00>compiler<65>r<>r<>r^r%r%r%r&r}0s0


"


/	#
r}c@s0eZdZdZdd<03>Zdd<05>Zdd<07>Zdd	<09>Zd
S)<0B>TextCollectionaVA collection of texts, which can be loaded with list of texts, or
    with a corpus consisting of one or more texts, and which supports
    counting, concordancing, collocation discovery, etc.  Initialize a
    TextCollection as follows:

    >>> import nltk.corpus
    >>> from nltk.text import TextCollection
    >>> print('hack'); from nltk.book import text1, text2, text3
    hack...
    >>> gutenberg = TextCollection(nltk.corpus.gutenberg)
    >>> mytexts = TextCollection([text1, text2, text3])

    Iterating over a TextCollection produces all the tokens of all the
    texts in order.
    cs@t<00>d<01>r <20>fdd<03><08><00><01>D<00><01><00>|_t<03>|t<05><00><01>i|_dS)NrIcsg|]}<01><00>|<01><01>qSr%)rI)r+<00>f)<01>sourcer%r&r.<00>sz+TextCollection.__init__.<locals>.<listcomp>)r<>Zfileids<64>_textsr}r:r<00>
_idf_cache)r2r<>r%)r<>r&r:<00>s

zTextCollection.__init__cCs|<02>|<01>t|<02>S)z$ The frequency of the term in text. )r<>r")r2<00>term<72>textr%r%r&<00>tf<74>szTextCollection.tfcsj|j<00><01><00>}|dkrft<02>fdd<03>|jD<00><01>}t|j<03>dkrBtd<05><01>|rXtt|j<03>|<00>nd}||j<00><|S)z<> The number of texts in the corpus divided by the
        number of texts that the term appears in.
        If a term does not appear in the corpus, 0.0 is returned. Ncsg|]}<01>|krd<00>qS)Tr%)r+r<>)r<>r%r&r.<00>sz&TextCollection.idf.<locals>.<listcomp>rz+IDF undefined for empty document collectiong)r<>rDr"r<>rNr)r2r<><00>idf<64>matchesr%)r<>r&r<><00>s
zTextCollection.idfcCs|<00>||<02>|<00>|<01>S)N)r<>r<>)r2r<>r<>r%r%r&<00>tf_idf<64>szTextCollection.tf_idfN)rSrTrUrVr:r<>r<>r<>r%r%r%r&r<><00>s
r<>cCs<>ddlm}t|jdd<04><01>}t|<01>t<04>td<05>|<01>d<03>t<04>td<06>|<01>d<03>t<04>td<07>|<01><07>t<04>td<08>|<01>dd	d
dg<04>t<04>td<0C>|<01>	d
<0A>t<04>td<0E>td|d<00>td|dd<12><00>td|<01>
<EFBFBD>d<00>dS)Nr)<01>brown<77>news)<01>
categorieszConcordance:zDistributionally similar words:z
Collocations:zDispersion plot:<3A>reportZsaidZ	announcedzVocabulary plot:<3A>2z	Indexing:ztext[3]:r<>z
text[3:5]:<3A>ztext.vocab()['news']:)r<>r<>r}rIrir<>r<>r<>r<>r<>r<>)r<>r<>r%r%r&<00>demo<6D>s.


r<><00>__main__)0rV<00>
__future__rrrr<00>mathr<00>collectionsrrr	<00>	functoolsr
rvr<><00>sixrZnltk.lmrZnltk.lm.preprocessingr
Znltk.probabilityrrr5Z	nltk.utilrrZnltk.metricsrrZnltk.collocationsrZnltk.compatrZ
nltk.tokenizerr<00>objectrrXrnr}r<>r<>rS<00>__all__r%r%r%r&<00><module>sH[q9v/