Files
old-nlp/venv/lib/python3.7/site-packages/nltk/__pycache__/text.cpython-37.pyc

351 lines
27 KiB
Plaintext
Raw Normal View History

2019-10-20 13:16:49 +02:00
B
D(<28>]<5D>j<00> @srdZddlmZmZmZmZddlmZddlm Z m
Z
m Z ddl m Z ddlZddlZddlmZddlmZdd lmZdd
lmZdd lmZdd lmZmZdd lmZmZddl m!Z!ddl"m#Z#ddl$m%Z%e ddddddddg<07>Z&Gdd<1A>de'<27>Z(e#Gdd<1C>de'<27><03>Z)Gdd<1E>de'<27>Z*e#Gdd <20>d e'<27><03>Z+Gd!d"<22>d"e+<2B>Z,d#d$<24>Z-e.d%k<02>r`e-<2D>dddd d"gZ/dS)&a 
This module brings together a variety of NLTK functionality for
text analysis, and provides simple, interactive interfaces.
Functionality includes: concordancing, collocation discovery,
regular expression search over tokenized strings, and
distributional similarity.
<EFBFBD>)<04>print_function<6F>division<6F>unicode_literals<6C>absolute_import)<01>log)<03> defaultdict<63>Counter<65>
namedtuple)<01>reduceN)<01> text_type)<01>MLE)<01>padded_everygram_pipeline)<01>FreqDist)<01>ConditionalFreqDist)<02> tokenwrap<61>LazyConcatenation)<02> f_measure<72>BigramAssocMeasures)<01>BigramCollocationFinder)<01>python_2_unicode_compatible)<01> sent_tokenize<7A>ConcordanceLine<6E>left<66>query<72>right<68>offset<65>
left_print<EFBFBD> right_print<6E>linec@sTeZdZdZedd<03><00>Zdddd<06>fdd<08>Zd d
<EFBFBD>Zd d <0C>Zddd<0F>Z ddd<12>Z
dS)<15> ContextIndexa
A bidirectional index between words and their 'contexts' in a text.
The context of a word is usually defined to be the words that occur
in a fixed window around the word; but other definitions may also
be used by providing a custom context function.
cCsH|dkr||d<00><00>nd}|t|<00>dkr<||d<00><00>nd}||fS)z;One left token and one right token, normalized to lowercaser<00>z*START*z*END*)<02>lower<65>len)<04>tokens<6E>irr<00>r%<00>+/tmp/pip-install-4m6m_5d_/nltk/nltk/text.py<70>_default_context2s$zContextIndex._default_contextNcCs|S)Nr%)<01>xr%r%r&<00><lambda>9<00>zContextIndex.<lambda>csv|<04>_<00><02>_|r|<02>_n<08>j<03>_<02>r6<72>fdd<02><08>D<00><01>t<04><01>fdd<04>t<05><02>D<00><01><01>_t<04><01>fdd<04>t<05><02>D<00><01><01>_dS)Ncsg|]}<01>|<01>r|<01>qSr%r%)<02>.0<EFBFBD>t)<01>filterr%r&<00>
<listcomp>Asz)ContextIndex.__init__.<locals>.<listcomp>c3s(|] \}}<02><00>|<02><01><00><01>|<01>fVqdS)N)<02>_key<65> _context_func)r+r$<00>w)<02>selfr#r%r&<00> <genexpr>Csz(ContextIndex.__init__.<locals>.<genexpr>c3s(|] \}}<02><00><00>|<01><02><00>|<02>fVqdS)N)r0r/)r+r$r1)r2r#r%r&r3Fs)r/<00>_tokensr0r'<00>CFD<46> enumerate<74>_word_to_contexts<74>_context_to_words)r2r#Z context_funcr-<00>keyr%)r-r2r#r&<00>__init__9szContextIndex.__init__cCs|jS)zw
:rtype: list(str)
:return: The document that this context index was
created from.
)r4)r2r%r%r&r#IszContextIndex.tokenscCsJ|<00>|<01>}t|j|<00>}i}x(|j<02><03>D]\}}t|t|<05><01>||<q(W|S)z<>
Return a dictionary mapping from words to 'similarity scores,'
indicating how often these two words occur in the same
context.
)r/<00>setr7<00>itemsr)r2<00>wordZ word_contexts<74>scoresr1Z
w_contextsr%r%r&<00>word_similarity_dictQs 
z!ContextIndex.word_similarity_dict<63>cCs~tt<01>}x\|j|<00>|<01>D]H}xB|j|D]4}||kr*|||j|||j||7<q*WqWt||jdd<02>d|<02>S)NT)r9<00>reverse)r<00>intr7r/r8<00>sorted<65>get)r2r=<00>nr><00>cr1r%r%r&<00> similar_words`s(zContextIndex.similar_wordsFcs<><00>fdd<02><08>D<00><01><03>fdd<02><08>D<00><01><01><01>fdd<02>tt<01><03><01>D<00>}ttj<04><01><02>|rf|rftdd<06><06><03><01><02>n&<26>spt<07>St<07><00>fdd<08><08>D<00><01>}|Sd S)
a<EFBFBD>
Find contexts where the specified words can all appear; and
return a frequency distribution mapping each context to the
number of times that context was used.
:param words: The words used to seed the similarity search
:type words: str
:param fail_on_unknown: If true, then raise a value error if
any of the given words do not occur at all in the index.
csg|]}<01><00>|<01><01>qSr%)r/)r+r1)r2r%r&r.usz0ContextIndex.common_contexts.<locals>.<listcomp>csg|]}t<00>j|<00><01>qSr%)r;r7)r+r1)r2r%r&r.vscsg|]}<01>|s<04>|<00>qSr%r%)r+r$)<02>contexts<74>wordsr%r&r.wsz%The following word(s) were not found:<3A> c3s*|]"}<01>j|D]}|<02>kr|VqqdS)N)r7)r+r1rF)<02>commonr2r%r&r3<00>sz/ContextIndex.common_contexts.<locals>.<genexpr>N)<08>ranger"r
r;<00> intersection<6F>
ValueError<EFBFBD>joinr)r2rIZfail_on_unknown<77>empty<74>fdr%)rKrHr2rIr&<00>common_contextsjs  zContextIndex.common_contexts)r@)F) <0B>__name__<5F>
__module__<EFBFBD> __qualname__<5F>__doc__<5F> staticmethodr'r:r#r?rGrRr%r%r%r&r*s 
rc@sLeZdZdZdd<03>fdd<05>Zdd<07>Zdd <09>Zd
d <0B>Zdd d<0E>Zddd<11>Z dS)<15>ConcordanceIndexzs
An index that can be used to look up the offset locations at which
a given word occurs in a document.
cCs|S)Nr%)r(r%r%r&r)<00>r*zConcordanceIndex.<lambda>cCsJ||_||_tt<03>|_x.t|<01>D]"\}}|<00>|<04>}|j|<00>|<03>q WdS)a<>
Construct a new concordance index.
:param tokens: The document (list of tokens) that this
concordance index was created from. This list can be used
to access the context of a given word occurrence.
:param key: A function that maps each token to a normalized
version that will be used as a key in the index. E.g., if
you use ``key=lambda s:s.lower()``, then the index will be
case-insensitive.
N)r4r/r<00>list<73>_offsetsr6<00>append)r2r#r9<00>indexr=r%r%r&r:<00>s 

zConcordanceIndex.__init__cCs|jS)z{
:rtype: list(str)
:return: The document that this concordance index was
created from.
)r4)r2r%r%r&r#<00>szConcordanceIndex.tokenscCs|<00>|<01>}|j|S)z<>
:rtype: list(int)
:return: A list of the offset positions at which the given
word occurs. If a key function was specified for the
index, then given word's key will be looked up.
)r/rZ)r2r=r%r%r&<00>offsets<74>s
zConcordanceIndex.offsetscCsdt|j<01>t|j<02>fS)Nz+<ConcordanceIndex for %d tokens (%d types)>)r"r4rZ)r2r%r%r&<00>__repr__<5F>szConcordanceIndex.__repr__<5F>Pc Cs<>|t|<01>dd}|d}g}|<00>|<01>}|r<>x<EFBFBD>|D]<5D>}|j|}|jtd||<00>|<07>} |j|d||<00>}
d<05>| <09>| d<06>} d<05>|
<EFBFBD>d|<03>} d<05>| || g<03>} t| ||
|| | | <0A>}|<05>|<0E>q4W|S)zB
Find all concordance lines given the query word.
<20><00>rr rJN)r"r]r4<00>maxrOrr[)r2r=<00>widthZ
half_width<EFBFBD>context<78>concordance_listr]r$Z
query_wordZ left_contextZ right_contextrrZ
line_print<EFBFBD>concordance_liner%r%r&<00>find_concordance<63>s,


z!ConcordanceIndex.find_concordance<63>cCsj|j||d<01>}|std<02>nJt|t|<04><01>}td<03>|t|<04><01><02>x&t|d|<03><00>D]\}}t|j<06>qPWdS)a<>
Print concordance lines given the query word.
:param word: The target word
:type word: str
:param lines: The number of lines to display (default=25)
:type lines: int
:param width: The width of each line, in characters (default=80)
:type width: int
:param save: The option to save the concordance.
:type save: bool
)rcz
no matcheszDisplaying {} of {} matches:N)rg<00>print<6E>minr"<00>formatr6r)r2r=rc<00>linesrer$rfr%r%r&<00>print_concordance<63>s 
z"ConcordanceIndex.print_concordanceN)r_)r_rh)
rSrTrUrVr:r#r]r^rgrmr%r%r%r&rX<00>s

"rXc@s eZdZdZdd<03>Zdd<05>ZdS)<07> TokenSearchera<72>
A class that makes it easier to use regular expressions to search
over tokenized strings. The tokenized string is converted to a
string where tokens are marked with angle brackets -- e.g.,
``'<the><window><is><still><open>'``. The regular expression
passed to the ``findall()`` method is modified to treat angle
brackets as non-capturing parentheses, in addition to matching the
token boundaries; and to have ``'.'`` not match the angle brackets.
cCsd<01>dd<03>|D<00><01>|_dS)N<>css|]}d|dVqdS)<03><<3C>>Nr%)r+r1r%r%r&r3sz)TokenSearcher.__init__.<locals>.<genexpr>)rO<00>_raw)r2r#r%r%r&r:szTokenSearcher.__init__cCs<>t<00>dd|<01>}t<00>dd|<01>}t<00>dd|<01>}t<00>dd|<01>}t<00>||j<03>}x(|D] }|<03>d<03>sL|<03>d<05>rLtd <09><01>qLWd
d <0B>|D<00>}|S) a"
Find instances of the regular expression in the text.
The text is a list of tokens, and a regexp pattern to match
a single token must be surrounded by angle brackets. E.g.
>>> from nltk.text import TokenSearcher
>>> print('hack'); from nltk.book import text1, text5, text9
hack...
>>> text5.findall("<.*><.*><bro>")
you rule bro; telling you bro; u twizted bro
>>> text1.findall("<a>(<.*>)<man>")
monied; nervous; dangerous; white; white; white; pious; queer; good;
mature; white; Cape; great; wise; wise; butterless; white; fiendish;
pale; furious; better; certain; complete; dismasted; younger; brave;
brave; brave; brave
>>> text9.findall("<th.*>{3,}")
thread through those; the thought that; that the thing; the thing
that; that that thing; through these than through; them that the;
through the thick; them that they; thought that the
:param regexp: A regular expression
:type regexp: str
z\srorpz(?:<(?:rqz)>)z (?<!\\)\.z[^>]z$Bad regexp for TokenSearcher.findallcSsg|]}|dd<01><00>d<02><01>qS)r <00><><EFBFBD><EFBFBD><EFBFBD>z><)<01>split)r+<00>hr%r%r&r.,sz)TokenSearcher.findall.<locals>.<listcomp>)<07>re<72>sub<75>findallrr<00>
startswith<EFBFBD>endswithrN)r2<00>regexp<78>hitsrur%r%r&rxs
 zTokenSearcher.findallN)rSrTrUrVr:rxr%r%r%r&rn<00>s rnc@s<>eZdZdZdZd6dd<05>Zdd<07>Zdd <09>Zd7d d <0A>Zd8dd<0F>Z d9dd<13>Z
d:dd<15>Z dd<17>Z dd<19>Z dd<1B>Zd;dd<1D>Zd<dd<1F>Zd d!<21>Zd=d#d$<24>Zd>d'd(<28>Zd)d*<2A>Zd+d,<2C>Zd-d.<2E>Ze<17>d/<2F>Zd0d1<64>Zd2d3<64>Zd4d5<64>ZdS)?<3F>Texta<74>
A wrapper around a sequence of simple (string) tokens, which is
intended to support initial exploration of texts (via the
interactive console). Its methods perform a variety of analyses
on the text's contexts (e.g., counting, concordancing, collocation
discovery), and display the results. If you wish to write a
program which makes use of these analyses, then you should bypass
the ``Text`` class, and use the appropriate analysis function or
class directly instead.
A ``Text`` is typically initialized from a given document or
corpus. E.g.:
>>> import nltk.corpus
>>> from nltk.text import Text
>>> moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))
TNcCs<>|jrt|<01>}||_|r ||_ndd|dd<03>krb|dd<03><00>d<01>}d<04>dd<06>|d|<03>D<00><01>|_n"d<04>dd<06>|dd <09>D<00><01>d
|_dS) zv
Create a Text object.
:param tokens: The source text.
:type tokens: sequence of str
<20>]Nr@rJcss|]}t|<01>VqdS)N)r )r+<00>tokr%r%r&r3Zsz Text.__init__.<locals>.<genexpr>r css|]}t|<01>VqdS)N)r )r+rr%r%r&r3\s<00>z...)<06> _COPY_TOKENSrYr#<00>namer\rO)r2r#r<><00>endr%r%r&r:Ks z Text.__init__cCs
|j|S)N)r#)r2r$r%r%r&<00> __getitem__bszText.__getitem__cCs
t|j<01>S)N)r"r#)r2r%r%r&<00>__len__esz Text.__len__<5F>OrhcCs.d|jkrt|jdd<03>d<04>|_|j<03>|||<03>S)a<>
Prints a concordance for ``word`` with the specified context window.
Word matching is not case-sensitive.
:param word: The target word
:type word: str
:param width: The width of each line, in characters (default=80)
:type width: int
:param lines: The number of lines to display (default=25)
:type lines: int
:seealso: ``ConcordanceIndex``
<20>_concordance_indexcSs|<00><00>S)N)r!)<01>sr%r%r&r)|r*z"Text.concordance.<locals>.<lambda>)r9)<05>__dict__rXr#r<>rm)r2r=rcrlr%r%r&<00> concordancels
zText.concordancecCs4d|jkrt|jdd<03>d<04>|_|j<03>||<02>d|<03>S)a<>
Generate a concordance for ``word`` with the specified context window.
Word matching is not case-sensitive.
:param word: The target word
:type word: str
:param width: The width of each line, in characters (default=80)
:type width: int
:param lines: The number of lines to display (default=25)
:type lines: int
:seealso: ``ConcordanceIndex``
r<>cSs|<00><00>S)N)r!)r<>r%r%r&r)<00>r*z'Text.concordance_list.<locals>.<lambda>)r9N)r<>rXr#r<>rg)r2r=rcrlr%r%r&re<00>s
zText.concordance_listr@r`cs<>d|jkr|j|kr|j|ks<>||_||_ddlm}|<03>d<04><01>t<06>|j|<02>}|<04> d<05>|<04>
<EFBFBD>fdd<07><08>t <0B>}|<04> |j |<01>|_dd <09>|jD<00>S)
a
Return collocations derived from the text, ignoring stopwords.
:param num: The maximum number of collocations to return.
:type num: int
:param window_size: The number of tokens spanned by a collocation (default=2)
:type window_size: int
<20> _collocationsr)<01> stopwords<64>englishr`cst|<00>dkp|<00><01><00>kS)N<>)r"r!)r1)<01> ignored_wordsr%r&r)<00>r*z'Text.collocation_list.<locals>.<lambda>cSsg|]\}}|d|<00>qS)rJr%)r+<00>w1<77>w2r%r%r&r.<00>sz)Text.collocation_list.<locals>.<listcomp>)r<>Z_numZ _window_size<7A> nltk.corpusr<73>rIrZ
from_wordsr#Zapply_freq_filterZapply_word_filterrZnbestZlikelihood_ratior<6F>)r2<00>num<75> window_sizer<65><00>finderZbigram_measuresr%)r<>r&<00>collocation_list<73>s


 

zText.collocation_listcCs*dd<02>|<00>||<02>D<00>}tt|dd<04><02>dS)a
Print collocations derived from the text, ignoring stopwords.
:param num: The maximum number of collocations to print.
:type num: int
:param window_size: The number of tokens spanned by a collocation (default=2)
:type window_size: int
cSsg|]\}}|d|<00>qS)rJr%)r+r<>r<>r%r%r&r.<00>sz%Text.collocations.<locals>.<listcomp>z; )<01> separatorN)r<>rir)r2r<>r<>Zcollocation_stringsr%r%r&<00> collocations<6E>s zText.collocationscCs |j<00>|<01>S)zJ
Count the number of times this word appears in the text.
)r#<00>count)r2r=r%r%r&r<><00>sz
Text.countcCs |j<00>|<01>S)zQ
Find the index of the first occurrence of the word in the text.
)r#r\)r2r=r%r%r&r\<00>sz
Text.indexcCst<00>dS)N)<01>NotImplementedError)r2<00>methodr%r%r&<00> readability<74>szText.readabilitycs<>d|jkr$t|jdd<03>dd<03>d<05>|_<03><02><04><00>|jj<05><01><02><01><06>kr<>t<07><01><00><01>t<08><00><01>fdd<07><08><01><06>D<00><01>}dd <09>|<03> |<02>D<00>}t
t |<04><01>nt
d
<EFBFBD>d S) a~
Distributional similarity: find other words which appear in the
same contexts as the specified word; list most similar words first.
:param word: The word used to seed the similarity search
:type word: str
:param num: The number of words to generate (default=20)
:type num: int
:seealso: ContextIndex.similar_words()
<20>_word_context_indexcSs|<00><00>S)N)<01>isalpha)r(r%r%r&r)<00>r*zText.similar.<locals>.<lambda>cSs|<00><00>S)N)r!)r<>r%r%r&r)<00>r*)r-r9c3s0|](}<01>|D]}|<02>kr|<01>ks|VqqdS)Nr%)r+r1rF)rH<00>wcir=r%r&r3<00>s zText.similar.<locals>.<genexpr>cSsg|] \}}|<01>qSr%r%)r+r1<00>_r%r%r&r.<00>sz Text.similar.<locals>.<listcomp>z
No matchesN) r<>rr#r<>r!r7Z
conditionsr;r<00> most_commonrir)r2r=r<>rQrIr%)rHr<>r=r&<00>similar<61>s
  z Text.similarc
Cs<>d|jkrt|jdd<03>d<04>|_yJ|j<03>|d<05>}|s<td<06>n*dd<08>|<03>|<02>D<00>}ttd d
<EFBFBD>|D<00><01><01>Wn*tk
r<EFBFBD>}z t|<05>Wd d }~XYnXd S) aY
Find contexts where the specified words appear; list
most frequent common contexts first.
:param words: The words used to seed the similarity search
:type words: str
:param num: The number of words to generate (default=20)
:type num: int
:seealso: ContextIndex.common_contexts()
r<>cSs|<00><00>S)N)r!)r<>r%r%r&r)r*z&Text.common_contexts.<locals>.<lambda>)r9TzNo common contexts were foundcSsg|] \}}|<01>qSr%r%)r+r1r<>r%r%r&r.sz(Text.common_contexts.<locals>.<listcomp>css|]\}}|d|VqdS)r<>Nr%)r+r<>r<>r%r%r&r3 sz'Text.common_contexts.<locals>.<genexpr>N) r<>rr#r<>rRrir<>rrN)r2rIr<>rQZranked_contexts<74>er%r%r&rR<00>s

zText.common_contextscCsddlm}|||<01>dS)z<>
Produce a plot showing the distribution of the words through the text.
Requires pylab to be installed.
:param words: The words to be plotted
:type words: list(str)
:seealso: nltk.draw.dispersion_plot()
r)<01>dispersion_plotN)Z nltk.drawr<77>)r2rIr<>r%r%r&r<>s zText.dispersion_plotr<74>cCs(t||<01>\}}t|d<01>}|<05>||<04>|S)N)<01>order)r r Zfit)r2Ztokenized_sentsrEZ
train_dataZ padded_sents<74>modelr%r%r&<00>_train_default_ngram_lms
 zText._train_default_ngram_lm<6C>d<00>*c Cs<>dd<02>td<03>|j<02><01>D<00>|_t|d<04>sFtdtjd<06>|j|jdd<08>|_ g}|d ksZt
d
<EFBFBD><01>xZt |<04>|kr<>x@t |j j |||d <0B><03>D]&\}}|d kr<>q<EFBFBD>|d kr<>P|<04>|<06>q<>W|d7}q\W|r<>d<03>|<02>dnd}|t|d|<01><00>}t|<08>|S)a 
Print random text, generated using a trigram language model.
See also `help(nltk.lm)`.
:param length: The length of text to generate (default=100)
:type length: int
:param text_seed: Generation can be conditioned on preceding context.
:type text_seed: list(str)
:param random_seed: A random seed or an instance of `random.Random`. If provided,
makes the random sampling part of generation reproducible. (default=42)
:type random_seed: int
cSsg|]}|<01>d<00><01>qS)rJ)rt)r+<00>sentr%r%r&r.3sz!Text.generate.<locals>.<listcomp>rJZ trigram_modelzBuilding ngram index...)<01>filer<65>)rErz!The `length` must be more than 0.)<02> text_seed<65> random_seedz<s>z</s>r roN)rrOr#Z_tokenized_sents<74>hasattrri<00>sys<79>stderrr<72>Z_trigram_model<65>AssertionErrorr"r6<00>generater[r) r2<00>lengthr<68>r<>Zgenerated_tokens<6E>idx<64>token<65>prefixZ
output_strr%r%r&r<>!s*
 z Text.generatecGs|<00><00>j|<01>dS)zc
See documentation for FreqDist.plot()
:seealso: nltk.prob.FreqDist.plot()
N)<02>vocab<61>plot)r2<00>argsr%r%r&r<>Psz Text.plotcCsd|jkrt|<00>|_|jS)z.
:seealso: nltk.prob.FreqDist
<20>_vocab)r<>rr<>)r2r%r%r&r<>Ws

z
Text.vocabcCs@d|jkrt|<00>|_|j<02>|<01>}dd<03>|D<00>}tt|d<04><02>dS)a<>
Find instances of the regular expression in the text.
The text is a list of tokens, and a regexp pattern to match
a single token must be surrounded by angle brackets. E.g.
>>> print('hack'); from nltk.book import text1, text5, text9
hack...
>>> text5.findall("<.*><.*><bro>")
you rule bro; telling you bro; u twizted bro
>>> text1.findall("<a>(<.*>)<man>")
monied; nervous; dangerous; white; white; white; pious; queer; good;
mature; white; Cape; great; wise; wise; butterless; white; fiendish;
pale; furious; better; certain; complete; dismasted; younger; brave;
brave; brave; brave
>>> text9.findall("<th.*>{3,}")
thread through those; the thought that; that the thing; the thing
that; that that thing; through these than through; them that the;
through the thick; them that they; thought that the
:param regexp: A regular expression
:type regexp: str
<20>_token_searchercSsg|]}d<00>|<01><01>qS)rJ)rO)r+rur%r%r&r.|sz Text.findall.<locals>.<listcomp>z; N)r<>rnr<>rxrir)r2r{r|r%r%r&rx`s


 z Text.findallz \w+|[\.\!\?]cCs<>|d}x$|dkr,|j<00>||<00>s,|d8}q
W|dkr>||nd}|d}x(|t|<01>krr|j<00>||<00>sr|d7}qLW|t|<01>kr<>||nd}||fS)z<>
One left & one right token, both case-normalized. Skip over
non-sentence-final punctuation. Used by the ``ContextIndex``
that is created for ``similar()`` and ``common_contexts()``.
r rz*START*z*END*)<03> _CONTEXT_RE<52>matchr")r2r#r$<00>jrrr%r%r&<00>_context<78>s  z Text._contextcCs
d|jS)Nz
<Text: %s>)r<>)r2r%r%r&<00>__str__<5F>sz Text.__str__cCs
d|jS)Nz
<Text: %s>)r<>)r2r%r%r&r^<00>sz Text.__repr__)N)r<>rh)r<>rh)r@r`)r@r`)r@)r@)r<>)r<>Nr<4E>)rSrTrUrVr<>r:r<>r<>r<>rer<>r<>r<>r\r<>r<>rRr<>r<>r<>r<>r<>rxrv<00>compiler<65>r<>r<>r^r%r%r%r&r}0s0





"


/ #
r}c@s0eZdZdZdd<03>Zdd<05>Zdd<07>Zdd <09>Zd
S) <0B>TextCollectionaVA collection of texts, which can be loaded with list of texts, or
with a corpus consisting of one or more texts, and which supports
counting, concordancing, collocation discovery, etc. Initialize a
TextCollection as follows:
>>> import nltk.corpus
>>> from nltk.text import TextCollection
>>> print('hack'); from nltk.book import text1, text2, text3
hack...
>>> gutenberg = TextCollection(nltk.corpus.gutenberg)
>>> mytexts = TextCollection([text1, text2, text3])
Iterating over a TextCollection produces all the tokens of all the
texts in order.
cs@t<00>d<01>r <20>fdd<03><08><00><01>D<00><01><00>|_t<03>|t<05><00><01>i|_dS)NrIcsg|]}<01><00>|<01><01>qSr%)rI)r+<00>f)<01>sourcer%r&r.<00>sz+TextCollection.__init__.<locals>.<listcomp>)r<>Zfileids<64>_textsr}r:r<00>
_idf_cache)r2r<>r%)r<>r&r:<00>s

zTextCollection.__init__cCs|<02>|<01>t|<02>S)z$ The frequency of the term in text. )r<>r")r2<00>term<72>textr%r%r&<00>tf<74>szTextCollection.tfcsj|j<00><01><00>}|dkrft<02>fdd<03>|jD<00><01>}t|j<03>dkrBtd<05><01>|rXtt|j<03>|<00>nd}||j<00><|S)z<> The number of texts in the corpus divided by the
number of texts that the term appears in.
If a term does not appear in the corpus, 0.0 is returned. Ncsg|]}<01>|krd<00>qS)Tr%)r+r<>)r<>r%r&r.<00>sz&TextCollection.idf.<locals>.<listcomp>rz+IDF undefined for empty document collectiong)r<>rDr"r<>rNr)r2r<><00>idf<64>matchesr%)r<>r&r<><00>s 
zTextCollection.idfcCs|<00>||<02>|<00>|<01>S)N)r<>r<>)r2r<>r<>r%r%r&<00>tf_idf<64>szTextCollection.tf_idfN)rSrTrUrVr:r<>r<>r<>r%r%r%r&r<><00>s
r<>cCs<>ddlm}t|jdd<04><01>}t|<01>t<04>td<05>|<01>d<03>t<04>td<06>|<01>d<03>t<04>td<07>|<01><07>t<04>td<08>|<01>dd d
d g<04>t<04>td <0C>|<01> d <0A>t<04>td<0E>td|d<00>td|dd<12><00>td|<01>
<EFBFBD>d<00>dS)Nr)<01>brown<77>news)<01>
categoriesz Concordance:zDistributionally similar words:z Collocations:zDispersion plot:<3A>reportZsaidZ announcedzVocabulary plot:<3A>2z Indexing:ztext[3]:r<>z
text[3:5]:<3A>ztext.vocab()['news']:) r<>r<>r}rIrir<>r<>r<>r<>r<>r<>)r<>r<>r%r%r&<00>demo<6D>s. 


r<><00>__main__)0rV<00>
__future__rrrr<00>mathr<00> collectionsrrr <00> functoolsr
rvr<><00>sixr Znltk.lmr Znltk.lm.preprocessingr Znltk.probabilityrrr5Z nltk.utilrrZ nltk.metricsrrZnltk.collocationsrZ nltk.compatrZ nltk.tokenizerr<00>objectrrXrnr}r<>r<>rS<00>__all__r%r%r%r&<00><module>sH          [ q9 v/