Files
old-nlp/venv/lib/python3.7/site-packages/nltk/__pycache__/util.cpython-37.pyc

352 lines
21 KiB
Plaintext
Raw Normal View History

2019-10-20 13:16:49 +02:00
B
D(<28>]<5D>e<00>@s
ddlmZddlZddlZddlZddlZddlZddlZddlZddl Z ddl
Z
ddl m Z m Z mZddlmZddlmZmZddlmZddlmZmZmZddlmZmZmZmZmZmZmZdd l m!Z!m"Z"dd
l#Tdd l$m%Z%dQd d<0E>Z&dd<10>Z'dRdd<12>Z(dSdd<15>Z)dTdd<18>Z*dd<1A>Z+dd<1C>Z,dd<1E>Z-Gdd <20>d e<11>Z.dUd#d$<24>Z/d%d&<26>Z0e1d'fd(d)<29>Z2d*d+<2B>Z3d,d-<2D>Z4d.d/<2F>Z5dVd1d2<64>Z6d3d4<64>Z7d5d6<64>Z8d7d8<64>Z9d9d:<3A>Z:dWd;d<<3C>Z;dXd=d><3E>Z<d?d@<40>Z=dAdB<64>Z>dYdDdE<64>Z?dFdG<64>Z@id'fdHdI<64>ZAdZdKdL<64>ZBd[dMdN<64>ZCdOdP<64>ZDdS)\<5C>)<01>print_functionN)<03>islice<63>chain<69> combinations)<01>pprint)<02> defaultdict<63>deque)<01> version_info)<03> class_types<65> string_types<65> text_type)<07> build_opener<65>install_opener<65>
getproxies<EFBFBD> ProxyHandler<65>ProxyBasicAuthHandler<65>ProxyDigestAuthHandler<65>HTTPPasswordMgrWithDefaultRealm)<02> slice_bounds<64>raise_unorderable_types)<01>*)<01>python_2_unicode_compatible<6C>selfc
Cst|<00>t|t<02>s|j}td|j<00>x<>tt<07>|<00><01> <09><00>D]<5D>\}}|<02>
d<02>rNq:t |dd<04>r\q:t j ddkrrtj}ntj}||<03>dd<07>\}}}}|r<>|ddkr<>|dks<>t|<05>t|<08>kr<>|d d<00>}d
||f}t<0E>||||<08>} ttjd || fd d t|<02>dd<0F><03>q:WdS)Nz%%s supports the following operations:<3A>_Z__deprecated__Fr<00><00>r<00>z%s.%sz%s%sz - <20> <20>)<02>initial_indent<6E>subsequent_indent)<15>str<74>
isinstancer
<00> __class__<5F>print<6E>__name__<5F>sorted<65>pydocZ
allmethods<EFBFBD>items<6D>
startswith<EFBFBD>getattr<74>sysr <00>inspect<63>getfullargspec<65>
getargspec<EFBFBD>len<65> formatargspec<65>textwrap<61>fill)
<EFBFBD>objZselfname<6D>name<6D>methodr.<00>args<67>varargs<67>varkw<6B>defaultsZargspec<65>r:<00>+/tmp/pip-install-4m6m_5d_/nltk/nltk/util.py<70>usage-s0

    
r<cCsddl}|jjjdkS)a<>
Return True if this function is run within idle. Tkinter
programs that are run in idle should never call ``Tk.mainloop``; so
this function should be used to gate all calls to ``Tk.mainloop``.
:warning: This function works by checking ``sys.stdin``. If the
user has modified ``sys.stdin``, then it may return incorrect
results.
:rtype: bool
rN)ZPyShellZRPCProxy)r+<00>stdinr#r%)r+r:r:r;<00>in_idleUs r>cCsttt|||<02><03><01>dS)z<>
Pretty print a sequence of data items
:param data: the data stream to print
:type data: sequence or iter
:param start: the start position
:type start: int
:param end: the end position
:type end: int
N)r<00>listr)<03>data<74>start<72>endr:r:r;<00>prjs rC<00>FcCstd<01>tj||d<02><02><01>dS)z<>
Pretty print a string, breaking lines on whitespace
:param s: the string to print, consisting of words and spaces
:type s: str
:param width: the display width
:type width: int
<20>
)<01>widthN)r$<00>joinr1<00>wrap)<02>srFr:r:r;<00> print_stringxs rJrcCsd<01>tj|<01>|<00>|d<02><02>S)a#
Pretty print a list of text tokens, breaking lines on whitespace
:param tokens: the tokens to print
:type tokens: list
:param separator: the string to use to separate tokens
:type separator: str
:param width: the display width (default=70)
:type width: int
rE)rF)rGr1rH)<03>tokens<6E> separatorrFr:r:r;<00> tokenwrap<61>s rMcCstddkotddkS)Nr<00>rr)r r:r:r:r;<00>py25<32>srOcCstddkotddkS)NrrNr<00>)r r:r:r:r;<00>py26<32>srQcCstddkotddkS)NrrNr<00>)r r:r:r:r;<00>py27<32>srSc@seZdZdd<02>ZdS)<04>IndexcCs0t<00>|t<02>x|D]\}}||<00>|<03>qWdS)N)r<00>__init__r?<00>append)r<00>pairs<72>key<65>valuer:r:r;rU<00>s zIndex.__init__N)r%<00>
__module__<EFBFBD> __qualname__rUr:r:r:r;rT<00>srT<00>{<7B>}cCs*tt<01>|tj<03><02>|d||<01><05><00><02>dS)a3
Return a string with markers surrounding the matched substrings.
Search str for substrings matching ``regexp`` and wrap the matches
with braces. This is convenient for learning about regular expressions.
:param regexp: The regular expression.
:type regexp: str
:param string: The string being matched.
:type string: str
:param left: The left delimiter (printed before the matched substring)
:type left: str
:param right: The right delimiter (printed after the matched substring)
:type right: str
:rtype: str
z\g<0>N)r$<00>re<72>compile<6C>M<>sub<75>rstrip)<04>regexp<78>string<6E>left<66>rightr:r:r;<00>re_show<6F>srgc CsDt|d<01>r|<00><01>St|t<03>r8t|d<02><02>
}|<01><01>SQRXntd<03><01>dS)N<>read<61>rz2Must be called with a filename or file-like object)<06>hasattrrhr"r <00>open<65>
ValueError)<02>f<>infiler:r:r;<00>
filestring<EFBFBD>s 

 ro<00><><EFBFBD><EFBFBD><EFBFBD>c#slt|dfg<01>}xX|rf|<03><01>\}<04>|V<00>|kry |<03><02>fdd<03>||<04>D<00><01>Wqtk
rbYqXqWdS)aTraverse the nodes of a tree in breadth-first order.
(No need to check for cycles.)
The first argument should be the tree root;
children should be a function taking as argument a tree node
and returning an iterator of the node's children.
rc3s|]}|<01>dfVqdS)rNr:)<02>.0<EFBFBD>c)<01>depthr:r;<00> <genexpr><3E>sz breadth_first.<locals>.<genexpr>N)r<00>popleft<66>extend<6E> TypeError)<05>tree<65>childrenZmaxdepth<74>queue<75>noder:)rsr;<00> breadth_first<73>s  r|c
Csd}dg}y|<02>t<01>tj<03><01>Wntk
r4YnXy|<02>t<01><05>d<00>Wnttfk
rdYnXy|<02>t<01><07>d<00>Wnttfk
r<EFBFBD>YnX|<02>d<04>x@|D]8}|s<>q<EFBFBD>yt||<03>}|}Wnt t
fk
r<EFBFBD>Yq<>XPq<>W|<01>st dd<06> dd<08>|D<00><01><00><01>n||fSdS) at
Given a byte string, attempt to decode it.
Tries the standard 'UTF8' and 'latin-1' encodings,
Plus several gathered from locale information.
The calling program *must* first call::
locale.setlocale(locale.LC_ALL, '')
If successful it returns ``(decoded_unicode, successful_encoding)``.
If unsuccessful it raises a ``UnicodeError``.
Nzutf-8rzlatin-1z?Unable to decode input data. Tried the following encodings: %s.z, cSsg|]}|rt|<01><01>qSr:)<01>repr)rq<00>encr:r:r;<00>
<listcomp>)sz"guess_encoding.<locals>.<listcomp>) rV<00>locale<6C> nl_langinfo<66>CODESET<45>AttributeError<6F> getlocale<6C>
IndexError<EFBFBD>getdefaultlocaler <00> UnicodeError<6F> LookupErrorrG)r@Zsuccessful_encoding<6E> encodingsr~<00>decodedr:r:r;<00>guess_encoding<6E>s: 


r<>cst<00><00><00>fdd<02>|D<00>S)Ncs"g|]}|<01>kr<04><00>|<01>s|<01>qSr:)<01>add)rq<00>x)<01>seenr:r;r7szunique_list.<locals>.<listcomp>)<01>set)<01>xsr:)r<>r;<00> unique_list4sr<>cCsVtt<01>}xH|D]@}t||d<01>rBx,||D]}||<00>|<02>q*Wq||||<qW|S)N<>__iter__)rr?rjrV)<04>dZ inverted_dictrXZtermr:r:r;<00> invert_dict?s
r<>Fcs<>|rdd<02><00>ndd<02><00>t<00>fdd<05><08>D<00><01>}t<00>fdd<05><08>D<00><01>}xh<78>D]`}||}||}xJ|r<>|<05><01>}|<06>|<07>||<03>|<07>|<07><01>O}||<02>|<07>|<07><01>O}||8}q^WqHW|S)a<>
Calculate the transitive closure of a directed graph,
optionally the reflexive transitive closure.
The algorithm is a slight modification of the "Marking Algorithm" of
Ioannidis & Ramakrishnan (1998) "Efficient Transitive Closure Algorithms".
:param graph: the initial graph, represented as a dictionary of sets
:type graph: dict(set)
:param reflexive: if set, also make the closure reflexive
:type reflexive: bool
:rtype: dict(set)
cSs
t|g<01>S)N)r<>)<01>kr:r:r;<00><lambda>_<00>z$transitive_closure.<locals>.<lambda>cSst<00>S)N)r<>)r<>r:r:r;r<>ar<>c3s|]}|<01>|<00><00>fVqdS)N)<01>copy)rqr<>)<01>graphr:r;rtcsz%transitive_closure.<locals>.<genexpr>c3s|]}|<01>|<01>fVqdS)Nr:)rqr<>)<01>base_setr:r;rtes)<05>dict<63>popr<70><00>
setdefault<EFBFBD>get)r<>Z reflexiveZ agenda_graphZ closure_graph<70>iZagendaZclosure<72>jr:)r<>r<>r;<00>transitive_closurePs


r<>cCs<i}x2|D]*}x$||D]}|<01>|t<01><00><02>|<02>qWq
W|S)z<>
Inverts a directed graph.
:param graph: the graph, represented as a dictionary of sets
:type graph: dict(set)
:return: the inverted graph
:rtype: dict(set)
)r<>r<>r<>)r<><00>invertedrXrYr:r:r;<00> invert_graphrs

r<>cCs td<01><01>dS)Nz>To remove HTML markup, use BeautifulSoup's get_text() function)<01>NotImplementedError)<01>htmlr:r:r;<00>
clean_html<EFBFBD>sr<>cCs td<01><01>dS)Nz>To remove HTML markup, use BeautifulSoup's get_text() function)r<>)<01>urlr:r:r;<00> clean_url<72>sr<>cGs`g}xV|D]N}t|ttf<02>s"|g}x4|D],}t|ttf<02>rJ|<01>t|<03><01>q(|<01>|<03>q(Wq
W|S)z<>
Flatten a list.
>>> from nltk.util import flatten
>>> flatten(1, 2, ['b', 'a' , ['c', 'd']], 3)
[1, 2, 'b', 'a', 'c', 'd', 3]
:param args: items and lists to be combined into a single list
:rtype: list
)r"r?<00>tuplerv<00>flattenrV)r6r<><00>l<>itemr:r:r;r<><00>s 

r<>cCs<t|<00>}|r t|f|d|<00>}|r8t||f|d<00>}|S)a
Returns a padded sequence of items before ngram extraction.
>>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>'))
['<s>', 1, 2, 3, 4, 5, '</s>']
>>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, left_pad_symbol='<s>'))
['<s>', 1, 2, 3, 4, 5]
>>> list(pad_sequence([1,2,3,4,5], 2, pad_right=True, right_pad_symbol='</s>'))
[1, 2, 3, 4, 5, '</s>']
:param sequence: the source data to be padded
:type sequence: sequence or iter
:param n: the degree of the ngrams
:type n: int
:param pad_left: whether the ngrams should be left-padded
:type pad_left: bool
:param pad_right: whether the ngrams should be right-padded
:type pad_right: bool
:param left_pad_symbol: the symbol to use for left padding (default is None)
:type left_pad_symbol: any
:param right_pad_symbol: the symbol to use for right padding (default is None)
:type right_pad_symbol: any
:rtype: sequence or iter
r)<02>iterr)<06>sequence<63>n<>pad_left<66> pad_right<68>left_pad_symbol<6F>right_pad_symbolr:r:r;<00> pad_sequence<63>s r<>c cs<>t||||||<05>}g}x@|dkrVy t|<00>}Wntk
r@dSX|<06>|<07>|d8}qWx&|D]}|<06>|<08>t|<06>V|d=q^WdS)a<>
Return the ngrams generated from a sequence of items, as an iterator.
For example:
>>> from nltk.util import ngrams
>>> list(ngrams([1,2,3,4,5], 3))
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]
Wrap with list for a list version of this function. Set pad_left
or pad_right to true in order to get additional ngrams:
>>> list(ngrams([1,2,3,4,5], 2, pad_right=True))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, None)]
>>> list(ngrams([1,2,3,4,5], 2, pad_right=True, right_pad_symbol='</s>'))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, '</s>')]
>>> list(ngrams([1,2,3,4,5], 2, pad_left=True, left_pad_symbol='<s>'))
[('<s>', 1), (1, 2), (2, 3), (3, 4), (4, 5)]
>>> list(ngrams([1,2,3,4,5], 2, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>'))
[('<s>', 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, '</s>')]
:param sequence: the source data to be converted into ngrams
:type sequence: sequence or iter
:param n: the degree of the ngrams
:type n: int
:param pad_left: whether the ngrams should be left-padded
:type pad_left: bool
:param pad_right: whether the ngrams should be right-padded
:type pad_right: bool
:param left_pad_symbol: the symbol to use for left padding (default is None)
:type left_pad_symbol: any
:param right_pad_symbol: the symbol to use for right padding (default is None)
:type right_pad_symbol: any
:rtype: sequence or iter
rNr)r<><00>next<78> StopIterationrVr<>) r<>r<>r<>r<>r<>r<><00>historyZ next_itemr<6D>r:r:r;<00>ngrams<6D>s+
 
 


r<>cks"xt|df|<01>D]
}|VqWdS)a<>
Return the bigrams generated from a sequence of items, as an iterator.
For example:
>>> from nltk.util import bigrams
>>> list(bigrams([1,2,3,4,5]))
[(1, 2), (2, 3), (3, 4), (4, 5)]
Use bigrams for a list version of this function.
:param sequence: the source data to be converted into bigrams
:type sequence: sequence or iter
:rtype: iter(tuple)
rNN)r<>)r<><00>kwargsr<73>r:r:r;<00>bigramssr<>cks"xt|df|<01>D]
}|VqWdS)a<>
Return the trigrams generated from a sequence of items, as an iterator.
For example:
>>> from nltk.util import trigrams
>>> list(trigrams([1,2,3,4,5]))
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]
Use trigrams for a list version of this function.
:param sequence: the source data to be converted into trigrams
:type sequence: sequence or iter
:rtype: iter(tuple)
rN)r<>)r<>r<>r<>r:r:r;<00>trigrams3sr<>rcksJ|dkrt|<00>}x4t||d<00>D]"}xt||f|<03>D]
}|Vq4Wq WdS)a<>
Returns all possible ngrams generated from a sequence of items, as an iterator.
>>> sent = 'a b c'.split()
>>> list(everygrams(sent))
[('a',), ('b',), ('c',), ('a', 'b'), ('b', 'c'), ('a', 'b', 'c')]
>>> list(everygrams(sent, max_len=2))
[('a',), ('b',), ('c',), ('a', 'b'), ('b', 'c')]
:param sequence: the source data to be converted into trigrams
:type sequence: sequence or iter
:param min_len: minimum length of the ngrams, aka. n-gram order/degree of ngram
:type min_len: int
:param max_len: maximum length of the ngrams (set to length of sequence by default)
:type max_len: int
:rtype: iter(tuple)
rprN)r/<00>ranger<65>)r<>Zmin_len<65>max_lenr<6E>r<>Zngr:r:r;<00>
everygramsGs
r<>c ks<>d|ksd|krt||f|<03>}t<01>}xdt|||d|d<04>D]L}|dd<06>}|dd<05>}x.t||d<00>D]}|d|krxqf||VqfWq:WdS)a<>
Returns all possible skipgrams generated from a sequence of items, as an iterator.
Skipgrams are ngrams that allows tokens to be skipped.
Refer to http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf
>>> sent = "Insurgents killed in ongoing fighting".split()
>>> list(skipgrams(sent, 2, 2))
[('Insurgents', 'killed'), ('Insurgents', 'in'), ('Insurgents', 'ongoing'), ('killed', 'in'), ('killed', 'ongoing'), ('killed', 'fighting'), ('in', 'ongoing'), ('in', 'fighting'), ('ongoing', 'fighting')]
>>> list(skipgrams(sent, 3, 2))
[('Insurgents', 'killed', 'in'), ('Insurgents', 'killed', 'ongoing'), ('Insurgents', 'killed', 'fighting'), ('Insurgents', 'in', 'ongoing'), ('Insurgents', 'in', 'fighting'), ('Insurgents', 'ongoing', 'fighting'), ('killed', 'in', 'ongoing'), ('killed', 'in', 'fighting'), ('killed', 'ongoing', 'fighting'), ('in', 'ongoing', 'fighting')]
:param sequence: the source data to be converted into trigrams
:type sequence: sequence or iter
:param n: the degree of the ngrams
:type n: int
:param k: the skip distance
:type k: int
:rtype: iter(tuple)
r<>r<>T)r<>r<>Nrrp)r<><00>objectr<74>r) r<>r<>r<>r<>ZSENTINELZngram<61>head<61>tailZ skip_tailr:r:r;<00> skipgramsas   r<>c Cs<>|d}t|<01>}d}d}t|d<03>r6t<02>|j<04>jd}n"|<00>dd<05>|<00><07>d}|<00>d<02><00>xT||k<00>r<>||f}||d} |<02>| <09>r<>|| \}
} nzd} x^|<00>t d| d<00><02>| dkr<>|<00>
<EFBFBD>|<00><07>}
|<00> <0B>} | dkr<>P|| d} | |dkr<>dSq<>W||k<00>r |
| f|| <|
|k<04>r6|| dk<03>s,t d<08><01>| d}nZ| d|<04>|k<02>rL| S| |k<04>rv|| dk<03>slt d<08><01>| d}n| |k<00>r<>|
t| <0B>d}|d7}||f} || kr\dSq\WdS) a
Return the line from the file with first word key.
Searches through a sorted file using the binary search algorithm.
:type file: file
:param file: the file to be searched through.
:type key: str
:param key: the identifier we are searching for.
rrr4rrN<00>Nz infinite loop) r/rj<00>os<6F>statr4<00>st_size<7A>seek<65>tellr<6C><00>maxZ discard_line<6E>readline<6E>AssertionError) <0A>filerX<00>cacheZ
cacheDepthZkeylenrAZ currentDepthrBZ lastStateZmiddle<6C>offset<65>lineZ thisStater:r:r;<00>binary_search_file<6C>sV 
  
 
  
 




r<>r<>cCs<>ddlm}|dkr@yt<02>d}Wntk
r>td<05><01>YnXt||d<06><02>}t|<04>}|dk r<>t<07>}|jd|||d<07>|<05> t
|<06><01>|<05> t |<06><01>t |<05>dS)a<>
Set the HTTP proxy for Python to download through.
If ``proxy`` is None then tries to set proxy from environment or system
settings.
:param proxy: The HTTP proxy server to use. For example:
'http://proxy.example.com:3128/'
:param user: The username to authenticate with. Use None to disable
authentication.
:param password: The password to authenticate with.
r)<01>compatN<74>httpz'Could not detect default proxy settings)<02>httpsr<73>)<04>realm<6C>uri<72>user<65>passwd) Znltkr<6B>r<00>KeyErrorrlrr r<00> add_password<72> add_handlerrrr)<07>proxyr<79><00>passwordr<64>Z proxy_handler<65>openerZpassword_managerr:r:r;<00> set_proxy<78>s r<>cCs<>d|d}t|<00>rb|jr$|j<01><02>s.|d|_x|D]}t||d<00>q4W|jrZ|j<04><02>s|||_n|r||jrv|j<04><02>s|||_dS)a<>
Recursive function to indent an ElementTree._ElementInterface
used for pretty printing. Run indent on elem and then output
in the normal way.
:param elem: element to be indented. will be modified.
:type elem: ElementTree._ElementInterface
:param level: level of indentation for this element
:type level: nonnegative integer
:rtype: ElementTree._ElementInterface
:return: Contents of elem indented to reflect its structure
rEz rN)r/<00>text<78>strip<69>elementtree_indentr<74>)<03>elem<65>levelr<6C>r:r:r;r<>s 

r<>cCsjd|kr|krbnnJd\}}x8tdt|||<00>d<00>D]}||9}||9}|d8}q:W||SdSdS)a9
This function is a fast way to calculate binomial coefficients, commonly
known as nCk, i.e. the number of combinations of n things taken k at a time.
(https://en.wikipedia.org/wiki/Binomial_coefficient).
This is the *scipy.special.comb()* with long integer computation but this
approximation is faster, see https://github.com/nltk/nltk/issues/1181
>>> choose(4, 2)
6
>>> choose(6, 2)
15
:param n: The number of things.
:type n: int
:param r: The number of times a thing is taken.
:type r: int
r)rrrN)r<><00>min)r<>r<>ZntokZktok<6F>tr:r:r;<00>choose s r<>)r)rN)rD)rrD)r\r])F)FFNN)FFNN)rrp)Nr<4E>)r)E<>
__future__rr+r,r<>r^<00>typesr1r'<00>bisectr<74><00> itertoolsrrrr<00> collectionsrrr <00>sixr
r r Zsix.moves.urllib.requestr rrrrrrZnltk.internalsrrZnltk.collectionsZ nltk.compatrr<r>rCrJrMrOrQrSrTrgror<>r|r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r<>r:r:r:r;<00><module>sl   $
 
(

 
= 
"  
(
9
+J
*