Initial commit
This commit is contained in:
134
venv/lib/python3.7/site-packages/nltk/test/compat.doctest
Normal file
134
venv/lib/python3.7/site-packages/nltk/test/compat.doctest
Normal file
@@ -0,0 +1,134 @@
|
||||
|
||||
=========================================
|
||||
NLTK Python 2.x - 3.x Compatibility Layer
|
||||
=========================================
|
||||
|
||||
NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
|
||||
(which is loosely based on `six <http://packages.python.org/six/>`_)::
|
||||
|
||||
>>> from nltk import compat
|
||||
>>> compat.PY3
|
||||
False
|
||||
>>> # and so on
|
||||
|
||||
@python_2_unicode_compatible
|
||||
----------------------------
|
||||
|
||||
Under Python 2.x ``__str__`` and ``__repr__`` methods must
|
||||
return bytestrings.
|
||||
|
||||
``@python_2_unicode_compatible`` decorator allows writing these methods
|
||||
in a way compatible with Python 3.x:
|
||||
|
||||
1) wrap a class with this decorator,
|
||||
2) define ``__str__`` and ``__repr__`` methods returning unicode text
|
||||
(that's what they must return under Python 3.x),
|
||||
|
||||
and they would be fixed under Python 2.x to return byte strings::
|
||||
|
||||
>>> from nltk.compat import python_2_unicode_compatible
|
||||
|
||||
>>> @python_2_unicode_compatible
|
||||
... class Foo(object):
|
||||
... def __str__(self):
|
||||
... return u'__str__ is called'
|
||||
... def __repr__(self):
|
||||
... return u'__repr__ is called'
|
||||
|
||||
>>> foo = Foo()
|
||||
>>> foo.__str__().__class__
|
||||
<type 'str'>
|
||||
>>> foo.__repr__().__class__
|
||||
<type 'str'>
|
||||
>>> print(foo)
|
||||
__str__ is called
|
||||
>>> foo
|
||||
__repr__ is called
|
||||
|
||||
Original versions of ``__str__`` and ``__repr__`` are available as
|
||||
``__unicode__`` and ``unicode_repr``::
|
||||
|
||||
>>> foo.__unicode__().__class__
|
||||
<type 'unicode'>
|
||||
>>> foo.unicode_repr().__class__
|
||||
<type 'unicode'>
|
||||
>>> unicode(foo)
|
||||
u'__str__ is called'
|
||||
>>> foo.unicode_repr()
|
||||
u'__repr__ is called'
|
||||
|
||||
There is no need to wrap a subclass with ``@python_2_unicode_compatible``
|
||||
if it doesn't override ``__str__`` and ``__repr__``::
|
||||
|
||||
>>> class Bar(Foo):
|
||||
... pass
|
||||
>>> bar = Bar()
|
||||
>>> bar.__str__().__class__
|
||||
<type 'str'>
|
||||
|
||||
However, if a subclass overrides ``__str__`` or ``__repr__``,
|
||||
wrap it again::
|
||||
|
||||
>>> class BadBaz(Foo):
|
||||
... def __str__(self):
|
||||
... return u'Baz.__str__'
|
||||
>>> baz = BadBaz()
|
||||
>>> baz.__str__().__class__ # this is incorrect!
|
||||
<type 'unicode'>
|
||||
|
||||
>>> @python_2_unicode_compatible
|
||||
... class GoodBaz(Foo):
|
||||
... def __str__(self):
|
||||
... return u'Baz.__str__'
|
||||
>>> baz = GoodBaz()
|
||||
>>> baz.__str__().__class__
|
||||
<type 'str'>
|
||||
>>> baz.__unicode__().__class__
|
||||
<type 'unicode'>
|
||||
|
||||
Applying ``@python_2_unicode_compatible`` to a subclass
|
||||
shouldn't break methods that was not overridden::
|
||||
|
||||
>>> baz.__repr__().__class__
|
||||
<type 'str'>
|
||||
>>> baz.unicode_repr().__class__
|
||||
<type 'unicode'>
|
||||
|
||||
unicode_repr
|
||||
------------
|
||||
|
||||
Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.
|
||||
|
||||
``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
|
||||
``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::
|
||||
|
||||
>>> from nltk.compat import unicode_repr
|
||||
>>> print(repr(u"test"))
|
||||
u'test'
|
||||
>>> print(unicode_repr(u"test"))
|
||||
'test'
|
||||
|
||||
It may be also used to get an original unescaped repr (as unicode)
|
||||
of objects which class was fixed by ``@python_2_unicode_compatible``
|
||||
decorator::
|
||||
|
||||
>>> @python_2_unicode_compatible
|
||||
... class Foo(object):
|
||||
... def __repr__(self):
|
||||
... return u'<Foo: foo>'
|
||||
|
||||
>>> foo = Foo()
|
||||
>>> repr(foo)
|
||||
'<Foo: foo>'
|
||||
>>> unicode_repr(foo)
|
||||
u'<Foo: foo>'
|
||||
|
||||
For other objects it returns the same value as ``repr``::
|
||||
|
||||
>>> unicode_repr(5)
|
||||
'5'
|
||||
|
||||
It may be a good idea to use ``unicode_repr`` instead of ``%r``
|
||||
string formatting specifier inside ``__repr__`` or ``__str__``
|
||||
methods of classes fixed by ``@python_2_unicode_compatible``
|
||||
to make the output consistent between Python 2.x and 3.x.
|
||||
Reference in New Issue
Block a user