You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
135 lines
3.6 KiB
Plaintext
135 lines
3.6 KiB
Plaintext
5 years ago
|
|
||
|
=========================================
|
||
|
NLTK Python 2.x - 3.x Compatibility Layer
|
||
|
=========================================
|
||
|
|
||
|
NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
|
||
|
(which is loosely based on `six <http://packages.python.org/six/>`_)::
|
||
|
|
||
|
>>> from nltk import compat
|
||
|
>>> compat.PY3
|
||
|
False
|
||
|
>>> # and so on
|
||
|
|
||
|
@python_2_unicode_compatible
|
||
|
----------------------------
|
||
|
|
||
|
Under Python 2.x ``__str__`` and ``__repr__`` methods must
|
||
|
return bytestrings.
|
||
|
|
||
|
``@python_2_unicode_compatible`` decorator allows writing these methods
|
||
|
in a way compatible with Python 3.x:
|
||
|
|
||
|
1) wrap a class with this decorator,
|
||
|
2) define ``__str__`` and ``__repr__`` methods returning unicode text
|
||
|
(that's what they must return under Python 3.x),
|
||
|
|
||
|
and they would be fixed under Python 2.x to return byte strings::
|
||
|
|
||
|
>>> from nltk.compat import python_2_unicode_compatible
|
||
|
|
||
|
>>> @python_2_unicode_compatible
|
||
|
... class Foo(object):
|
||
|
... def __str__(self):
|
||
|
... return u'__str__ is called'
|
||
|
... def __repr__(self):
|
||
|
... return u'__repr__ is called'
|
||
|
|
||
|
>>> foo = Foo()
|
||
|
>>> foo.__str__().__class__
|
||
|
<type 'str'>
|
||
|
>>> foo.__repr__().__class__
|
||
|
<type 'str'>
|
||
|
>>> print(foo)
|
||
|
__str__ is called
|
||
|
>>> foo
|
||
|
__repr__ is called
|
||
|
|
||
|
Original versions of ``__str__`` and ``__repr__`` are available as
|
||
|
``__unicode__`` and ``unicode_repr``::
|
||
|
|
||
|
>>> foo.__unicode__().__class__
|
||
|
<type 'unicode'>
|
||
|
>>> foo.unicode_repr().__class__
|
||
|
<type 'unicode'>
|
||
|
>>> unicode(foo)
|
||
|
u'__str__ is called'
|
||
|
>>> foo.unicode_repr()
|
||
|
u'__repr__ is called'
|
||
|
|
||
|
There is no need to wrap a subclass with ``@python_2_unicode_compatible``
|
||
|
if it doesn't override ``__str__`` and ``__repr__``::
|
||
|
|
||
|
>>> class Bar(Foo):
|
||
|
... pass
|
||
|
>>> bar = Bar()
|
||
|
>>> bar.__str__().__class__
|
||
|
<type 'str'>
|
||
|
|
||
|
However, if a subclass overrides ``__str__`` or ``__repr__``,
|
||
|
wrap it again::
|
||
|
|
||
|
>>> class BadBaz(Foo):
|
||
|
... def __str__(self):
|
||
|
... return u'Baz.__str__'
|
||
|
>>> baz = BadBaz()
|
||
|
>>> baz.__str__().__class__ # this is incorrect!
|
||
|
<type 'unicode'>
|
||
|
|
||
|
>>> @python_2_unicode_compatible
|
||
|
... class GoodBaz(Foo):
|
||
|
... def __str__(self):
|
||
|
... return u'Baz.__str__'
|
||
|
>>> baz = GoodBaz()
|
||
|
>>> baz.__str__().__class__
|
||
|
<type 'str'>
|
||
|
>>> baz.__unicode__().__class__
|
||
|
<type 'unicode'>
|
||
|
|
||
|
Applying ``@python_2_unicode_compatible`` to a subclass
|
||
|
shouldn't break methods that was not overridden::
|
||
|
|
||
|
>>> baz.__repr__().__class__
|
||
|
<type 'str'>
|
||
|
>>> baz.unicode_repr().__class__
|
||
|
<type 'unicode'>
|
||
|
|
||
|
unicode_repr
|
||
|
------------
|
||
|
|
||
|
Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.
|
||
|
|
||
|
``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
|
||
|
``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::
|
||
|
|
||
|
>>> from nltk.compat import unicode_repr
|
||
|
>>> print(repr(u"test"))
|
||
|
u'test'
|
||
|
>>> print(unicode_repr(u"test"))
|
||
|
'test'
|
||
|
|
||
|
It may be also used to get an original unescaped repr (as unicode)
|
||
|
of objects which class was fixed by ``@python_2_unicode_compatible``
|
||
|
decorator::
|
||
|
|
||
|
>>> @python_2_unicode_compatible
|
||
|
... class Foo(object):
|
||
|
... def __repr__(self):
|
||
|
... return u'<Foo: foo>'
|
||
|
|
||
|
>>> foo = Foo()
|
||
|
>>> repr(foo)
|
||
|
'<Foo: foo>'
|
||
|
>>> unicode_repr(foo)
|
||
|
u'<Foo: foo>'
|
||
|
|
||
|
For other objects it returns the same value as ``repr``::
|
||
|
|
||
|
>>> unicode_repr(5)
|
||
|
'5'
|
||
|
|
||
|
It may be a good idea to use ``unicode_repr`` instead of ``%r``
|
||
|
string formatting specifier inside ``__repr__`` or ``__str__``
|
||
|
methods of classes fixed by ``@python_2_unicode_compatible``
|
||
|
to make the output consistent between Python 2.x and 3.x.
|