You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

135 lines
3.6 KiB
Plaintext

5 years ago
=========================================
NLTK Python 2.x - 3.x Compatibility Layer
=========================================
NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
(which is loosely based on `six <http://packages.python.org/six/>`_)::
>>> from nltk import compat
>>> compat.PY3
False
>>> # and so on
@python_2_unicode_compatible
----------------------------
Under Python 2.x ``__str__`` and ``__repr__`` methods must
return bytestrings.
``@python_2_unicode_compatible`` decorator allows writing these methods
in a way compatible with Python 3.x:
1) wrap a class with this decorator,
2) define ``__str__`` and ``__repr__`` methods returning unicode text
(that's what they must return under Python 3.x),
and they would be fixed under Python 2.x to return byte strings::
>>> from nltk.compat import python_2_unicode_compatible
>>> @python_2_unicode_compatible
... class Foo(object):
... def __str__(self):
... return u'__str__ is called'
... def __repr__(self):
... return u'__repr__ is called'
>>> foo = Foo()
>>> foo.__str__().__class__
<type 'str'>
>>> foo.__repr__().__class__
<type 'str'>
>>> print(foo)
__str__ is called
>>> foo
__repr__ is called
Original versions of ``__str__`` and ``__repr__`` are available as
``__unicode__`` and ``unicode_repr``::
>>> foo.__unicode__().__class__
<type 'unicode'>
>>> foo.unicode_repr().__class__
<type 'unicode'>
>>> unicode(foo)
u'__str__ is called'
>>> foo.unicode_repr()
u'__repr__ is called'
There is no need to wrap a subclass with ``@python_2_unicode_compatible``
if it doesn't override ``__str__`` and ``__repr__``::
>>> class Bar(Foo):
... pass
>>> bar = Bar()
>>> bar.__str__().__class__
<type 'str'>
However, if a subclass overrides ``__str__`` or ``__repr__``,
wrap it again::
>>> class BadBaz(Foo):
... def __str__(self):
... return u'Baz.__str__'
>>> baz = BadBaz()
>>> baz.__str__().__class__ # this is incorrect!
<type 'unicode'>
>>> @python_2_unicode_compatible
... class GoodBaz(Foo):
... def __str__(self):
... return u'Baz.__str__'
>>> baz = GoodBaz()
>>> baz.__str__().__class__
<type 'str'>
>>> baz.__unicode__().__class__
<type 'unicode'>
Applying ``@python_2_unicode_compatible`` to a subclass
shouldn't break methods that was not overridden::
>>> baz.__repr__().__class__
<type 'str'>
>>> baz.unicode_repr().__class__
<type 'unicode'>
unicode_repr
------------
Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.
``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::
>>> from nltk.compat import unicode_repr
>>> print(repr(u"test"))
u'test'
>>> print(unicode_repr(u"test"))
'test'
It may be also used to get an original unescaped repr (as unicode)
of objects which class was fixed by ``@python_2_unicode_compatible``
decorator::
>>> @python_2_unicode_compatible
... class Foo(object):
... def __repr__(self):
... return u'<Foo: foo>'
>>> foo = Foo()
>>> repr(foo)
'<Foo: foo>'
>>> unicode_repr(foo)
u'<Foo: foo>'
For other objects it returns the same value as ``repr``::
>>> unicode_repr(5)
'5'
It may be a good idea to use ``unicode_repr`` instead of ``%r``
string formatting specifier inside ``__repr__`` or ``__str__``
methods of classes fixed by ``@python_2_unicode_compatible``
to make the output consistent between Python 2.x and 3.x.