image

It is new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.

Structures comparing


Sometimes you want to compare complex structures in tests ignoring some values. Usually, it can be done by comparing particular values with the structure:

>>> d = dict(a=1, b=2, c=3)
>>> assert d['a'] == 1
>>> assert d['c'] == 3

However, you can create special value that reports being equal to any other value:

>>> assert d == dict(a=1, b=ANY, c=3)

That can be easily done by defining the __eq__ method:

>>> class AnyClass:
...     def __eq__(self, another):
...         return True
...
>>> ANY = AnyClass()

sys.stdout is a wrapper that allows you to write strings instead of raw bytes. The string is encoded automatically using sys.stdout.encoding:

>>> _ = sys.stdout.write('Stra?e\n')
Stra?e
>>> sys.stdout.encoding
'UTF-8'

sys.stdout.encoding is read-only and is equal to Python default encoding, which can be changed by setting the PYTHONIOENCODING environment variable:

$ PYTHONIOENCODING=cp1251 python3
Python 3.6.6 (default, Aug 13 2018, 18:24:23)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp1251'

If you want to write bytes to stdout you can bypass automatic encoding by accessing the wrapped buffer with sys.stdout.buffer:

>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1251'>
>>> sys.stdout.buffer
<_io.BufferedWriter name='<stdout>'>
>>> _ = sys.stdout.buffer.write(b'Stra\xc3\x9fe\n')
Stra?e

sys.stdout.buffer is also a wrapper that does buffering for you. It can be bypassed by accessing the raw file handler with sys.stdout.buffer.raw:

>>> _ = sys.stdout.buffer.raw.write(b'Stra\xc3\x9fe')
Stra?e

Ellipsis constant


Python has a very short list of built-in constants. One of them is Ellipsis which is also can be written as .... This constant has no special meaning for the interpreter but is used in places where such syntax looks appropriate.

numpy support Ellipsis as a __getitem__ argument, e. g. x[...] returns all elements of x.

PEP 484 defines additional meaning: Callable[..., type] is a way to define a type of callables with no argument types specified.

Finally, you can use ... to indicate that function is not yet implemented. This is a completely valid Python code:

def x():
    ...

However, in Python 2 Ellipsis can't be written as .... The only exception is a[...] that means a[Ellpsis].

All of the following syntaxes are valid for Python 3, but only the first line is valid for Python 2:

a[...]
a[...:2:...]
[..., ...]
{...:...}
a = ...
... is ...
def a(x=...): ...

Modules reimporting


Already imported modules will not be loaded again. import foo just does nothing. However, it proved to be useful to reimport modules while working in an interactive environment. The proper way to do this in Python 3.4+ is to use importlib:

In [1]: import importlib
In [2]: with open('foo.py', 'w') as f:
   ...:     f.write('a = 1')
   ...:

In [3]: import foo
In [4]: foo.a
Out[4]: 1
In [5]: with open('foo.py', 'w') as f:
   ...:     f.write('a = 2')
   ...:
In [6]: foo.a
Out[6]: 1
In [7]: import foo
In [8]: foo.a
Out[8]: 1
In [9]: importlib.reload(foo)
Out[9]: <module 'foo' from '/home/v.pushtaev/foo.py'>
In [10]: foo.a
Out[10]: 2

ipython also has the autoreload extension that automatically reimports modules if necessary:

In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: with open('foo.py', 'w') as f:
   ...:     f.write('print("LOADED"); a=1')
   ...:
In [4]: import foo
LOADED
In [5]: foo.a
Out[5]: 1
In [6]: with open('foo.py', 'w') as f:
   ...:     f.write('print("LOADED"); a=2')
   ...:
In [7]: import foo
LOADED
In [8]: foo.a
Out[8]: 2
In [9]: with open('foo.py', 'w') as f:
   ...:     f.write('print("LOADED"); a=3')
   ...:
In [10]: foo.a
LOADED
Out[10]: 3

\G


In some languages, you can use \G assertion. It matches at the position where the previous match is ended. That allows writing finite automata that walk through string word by word (where word is defined by the regex).

However, there is no such thing in Python. The proper workaround is to manually track the position and pass the substring to regex functions:

import re
import json

text = '<a><b>foo</b><c>bar</c></a><z>bar</z>'
regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'

stack = []
tree = []

pos = 0
while len(text) > pos:
    error = f'Error at {text[pos:]}'
    found = re.search(regex, text[pos:])
    assert found, error
    pos += len(found[0])
    start, stop, data = found.groups()

    if start:
        tree.append(dict(
            tag=start,
            children=[],
        ))
        stack.append(tree)
        tree = tree[-1]['children']
    elif stop:
        tree = stack.pop()
        assert tree[-1]['tag'] == stop, error
        if not tree[-1]['children']:
            tree[-1].pop('children')
    elif data:
        stack[-1][-1]['data'] = data


print(json.dumps(tree, indent=4))

In the previous example, we can save some time by avoiding slicing the string again and again but asking the re module to search starting from a different position instead.

That requires some changes. First, re.search doesn' support searching from a custom position, so we have to compile the regular expression manually. Second, ^ means the real start for the string, not the position where the search started, so we have to manually check that the match happened at the same position.

import re
import json


text = '<a><b>foo</b><c>bar</c></a><z>bar</z>' * 10


def print_tree(tree):
   print(json.dumps(tree, indent=4))


def xml_to_tree_slow(text):
   regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'

   stack = []
   tree = []

   pos = 0
   while len(text) > pos:
       error = f'Error at {text[pos:]}'
       found = re.search(regex, text[pos:])
       assert found, error
       pos += len(found[0])
       start, stop, data = found.groups()

       if start:
           tree.append(dict(
               tag=start,
               children=[],
           ))
           stack.append(tree)
           tree = tree[-1]['children']
       elif stop:
           tree = stack.pop()
           assert tree[-1]['tag'] == stop, error
           if not tree[-1]['children']:
               tree[-1].pop('children')
       elif data:
           stack[-1][-1]['data'] = data


def xml_to_tree_slow(text):
   regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'

   stack = []
   tree = []

   pos = 0
   while len(text) > pos:
       error = f'Error at {text[pos:]}'
       found = re.search(regex, text[pos:])
       assert found, error
       pos += len(found[0])
       start, stop, data = found.groups()

       if start:
           tree.append(dict(
               tag=start,
               children=[],
           ))
           stack.append(tree)
           tree = tree[-1]['children']
       elif stop:
           tree = stack.pop()
           assert tree[-1]['tag'] == stop, error
           if not tree[-1]['children']:
               tree[-1].pop('children')
       elif data:
           stack[-1][-1]['data'] = data

   return tree

_regex = re.compile('(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))')
def _error_message(text, pos):
   return text[pos:]
  
def xml_to_tree_fast(text):

   stack = []
   tree = []

   pos = 0
   while len(text) > pos:
       error = f'Error at {text[pos:]}'
       found = _regex.search(text, pos=pos)
       begin, end = found.span(0)
       assert begin == pos, _error_message(text, pos)
       assert found, _error_message(text, pos)
       pos += len(found[0])
       start, stop, data = found.groups()

       if start:
           tree.append(dict(
               tag=start,
               children=[],
           ))
           stack.append(tree)
           tree = tree[-1]['children']
       elif stop:
           tree = stack.pop()
           assert tree[-1]['tag'] == stop, _error_message(text, pos)
           if not tree[-1]['children']:
               tree[-1].pop('children')
       elif data:
           stack[-1][-1]['data'] = data

   return tree

print_tree(xml_to_tree_fast(text))

Result:

In [1]: from example import *

In [2]: %timeit xml_to_tree_slow(text)
356 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: %timeit xml_to_tree_fast(text)
294 µs ± 6.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Round function


Today's post is written by orsinium, the author of @itgram_channel.

The round function rounds a number to a given precision in decimal digits.

>>> round(1.2)
1
>>> round(1.8)
2
>>> round(1.228, 1)
1.2

Also you can set up negative precision:

>>> round(413.77, -1)
410.0
>>> round(413.77, -2)
400.0

round returns value of type of input number:

>>> type(round(2, 1))
<class 'int'>

>>> type(round(2.0, 1))
<class 'float'>

>>> type(round(Decimal(2), 1))
<class 'decimal.Decimal'>

>>> type(round(Fraction(2), 1))
<class 'fractions.Fraction'>

For your own classes you can define round processing with the __round__ method:

>>> class Number(int):
...   def __round__(self, p=-1000):
...     return p
...
>>> round(Number(2))
-1000
>>> round(Number(2), -2)
-2

Values are rounded to the closest multiple of 10 ** (-precision). For example, for precision=1 value will be rounded to multiple of 0.1: round(0.63, 1) returns 0.6. If two multiples are equally close, rounding is done toward the even choice:

>>> round(0.5)
0
>>> round(1.5)
2

Sometimes rounding of floats can be a little bit surprising:

>>> round(2.85, 1)
2.9

This is because most decimal fractions can't be represented exactly as a float (https://docs.python.org/3.7/tutorial/floatingpoint.html):

>>> format(2.85, '.64f')
'2.8500000000000000888178419700125232338905334472656250000000000000'

If you want to round half up you can use decimal.Decimal:

>>> from decimal import Decimal, ROUND_HALF_UP
>>> Decimal(1.5).quantize(0, ROUND_HALF_UP)
Decimal('2')
>>> Decimal(2.85).quantize(Decimal('1.0'), ROUND_HALF_UP)
Decimal('2.9')
>>> Decimal(2.84).quantize(Decimal('1.0'), ROUND_HALF_UP)
Decimal('2.8')

Комментарии (0)