It is new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.
Previous publications.
Structures comparing
Sometimes you want to compare complex structures in tests ignoring some values. Usually, it can be done by comparing particular values with the structure:
>>> d = dict(a=1, b=2, c=3)
>>> assert d['a'] == 1
>>> assert d['c'] == 3
However, you can create special value that reports being equal to any other value:
>>> assert d == dict(a=1, b=ANY, c=3)
That can be easily done by defining the
__eq__
method:>>> class AnyClass:
... def __eq__(self, another):
... return True
...
>>> ANY = AnyClass()
sys.stdout
is a wrapper that allows you to write strings instead of raw bytes. The string is encoded automatically using sys.stdout.encoding
:>>> _ = sys.stdout.write('Stra?e\n')
Stra?e
>>> sys.stdout.encoding
'UTF-8'
sys.stdout.encoding
is read-only and is equal to Python default encoding, which can be changed by setting the PYTHONIOENCODING
environment variable:$ PYTHONIOENCODING=cp1251 python3
Python 3.6.6 (default, Aug 13 2018, 18:24:23)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp1251'
If you want to write bytes to
stdout
you can bypass automatic encoding by accessing the wrapped buffer with sys.stdout.buffer
:>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1251'>
>>> sys.stdout.buffer
<_io.BufferedWriter name='<stdout>'>
>>> _ = sys.stdout.buffer.write(b'Stra\xc3\x9fe\n')
Stra?e
sys.stdout.buffer
is also a wrapper that does buffering for you. It can be bypassed by accessing the raw file handler with sys.stdout.buffer.raw
:>>> _ = sys.stdout.buffer.raw.write(b'Stra\xc3\x9fe')
Stra?e
Ellipsis constant
Python has a very short list of built-in constants. One of them is
Ellipsis
which is also can be written as ...
. This constant has no special meaning for the interpreter but is used in places where such syntax looks appropriate.numpy
support Ellipsis
as a __getitem__
argument, e. g. x[...]
returns all elements of x
.PEP 484 defines additional meaning:
Callable[..., type]
is a way to define a type of callables with no argument types specified.Finally, you can use
...
to indicate that function is not yet implemented. This is a completely valid Python code:def x():
...
However, in Python 2
Ellipsis
can't be written as ...
. The only exception is a[...]
that means a[Ellpsis]
.All of the following syntaxes are valid for Python 3, but only the first line is valid for Python 2:
a[...]
a[...:2:...]
[..., ...]
{...:...}
a = ...
... is ...
def a(x=...): ...
Modules reimporting
Already imported modules will not be loaded again.
import foo
just does nothing. However, it proved to be useful to reimport modules while working in an interactive environment. The proper way to do this in Python 3.4+ is to use importlib
:In [1]: import importlib
In [2]: with open('foo.py', 'w') as f:
...: f.write('a = 1')
...:
In [3]: import foo
In [4]: foo.a
Out[4]: 1
In [5]: with open('foo.py', 'w') as f:
...: f.write('a = 2')
...:
In [6]: foo.a
Out[6]: 1
In [7]: import foo
In [8]: foo.a
Out[8]: 1
In [9]: importlib.reload(foo)
Out[9]: <module 'foo' from '/home/v.pushtaev/foo.py'>
In [10]: foo.a
Out[10]: 2
ipython
also has the autoreload
extension that automatically reimports modules if necessary:In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: with open('foo.py', 'w') as f:
...: f.write('print("LOADED"); a=1')
...:
In [4]: import foo
LOADED
In [5]: foo.a
Out[5]: 1
In [6]: with open('foo.py', 'w') as f:
...: f.write('print("LOADED"); a=2')
...:
In [7]: import foo
LOADED
In [8]: foo.a
Out[8]: 2
In [9]: with open('foo.py', 'w') as f:
...: f.write('print("LOADED"); a=3')
...:
In [10]: foo.a
LOADED
Out[10]: 3
\G
In some languages, you can use
\G
assertion. It matches at the position where the previous match is ended. That allows writing finite automata that walk through string word by word (where word is defined by the regex).However, there is no such thing in Python. The proper workaround is to manually track the position and pass the substring to regex functions:
import re
import json
text = '<a><b>foo</b><c>bar</c></a><z>bar</z>'
regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = re.search(regex, text[pos:])
assert found, error
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, error
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
print(json.dumps(tree, indent=4))
In the previous example, we can save some time by avoiding slicing the string again and again but asking the re module to search starting from a different position instead.
That requires some changes. First,
re.search
doesn' support searching from a custom position, so we have to compile the regular expression manually. Second, ^
means the real start for the string, not the position where the search started, so we have to manually check that the match happened at the same position.import re
import json
text = '<a><b>foo</b><c>bar</c></a><z>bar</z>' * 10
def print_tree(tree):
print(json.dumps(tree, indent=4))
def xml_to_tree_slow(text):
regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = re.search(regex, text[pos:])
assert found, error
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, error
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
def xml_to_tree_slow(text):
regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = re.search(regex, text[pos:])
assert found, error
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, error
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
return tree
_regex = re.compile('(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))')
def _error_message(text, pos):
return text[pos:]
def xml_to_tree_fast(text):
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = _regex.search(text, pos=pos)
begin, end = found.span(0)
assert begin == pos, _error_message(text, pos)
assert found, _error_message(text, pos)
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, _error_message(text, pos)
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
return tree
print_tree(xml_to_tree_fast(text))
Result:
In [1]: from example import *
In [2]: %timeit xml_to_tree_slow(text)
356 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: %timeit xml_to_tree_fast(text)
294 µs ± 6.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Round function
Today's post is written by orsinium, the author of @itgram_channel.
The
round
function rounds a number to a given precision in decimal digits.>>> round(1.2)
1
>>> round(1.8)
2
>>> round(1.228, 1)
1.2
Also you can set up negative precision:
>>> round(413.77, -1)
410.0
>>> round(413.77, -2)
400.0
round
returns value of type of input number:>>> type(round(2, 1))
<class 'int'>
>>> type(round(2.0, 1))
<class 'float'>
>>> type(round(Decimal(2), 1))
<class 'decimal.Decimal'>
>>> type(round(Fraction(2), 1))
<class 'fractions.Fraction'>
For your own classes you can define round processing with the
__round__
method:>>> class Number(int):
... def __round__(self, p=-1000):
... return p
...
>>> round(Number(2))
-1000
>>> round(Number(2), -2)
-2
Values are rounded to the closest multiple of
10 ** (-precision)
. For example, for precision=1
value will be rounded to multiple of 0.1: round(0.63, 1)
returns 0.6
. If two multiples are equally close, rounding is done toward the even choice:>>> round(0.5)
0
>>> round(1.5)
2
Sometimes rounding of floats can be a little bit surprising:
>>> round(2.85, 1)
2.9
This is because most decimal fractions can't be represented exactly as a float (https://docs.python.org/3.7/tutorial/floatingpoint.html):
>>> format(2.85, '.64f')
'2.8500000000000000888178419700125232338905334472656250000000000000'
If you want to round half up you can use
decimal.Decimal
:>>> from decimal import Decimal, ROUND_HALF_UP
>>> Decimal(1.5).quantize(0, ROUND_HALF_UP)
Decimal('2')
>>> Decimal(2.85).quantize(Decimal('1.0'), ROUND_HALF_UP)
Decimal('2.9')
>>> Decimal(2.84).quantize(Decimal('1.0'), ROUND_HALF_UP)
Decimal('2.8')