Tutorial#

Simple mapping#

The most basic (and possibly the most useful) function is smap() which maps a function to each element of a sequence, similarly to what map() does for iterables:

>>> data = [3, 5, 1, 4]
>>> y = seqtools.smap(lambda x: x * 2, data)
>>> [y[i] for i in range(4)]
[6, 10, 2, 8]

To understand the effect of lazy evaluation, let’s add a notification when the function is called:

>>> def f(x):
...     print("processing {}".format(x))
...     return x * 2
...
>>> y1 = seqtools.smap(f, data)
>>> # nothing happened so far
>>>
>>> y1[0]  # f will be called now and specifically on item 0
processing 3
6

Note

There is no caching/memoization mechanism included, so repeated calls to the same element will trigger a call to the mapping functions each time:

>>> y1[0]
processing 3
6
>>> y1[0]
processing 3
6

See add_cache() for a simple form of caching mechanism.

If the transformation is slow to compute and/or the sequence is large, lazy evaluation can dramatically reduce the delay to obtain any particular item. Furthermore, on can chain several transformations in a pipeline. This is particularly convenient when intermediate transformations are memory heavy because SeqTools only stores intermediate results for one element at a time:

>>> def f(x):
...     # This intermediate result takes a lot of space...
...     return [x] * 10000
...
>>> def g(x):
...     return sum(x) / len(x)
...
>>> data = list(range(2000))
>>>
>>> # construct pipeline without computing anything
>>> y1 = seqtools.smap(f, data)
>>> y2 = seqtools.smap(g, y1)
>>>
>>> # computing one of the output values only uses sizeof(float) * 10000
>>> # whereas explicitely computing y1 would use sizeof(float) * 10000 * 2000
>>> y2[2]
2.0

Indexing#

Most functions in this library including smap() try to preserve the simplicity of python list indexing, that includes negative indexing and slicing as well:

>>> data = [3, 5, 1, 4]
>>> y = seqtools.smap(lambda x: x * 2, data)
>>> list(y)
[6, 10, 2, 8]
>>> z = y[1:-1]  # on-demand slicing ⇒ z values aren't computed yet
>>> len(z)  # deduced without evaluating z
2
>>> list(z)
[10, 2]

Where it makes sense, transformed sequences also support index and slice based assignment so as to make the objects truely behave like lists. For example with the gather() function:

>>> data = [0, 1, 2, 3, 4, 5]
>>> y = seqtools.gather(data, [1, 1, 3, 4])
>>> list(y)
[1, 1, 3, 4]
>>> y[0] = -1
>>> data
[0, -1, 2, 3, 4, 5]
>>> y[-2:] = [-3, -4]
>>> data
[0, -1, 2, -3, -4, 5]

Multivariate mapping#

Similarly to map(), if more than one sequence is passed, they are zipped together and fed as distinct arguments to the function:

>>> data1 = [3, 5, 1, 4]
>>> data2 = [4, 5, 7, 2]
>>> y = seqtools.smap(lambda x1, x2: x1 + x2, data1, data2)
>>> list(y)
[7, 10, 8, 6]

Going further#

To finally compute all the values from a sequence, prefetch() provides a wrapper backed by multiple workers to compute the values more quickly.

To see the library in practice, you can see how to build, debug and run a transformation pipeline or how to write a multiprocessing capable data loader.

The library is quite small for now, how about giving a quick glance at the API Reference?