Error handling and debuging

During the design of a transformation pipeline, mistakes and programming errors are relatively frequent. SeqTools tries to recover from them and report useful informations as much as possible. This tutorial reviews some details about the internal error management and should facilitate your debugging sessions.

Tracing mapping errors

Due to on-demand execution, an error generated by mapping a function to an item won’t raise when the mapping is created but rather when the problematic element is read.

[2]:
import math
import random
import seqtools

def f1(x):
    return math.sqrt(x)  # this will fail for negative values

data = [0, 4, 6, 7, 2, 4, 4, -1]  # sqrt(-1) raises ValueError

out = seqtools.smap(f1, data)

Due to on-demand execution, no error is raised yet for the last item.

As soon as it is evaluated, SeqTools raises an EvaluationError and sets the original exception as its cause.

[3]:
list(out)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/checkouts/readthedocs.org/user_builds/seqtools-doc/envs/stable/lib/python3.7/site-packages/seqtools/mapping.py in __iter__(self)
     22             for args in zip(*self.sequences):
---> 23                 yield self.f(*args)
     24                 i += 1

/tmp/ipykernel_146/3124324496.py in f1(x)
      5 def f1(x):
----> 6     return math.sqrt(x)  # this will fail for negative values
      7

ValueError: math domain error

The above exception was the direct cause of the following exception:

EvaluationError                           Traceback (most recent call last)
/tmp/ipykernel_146/2084533763.py in <module>
----> 1 list(out)

~/checkouts/readthedocs.org/user_builds/seqtools-doc/envs/stable/lib/python3.7/site-packages/seqtools/mapping.py in __iter__(self)
     30                 msg = "Failed to evaluate item {} in {} created at:\n{}".format(
     31                     i, self.__class__.__name__, self.stack)
---> 32                 raise EvaluationError(msg) from error
     33
     34     @basic_getitem

EvaluationError: Failed to evaluate item 7 in Mapping created at:

The ValueError that caused the failure is detailed first.

The EvaluationError message provides additional clarification: it tells which item caused the error and where the mapping was defined, a crucial debugging information when the mapping function is used at multiple locations in the code.

If you prefer working with the original exception directly and skip the EvaluationError wrapper, you can enable the ‘passthrough’ error mode which does just that:

[4]:
seqtools.seterr(evaluation='passthrough')

list(out)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_146/3141806584.py in <module>
      1 seqtools.seterr(evaluation='passthrough')
      2
----> 3 list(out)

~/checkouts/readthedocs.org/user_builds/seqtools-doc/envs/stable/lib/python3.7/site-packages/seqtools/mapping.py in __iter__(self)
     21         try:
     22             for args in zip(*self.sequences):
---> 23                 yield self.f(*args)
     24                 i += 1
     25

/tmp/ipykernel_146/3124324496.py in f1(x)
      4
      5 def f1(x):
----> 6     return math.sqrt(x)  # this will fail for negative values
      7
      8 data = [0, 4, 6, 7, 2, 4, 4, -1]  # sqrt(-1) raises ValueError

ValueError: math domain error
[5]:
seqtools.seterr(evaluation='wrap')  # revert to normal behaviour
[5]:
'wrap'

Errors inside worker

Background workers used by prefetch do not share the execution space of the main program and exceptions raised while evaluating elements will happen asynchronously.

To facilitate troubleshooting, SeqTools silently stores exception data, sends it back to the main process to be re-raised when failed items are read. In practice it looks like exceptions happen when the items are read.

[6]:
out = seqtools.smap(f1, data)
out = seqtools.prefetch(out, max_buffered=10)

# evaluate all elements but the last
for i in range(len(out) - 1):
    fast_out[i]

# evaluate the final one
out[-1]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_146/2522993887.py in <module>
      4 # evaluate all elements but the last
      5 for i in range(len(out) - 1):
----> 6     fast_out[i]
      7
      8 # evaluate the final one

NameError: name 'fast_out' is not defined

Note that the workers will continue processing other items just fine after an error.

Transfering exceptions back to the parent process has some notable limitations:

  • Process-based workers cannot save errors that cannot be pickled, in particular exception types defined inside a function. These will be replaced by an text message.

  • Error tracebacks cannot be completely serialized so debuggers won’t be able to explore the whole error context.