The process method of DoFns can either return values or yield values. In the case of returning values, it expects a List of elements to be returned. When returning a single value, it is easy to forget this, and return the value instead.
Correct way:
class SomeDoFn(beam.DoFn)
def process(self, elem):
return ['a']
Incorrect way:
class SomeDoFn(beam.DoFn)
def process(self, elem):
return 'a'
A pipeline with the incorrect DoFn will fail will a cryptic error message without a direct indication that the actual error is due to SomeDoFn returning an element instead of a List containing that element. This issue is very time-consuming to track down.
It would be good if the pipeline could raise an exception or otherwise indicate that the DoFn is incorrectly returning an element instead of a List to make it easier to identify the error.
Imported from Jira BEAM-3530. Original Jira may contain additional context.
Reported by: chuanyu.
The process method of DoFns can either return values or yield values. In the case of returning values, it expects a List of elements to be returned. When returning a single value, it is easy to forget this, and return the value instead.
Correct way:
class SomeDoFn(beam.DoFn)def process(self, elem):return ['a']Incorrect way:
class SomeDoFn(beam.DoFn)def process(self, elem):return 'a'A pipeline with the incorrect DoFn will fail will a cryptic error message without a direct indication that the actual error is due to SomeDoFn returning an element instead of a List containing that element. This issue is very time-consuming to track down.
It would be good if the pipeline could raise an exception or otherwise indicate that the DoFn is incorrectly returning an element instead of a List to make it easier to identify the error.
Imported from Jira BEAM-3530. Original Jira may contain additional context.
Reported by: chuanyu.