Skip to content

DoFn.process should raise exception if something other than a List is returned #18712

@kennknowles

Description

@kennknowles

The process method of DoFns can either return values or yield values. In the case of returning values, it expects a List of elements to be returned. When returning a single value, it is easy to forget this, and return the value instead.

Correct way:

class SomeDoFn(beam.DoFn)
  def process(self, elem):
    return ['a']

Incorrect way:

class SomeDoFn(beam.DoFn)
  def process(self, elem):
    return 'a'

A pipeline with the incorrect DoFn will fail will a cryptic error message without a direct indication that the actual error is due to SomeDoFn returning an element instead of a List containing that element. This issue is very time-consuming to track down.

It would be good if the pipeline could raise an exception or otherwise indicate that the DoFn is incorrectly returning an element instead of a List to make it easier to identify the error.

Imported from Jira BEAM-3530. Original Jira may contain additional context.
Reported by: chuanyu.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions