Skip to content

ArrowCube Format #5

@echeipesh

Description

@echeipesh

This epic covers developing arrow schema that is optimal for representing multi-dimensional rasters as tensors.

Design goals

  1. Records in this schema can accessed through zero-copy ndarray
  • This may require numpy extension
  1. Schema supports efficient dimension slicing by virtue of being paged and columnar
  • Arrow pages can map unto bands or other "dimension-slices"
  • It should be cheap to slice/select in at least one dimension of the array
  1. Records in this schema map well on common image encoding formats
  • pixel-interleave, band-interleave
  1. Some subset of GeoTrellis Tile interface or ND4J interface can be backed by these records
  • we're willing to give up efficient random access
  • we kind of get this one for "free", its not a big restriction

Its not clear that we can achieve all this so this is a wish-list that can be negotiated down

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions