Skip to content

add_samplet: feature_names allows dimension mismatch, order isn't paired -- will overwrite #45

@WillForan

Description

@WillForan

I had a few bugs (using wrong variable name), and realized I never got yelled at for providing bad feature names.

A few observations:

  1. feature name length doesn't have to match features.

there can be too many (x, y, z and an additional "DNE" name)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['x','y','z','DNE'])
(x, _, _) = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x' 'y' 'z' 'DNE']
[[1. 2. 3.]
[4. 5. 6.]]

or too few (only x, but have x, y, and z)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['x']
[[1. 2. 3.]
[6. 5. 4.]]

  1. specifying feature names for one samplet changes names everywhere?
ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[4,5,6], feature_names=['y','y','z'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['y' 'y' 'z']
[[1. 2. 3.]
[4. 5. 6.]]

this is a potentially surprising when features given to add_samplet in a different order -- even if feature and feature_names are paired correctly (@raamana -- a thing you warned me to check. good eye!)

ds = RegrDataset()
ds.descritpion="extra of feauture names"
ds.add_samplet('id1', target=100, features=[1,2,3], feature_names=['x','y','z'])
ds.add_samplet('id2', target=200, features=[6,5,4], feature_names=['z','y','x'])
[x, _, _] = ds.data_and_targets()
print(ds.feature_names)
print(x)

['z' 'y' 'x']
[[1. 2. 3.]
[6. 5. 4.]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions