Skip to content

WIP, ENH: Add support for gamma scaling and iterative solvers.#70

Open
nray wants to merge 5 commits intomainfrom
nray/gamma_scaling
Open

WIP, ENH: Add support for gamma scaling and iterative solvers.#70
nray wants to merge 5 commits intomainfrom
nray/gamma_scaling

Conversation

@nray
Copy link
Copy Markdown
Collaborator

@nray nray commented Mar 19, 2026

This draft MR contains the changes needed for gamma scaling experiments.

Addresses issue: #71

@nray nray self-assigned this Mar 19, 2026
return out


class GFDLRegressor(RegressorMixin, MultiOutputMixin, GFDL):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has something changed about GFDLRegressor? The diff seems quite messy here, making it hard to judge what has actually changed and what hasn't. This class was deleted below and then pasted here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the class was at the end of the file, but I moved it after GFDLClassifier to be have similar estimators one after the other. The only other change is the extra argument "gamma" to the constructor and "solver" to fit method.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best not to reorganize the classes for now I think, so we can focus on what actually changed in the diff.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to the old organization.

return np.random.default_rng(seed)


def stochastic_gradient_descent(self, X, y):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for rolling our own SGD instead of using upstream SGDRegressor from sklearn, which has learning_rate, eta0, etc.? Does its verbose argument not provide enough output?

You mention it briefly in the matching issue, but my intuition would always be to try the upstream/well-tested solution first before investing time on a hand-rolled version.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a sub issue.


return self

def gradient_descent(self, X, y):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be good to clarify why we can't use i.e., scipy.optimize.minimize (which has some output options and custom callback support), and if we ever need mini-batch/partial_fit() support for these experiments or not.

@tylerjereddy tylerjereddy added the enhancement New feature or request label Mar 27, 2026
seed: int = None,
reg_alpha: float = None,
rtol: float | None = None,
gamma: float = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kostas' email on March 29/2026 suggests that there may be interest in the ability to provide different scaling values for different layers and/or to only apply scaling to a subset of layers, so the API design may need careful consideration.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api can now pass different gamma for each layer.

@tylerjereddy
Copy link
Copy Markdown
Collaborator

Beyond the various cleanups and clarifications noted above, this will also need regression tests before it is seriously ready for review.

Given the amount of confusion I've seen around gamma scaling, the documentation should also explain how it works with crystal clarity.

Z = H_prev @ w.T + b # (n_samples, n_hidden)
H_prev = self._activation_fn(Z)
if self.gamma is not None:
H_prev *= 1.0 / (H_prev.shape[1] ** self.gamma)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahyadk-bu, @eiviani-lanl : I would appreciate some feedback about the gamma scaling api implemented here. To my understanding, this is equivalent to Shahyad's implementation here, but still it would nice to get a confirmation from both of you.

@shahyadk-bu
Copy link
Copy Markdown

shahyadk-bu commented Apr 8, 2026 via email

@nray
Copy link
Copy Markdown
Collaborator Author

nray commented Apr 9, 2026

Hi Ray, This seems correct. I also have begun using the non-training method of solving the RVFL directly and have that code in my Git as well if you wanted to look at that reference as well. It is the folder called "RVFL_Solved" on my Git and there is a python script called "RVFL_model.py" which holds my RVFL class and the scaling is implemented there as well. Sincerely, Shahyad

On Wed, Apr 8, 2026 at 4:23 AM Navamita Ray @.> wrote: @.* commented on this pull request. ------------------------------ In src/gfdl/model.py <#70 (comment)>: > Z = H_prev @ w.T + b # (n_samples, n_hidden) H_prev = self._activation_fn(Z) + if self.gamma is not None: + H_prev = 1.0 / (H_prev.shape[1] ** self.gamma) @shahyadk-bu https://github.com/shahyadk-bu, @eiviani-lanl https://github.com/eiviani-lanl : I would appreciate some feedback about the gamma scaling api implemented here. To my understanding, this is equivalent to Shahyad's implementation here https://github.com/shahyadk-bu/RVFL_Research/blob/main/RVFL_Trained/RVFL_Model.py#L45, but still it would nice to get a confirmation from both of you. — Reply to this email directly, view it on GitHub <#70 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/BZ2CZWVMIEDQOE5CSISBSJD4UWZ3ZAVCNFSM6AAAAACWYC7PESVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DANZSGQZDENRXGY . You are receiving this because you were mentioned.Message ID: @.**>

Hi Shahyad,
Thanks for the confirmation, I will also take a look at your RVFL implementation that you pointed to.

seed: int = None,
reg_alpha: float = None,
rtol: float | None = None,
gamma: float | np.typing.ArrayLike | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and elsewhere, note that ArrayLike already includes floats and other scalars: https://numpy.org/devdocs/reference/typing.html#numpy.typing.ArrayLike

before inputting to the next layer. None implies no
scaling is applied. A single value implies applying
the scaling each layer with the same value. For different
gammas per layer, pass an array like type.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a float is also an array-like in NumPy speak, so we should probably reword a bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants