Skip to content

Custom kernel implementation for invariants computation and message passing.#167

Open
ngorski9 wants to merge 16 commits intolanl:developmentfrom
ngorski9:development-mine
Open

Custom kernel implementation for invariants computation and message passing.#167
ngorski9 wants to merge 16 commits intolanl:developmentfrom
ngorski9:development-mine

Conversation

@ngorski9
Copy link
Copy Markdown

Overview

For invariant polynomials, allowed the user to translate arbitrary invariants into sparesely defined polynomials. Then, implemented a triton kernel to evaluate these polynomials. Wrote a wrapper layer that will use this strategy when able, and will fall back on the previous strategy otherwise.

For message passing, wrote a triton kernel that performs various tensor operations that are required to perform message passing for both HIP-HOP-NN and HIP-NN-TS. Wrote wrapper functions that will use this strategy when able, and will fall back on the previous strategy otherwise.

Automated tests have also been added for each of these.

Summary of changes:

hippynn:

custom_kernels/__init__.py: Added wrapper functions for the message passing layers for HIP-HOP-NN and HIP-NN-TS. Wrapper functions check if triton and cuda are available, and check if the settings permit the triton version of message passing to be called. Then, it selects the appropriate message passing function.

custom_kernels/message_passing_torch.py: Moved the old message passing logic into individual functions defined in this file. Previously, the message passing logic was hard-coded into HIP-HOP-NN and HIP-NN-TS.

custom_kernels/message_passing_triton.py: Defines custom kernel for message passing. Also defines several wrapper functions that call that kernel to implement the message passing logic for HIP-HOP-NN and HIP-NN-TS.

custom_kernels/poly_triton.py: Defines a custom kernel for evaluating a collection of multivariate polynomials. Defines a class PolynomialCollection which stores a collection of polynomials. It also caches derivatives when they must be computed.

layers/hiplayers/interactions.py: Replaced message passing logic for HIP-HOP-NN and HIP-NN-TS with the custom message passing functions defined in custom_kernels/__init__.py.

layers/hiplayers/invariants.py: Defines function to compute polynomials based on invariants. Implements a torch layer called HopInvariantLayer which computes invariants. It uses the polynomial strategy if triton and cuda are available, and if the settings permit the polynomial version to be used. Otherwise, it falls back on the previous implementation.

layers/hiplayers/tensors.py: Renamed HopInvariantLayer to HopInvariantLayerTorch.

_settings_setup.py: Added new settings to allow the user to enable/disable using these new implementations for invariant computation and message passing. Also added a setting that allows for the new message passing kernel to be used for the backwards pass only and not the forward pass.

tests:

test_polynomial_invariants.py: Tests the validity of the polynomial invariants layer. Separately tests the kernel itself as well as the layer HopInvariantLayer.

test_tensor_messagePassing.py: Tests the validity of the new message passing implementation. Separately tests the kernel itself, as well as the various rappers for HIP-HOP-NN and HIP-NN-TS.

docs:

source/user_guide/settings.rst: Added new settings to the settings table.

@karellat
Copy link
Copy Markdown
Contributor

karellat commented Dec 8, 2025

I noted that @ngorski9 's code uses Python 3.11 syntactic sugar, just flagging it in case it leads to any compatibility problems.

https://github.com/ngorski9/hippynn-optimizations/blob/54634c8eb974f19c998d4333947471b564684a63/hippynn/layers/hiplayers/invariants.py#L166

@lubbersnick
Copy link
Copy Markdown
Collaborator

Good catch. I think we can require 3.11 at this point.

@karellat
Copy link
Copy Markdown
Contributor

karellat commented Dec 9, 2025

I don’t have a strong opinion, but we should adjust pyproject.toml as needed.

requires-python=">=3.9"

@karellat
Copy link
Copy Markdown
Contributor

@ngorski9
Copy link
Copy Markdown
Author

Is there any specific issue that you have in mind?

@karellat
Copy link
Copy Markdown
Contributor

Oh, yes - sorry. We hit an out-of-memory error, most likely due to memory fragmentation.

I was trying to identify where we repeatedly allocate small blocks.

It helped to:

  • Cleaning GPU cache at epoch ends ( torch.cuda.empty_cache() )
  • Increasing PYTORCH_CUDA_ALLOC_CONF

Best,
Tomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants