Skip to content

Commit b4d31f0

Browse files
Mircusclaude
andcommitted
Add hyperstructure modeling, social analysis, and temporal windowing
- SHEHyperstructure: decorated higher-order relational object with entity attrs, typed relations, from_csv/from_jsonl ingestion - Social analysis: rank_diffusers, find_bridge_simplices, group_cohesion, rank_influencers (graph vs simplex comparison) - Temporal: window() and rolling_windows() for time-sliced analysis - Fixed diffusion scoring: rank-percentile normalisation, stronger Laplacian coupling — entity scores now discriminate properly - Worked example: two-community social-media diffuser scenario - Notebook: temporal bridge formation from JSONL interaction data - README repositioned around social/group-level analysis mission - 31 tests passing (13 new) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 55e4305 commit b4d31f0

14 files changed

+1469
-50
lines changed

README.md

Lines changed: 80 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -6,77 +6,116 @@
66
<img src="she_logo.png" alt="SHE logo" width="280">
77
</p>
88

9-
SHE is a source-available Python toolkit for building **weighted simplicial
10-
complexes** from relational data, computing **Hodge Laplacians**, and running
11-
**diffusion / spectral analysis** on higher-order structures.
9+
SHE is a source-available research library for modeling and analyzing
10+
**decorated higher-order relational structures**, with a current computational
11+
focus on weighted simplicial representations and **social / group-level
12+
diffusion analysis**.
1213

13-
This is a **v0.1 Research Preview** — useful for exploration, not yet hardened
14-
for production. Released under the non-commercial
14+
This is a **Research Preview** released under the non-commercial
1515
[HNCL v1.0](LICENCE.md) license (not OSI open-source).
1616

17+
## Why SHE?
18+
19+
Standard graph analysis collapses every interaction to a pairwise edge.
20+
When the signal you care about lives in **small-group structure** — triads,
21+
co-engagement cliques, collaborative clusters — graph methods wash it out.
22+
23+
Use SHE when:
24+
25+
- **Triads and higher simplices matter.** A co-amplification group of three
26+
accounts is a different object from three pairwise edges.
27+
- **Group-level diffusion matters.** The question is not "who is central?"
28+
but "which small group is the diffusion bottleneck?"
29+
- **Bridge groups matter.** You want to find the cross-community triad, not
30+
just the cross-community edge.
31+
- **Decorations matter.** Relations carry weight, type, topic, and metadata
32+
that you want to query and analyze — not just adjacency.
33+
34+
SHE does not replace graph libraries. It adds a layer for the cases where
35+
graphs are not enough.
36+
1737
## What v0.1 includes
1838

39+
**Modeling layer**
40+
- `SHEHyperstructure` — decorated, weighted higher-order relational object
41+
with entity attributes, typed relations, and bulk record ingestion
42+
43+
**Social analysis**
44+
- `rank_diffusers` / `rank_entity_diffusers` / `rank_simplex_diffusers`
45+
- `find_bridge_simplices` — cross-community higher-order bridges
46+
- `group_cohesion` — structural cohesion scoring for candidate groups
47+
- `rank_influencers` — graph centrality vs. simplex diffusion comparison
48+
49+
**Core simplicial engine**
1950
- Simplicial-complex construction (wraps [TopoNetX](https://github.com/pyt-team/TopoNetX))
2051
- Graph-to-simplicial lifting via clique detection
21-
- Hodge-Laplacian computation, spectral analysis, and harmonic-form extraction
52+
- Hodge-Laplacian spectral analysis and harmonic-form extraction
2253
- Diffusion centrality ranking
2354
- Minimal matplotlib visualisation
24-
- Three runnable examples and a small test suite
2555

2656
## Installation
2757

2858
```bash
29-
# core only
30-
pip install -e .
31-
32-
# with test tooling
33-
pip install -e ".[dev]"
34-
35-
# with optional TDA support (gudhi, giotto-tda)
36-
pip install -e ".[tda]"
59+
pip install -e . # core only
60+
pip install -e ".[dev]" # with pytest / ruff
61+
pip install -e ".[tda]" # with gudhi / giotto-tda
3762
```
3863

3964
Requires **Python >= 3.10**.
4065

4166
## Quickstart
4267

4368
```python
44-
import networkx as nx
45-
from she import SHEDataLoader, SHEHodgeDiffusion, SHEConfig
69+
from she import SHEHyperstructure, rank_diffusers, find_bridge_simplices
4670

47-
# 1. Start from a NetworkX graph
48-
G = nx.karate_club_graph()
71+
# Build a decorated hyperstructure from interaction records
72+
hs = SHEHyperstructure("demo")
73+
hs.add_entity("alice", community="A")
74+
hs.add_entity("bob", community="A")
75+
hs.add_entity("carol", community="B")
4976

50-
# 2. Lift to a simplicial complex (cliques become higher-order simplices)
51-
sc = SHEDataLoader.from_weighted_networkx(G)
77+
hs.add_relation(["alice", "bob"], weight=1.0, kind="reply")
78+
hs.add_relation(["alice", "bob", "carol"], weight=2.5, kind="co_amplification")
5279

53-
# 3. Run diffusion analysis
54-
config = SHEConfig(max_dimension=2, spectral_k=5)
55-
analyzer = SHEHodgeDiffusion(config)
56-
result = analyzer.analyze_diffusion(sc)
80+
# Who are the key diffusers?
81+
for r in rank_diffusers(hs, top_k=3):
82+
print(f"dim={r.dimension} {r.target} score={r.score:.3f}")
5783

58-
# 4. Inspect top diffusers
59-
for dim, diffusers in result.key_diffusers.items():
60-
print(f"Dimension {dim}: top diffuser = {diffusers[0]}")
84+
# Which simplices bridge communities?
85+
for b in find_bridge_simplices(hs):
86+
print(f"{sorted(b.members)} communities={b.communities_spanned}")
6187
```
6288

89+
## Worked use case: social-media diffusers
90+
91+
`examples/social_media_diffusers.py` builds a two-community social scenario
92+
where a high-degree hub dominates graph centrality, but a cross-community
93+
triad is the actual diffusion engine. The example compares graph-only
94+
ranking with simplex-level analysis and shows where they disagree.
95+
96+
```bash
97+
python examples/social_media_diffusers.py
98+
```
99+
100+
Output highlights:
101+
- **Graph centrality** ranks the hub (u0) first.
102+
- **Bridge detection** surfaces the {u3, u5, u7} triad as the top
103+
cross-community structure.
104+
- **Group cohesion** scores the triad as structurally tight despite
105+
containing no individually prominent member.
106+
63107
## Examples
64108

65109
| Script | Description |
66110
|--------|-------------|
111+
| `examples/social_media_diffusers.py` | Graph vs. simplex ranking on a two-community scenario |
67112
| `examples/toy_triangle.py` | Smallest nontrivial complex — Hodge Laplacian printout |
68113
| `examples/social_group_lift.py` | Lift a small social graph to simplices via cliques |
69-
| `examples/group_diffusion_demo.py` | Weighted Karate Club diffusion analysis with plot |
70-
71-
Run any example with:
72-
73-
```bash
74-
python examples/toy_triangle.py
75-
```
114+
| `examples/group_diffusion_demo.py` | Weighted Karate Club diffusion analysis |
76115

77116
## Experimental modules
78117

79-
The following are **not** part of the stable v0.1 API and require extra
118+
The following are **not** part of the stable API and require extra
80119
dependencies:
81120

82121
| Module | Requires | Install extra |
@@ -87,10 +126,12 @@ dependencies:
87126
## Limitations
88127

89128
This is a **Research Preview**. The API may change between releases.
90-
It has not been optimised or audited for production use.
91129

92-
- Hodge analysis currently computes the **harmonic component** only; exact and
93-
coexact parts of the decomposition are not yet implemented.
130+
- Hodge analysis computes the **harmonic component** only; exact/coexact
131+
decomposition is not yet implemented.
132+
- Bridge detection uses a heuristic (community-span weighted by relation
133+
weight), not a topological invariant.
134+
- Group cohesion is a simple composite score, not a formal measure.
94135
- Tested with TopoNetX 0.2.x on Python 3.11.
95136
- Not OSI open-source — see License section below.
96137

data/interactions.jsonl

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{"users": ["anna", "ben"], "weight": 1.0, "kind": "reply", "topic": "energy", "time": 1}
2+
{"users": ["anna", "carol"], "weight": 0.8, "kind": "reply", "topic": "energy", "time": 1}
3+
{"users": ["ben", "carol"], "weight": 0.6, "kind": "like", "topic": "energy", "time": 1}
4+
{"users": ["anna", "ben", "carol"], "weight": 2.0, "kind": "co_retweet", "topic": "energy", "time": 1}
5+
{"users": ["dave", "elena"], "weight": 1.2, "kind": "reply", "topic": "housing", "time": 1}
6+
{"users": ["dave", "frank"], "weight": 0.9, "kind": "reply", "topic": "housing", "time": 1}
7+
{"users": ["elena", "frank"], "weight": 1.0, "kind": "like", "topic": "housing", "time": 1}
8+
{"users": ["dave", "elena", "frank"], "weight": 1.8, "kind": "co_retweet", "topic": "housing", "time": 1}
9+
{"users": ["carol", "dave"], "weight": 0.3, "kind": "mention", "topic": "energy", "time": 1}
10+
{"users": ["anna", "ben"], "weight": 1.2, "kind": "reply", "topic": "energy", "time": 2}
11+
{"users": ["anna", "carol"], "weight": 1.0, "kind": "reply", "topic": "energy", "time": 2}
12+
{"users": ["ben", "carol"], "weight": 0.8, "kind": "reply", "topic": "energy", "time": 2}
13+
{"users": ["anna", "ben", "carol"], "weight": 2.5, "kind": "co_retweet", "topic": "energy", "time": 2}
14+
{"users": ["dave", "elena"], "weight": 0.5, "kind": "like", "topic": "housing", "time": 2}
15+
{"users": ["carol", "dave"], "weight": 1.5, "kind": "reply", "topic": "energy", "time": 2}
16+
{"users": ["carol", "dave", "elena"], "weight": 2.8, "kind": "co_retweet", "topic": "energy", "time": 2}
17+
{"users": ["anna", "ben"], "weight": 0.6, "kind": "like", "topic": "energy", "time": 3}
18+
{"users": ["carol", "dave"], "weight": 2.0, "kind": "reply", "topic": "energy", "time": 3}
19+
{"users": ["carol", "dave", "elena"], "weight": 3.2, "kind": "co_retweet", "topic": "energy", "time": 3}
20+
{"users": ["carol", "elena"], "weight": 1.5, "kind": "reply", "topic": "energy", "time": 3}
21+
{"users": ["dave", "elena", "frank"], "weight": 1.0, "kind": "co_retweet", "topic": "housing", "time": 3}
22+
{"users": ["ben", "frank"], "weight": 0.4, "kind": "mention", "topic": "housing", "time": 3}
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Use case: simplicial diffusers in social media
2+
3+
## Problem
4+
5+
Standard social-network analysis ranks individuals by graph centrality
6+
(degree, betweenness, eigenvector). This misses a common real-world
7+
phenomenon: **small groups that co-amplify content are often more important
8+
for information diffusion than any single prominent individual**.
9+
10+
A triad of mid-reach accounts that consistently co-retweet or co-engage on a
11+
topic can act as a diffusion engine that a node-level centrality measure will
12+
never surface, because the signal lives in the *group structure*, not in any
13+
one member's connectivity.
14+
15+
## Graph baseline limitation
16+
17+
In a standard graph:
18+
19+
- Each interaction is collapsed to a pairwise edge.
20+
- A triad {A, B, C} that always acts together is indistinguishable from three
21+
independent pairwise interactions A-B, B-C, A-C.
22+
- Centrality metrics rank nodes; there is no object corresponding to the group.
23+
24+
## Event-to-hyperstructure construction
25+
26+
SHE lifts interaction records into a **weighted simplicial complex** where:
27+
28+
- Each entity (account, user) is a 0-simplex.
29+
- Each pairwise interaction is a 1-simplex (edge) with weight.
30+
- Each co-engagement group of size k is a (k-1)-simplex with its own weight,
31+
kind label, and metadata (topic, timestamp window, ...).
32+
33+
This preserves the *higher-order grouping* that graph projection destroys.
34+
35+
## Analysis outputs
36+
37+
Given this structure, SHE computes:
38+
39+
1. **Simplex-level diffusion centrality** via the Hodge Laplacian, ranking
40+
groups (not just individuals) by their structural role in diffusion.
41+
2. **Bridge simplices** that span community boundaries.
42+
3. **Group cohesion scores** measuring how tightly a candidate group is bound.
43+
4. **Graph vs. simplex ranking comparison** showing where the two disagree.
44+
45+
## What higher-order signal we expect to uncover
46+
47+
In a scenario with two communities and one cross-community triad:
48+
49+
- **Graph centrality** highlights a high-degree hub node.
50+
- **Simplex diffusion** highlights the cross-community triad as the actual
51+
diffusion bottleneck, because information must pass through that group
52+
structure to bridge the communities.
53+
54+
The triad may contain no individually prominent member, yet it dominates the
55+
diffusion pathway. SHE makes this visible.

examples/social_media_diffusers.py

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
"""Social media diffusers -- why SHE instead of graph-only analysis?
2+
3+
This example builds a synthetic social scenario with two communities and one
4+
cross-community triad that acts as the real diffusion engine. It then
5+
compares graph centrality with simplex-level diffusion ranking to show what
6+
higher-order analysis reveals that node-level metrics miss.
7+
8+
Scenario
9+
--------
10+
Community A: five tightly connected accounts (u0-u4).
11+
Community B: five tightly connected accounts (u5-u9).
12+
Hub: u0 has high degree -- connected to many in A and a few in B.
13+
Bridge triad: {u3, u5, u7} repeatedly co-amplify a topic across A and B.
14+
15+
Graph centrality will favour u0 (high degree).
16+
Simplex diffusion will surface the {u3, u5, u7} triad as structurally more
17+
important for cross-community information flow.
18+
"""
19+
20+
from she import (
21+
SHEHyperstructure,
22+
SHEConfig,
23+
rank_diffusers,
24+
rank_entity_diffusers,
25+
rank_influencers,
26+
find_bridge_simplices,
27+
group_cohesion,
28+
)
29+
30+
31+
def build_scenario() -> SHEHyperstructure:
32+
config = SHEConfig(max_dimension=2, spectral_k=6)
33+
hs = SHEHyperstructure("social_media", config=config)
34+
35+
# -- Community A (u0-u4) ----------------------------------------------
36+
for i in range(5):
37+
hs.add_entity(f"u{i}", community="A", role="regular")
38+
# make u0 a high-degree hub
39+
hs._entity_attrs["u0"]["role"] = "hub"
40+
41+
# dense, high-weight internal edges in A — u0 is the star
42+
a_pairs = [
43+
("u0", "u1", 2.5), ("u0", "u2", 2.5), ("u0", "u3", 2.0), ("u0", "u4", 2.0),
44+
("u1", "u2", 1.0), ("u1", "u3", 0.8), ("u2", "u3", 0.8), ("u2", "u4", 0.6),
45+
("u3", "u4", 0.6),
46+
]
47+
for a, b, w in a_pairs:
48+
hs.add_relation([a, b], weight=w, kind="engagement")
49+
50+
# internal triads in A centred on u0
51+
hs.add_relation(["u0", "u1", "u2"], weight=2.0, kind="co_engagement")
52+
hs.add_relation(["u0", "u2", "u4"], weight=1.5, kind="co_engagement")
53+
54+
# -- Community B (u5-u9) ----------------------------------------------
55+
for i in range(5, 10):
56+
hs.add_entity(f"u{i}", community="B", role="regular")
57+
58+
b_pairs = [
59+
("u5", "u6", 1.0), ("u5", "u7", 1.0), ("u5", "u8", 0.8),
60+
("u6", "u7", 0.8), ("u6", "u8", 0.6), ("u7", "u8", 0.8),
61+
("u7", "u9", 0.6), ("u8", "u9", 0.6),
62+
]
63+
for a, b, w in b_pairs:
64+
hs.add_relation([a, b], weight=w, kind="engagement")
65+
66+
hs.add_relation(["u5", "u6", "u7"], weight=1.3, kind="co_engagement")
67+
68+
# -- Cross-community links --------------------------------------------
69+
# hub u0 has moderately strong cross-community edges (boosts graph centrality)
70+
hs.add_relation(["u0", "u5"], weight=1.2, kind="mention")
71+
hs.add_relation(["u0", "u6"], weight=1.0, kind="mention")
72+
hs.add_relation(["u0", "u7"], weight=0.8, kind="mention")
73+
74+
# the bridge triad: u3 (A), u5 (B), u7 (B) — moderate pairwise, heavy group
75+
hs.add_relation(["u3", "u5"], weight=1.0, kind="co_amplification", topic="climate")
76+
hs.add_relation(["u3", "u7"], weight=0.9, kind="co_amplification", topic="climate")
77+
# the key triad — its *group* weight is what matters
78+
hs.add_relation(
79+
["u3", "u5", "u7"], weight=4.0,
80+
kind="co_amplification", topic="climate",
81+
)
82+
83+
return hs
84+
85+
86+
def main():
87+
# suppress noisy ARPACK warnings from small-matrix spectral solves
88+
import logging
89+
logging.getLogger("she.diffusion").setLevel(logging.ERROR)
90+
91+
hs = build_scenario()
92+
print(f"Built: {hs!r}\n")
93+
94+
# -- 1. Graph vs simplex ranking --------------------------------------
95+
comparison = rank_influencers(hs, top_k=5)
96+
97+
print("=== Graph centrality (1-skeleton only) ===")
98+
for r in comparison["graph_centrality"]:
99+
label = r.target[0] if len(r.target) == 1 else r.target
100+
role = r.metadata.get("role", "")
101+
comm = r.metadata.get("community", "")
102+
print(f" {label:>4s} score={r.score:.4f} community={comm} role={role}")
103+
104+
print("\n=== Simplex diffusion (all dimensions) ===")
105+
for r in comparison["simplex_diffusion"]:
106+
kind = r.metadata.get("kind", "")
107+
print(f" dim={r.dimension} {r.target} score={r.score:.4f} kind={kind}")
108+
109+
# -- 2. Top entity diffusers ------------------------------------------
110+
print("\n=== Top entity diffusers (dim 0) ===")
111+
for r in rank_entity_diffusers(hs, top_k=5):
112+
label = r.target[0] if len(r.target) == 1 else r.target
113+
comm = r.metadata.get("community", "")
114+
print(f" {label:>4s} score={r.score:.4f} community={comm}")
115+
116+
# -- 3. Bridge simplices ----------------------------------------------
117+
print("\n=== Bridge simplices (cross-community) ===")
118+
bridges = find_bridge_simplices(hs)
119+
for b in bridges[:5]:
120+
print(
121+
f" {sorted(b.members, key=str)} dim={b.dimension} "
122+
f"communities={b.communities_spanned} bridge_score={b.bridge_score:.3f} "
123+
f"kind={b.metadata.get('kind', '')}"
124+
)
125+
126+
# -- 4. Group cohesion comparison -------------------------------------
127+
print("\n=== Group cohesion comparison ===")
128+
triad = ["u3", "u5", "u7"]
129+
hub_group = ["u0", "u1", "u2"]
130+
for group in [triad, hub_group]:
131+
cs = group_cohesion(hs, group)
132+
print(f" {sorted(cs.members, key=str)} score={cs.score:.4f} {cs.components}")
133+
134+
# -- 5. The punchline ------------------------------------------------
135+
print("\n=== Why this matters ===")
136+
print("Graph centrality highlights u0 (the high-degree hub).")
137+
print("Simplex diffusion highlights the {u3, u5, u7} triad -- a cross-community")
138+
print("co-amplification group that no node-level metric would surface.")
139+
print("The triad is the actual diffusion bottleneck between communities A and B.")
140+
141+
142+
if __name__ == "__main__":
143+
main()

0 commit comments

Comments
 (0)