Skip to content

Minimize e2e brittleness#7406

Open
christianbaroni wants to merge 11 commits intodevelopfrom
@christian/fix-e2e-brittleness
Open

Minimize e2e brittleness#7406
christianbaroni wants to merge 11 commits intodevelopfrom
@christian/fix-e2e-brittleness

Conversation

@christianbaroni
Copy link
Copy Markdown
Member

@christianbaroni christianbaroni commented Apr 28, 2026

What changed (plus any additional context for devs)

Fixes several sources of e2e brittleness that were showing up as intermittent Android and iOS shard failures.

E2E setup

  • Runs transaction flows against a pinned Foundry/Anvil setup shared across shards
  • Downloads workflow artifacts through a retrying helper that validates the zip before extraction
  • Waits for Anvil RPC readiness before Maestro starts a transaction flow
  • Resets Anvil to ANVIL_FORK_BLOCK_NUMBER before each transaction attempt, with an Anvil restart if reset fails
  • Handles test wallet import URLs delivered at launch through Linking.getInitialURL()
  • Transaction setup waits on app-level state exposed through E2EStatusMarker:
    • wallet import exposes e2e-wallet-importing, e2e-wallet-ready, and e2e-wallet-error
    • Anvil connection exposes e2e-anvil-connected
    • Anvil funding exposes e2e-anvil-funded after the provider is reachable and the funding transaction confirms

Maestro flows

  • Transaction and onboarding flows wait on stable screen/testID anchors instead of rendered balance text
  • Android PIN entry uses the numpad testID rather than relying on visible-text matching against 1
  • Swap and wrap iOS command blocks use Maestro's case-sensitive iOS platform value
  • Send and wrap flows wait for target asset/NFT rows, action buttons, and toast contents before continuing

NFT collection loading

The SendNft recordings showed the Rainbow Pooly collection with the collection row visible, but Rainbow Pooly 16 was missing from the expanded collection.

  • fetchNftCollection(...) accepts { expectedCount, force }
  • Non-forced collection fetches retry until the response includes the requested collection and at least the expected number of NFTs
  • Collection open/mount paths pass the known collection total as expectedCount
  • Forced refreshes after NFT sends and ENS registration bypass expectedCount, because those refreshes can legitimately remove NFTs from the collection
  • buildBriefUniqueTokenList(...) normalizes collection totals before building placeholder NFT rows

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

🧪 Flashlight Performance Report (AWS Device Farm)

🔀 Commit: 9ac8297

📎 View Artifacts

Metric Current Δ vs Baseline
Time to Interactive (TTI) 5610 ms
Average FPS 56.66
Average RAM 398.5 MB

@github-actions
Copy link
Copy Markdown

Launch in simulator or device for e2cb651

@github-actions
Copy link
Copy Markdown

Launch in simulator or device for 3d92e7d

@christianbaroni christianbaroni changed the title [Test] Attempt to minimize e2e brittleness [Test] Minimize e2e brittleness Apr 28, 2026
@github-actions
Copy link
Copy Markdown

Launch in simulator or device for d3c5a64

@github-actions
Copy link
Copy Markdown

Launch in simulator or device for 82a89f7

@github-actions
Copy link
Copy Markdown

Launch in simulator or device for 61dde53

@github-actions
Copy link
Copy Markdown

Launch in simulator or device for 6ebf168

@github-actions
Copy link
Copy Markdown

Launch in simulator or device for 2ca541e

Copy link
Copy Markdown
Contributor

@janicduplessis janicduplessis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I remember seeing random failures related to anvil download. Left a few minor comments, but also fine as is.

Comment thread src/helpers/RainbowContext.tsx Outdated
try {
const currentValue = useConnectedToAnvilStore.getState().connectedToAnvil;
setConnectedToAnvil(!currentValue);
setConnectedToAnvil(true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob not a big deal, but this means we can't disconnect from anvil if enabled in dev?

Comment thread src/components/TestDeeplinkHandler.tsx Outdated
return;
}

if (handledUrls.current.has(url)) return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? My idea for commands is it could be used for anything so it would make sense to be able to execute the same one multiple times.

uses: actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955

- name: Cache Foundry binaries
if: github.event_name != 'issue_comment' || startsWith(github.event.comment.body, '/e2e')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might not be needed, also iOS doesnt have it.

@olerass
Copy link
Copy Markdown
Contributor

olerass commented Apr 30, 2026

Overall looks good with the specific fixes and foundry centralization, but I wonder if there's a better approach do achieve the same fixes in some of the cases?

re: remoteConfig short-circuit
We're stubbing out initializeRemoteConfig / fetchRemoteConfig entirely under IS_TEST, which means e2e no longer covers the boot-pat fetch at all (e.g init crashes, bad responses, the 3s timeout race against welcome-screen, any flag-gated behavior). Afaics the underlying issue that config fetch can race the welcome-screen render is a real production bug, not a test bug. Acuals users with bad networks would see the same race we're now "hiding" here. So I'd actually argue the e2e suite is the oppsiite of flaky in this regard, it's surfacing a real issue that the app itself is flaky (?)

We may want to consider either making config init non-blocking liek I think has been discussed previpusly wth a proper fallback UI meanwhile, which fixes it for both tests AND users, or we could inject a test confnig client at the boundary or use MSW so the init code still runs against known fixtures. Both are bigger lifts ofc, but stubbing it out doesn't fix the race, it just hides it from this one consumer, and I'm not usre it's the right path tbh. Curious what your thoughts are here @christianbaroni ?

re: the new e2e marker views
they're a bit ugly but tbh I can't find alternatives that are bviously better and afaiu the production cost is zero since they're behind IS_TEST && status !== 'idle' right?

re: the broader direction
IMO we should be generally trying to reduce IS_TEST usage in the app, not grow it. The markers in this regard are a small step the wrong way; the remoteConfig stub is a bigger step the wrong way. Worth being explicit about whether we're ok with that tradeoff for the immediate flake win (I;d argue no, but curious what others think), especially given we'll soon be running the e2e suite outside PRs entirely, so flakes will ofc still be a thing but much less prevalent and blocking.

@github-actions
Copy link
Copy Markdown

Launch in simulator or device for ddce447

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Launch in simulator or device for 9ac8297

@christianbaroni christianbaroni mentioned this pull request May 1, 2026
@christianbaroni christianbaroni changed the title [Test] Minimize e2e brittleness Minimize e2e brittleness May 1, 2026
@christianbaroni
Copy link
Copy Markdown
Member Author

Overall looks good with the specific fixes and foundry centralization, but I wonder if there's a better approach do achieve the same fixes in some of the cases?

re: remoteConfig short-circuit

We're stubbing out initializeRemoteConfig / fetchRemoteConfig entirely under IS_TEST, which means e2e no longer covers the boot-pat fetch at all (e.g init crashes, bad responses, the 3s timeout race against welcome-screen, any flag-gated behavior). Afaics the underlying issue that config fetch can race the welcome-screen render is a real production bug, not a test bug. Acuals users with bad networks would see the same race we're now "hiding" here. So I'd actually argue the e2e suite is the oppsiite of flaky in this regard, it's surfacing a real issue that the app itself is flaky (?)

We may want to consider either making config init non-blocking liek I think has been discussed previpusly wth a proper fallback UI meanwhile, which fixes it for both tests AND users, or we could inject a test confnig client at the boundary or use MSW so the init code still runs against known fixtures. Both are bigger lifts ofc, but stubbing it out doesn't fix the race, it just hides it from this one consumer, and I'm not usre it's the right path tbh. Curious what your thoughts are here @christianbaroni ?

re: the new e2e marker views

they're a bit ugly but tbh I can't find alternatives that are bviously better and afaiu the production cost is zero since they're behind IS_TEST && status !== 'idle' right?

re: the broader direction

IMO we should be generally trying to reduce IS_TEST usage in the app, not grow it. The markers in this regard are a small step the wrong way; the remoteConfig stub is a bigger step the wrong way. Worth being explicit about whether we're ok with that tradeoff for the immediate flake win (I;d argue no, but curious what others think), especially given we'll soon be running the e2e suite outside PRs entirely, so flakes will ofc still be a thing but much less prevalent and blocking.

Removed the remote config changes which weren't needed. The current loading design there works pretty well in production — it's only blocking (up to 3s) when the app version changes which is generally the only time it needs to block, and even then it's typically fast to load.

Also cleaned up the marker component code so it's at least not scattered within production code.

Regarding IS_TEST usage, generally agree we should avoid scattering it all over and prefer lower-level control points. The remaining uses are I think justified.

With the recent commits the brittleness seems to be resolved, or at least is far less common, so broadly I think these changes are worthwhile.

There were a couple real NFT bugs that showed up in the run artifact recordings. Might be useful to have those posted somewhere on failure rather than being buried in the CI views.

@olerass
Copy link
Copy Markdown
Contributor

olerass commented May 4, 2026

Thanks for simplifying @christianbaroni! The core E2E fixes here look really good!

A few more things worth considering before merge, mostly around minimizing what's actually in this PR vs. what could ship separately or even get dropped.

1.Split out the NFT fixes
These are real bug fixes not really related to the E2E suite right? Just because the e2e suite found them, doesn't mean we should fix it in an E2E PR. Pulling them into their own fix PR will make review more focused, will also attribute this correctly in our changelog generation for releases, and make revert easier (if we need to).

2. platform: iosplatform: iOS casing fix
Afaiu this is not really reducing flakiness but more so activating what was previousluy dead test code that noop before? Couldn't this in theory cause more flakiness (or finding more real issues) since it's now asserting properly?
E.g if iOS shards start failing on those specific assertions you want bisect to land here directly, not folded into a 32-file e2e infra PR.

3. Dormant marker states
Afaics only e2e-anvil-connected and e2e-anvil-funded are used anywhere? The e2e-wallet-* ones look like leftover from the prev PR shape, but not sure about the rest?

What if we make it such that:

  • TestDeeplinkHandler: drop the commandStatus state machine, E2ECommandStatus type, and marker render. Component returns null again. Keep getInitialURL, param validation etc since those are the real fixes.
  • RainbowContext: type unions shrink to 'idle' | 'connected' and 'idle' | 'funded'.

Removes a bunch of code that nothing reads.

4. Used markers -> dev button state?
For the 2 actually used markers, the dev buttons that trigger them already exist in the dev/test surface. Could those buttons just encode the operation state in their label/style? Like "Connect to Anvil" → "Connecting…" → "Anvil ✓" / "Anvil ✗"; same for fund.

If we do this maestro will be asserting on the existing button testID (with the new label state), no need for invisible 2x2 marker views anywhere, devs pressing the buttons get visible feedback they don't have today as a bonus even. Alsow ould remove the the IS_TEST code.

Bigger refactor ofc, and fine to defer if you'd rather keep the markers, but if we do this it looks to m elike we can drop e2e markers entirely and solve it simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants