Skip to content

refactor: split .index file into multiple#389

Open
dkharms wants to merge 15 commits intomainfrom
336-sealing-split
Open

refactor: split .index file into multiple#389
dkharms wants to merge 15 commits intomainfrom
336-sealing-split

Conversation

@dkharms
Copy link
Copy Markdown
Member

@dkharms dkharms commented Mar 31, 2026

Description

This is first pull request of series #336. Changes in this pull request will allow us to efficiently merge several fractions into one when performing compaction.

So .index gets split into several:

  • .info -- contains one info block;
  • .offsets -- contains one block with offsets of DocBlock inside .docs file;
  • .ids -- contains triplets of seq.MID, seq.RID and seq.DocPos blocks;
  • .tokens -- contains tokens and token table;
  • .lids -- contains chunks of seq.LID;

It's easy to notice that we expect increase in file descriptors usage by 3x

Several things I need to polish:

  • Reintroduce statistics reporting on sealing;
  • Backwards compatibility for sealed fractions which were offloaded to remote storage;
  • Delete all sealed fractions files if at least one file has tmp suffix;
  • Handle tmp .index files;

  • I have read and followed all requirements in CONTRIBUTING.md;
  • I used LLM/AI assistance to make this pull request;

If you have used LLM/AI assistance please provide model name and full prompt:

Model: Claude Sonnet 4.6
Context: I've used LLM to fix issues with index analyzer binary

@github-actions
Copy link
Copy Markdown
Contributor

PR Title Validation Failed
Please refer to CONTRIBUTING.md

@dkharms dkharms changed the title 336 sealing split refactor: split .index file into multiple Mar 31, 2026
@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - f70e7779.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 61.08911% with 393 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.73%. Comparing base (124d460) to head (4acc3e9).

Files with missing lines Patch % Lines
frac/sealed.go 50.25% 79 Missing and 19 partials ⚠️
frac/sealed_loader.go 41.86% 67 Missing and 8 partials ⚠️
cmd/index_analyzer/main.go 0.00% 50 Missing ⚠️
frac/remote.go 64.07% 25 Missing and 12 partials ⚠️
frac/sealed/sealing/index.go 60.46% 13 Missing and 21 partials ⚠️
frac/active_sealing_source.go 79.81% 10 Missing and 12 partials ⚠️
frac/sealed/sealing/sealer.go 67.16% 14 Missing and 8 partials ⚠️
frac/sealed/sealing/blocks_builder.go 86.08% 9 Missing and 7 partials ⚠️
frac/sealed/sealing/writer.go 70.37% 8 Missing and 8 partials ⚠️
fracmanager/frac_manifest.go 71.42% 14 Missing and 2 partials ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #389      +/-   ##
==========================================
- Coverage   71.40%   70.73%   -0.68%     
==========================================
  Files         219      219              
  Lines       16454    16845     +391     
==========================================
+ Hits        11749    11915     +166     
- Misses       3834     4029     +195     
- Partials      871      901      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot down f70e7779

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator f70e7779 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 75.47 75.95 +0.64% 31.14 32.51 +4.41% 67.00 66.00 -1.49% 140.00 144.00 +2.86% 191.00 196.00 +2.62% 56767.00 56645.00 -0.21%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot up main mixed

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - e47604c9.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot down e47604c9

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator e47604c9 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 75.44 75.74 +0.40% 30.28 30.81 +1.74% 67.00 68.00 +1.49% 136.50 140.00 +2.56% 188.00 191.00 +1.60% 21032.00 21063.00 +0.15%
service:payment-backend-eu
AND k8s_namespace:prod
AND level:[0 to 3]
AND (
    message:'failed'
    OR message:'timeout'
)
warm 71.49 71.26 -0.32% 25.23 24.94 -1.14% 65.00 65.00 0.00% 124.00 121.00 -2.42% 173.50 171.00 -1.44% 3459.00 3466.00 +0.20%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - 3a13f7af.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot down 3a13f7af

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator 3a13f7af was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 70.93 70.93 -0.00% 26.40 27.66 +4.80% 64.00 63.00 -1.56% 124.50 127.00 +2.01% 165.00 170.00 +3.03% 13088.00 13154.00 +0.50%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - 29e733d8.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms dkharms force-pushed the 336-sealing-split branch from b0caec8 to 20ecd57 Compare March 31, 2026 13:34
@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot down 29e733d8

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator 29e733d8 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 69.95 69.78 -0.25% 26.18 26.57 +1.50% 63.00 63.00 0.00% 122.00 123.00 +0.82% 162.00 166.00 +2.47% 13679.00 13623.00 -0.41%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - b1727103.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Mar 31, 2026

@seqbenchbot down b1727103

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Mar 31, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator b1727103 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 72.81 72.37 -0.60% 29.14 28.83 -1.06% 65.00 64.00 -1.54% 132.00 130.00 -1.52% 175.00 179.00 +2.29% 40040.00 39925.00 -0.29%

Have a great time!

@github-actions
Copy link
Copy Markdown
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
MutexListAppend-4 5115f7 674dbc
196.07 MB/s 168.78 MB/s 0.86 🔴
83287562.00 ns/op 94799666.00 ns/op 1.14 🔴

@dkharms dkharms force-pushed the 336-sealing-split branch from d73ef96 to 47289a3 Compare April 2, 2026 08:45
@dkharms dkharms marked this pull request as ready for review April 2, 2026 08:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
MutexListAppend-4 5115f7 30cda4
196.07 MB/s 174.63 MB/s 0.89 🔴

@eguguchkin eguguchkin self-requested a review April 6, 2026 10:21
@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 16, 2026

@seqbenchbot down cfc11182

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 16, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator cfc11182 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 62.06 61.62 -0.70% 21.63 21.82 +0.88% 56.00 56.00 0.00% 105.00 105.00 0.00% 143.00 143.00 0.00% 42231.00 42250.00 +0.04%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 16, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 16, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - 3dd6ec65.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms dkharms force-pushed the 336-sealing-split branch from daaec7e to f0f2fbe Compare April 16, 2026 14:28
@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 16, 2026

@seqbenchbot down 3dd6ec65

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 16, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator 3dd6ec65 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 61.94 61.88 -0.10% 22.16 23.01 +3.84% 56.00 56.00 0.00% 104.00 108.00 +3.85% 146.00 150.00 +2.74% 10872.00 10818.00 -0.50%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 16, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 16, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - ddbdf38d.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@github-actions
Copy link
Copy Markdown
Contributor

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table
Name Previous Current Ratio Verdict
AndTree/size=1000000-4 124d46 0e59a4
4.56 ns/op 5.11 ns/op 1.12 🔴
Sealing_NoSort-4 124d46 0e59a4
1155972995.00 ns/op 1329586811.00 ns/op 1.15 🔴
Sealing_WithSort-4 124d46 0e59a4
2189462576.00 ns/op 2513488819.00 ns/op 1.15 🔴

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 16, 2026

@seqbenchbot down ddbdf38d

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 16, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator ddbdf38d was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 63.38 62.43 -1.50% 23.95 22.96 -4.13% 57.00 56.00 -1.75% 112.00 108.00 -3.57% 153.50 151.00 -1.63% 130229.00 130316.00 +0.07%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 16, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 16, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - 12d95a0e.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 17, 2026

@seqbenchbot down 12d95a0e

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 17, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator 12d95a0e was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 64.26 63.66 -0.93% 24.23 23.58 -2.68% 58.00 57.00 -1.72% 112.00 111.00 -0.89% 160.00 153.50 -4.06% 1553308.00 1553207.00 -0.01%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 17, 2026

@seqbenchbot up main bulk

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 17, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - d1043cea.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 17, 2026

@seqbenchbot down d1043cea

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 17, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator d1043cea was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 63.77 62.88 -1.40% 23.26 23.02 -1.02% 58.00 57.00 -1.72% 111.00 109.50 -1.35% 152.00 151.50 -0.33% 123732.00 123857.00 +0.10%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 17, 2026

@seqbenchbot up main search-keyword-exact-match-warm

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 17, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - caad43de.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 17, 2026

@seqbenchbot down caad43de

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 17, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator caad43de was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 61.38 60.93 -0.73% 19.34 18.90 -2.28% 57.00 56.00 -1.75% 100.00 98.00 -2.00% 131.50 130.00 -1.14% 2450.00 2450.00 0.00%
service:payment-backend-eu
AND k8s_namespace:prod
warm 98.10 96.44 -1.69% 53.46 54.54 +2.03% 90.00 91.00 +1.11% 206.00 205.00 -0.49% 250.50 255.00 +1.80% 11285.00 11273.00 -0.11%

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 17, 2026

@seqbenchbot up main mixed

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 17, 2026

Nice, @dkharms <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - 1d4eb6bd.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@dkharms
Copy link
Copy Markdown
Member Author

dkharms commented Apr 18, 2026

@seqbenchbot down 1d4eb6bd

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 18, 2026

Nice, @dkharms <(-^,^-)=b!

The benchmark with identificator 1d4eb6bd was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 76.50 77.35 +1.11% 64.51 65.92 +2.19% 59.00 59.00 0.00% 174.00 181.00 +4.02% 384.00 385.00 +0.26% 1836773.00 1834692.00 -0.11%
service:payment-backend-eu
AND k8s_namespace:prod
AND level:[0 to 3]
AND (
    message:'failed'
    OR message:'timeout'
)
warm 105.47 105.03 -0.42% 127.19 124.46 -2.15% 70.00 68.00 -2.86% 323.50 328.50 +1.55% 714.00 707.50 -0.91% 458701.00 458676.00 -0.01%

Have a great time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants