feat(msfs): warm-read fixes + per-inode disk cache backend#68
feat(msfs): warm-read fixes + per-inode disk cache backend#68ankitmaster08 wants to merge 1 commit into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (15)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (10)
📝 WalkthroughWalkthroughAdds selectable disk-backed per-inode cache lines with hole-punching, updates FUSE open/read to use optimistic unlocked copies and evict failed fetches (returning EIO), and lets AIStore delegate LIST/STAT-DIR (manifest generation) to a configured non-AIStore backend; includes config, tests, docs, and lock-site label updates. ChangesMSFS Runtime Storage and Manifest Delegation
Disk-Backed Cache Backend
FUSE/Open/Read Changes
Sequence Diagram (DoRead optimistic copy): sequenceDiagram
participant DoRead
participant globalsLock
participant dataCacheLineTracker
participant diskFile
DoRead->>globalsLock: acquire & inspect cache line
DoRead->>dataCacheLineTracker: snapshot contentGeneration + bounds
DoRead->>globalsLock: release for copy
alt cache_backend == "disk"
DoRead->>diskFile: ReadAt() into reply buffer (unlocked)
else memory backend
DoRead->>dataCacheLineTracker: read in-memory content (unlocked)
end
DoRead->>globalsLock: re-acquire and validate generation
alt fetchFailed
DoRead->>dataCacheLineTracker: evict/free
DoRead->>DoRead: return syscall.EIO
else success
DoRead->>DoRead: advance offset / continue
end
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
multi-storage-file-system/config_test.go (1)
708-809: ⚡ Quick winAdd a SIGHUP regression case for
manifest_gen_backendupdates on existing backends.The new tests cover resolve/missing paths, but not reload behavior for already-mounted AIStore backends. A dedicated update test would catch stale delegation routing regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multi-storage-file-system/config_test.go` around lines 708 - 809, Add a new test that simulates sending SIGHUP (or calling the config reload path used by the program) to exercise updating an already-mounted AIStore backend's manifest_gen_backend; reuse the setup from TestManifestGenBackendReference to mount an "ais" backend, then modify the on-disk config to change ais.AIStore.manifest_gen_backend to a different existing backend (and also add a case where it points to a non-existent backend) and trigger the same reload handler (the SIGHUP path or the reload function the code uses) so that globals.backendsToMount["ais"].backendTypeSpecifics.(*backendConfigAIStoreStruct).manifestGenBackendName and manifestGenBackend are updated/validated; assert that after reload the manifestGenBackendName and resolved manifestGenBackend (and its dirName/backendType) match the new config, and that reload rejects a dangling reference as in TestManifestGenBackendMissing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@multi-storage-file-system/cache.go`:
- Around line 381-385: When recycling a clean cache line, increment/bump the
line's contentGeneration before calling punchHoleDisk so unlocked optimistic
readers can't observe a stale "stable" generation after the line's bytes are
released; locate the code path that handles eviction/reassignment (references:
dataCacheLineTracker, punchHoleDisk, contentGeneration and the reassignment
branch around the current punchHoleDisk call) and move or add a
contentGeneration bump immediately prior to invoking punchHoleDisk (do the same
change in the reassignment code path around the other occurrence referenced at
the reassignment section).
- Around line 528-535: The disk-store call can fail and return 0 which currently
overwrites contentLength and leaves fetchFailed false, causing DoRead to treat
it as EOF; change the logic around
dataCacheLineTracker.storeContentDisk(readFileOutput.buf) so you capture its
return (e.g. savedLen := dataCacheLineTracker.storeContentDisk(...)) and if
savedLen <= 0 mark dataCacheLineTracker.fetchFailed = true and do not overwrite
a previously valid dataCacheLineTracker.contentLength with 0 (keep the prior
length or set an explicit error sentinel), ensuring fetch failures from
storeContentDisk are propagated to DoRead; reference dataCacheLineTracker,
storeContentDisk, and DoRead when making the change.
In `@multi-storage-file-system/README.md`:
- Line 150: The README statement that manifest_gen_backend "must" match the same
bucket/prefix is inaccurate; update the documentation for the
manifest_gen_backend parameter to say the system will only warn on bucket/prefix
mismatches (not enforce) and describe the operational implications of a warning
(e.g., listing may target a different underlying store while reads use AIS
cache). Modify the manifest_gen_backend entry text to replace "must" language
with a note that mismatches trigger a runtime warning and include a short line
about expected behavior when a mismatch occurs.
---
Nitpick comments:
In `@multi-storage-file-system/config_test.go`:
- Around line 708-809: Add a new test that simulates sending SIGHUP (or calling
the config reload path used by the program) to exercise updating an
already-mounted AIStore backend's manifest_gen_backend; reuse the setup from
TestManifestGenBackendReference to mount an "ais" backend, then modify the
on-disk config to change ais.AIStore.manifest_gen_backend to a different
existing backend (and also add a case where it points to a non-existent backend)
and trigger the same reload handler (the SIGHUP path or the reload function the
code uses) so that
globals.backendsToMount["ais"].backendTypeSpecifics.(*backendConfigAIStoreStruct).manifestGenBackendName
and manifestGenBackend are updated/validated; assert that after reload the
manifestGenBackendName and resolved manifestGenBackend (and its
dirName/backendType) match the new config, and that reload rejects a dangling
reference as in TestManifestGenBackendMissing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d670e934-3885-4458-aaa4-be290649b3b7
📒 Files selected for processing (15)
agent-learnings.mdmulti-storage-file-system/README.mdmulti-storage-file-system/backend_aistore.gomulti-storage-file-system/cache.gomulti-storage-file-system/cache_disk.gomulti-storage-file-system/cache_disk_linux.gomulti-storage-file-system/cache_disk_other.gomulti-storage-file-system/cache_disk_test.gomulti-storage-file-system/config.gomulti-storage-file-system/config_test.gomulti-storage-file-system/fission.gomulti-storage-file-system/fission_test.gomulti-storage-file-system/fs.gomulti-storage-file-system/globals.gomulti-storage-file-system/globals_lock.go
56f1193 to
5672e58
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
multi-storage-file-system/config_test.go (1)
708-809: ⚡ Quick winAdd tests for the remaining manifest_gen_backend validation branches.
These additions cover valid reference and missing backend, but
checkConfigFile()also enforces (1) self-reference rejection and (2) AIStore→AIStore rejection. Adding those two cases would lock the full contract and prevent silent regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@multi-storage-file-system/config_test.go` around lines 708 - 809, Add two tests exercising the remaining manifest_gen_backend validation branches: one that sets manifest_gen_backend to the same dir_name (self-reference) and one that configures an AIStore backend pointing its manifest_gen_backend to another backend whose backend_type is also AIStore (AIStore→AIStore). For each test (modeled on TestManifestGenBackendReference/TestManifestGenBackendMissing) call initGlobals(...), write a YAML to globals.configFilePath that creates the specific invalid reference, then call checkConfigFile() and assert it returns an error; for the self-reference case you can additionally load globals.backendsToMount and assert the resolver would have matched the same dir_name if it ran, but the primary expectation is checkConfigFile() must fail. Use the same symbols from the diff: checkConfigFile, manifest_gen_backend, backendConfigAIStoreStruct, globals.backendsToMount, and dir_name to locate where to add these tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@multi-storage-file-system/cache_disk_test.go`:
- Around line 13-25: Add a t.Cleanup registration in diskTestSetup so the disk
cache is always torn down after each test: after successfully calling
diskCacheUp() in diskTestSetup, call t.Cleanup with a closure that invokes
punchHoleDisk() (or the appropriate disk-cache teardown function) and logs any
error (using t.Logf) rather than calling t.Fatal; this ensures
diskCacheUp()/punchHoleDisk() state and file handles are cleaned even if the
test aborts.
---
Nitpick comments:
In `@multi-storage-file-system/config_test.go`:
- Around line 708-809: Add two tests exercising the remaining
manifest_gen_backend validation branches: one that sets manifest_gen_backend to
the same dir_name (self-reference) and one that configures an AIStore backend
pointing its manifest_gen_backend to another backend whose backend_type is also
AIStore (AIStore→AIStore). For each test (modeled on
TestManifestGenBackendReference/TestManifestGenBackendMissing) call
initGlobals(...), write a YAML to globals.configFilePath that creates the
specific invalid reference, then call checkConfigFile() and assert it returns an
error; for the self-reference case you can additionally load
globals.backendsToMount and assert the resolver would have matched the same
dir_name if it ran, but the primary expectation is checkConfigFile() must fail.
Use the same symbols from the diff: checkConfigFile, manifest_gen_backend,
backendConfigAIStoreStruct, globals.backendsToMount, and dir_name to locate
where to add these tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: dcd65572-821e-49c3-a462-f11cd1e5fbf2
📒 Files selected for processing (15)
agent-learnings.mdmulti-storage-file-system/README.mdmulti-storage-file-system/backend_aistore.gomulti-storage-file-system/cache.gomulti-storage-file-system/cache_disk.gomulti-storage-file-system/cache_disk_linux.gomulti-storage-file-system/cache_disk_other.gomulti-storage-file-system/cache_disk_test.gomulti-storage-file-system/config.gomulti-storage-file-system/config_test.gomulti-storage-file-system/fission.gomulti-storage-file-system/fission_test.gomulti-storage-file-system/fs.gomulti-storage-file-system/globals.gomulti-storage-file-system/globals_lock.go
✅ Files skipped from review due to trivial changes (1)
- multi-storage-file-system/README.md
🚧 Files skipped from review as they are similar to previous changes (11)
- multi-storage-file-system/cache_disk_other.go
- multi-storage-file-system/cache_disk_linux.go
- multi-storage-file-system/fs.go
- agent-learnings.md
- multi-storage-file-system/fission_test.go
- multi-storage-file-system/config.go
- multi-storage-file-system/backend_aistore.go
- multi-storage-file-system/cache_disk.go
- multi-storage-file-system/globals_lock.go
- multi-storage-file-system/cache.go
- multi-storage-file-system/fission.go
| func diskTestSetup(t *testing.T, cacheLineSize, cacheLines uint64) { | ||
| t.Helper() | ||
| globals.logger = log.New(os.Stderr, "", 0) | ||
| globals.cacheDir = t.TempDir() | ||
| globals.config = &configStruct{ | ||
| cacheBackend: "disk", | ||
| cacheLineSize: cacheLineSize, | ||
| cacheLines: cacheLines, | ||
| } | ||
| if err := diskCacheUp(); err != nil { | ||
| t.Fatalf("diskCacheUp() failed: %v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Register teardown in setup to avoid cross-test state/file-handle leaks.
Line 22 brings disk cache up, but setup never guarantees a matching teardown. If a test aborts before punchHoleDisk(), disk-cache state can persist and cause follow-on flakiness. Please add a t.Cleanup(...) teardown path in diskTestSetup.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@multi-storage-file-system/cache_disk_test.go` around lines 13 - 25, Add a
t.Cleanup registration in diskTestSetup so the disk cache is always torn down
after each test: after successfully calling diskCacheUp() in diskTestSetup, call
t.Cleanup with a closure that invokes punchHoleDisk() (or the appropriate
disk-cache teardown function) and logs any error (using t.Logf) rather than
calling t.Fatal; this ensures diskCacheUp()/punchHoleDisk() state and file
handles are cleaned even if the test aborts.
Improves the MSFS data-cache warm-read path and adds an opt-in disk cache
backend, validated against the EC2 MSFS->S3 baseline on OCI/AIStore.
Contention / optimistic-lock copy:
- Move the cache-line -> FUSE reply copy out of the global lock using the data
cache's contentGeneration counter (snapshot under lock, copy unlocked,
re-check + retry on mismatch), removing warm-read lock serialization.
Per-inode disk cache backend (cache_backend: disk, Option B, opt-in):
- Store each inode's cache lines in a contiguous per-inode file; serve warm
reads via pread issued outside the global lock and drop FOPEN_DIRECT_IO so
kernel readahead/page cache apply. Default in-memory data cache unchanged.
- cache_disk.go + cache_disk_{linux,other}.go (fallocate PUNCH_HOLE on Linux).
- Bump contentGeneration before punching recycled clean lines so in-flight
optimistic readers retry instead of reading punched bytes.
- Treat a failed storeContentDisk on a non-empty read as a fetch failure
(EIO via DoRead) rather than EOF.
Results (ws-fits, P4.8 warm): baseline 1,861 -> optimistic 2,866 -> disk 4,496
MiB/s (0.99x EC2 4,536).
Other:
- Do not log backend-derived values in the backend setup-failure and
manifest_gen mismatch warnings (CodeQL go/clear-text-logging): the secret
taints those structures, so the warnings log a fixed message only.
- Reject changing AIStore.manifest_gen_backend via SIGHUP.
- config.go/globals.go: cache_backend knob and config plumbing.
- README.md: document cache_backend; align manifest_gen_backend wording.
- tests: config_test.go, fission_test.go, cache_disk_test.go.
5672e58 to
0f8ae01
Compare
Summary
Improves the MSFS data-cache warm-read path and adds an opt-in per-inode disk cache backend, validated against the EC2 MSFS→S3 baseline on OCI/AIStore.
contentGenerationcounter (snapshot under lock, copy unlocked, re-check + retry on mismatch). Removes warm-read lock serialization.cache_backend: disk, opt-in, defaultmemory): stores each inode's cache lines in a contiguous per-inode file and serves warm reads viapreadissued outside the global lock, droppingFOPEN_DIRECT_IOso kernel readahead / page cache apply. Restores sequential locality the fixed-slot LRU gave up.Results (ws-fits, P4.8 warm — 80×1 GiB, 64 KiB seq, 8 threads)
Full 24-cell ws-fits matrix; warm = 0 backend fetches (pure local cache). Disk closes the residual large-file gap optimistic-copy could not (+22–57% on P4/P6).
Files
cache_disk.go,cache_disk_{linux,other}.go— per-inode file store / pread-serve / punch (fallocatePUNCH_HOLEon Linux, no-op elsewhere)cache.go,fission.go— disk store-through + pread serve path + open flagsconfig.go,globals.go—cache_backendknob (validated, defaultmemory)README.md— documentcache_backendcache_disk_test.go,config_test.go,fission_test.goTest plan
go build ./...,go test .(disk-cache unit tests + memory-path FUSE tests) greengofmt/go vetcleancache_backend: memory) path unchangedSummary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests