Skip to content

CASSANDRA-21133 Make bin/sstableupgrade functionally on par with nodetool upgradesstables#4825

Closed
arvindKandpal-ksolves wants to merge 2 commits into
apache:trunkfrom
arvindKandpal-ksolves:CASSANDRA-21133
Closed

CASSANDRA-21133 Make bin/sstableupgrade functionally on par with nodetool upgradesstables#4825
arvindKandpal-ksolves wants to merge 2 commits into
apache:trunkfrom
arvindKandpal-ksolves:CASSANDRA-21133

Conversation

@arvindKandpal-ksolves
Copy link
Copy Markdown
Contributor

Make bin/sstableupgrade functionally on par with nodetool upgradesstables

What this PR does / why we need it:

This PR brings functional parity between bin/sstableupgrade and nodetool upgradesstables by introducing the -a / --include-all-sstables flag to the offline tool.

Previously, bin/sstableupgrade would automatically skip SSTables that were already on the latest version. This patch allows users to force a rewrite of all SSTables even if they are on the current format.

Technical Details & Safety:

Adding this flag to an offline tool introduced a known edge case: If the offline tool is run while the Cassandra node is online, it consumes a new SSTable ID on disk. However, the live node's internal sstableIdGenerator remains unaware of this. Upon the next flush, the live node attempts to use the same ID, which previously triggered a java.lang.AssertionError in ColumnFamilyStore.newSSTableDescriptor().

To safely resolve this without introducing regressions:

  1. Replaced the strict assert !newDescriptor.fileFor(Components.DATA).exists(); check in newSSTableDescriptor with a safe while(true) loop. It checks if the generated ID already exists on disk. If a collision is detected, it logs a warning and advances the generator, self-healing the state.
  2. Updated StandaloneUpgrader to parse and apply the -a option consistently.
  3. Added a robust test (testNewSSTableDescriptorCollision) in ColumnFamilyStoreTest that probes the current ID, uses SSTableIdFactory to pre-create files for future IDs (N+1, N+2), and verifies that the loop correctly skips them and yields N+3.
  4. Updated flag assertions in StandaloneUpgraderTest.
  5. Updated documentation in sstableupgrade.adoc.

patch by Arvind Kandpal; reviewed by TBD for CASSANDRA-21133

Comment thread src/java/org/apache/cassandra/tools/StandaloneUpgrader.java Outdated
Comment thread src/java/org/apache/cassandra/db/ColumnFamilyStore.java Outdated
sstableIdGenerator.get());
assert !newDescriptor.fileFor(Components.DATA).exists();
Descriptor newDescriptor;
while (true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not happen in practice but it is worth saying that there is no limit as in how many times we loop, if there is some buggy ID supplier or we exhausted the namespace of these ids and all were on disk then we would spin for ever. Currently we have sequence based generator on integer and UUID generator on ... uuid, the namespace is pretty much "infinite" in practical terms so this we will not hit it, one would have to have like ~2 billions sstables on disk to exhaust (and then it would start to overflow)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. With ~2 billion for sequence IDs and practically infinite for UUIDs, we shouldn't ever hit an infinite loop in the real case. Thanks for pointing this out.

}

@Test
public void testNewSSTableDescriptorCollision() throws Exception
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just saying we are not testing UUID-based generator but I dont think doing that is the absolute must given how it behaves.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if needed we can try to add , otherwise I also thought there is no absolute required.

sstableIdGenerator.get());
assert !newDescriptor.fileFor(Components.DATA).exists();
Descriptor newDescriptor;
while (true)
Copy link
Copy Markdown
Contributor

@smiklosovic smiklosovic May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that thread safety is just delegated to how sstableIdGenerator.get() returns the new id, right? That is AtomicInteger, hence increased atomically, so two threads calling this stuff will always have different ids to check the existence of a file against.

@arvindKandpal-ksolves could you put a comment on while(true) about this and that this loop is meant to be a defense against offline / online generation for tooling's sake instead of in-process flow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. The atomic increment takes care of thread safety in-process. I've added a comment above the while(true) loop to explain that this is purely a defense against offline tool generation.

@arvindKandpal-ksolves
Copy link
Copy Markdown
Contributor Author

Hi @smiklosovic , I have addressed most of the points, Please take a look.

@smiklosovic smiklosovic requested a review from netudima May 20, 2026 08:34
@netudima
Copy link
Copy Markdown
Contributor

Hi, I am a bit confused the the description/intention of the patch:

  1. sstableupgrade is suppose to be offline tool (means it should be executed when a Cassandra process is stopped), are we trying to change it as a part of this patch?
  2. "Adding this flag to an offline tool introduced a known edge case" - it is unclear why the flag introduction changes anything in the context of the SSTable IDs: we just change the set of SSTables to convert but the overall mechanism is the same, so the id collision issue is not something new, introduced in this patch.

@smiklosovic
Copy link
Copy Markdown
Contributor

smiklosovic commented May 25, 2026

@netudima currently there is no check if a tool operates against Cassandra turned off or not. It just fails or the behavior is unpredictable, in general. I was commenting on that here.

https://issues.apache.org/jira/browse/CASSANDRA-21133?focusedCommentId=18053921&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-18053921

From my perspective this is the correct solution to disparity of the functionality regardless whether this tool is run against online or offline Cassandra.

If you want to not make it functionally on par, then we should prevent offline tools to run against running Cassandra.

to your 2) because both tools run same code, but offline tool runs in a completely different process and it cant see what Cassandra state is. Generators of IDs are isolated, per process. We need to allocate new ID but allocation of an ID in one process does not guarantee that it is not already taken by another process.

@netudima
Copy link
Copy Markdown
Contributor

netudima commented May 25, 2026

ok, then the MR description seems incorrect and confusing from this point of view, it is not a new logic which we are trying to fix, it is an attempt to adjust the existing logic.

If you want to not make it functionally on par

For me it looks as quite independent activity. I understand the danger of the current offline tools and agree that a protection is needed but it is not related to adding a new parameter.. Also I suppose the ID conflict is not only the issue and just adding the optimistic locking loop would not make the tool safe to use, you still have other issues like (but not limited to):

  • interference of the offiline process with the SSTable transactions are executed by DB process (like compaction), for example the tool reads an SSTable and at the same time it is removed by a compaction activity, or vice versa
  • visibility for Cassandra DB of the new SSTables (the tool renders new files and Cassandra process is not aware about them -> so it will look like a data loss)

So, for me it is a kind of separate independent story with a proper design to check if DB process exists or to implement proper ability to execute the tool and DB process concurrently.., mixing it with an offline tool flag adding is a bit confusing (at least for me).
I also want to avoid the illusion that only this change is enough to make the logic safe, as a result people may start to think that it is ok to avoid the warn, start to use the tool and get all kinds of issues hard to troubleshoot

@smiklosovic
Copy link
Copy Markdown
Contributor

Fair enough. Your opinion then renders this PR irrelevant and OP should try to direct his effort on the check there is a running process or not before offline tools are executed.

I can image such an endeavour would cause issues where we run offline tools when Cassandra is online regardless and change of this behavior might be disruptive to the stuff we are not aware of yet. We would need to replace such usages with their online counterparts which might not necessarily exist.

@netudima
Copy link
Copy Markdown
Contributor

netudima commented May 25, 2026

I would create a separate story about adding to offine tools a logic to check if Cassandra process is alive: using PID file or a file lock for example

UPD: I have found one - https://issues.apache.org/jira/browse/CASSANDRA-9555

@netudima
Copy link
Copy Markdown
Contributor

netudima commented May 27, 2026

jut to be clear: my concern is only about the ad-hoc attempt to make offline tool working safe in parallel with a running Cassandra process (the id loop logic). I do not have a problem with the functional change about adding a new option to sstableupgrade if it was the main goal of the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants