fuzz: model chanmon mempool mining#4657
Conversation
|
👋 Thanks for assigning @wpaulino as a reviewer! |
|
@wpaulino FYI, this may intersect with the existing splice fuzzing failures you’re looking at. This PR makes splice transaction mining in |
4e6f262 to
d02194d
Compare
|
Fuzz failure is pre-existing |
0ab5882 to
ef0be5b
Compare
| // Connects this node from its tracked height to target_height, delivering | ||
| // each relevant chain callback to both ChainMonitor and ChannelManager. | ||
| fn connect_chain_range(&mut self, chain_state: &ChainState, target_height: u32) { | ||
| // Mining commands can advance the harness chain by more than one block. | ||
| // Transaction blocks must be connected explicitly so LDK learns about | ||
| // on-chain spends, while empty depth blocks still need best-block | ||
| // updates so CSV/CLTV-sensitive logic can run. This is the normal sync | ||
| // path, so both the raw ChainMonitor and the ChannelManager receive the | ||
| // callbacks and the node's tracked height advances to the target. | ||
| let mut height = self.height; | ||
| while height < target_height { | ||
| let mut next_height = height + 1; | ||
| while next_height <= target_height && chain_state.block_at(next_height).1.is_empty() { | ||
| next_height += 1; | ||
| } | ||
| if next_height > target_height { | ||
| // The rest of the range is empty. One best-block update to the | ||
| // final height is enough because LDK's Confirm API explicitly | ||
| // allows best_block_updated to skip intermediary blocks, and | ||
| // empty blocks have no transactions_confirmed calls whose chain | ||
| // order must be preserved. | ||
| height = target_height; | ||
| let (header, _) = chain_state.block_at(height); | ||
| self.monitor.best_block_updated(header, height); | ||
| self.node.best_block_updated(header, height); | ||
| break; | ||
| } | ||
| if next_height > height + 1 { | ||
| // Advance across the empty prefix before the next transaction | ||
| // block. Confirm::best_block_updated may skip intermediary | ||
| // blocks, so this compressed update still lets height-triggered | ||
| // LDK work run at the correct tip before the transaction | ||
| // confirmations are connected. | ||
| height = next_height - 1; | ||
| let (header, _) = chain_state.block_at(height); | ||
| self.monitor.best_block_updated(header, height); | ||
| self.node.best_block_updated(header, height); | ||
| } | ||
| height = next_height; | ||
| let (header, txn) = chain_state.block_at(height); | ||
| let txdata: Vec<_> = txn.iter().enumerate().map(|(i, tx)| (i + 1, tx)).collect(); | ||
| if !txdata.is_empty() { | ||
| // Transaction blocks need the explicit confirmation callback | ||
| // before the best-block update so watched spends are delivered | ||
| // in chain order before the node advances to that tip. | ||
| self.monitor.transactions_confirmed(header, &txdata, height); | ||
| self.node.transactions_confirmed(header, &txdata, height); | ||
| } | ||
| self.monitor.best_block_updated(header, height); | ||
| self.node.best_block_updated(header, height); | ||
| } | ||
| self.height = target_height; | ||
| } |
There was a problem hiding this comment.
Good improvement: the old sync_with_chain_state only notified self.node (ChannelManager), never self.monitor (ChainMonitor). This means ChannelMonitors weren't receiving transactions_confirmed / best_block_updated callbacks during normal fuzz-loop sync, which would have hidden any bugs in monitor chain-tracking logic. The new connect_chain_range fixing this is an important correctness improvement.
One behavioral change worth noting: the old code called best_block_updated for every individual block height. The new code batches consecutive empty blocks into a single best_block_updated call at the last empty height before the next tx block (or at the target). This is allowed by the Confirm trait ("May be skipped for intermediary blocks"), but it means height-triggered logic inside ChainMonitor/ChannelManager runs at fewer checkpoints than before. For a fuzzer seeking maximum coverage of height-sensitive paths (e.g., HTLC timeout detection), individual block delivery would exercise more code paths.
There was a problem hiding this comment.
You could also argue that skipping exercises different paths. I think in practice, we'll get a bit from both.
|
I've reviewed the PR against my prior review notes and re-examined the areas not previously covered, particularly the The two new No new issues found beyond those already flagged in prior passes. The following previously-flagged issues remain present in the current code and were intentionally not re-posted per the no-duplicate rule:
Note (correction carried forward): the earlier first-pass claim at |
ef0be5b to
4036d6a
Compare
4036d6a to
57960f1
Compare
|
When we are good with the changes, I can tack on one more commit that moves the chain/mempool code into |
| const DEFAULT_TX_CONFIRMATION_BLOCKS: u32 = 6; | ||
| // Single fuzz bytes can mine more than one block so a corpus entry does not | ||
| // need long runs of identical "mine one block" commands to reach CSV or CLTV | ||
| // boundaries. Mining is clipped below if unresolved HTLCs are near expiry. |
There was a problem hiding this comment.
What is meant by "clipped below"?
There was a problem hiding this comment.
That mining stops before anything would expire. Changed it to 'capped'
| // A mined transaction is considered deeply confirmed after this many blocks. | ||
| // This confirms the transaction in one block and then mines five empty depth | ||
| // blocks. | ||
| const DEFAULT_TX_CONFIRMATION_BLOCKS: u32 = 6; |
| events::Event::PaymentClaimed { payment_hash, .. } => { | ||
| if payments.payment_preimages.contains_key(&payment_hash) { | ||
| payments.claimed_payment_hashes.insert(payment_hash); | ||
| } | ||
| }, |
There was a problem hiding this comment.
Should this (moved from line 1943) be a separate commit?
There was a problem hiding this comment.
Hm yes, that is already living in #4635. Removed code from this PR.
| assert!( | ||
| current_tip < timeout_deadline, | ||
| "pending HTLC with expiry {} and timeout deadline {} is already unsafe at tip {}", | ||
| expiry, | ||
| timeout_deadline, | ||
| current_tip | ||
| ); |
There was a problem hiding this comment.
This fails with the following input:
printf '\x20\xdf\xdf\xb0\x0e\x10\x18\x10\x18\x10\x18\x30\xd8' | ./target/release/chanmon_consistency_target
Claude's succinct summary:
The persisted view can drift arbitrarily behind chain_state.tip because
Harness::mine_blocksnever re-checkpoints, so after a deferred reload the node builds an HTLC against a stale view with a deadline already below the tip. Fix it by either soft-capping the past-deadline branch insafe_mine_block_countto return 0, or — better — also capping mining against the lowest persisted-manager view so a future reload-then-send can't produce an immediately-unsafe HTLC.
There was a problem hiding this comment.
Thanks, this was the missing startup-sync piece from the FC branch that I forgot to cherry-pick. Reload now catches ChainMonitor up from the oldest monitor height and ChannelManager from its own best block before returning to the fuzz loop.
The repro passes now
| assert!( | ||
| self.mine_blocks(DEFAULT_TX_CONFIRMATION_BLOCKS) > 0, | ||
| "finish cannot mine pending mempool transactions without crossing an unresolved HTLC timeout deadline" | ||
| ); |
There was a problem hiding this comment.
This can be reached using:
printf '\xc9\xa6\xff\xde\xdf' | ./target/release/chanmon_consistency_target
There was a problem hiding this comment.
This one the the ones below contain splice opcodes. As that is currently not yet stable, I don't want to go into debugging that preferably 😬 Without the splicing cfg flag, those strings pass.
| @@ -2818,13 +3128,7 @@ | |||
| .funding_transaction_signed(&channel_id, &counterparty_node_id, signed_tx) | |||
| .unwrap(); | |||
There was a problem hiding this comment.
This is now reachable:
printf '\xa7\xa0\xff\xd5\xd8\xa0\xff' | ./target/release/chanmon_consistency_target
| panic!( | ||
| "It may take may iterations to settle the state, but it should not take forever" | ||
| ); |
There was a problem hiding this comment.
Able to hit this now but looks like it was preexisting, too.
printf '\xa7\xa0\xff\xd5\xd8\xa0\xff' | ./target/release/chanmon_consistency_target
| let is_quiescent_msg = msg.data.contains("already sent splice_locked, cannot RBF"); | ||
| if !msg.data.contains("Disconnecting due to timeout awaiting response") && !is_quiescent_msg | ||
| { | ||
| panic!("Unexpected disconnect case: {}", msg.data); |
There was a problem hiding this comment.
Looks like this can be hit prior to your change.
printf '\xc7\x3a\xa2\x32\x3a\xff\xff\xa7\x35\x33\x45\xff' | ./target/release/chanmon_consistency_target
We may need to expand the allowed reason.
57960f1 to
771e8a5
Compare
771e8a5 to
534bb75
Compare
|
Review comments addressed: https://github.com/lightningdevkit/rust-lightning/compare/57960f1..3b8a78e1d9c61e184e4ccb7ea9279b9f832df580
|
9458dc7 to
3b8a78e
Compare
| confirmed_txids: HashSet<Txid>, | ||
| /// Unconfirmed transactions (e.g., splice txs). Conflicting RBF candidates may coexist; | ||
| /// `confirm_pending_txs` determines which one confirms. | ||
| /// Unconfirmed transactions admitted by the harness mempool. Admission keeps |
There was a problem hiding this comment.
Made it 'admitted to'
| /// created by an earlier transaction in this vector. | ||
| pending_txs: Vec<(Txid, Transaction)>, | ||
| /// Tracks unspent outputs created by confirmed transactions. Admission builds | ||
| /// a temporary package view from this set plus earlier mempool transactions, |
There was a problem hiding this comment.
What is meant by "earlier mempool transactions" here? Do you mean the state of pending_txs at time of insertion into utxos?
| /// Confirm pending transactions in a single block, selecting deterministically among | ||
| /// conflicting RBF candidates. Sorting by txid ensures the winner is determined by fuzz input | ||
| /// content. Transactions that double-spend an already-confirmed outpoint are skipped. |
There was a problem hiding this comment.
We want to retain this behavior for splicing at least. Otherwise, we won't exercise code confirming an RBF that isn't the most recent.
There was a problem hiding this comment.
Which candidate confirms is still fuzz-chosen, just via relay order instead of txid sort: admission keeps whichever conflicting tx was relayed last (no fee-rate policy), and mining confirms whatever is in the mempool at that point. So an input can confirm a non-latest candidate either by mining before the replacement is relayed, or by relaying an older copy from another node's queue after the newer one.
| // that conflicting transaction itself signals RBF. The harness does not | ||
| // model fee-rate policy, so fuzz-controlled relay order chooses between | ||
| // otherwise valid RBF candidates. |
There was a problem hiding this comment.
I don't know what is meant by "fuzz-controlled relay order chooses between otherwise valid RBF candidates". How can an "order" choose? What criteria is it using for choosing? Is saying "does not model fee-rate policy" supposed to indicate that the fee rate isn't considered during replacement?
There was a problem hiding this comment.
Fuzz-controlled relay order is what I try to explain here: #4657 (comment)
Indeed, replacement is based on relay order, with fee being ignored.
| // need this to keep double-spends and unknown-prevout spends from producing | ||
| // impossible on-chain state. | ||
| fn has_invalid_inputs(tx: &Transaction, utxos: &HashSet<BitcoinOutPoint>) -> bool { | ||
| // The tiny UTXO set protects the chain model from two false positives: |
There was a problem hiding this comment.
What is "The tiny UTXO set"?
There was a problem hiding this comment.
AI weirdness. It's just a dupe-checking set.
| // Coinbase transactions have a null input, and synthetic funding | ||
| // transactions have no inputs, but neither consumes a modeled UTXO. | ||
| // Normal transactions consume their inputs before exposing outputs to | ||
| // later transactions. |
There was a problem hiding this comment.
Honestly, I find the AI-generated comments like this and others jarring to read. I need to parse it multiple times to determine what it's trying to say and why it's worth saying. i.e. Why a "modeled" UTXO and not just a UTXO? What value does the last sentence provide?
There was a problem hiding this comment.
Removed this whole section.
|
Ouch, so much AI weirdness. Been experimenting with this extensive comment style. It is extra info, but it also needs to be right, which increases the review burden for author and reviewer to get there. Been looking over all kinds of change sets so often, and I did think I reviewed this one thoroughly, but clearly not. I found the insights obtained via AI-added comments quite useful, to avoid asking it to explain things repeatedly, but perhaps it is better reserved for the dev stage and removed before review. Will address this and clean up. |
f012c8f to
9c64644
Compare
95b628a to
cc04e72
Compare
|
Working on getting this to pass with splicing. Opened a reference branch to see changes since last review excluding rebases: https://github.com/joostjager/rust-lightning/compare/chanmon-mempool-mining-ref..chanmon-mempool-mining
|
Route chanmon broadcasts through an explicit harness mempool so relay, mining, wallet updates, and chain delivery share one path. This lets splice, anchor, and claim transactions enter the mempool before mining. On restart, sync loaded monitors and managers from their own persisted best blocks so raw monitors catch up without rewinding ChannelManager state. Cap modeled mining before unresolved HTLC timeout deadlines and use the LDK anti-reorg depth for setup confirmations. Accept the splice-reserve disconnect a node emits when a splice would drop a balance below the channel reserve, exiting quiescence as with other splice-negotiation disconnects. AI tools were used in preparing this commit.
cc04e72 to
c04cef6
Compare
| .iter() | ||
| .any(|funding| funding.funding_tx_confirmation_height != 0) | ||
| { | ||
| return Err(ChannelError::WarnAndDisconnect(format!( |
There was a problem hiding this comment.
We should respond with tx_abort, there's no need to disconnect.
| // A queued RBF signing request can lose the race against a | ||
| // transaction confirming with one of its wallet inputs. | ||
| match nodes[node_idx] | ||
| .cancel_funding_contributed(&channel_id, &counterparty_node_id) |
There was a problem hiding this comment.
This seems like something end users would need to handle as well, which doesn't seem ideal. We could instead cancel an ongoing negotiation if we see a splice confirm (even if not locked). If the confirmation is reorged out, then the user could re-attempt their negotiation, though there isn't any way currently for the user to detect that case.
| || msg.data.contains("smaller than their selected v2 reserve") | ||
| || msg.data.contains("smaller than our selected v2 reserve"); |
There was a problem hiding this comment.
If these are resulting from processing a counterparty's splice_init/ack/tx_init/ack_rbf, then we should be responding with tx_abort instead.
| // breach the channel reserve. Like the RBF case, this leaves the channel | ||
| // quiescent, so both sides must exit quiescence afterwards. | ||
| let is_quiescent_msg = msg.data.contains("already sent splice_locked, cannot RBF") | ||
| || msg.data.contains("has a confirmed pending splice, cannot RBF") |
There was a problem hiding this comment.
Is this happening in can_initiate_rbf as the sender, validate_tx_init_rbf as the receiver, or both?
This prepares
chanmon_consistencyfor force-close fuzzing by making its chain model closer to the environment LDK sees in normal operation.Force-close scenarios depend heavily on transaction timing: claims may be broadcast, replaced, confirmed, followed by additional claims, and later become spendable only after more blocks. The previous harness mostly folded transaction confirmation into sync-style actions, which made it harder to express those flows accurately and made future force-close coverage depend on shortcuts in the test harness.
The updated model gives the harness an explicit mempool and block-mining path. Broadcast transactions can be relayed into the modeled mempool, mined into harness blocks, and then replayed to both monitors and managers through chain callbacks. The harness also tracks confirmed UTXOs and wallet change so later splice, anchor, and claim transactions have a realistic view of what can be spent.
This should make upcoming force-close fuzzing changes easier to review: first establish a more faithful chain environment, then add the force-close-specific scenarios and invariants on top of it.