Files
bal-electrum-plugin/REPORT_NETWORKING_PARALLELO.md
2026-06-20 09:50:43 -04:00

254 lines
12 KiB
Markdown

# Technical report — Parallel networking (Will-Executor anti-freeze) + UI feedback
**Audience:** external programmer / plugin maintainer
**Author:** AI refactoring work (on GitHub `Bitcoin-after-life/test`)
**Date:** 2026-06-15
**Branch:** `feature/networking-parallelo` (Pull Request #4)
**Private Gitea repo `kaibot/bal-plugin-ai`: NOT modified** (per explicit request; cloned read-only only to run the official tests).
---
## 1. Problem
When the plugin contacts the Will-Executor servers (pushing transactions,
pinging/refreshing the inheritance, downloading the list, and **checking**
transactions), it used to do it **sequentially**. If a server did not answer,
the thread stayed blocked on the connection timeouts and, worse, on the
**retries**:
- `send_request` retried up to **10 times** with `time.sleep(3)` on every
timeout → roughly **~140 seconds per unreachable server**, summed one after
another.
Consequences:
- Noticeable already with a few servers; with **20 servers** it became
unusable.
- The user saw "Stay waiting — Not responding" with no idea what was happening.
- A single dead server blocked the whole operation.
---
## 2. Solution (overview)
1. **Parallelism** with `ThreadPoolExecutor`: servers are contacted
concurrently. Total time ≈ the **slowest** server, not the **sum**.
2. **Fast-fail** for interactive operations (ping/info/download): no retry
storm, a single short timeout and the server is marked "KO".
3. **Aggressive timeouts + global deadline** for push/check: a short per-server
retry budget is kept (a real transaction must survive a transient hiccup),
but a wall-clock **global deadline** caps the whole batch so a dialog never
freezes behind one unresponsive server.
4. **Live feedback + reliable elapsed-time counter**: `on_each(...)` updates the
dialog as results arrive, and `on_tick()` refreshes an elapsed-time counter
(`Xs / DEADLINEs`) so the user always knows progress and the maximum wait.
### Why it is thread-safe
`Network.send_http_on_proxy()` uses `asyncio.run_coroutine_threadsafe(coro,
loop)` and then `coro.result()`: every call schedules its own coroutine on
Electrum's shared asyncio loop and blocks **only its own worker thread**.
Multiple concurrent calls are therefore safe → `ThreadPoolExecutor` gives true
parallelism.
UI updates go through `BalWaitingDialog.update()` / the dialog's `pyqtSignal`,
which marshals to the GUI thread automatically. Callbacks from worker threads
can therefore update the dialog safely.
### Why the counter is driven from the calling thread (important)
An earlier attempt refreshed the elapsed-time counter from a separate raw
`threading.Thread` heartbeat that emitted the `pyqtSignal`. That proved
**unreliable**: a `pyqtSignal` emitted from a raw (non-Qt) Python thread inside
the wizard's `TaskThread` was not reliably marshalled and the dialog never
repainted — the counter was invisible.
The fix: the parallel helpers accept an **`on_tick` callback that is invoked
periodically from the CALLING thread** (the same thread that already drives
`on_each` and successfully repaints). The helpers poll the futures in short
slices (`concurrent.futures.wait(..., timeout=tick_interval)`) and call
`on_tick()` between waits. No heartbeat thread is used anymore.
---
## 3. Modified files (all on GitHub `Bitcoin-after-life/test`, branch `feature/networking-parallelo`)
### 3.1 `bal/core/willexecutors.py`
**Networking constants** (module level, also exposed as `Willexecutors` class
attributes for a single source of truth in the GUI):
```python
DEFAULT_TIMEOUT = 5 # interactive ops (ping/info/list)
PUSH_TIMEOUT = 8 # broadcast (pushtxs)
PUSH_MAX_RETRIES = 2
PUSH_RETRY_SLEEP = 1
PUSH_GLOBAL_DEADLINE = 30 # wall-clock cap for the whole parallel push
CHECK_TIMEOUT = 8 # check (searchtx)
CHECK_MAX_RETRIES = 1
CHECK_RETRY_SLEEP = 1
CHECK_GLOBAL_DEADLINE = 30 # wall-clock cap for the whole parallel check
```
Worst case per server is now ~26s (push) / ~17s (check) instead of ~140s, and
the global deadline guarantees the dialog proceeds within 30s regardless.
**`send_request(...)`** — keyword-only retry controls:
```python
def send_request(method, url, data=None, *, timeout=10, handle_response=None,
count_reply=0, max_retries=10, retry_sleep=3):
```
- Defaults unchanged → callers that need the historical behaviour are
unaffected.
- Interactive callers pass `max_retries=0` → fast-fail.
**`get_info_task(...)`** — fast-fail by default (`max_retries=0`); a
timeout/empty response yields `status="KO"`.
**`check_transaction(...)`** — now accepts `timeout`/`max_retries`/`retry_sleep`
(defaults from the `CHECK_*` constants) and forwards them to `send_request`,
replacing the old ~140s default storm.
**NEW `ping_servers_parallel(willexecutors, *, on_each=None, max_workers=8,
timeout=DEFAULT_TIMEOUT, on_tick=None, tick_interval=1.0)`**
- `ThreadPoolExecutor`; polls futures in slices and calls `on_tick()` from the
calling thread; mutates `willexecutors` in place; invokes
`on_each(url, we, ok)` as results arrive; a worker exception never blocks the
others (defensive try/except).
**NEW `push_transactions_parallel(willexecutors, *, on_each=None, max_workers=8,
deadline=PUSH_GLOBAL_DEADLINE, on_timeout=None, on_tick=None,
tick_interval=1.0)`**
- Parallel push only to entries that have a `"txs"` key; each server keeps its
short retry budget.
- `on_each(url, we, ok, exc)` per server; `on_timeout(url, we)` for servers
still pending when the global deadline elapses; `on_tick()` for the counter.
- Manual pool (no `with`) so `shutdown(wait=False, cancel_futures=True)` does
not block on a hung worker once the deadline is reached.
- Returns `{url: (ok, exc)}` for the servers that answered in time.
**NEW `check_transactions_parallel(items, *, on_each=None, max_workers=8,
deadline=CHECK_GLOBAL_DEADLINE, on_timeout=None, on_tick=None,
tick_interval=1.0)`**
- Same design as the push helper but for the **Check** (searchtx) operation.
- `items` is an iterable of `(wid, url)` pairs; `_check_one` calls
`check_transaction`.
- `on_each(wid, url, result_or_None, exc)`, `on_timeout(wid, url)`, `on_tick()`.
- Returns `{wid: (result_or_None, exc)}`.
### 3.2 `bal/gui/qt/window.py`
- **`ping_willexecutors_task(self, wes)`** rewritten on `ping_servers_parallel`
with live feedback and a counter `Ping Will-Executors: 2/3 (3s / 30s)` driven
by `on_tick` from the calling thread.
- **`push_transactions_to_willexecutors(self, force=False)`** rewritten on
`push_transactions_parallel`; `on_each` does thread-safe book-keeping + UI
update; "already present" servers are verified afterwards (original check
logic intact).
- **`check_transactions_task(self, will)`** rewritten on
`check_transactions_parallel`; shows `Checking transactions: 2/5 (4s / 30s)`,
reusing the original `set_check_willexecutor(...)` per-item logic inside
`on_each` (and `set_check_willexecutor(None)` on `on_timeout`).
- **`fetch_will_executors_list(...)`** fast-fail download
(`timeout=10, max_retries=1, retry_sleep=1`); the download dialog shows
`Downloading will-executors list... (Xs / 45s)`.
### 3.3 `bal/gui/qt/dialogs.py`
- **`BalBuildWillDialog.loop_push`** (the "Building Will" wizard broadcast step)
rewritten on `push_transactions_parallel` with the `on_tick` counter
`Broadcasting 2/3 (5s / 30s)`. The previous raw heartbeat thread was removed.
### 3.4 `bal/gui/qt/plugin.py` — status-bar icon (restored)
`create_status_bar` re-adds the BAL `StatusBarButton` (bottom-right of the
Electrum status bar). It shows that the plugin is installed and opens the plugin
settings on click; it also de-duplicates the button per window. (Comments in
English.)
### 3.5 `bal/gui/qt/lists.py` and `bal/gui/qt/widgets.py` — GUI usability
- **Tooltips** (hover) on the Will toolbar icons, all in English:
Wizard (`Wizard - Build your will`), Delivery time (truck), Check Alive
(siren), Calendar, Check (refresh).
- **Toolbar order** changed to:
`Wizard | Delivery time | Check Alive | Calendar | Check`; layout margins
tightened so everything fits the Will window.
### 3.6 `bal/core/util.py` — BUGFIX (pre-existing regression)
In `get_value_amount` (line 324) `Util.in_output(...)` (returns `bool`) had been
used instead of `Util.din_output(...)` (returns the tuple
`(same_amount, same_address)`), causing:
```
TypeError: cannot unpack non-iterable bool object
```
**Fixed** by restoring `din_output`. Found by running the official Gitea tests
(`tests/test_core_util.py::test_get_value_amount`).
---
## 4. Verification (ruff + official tests)
### 4.1 ruff (lint / PEP8)
- `ruff check` on the new code: **no new issues** introduced. The `F403/F405/
F401` warnings come from the original `from .common import *` pattern;
per-file counts are identical between HEAD and the working tree.
- The new parallel functions add **0 `E501`** (line-length) issues; in
`window.py` the count actually decreased after the rewrite.
- `ruff check tests/parallel_ping_test.py` → no new issues.
### 4.2 Official tests from the Gitea repo `kaibot/bal-plugin-ai/tests`
Run against the refactored code (with all the networking + UI changes):
| Suite | Result |
|-------|--------|
| `test_core_*` + `test_gui_*` (pytest) | **182 passed** |
| `smoke_test.py` | OK |
| `external_zip_test.py` | OK |
| `windows_overflow_test.py` | OK |
| `gui_fixes_test.py` | OK |
| `parallel_ping_test.py` (new) | OK — parallel ping/push/check ~`0.50s` for 8 servers (sequential would be ~`4.00s`); global deadline enforced; `on_tick` fired from the calling thread; static checks that the dialogs use the parallel helpers + the `Xs / Ns` counter |
Commands (as per README):
```bash
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
python3 -m pytest tests/ -q
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
python3 tests/smoke_test.py electrum.plugins.bal
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
python3 tests/external_zip_test.py bal-electrum-plugin.zip
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
python3 tests/parallel_ping_test.py bal
```
---
## 5. Integration notes / risks
- **No change to the server protocol**: only the *how* (parallel) and the
*when* (retries/deadline) of the calls changed, not the payloads.
- **Push transactions**: per-server retries are intentionally kept so a real
transaction is not lost to a transient hiccup; only ping/info/download use
fast-fail. The global deadline marks unanswered servers as failed (`on_timeout`)
so the user can retry later.
- **`max_workers=8`** is conservative; with many servers (e.g. 20) it can be
raised, but 8 workers already collapse the total time to the slowest server.
- **Thread/UI**: all UI updates from workers go through `pyqtSignal`-based
dialog updates; the periodic counter is driven by `on_tick` from the calling
thread. Do **not** reintroduce a raw heartbeat thread emitting signals — it
does not repaint reliably.
- **Compatibility**: signatures are backward compatible (new parameters are
keyword-only with defaults that preserve the old behaviour).
---
## 6. How to test
1. Install `bal-electrum-plugin.zip` (Tools → Plugins → install from file).
Fully close and reopen Electrum to avoid the cached zip import.
2. Configure several Will-Executors, including **at least one unreachable**.
3. Run push / ping / Check: each dialog shows per-server status plus a counter
`N/total (Xs / 30s)` and **no longer freezes** on the dead server — within
the global deadline the operation reports the dead server and proceeds.
The SHA-256 of the zip is printed by `build_zip.py` at the end of the build
(use it to verify integrity).