forked from bitcoinafterlife/bal-electrum-plugin
254 lines
12 KiB
Markdown
254 lines
12 KiB
Markdown
# Technical report — Parallel networking (Will-Executor anti-freeze) + UI feedback
|
|
|
|
**Audience:** external programmer / plugin maintainer
|
|
**Author:** AI refactoring work (on GitHub `Bitcoin-after-life/test`)
|
|
**Date:** 2026-06-15
|
|
**Branch:** `feature/networking-parallelo` (Pull Request #4)
|
|
**Private Gitea repo `kaibot/bal-plugin-ai`: NOT modified** (per explicit request; cloned read-only only to run the official tests).
|
|
|
|
---
|
|
|
|
## 1. Problem
|
|
|
|
When the plugin contacts the Will-Executor servers (pushing transactions,
|
|
pinging/refreshing the inheritance, downloading the list, and **checking**
|
|
transactions), it used to do it **sequentially**. If a server did not answer,
|
|
the thread stayed blocked on the connection timeouts and, worse, on the
|
|
**retries**:
|
|
|
|
- `send_request` retried up to **10 times** with `time.sleep(3)` on every
|
|
timeout → roughly **~140 seconds per unreachable server**, summed one after
|
|
another.
|
|
|
|
Consequences:
|
|
- Noticeable already with a few servers; with **20 servers** it became
|
|
unusable.
|
|
- The user saw "Stay waiting — Not responding" with no idea what was happening.
|
|
- A single dead server blocked the whole operation.
|
|
|
|
---
|
|
|
|
## 2. Solution (overview)
|
|
|
|
1. **Parallelism** with `ThreadPoolExecutor`: servers are contacted
|
|
concurrently. Total time ≈ the **slowest** server, not the **sum**.
|
|
2. **Fast-fail** for interactive operations (ping/info/download): no retry
|
|
storm, a single short timeout and the server is marked "KO".
|
|
3. **Aggressive timeouts + global deadline** for push/check: a short per-server
|
|
retry budget is kept (a real transaction must survive a transient hiccup),
|
|
but a wall-clock **global deadline** caps the whole batch so a dialog never
|
|
freezes behind one unresponsive server.
|
|
4. **Live feedback + reliable elapsed-time counter**: `on_each(...)` updates the
|
|
dialog as results arrive, and `on_tick()` refreshes an elapsed-time counter
|
|
(`Xs / DEADLINEs`) so the user always knows progress and the maximum wait.
|
|
|
|
### Why it is thread-safe
|
|
`Network.send_http_on_proxy()` uses `asyncio.run_coroutine_threadsafe(coro,
|
|
loop)` and then `coro.result()`: every call schedules its own coroutine on
|
|
Electrum's shared asyncio loop and blocks **only its own worker thread**.
|
|
Multiple concurrent calls are therefore safe → `ThreadPoolExecutor` gives true
|
|
parallelism.
|
|
|
|
UI updates go through `BalWaitingDialog.update()` / the dialog's `pyqtSignal`,
|
|
which marshals to the GUI thread automatically. Callbacks from worker threads
|
|
can therefore update the dialog safely.
|
|
|
|
### Why the counter is driven from the calling thread (important)
|
|
An earlier attempt refreshed the elapsed-time counter from a separate raw
|
|
`threading.Thread` heartbeat that emitted the `pyqtSignal`. That proved
|
|
**unreliable**: a `pyqtSignal` emitted from a raw (non-Qt) Python thread inside
|
|
the wizard's `TaskThread` was not reliably marshalled and the dialog never
|
|
repainted — the counter was invisible.
|
|
|
|
The fix: the parallel helpers accept an **`on_tick` callback that is invoked
|
|
periodically from the CALLING thread** (the same thread that already drives
|
|
`on_each` and successfully repaints). The helpers poll the futures in short
|
|
slices (`concurrent.futures.wait(..., timeout=tick_interval)`) and call
|
|
`on_tick()` between waits. No heartbeat thread is used anymore.
|
|
|
|
---
|
|
|
|
## 3. Modified files (all on GitHub `Bitcoin-after-life/test`, branch `feature/networking-parallelo`)
|
|
|
|
### 3.1 `bal/core/willexecutors.py`
|
|
|
|
**Networking constants** (module level, also exposed as `Willexecutors` class
|
|
attributes for a single source of truth in the GUI):
|
|
```python
|
|
DEFAULT_TIMEOUT = 5 # interactive ops (ping/info/list)
|
|
|
|
PUSH_TIMEOUT = 8 # broadcast (pushtxs)
|
|
PUSH_MAX_RETRIES = 2
|
|
PUSH_RETRY_SLEEP = 1
|
|
PUSH_GLOBAL_DEADLINE = 30 # wall-clock cap for the whole parallel push
|
|
|
|
CHECK_TIMEOUT = 8 # check (searchtx)
|
|
CHECK_MAX_RETRIES = 1
|
|
CHECK_RETRY_SLEEP = 1
|
|
CHECK_GLOBAL_DEADLINE = 30 # wall-clock cap for the whole parallel check
|
|
```
|
|
Worst case per server is now ~26s (push) / ~17s (check) instead of ~140s, and
|
|
the global deadline guarantees the dialog proceeds within 30s regardless.
|
|
|
|
**`send_request(...)`** — keyword-only retry controls:
|
|
```python
|
|
def send_request(method, url, data=None, *, timeout=10, handle_response=None,
|
|
count_reply=0, max_retries=10, retry_sleep=3):
|
|
```
|
|
- Defaults unchanged → callers that need the historical behaviour are
|
|
unaffected.
|
|
- Interactive callers pass `max_retries=0` → fast-fail.
|
|
|
|
**`get_info_task(...)`** — fast-fail by default (`max_retries=0`); a
|
|
timeout/empty response yields `status="KO"`.
|
|
|
|
**`check_transaction(...)`** — now accepts `timeout`/`max_retries`/`retry_sleep`
|
|
(defaults from the `CHECK_*` constants) and forwards them to `send_request`,
|
|
replacing the old ~140s default storm.
|
|
|
|
**NEW `ping_servers_parallel(willexecutors, *, on_each=None, max_workers=8,
|
|
timeout=DEFAULT_TIMEOUT, on_tick=None, tick_interval=1.0)`**
|
|
- `ThreadPoolExecutor`; polls futures in slices and calls `on_tick()` from the
|
|
calling thread; mutates `willexecutors` in place; invokes
|
|
`on_each(url, we, ok)` as results arrive; a worker exception never blocks the
|
|
others (defensive try/except).
|
|
|
|
**NEW `push_transactions_parallel(willexecutors, *, on_each=None, max_workers=8,
|
|
deadline=PUSH_GLOBAL_DEADLINE, on_timeout=None, on_tick=None,
|
|
tick_interval=1.0)`**
|
|
- Parallel push only to entries that have a `"txs"` key; each server keeps its
|
|
short retry budget.
|
|
- `on_each(url, we, ok, exc)` per server; `on_timeout(url, we)` for servers
|
|
still pending when the global deadline elapses; `on_tick()` for the counter.
|
|
- Manual pool (no `with`) so `shutdown(wait=False, cancel_futures=True)` does
|
|
not block on a hung worker once the deadline is reached.
|
|
- Returns `{url: (ok, exc)}` for the servers that answered in time.
|
|
|
|
**NEW `check_transactions_parallel(items, *, on_each=None, max_workers=8,
|
|
deadline=CHECK_GLOBAL_DEADLINE, on_timeout=None, on_tick=None,
|
|
tick_interval=1.0)`**
|
|
- Same design as the push helper but for the **Check** (searchtx) operation.
|
|
- `items` is an iterable of `(wid, url)` pairs; `_check_one` calls
|
|
`check_transaction`.
|
|
- `on_each(wid, url, result_or_None, exc)`, `on_timeout(wid, url)`, `on_tick()`.
|
|
- Returns `{wid: (result_or_None, exc)}`.
|
|
|
|
### 3.2 `bal/gui/qt/window.py`
|
|
|
|
- **`ping_willexecutors_task(self, wes)`** rewritten on `ping_servers_parallel`
|
|
with live feedback and a counter `Ping Will-Executors: 2/3 (3s / 30s)` driven
|
|
by `on_tick` from the calling thread.
|
|
- **`push_transactions_to_willexecutors(self, force=False)`** rewritten on
|
|
`push_transactions_parallel`; `on_each` does thread-safe book-keeping + UI
|
|
update; "already present" servers are verified afterwards (original check
|
|
logic intact).
|
|
- **`check_transactions_task(self, will)`** rewritten on
|
|
`check_transactions_parallel`; shows `Checking transactions: 2/5 (4s / 30s)`,
|
|
reusing the original `set_check_willexecutor(...)` per-item logic inside
|
|
`on_each` (and `set_check_willexecutor(None)` on `on_timeout`).
|
|
- **`fetch_will_executors_list(...)`** fast-fail download
|
|
(`timeout=10, max_retries=1, retry_sleep=1`); the download dialog shows
|
|
`Downloading will-executors list... (Xs / 45s)`.
|
|
|
|
### 3.3 `bal/gui/qt/dialogs.py`
|
|
|
|
- **`BalBuildWillDialog.loop_push`** (the "Building Will" wizard broadcast step)
|
|
rewritten on `push_transactions_parallel` with the `on_tick` counter
|
|
`Broadcasting 2/3 (5s / 30s)`. The previous raw heartbeat thread was removed.
|
|
|
|
### 3.4 `bal/gui/qt/plugin.py` — status-bar icon (restored)
|
|
|
|
`create_status_bar` re-adds the BAL `StatusBarButton` (bottom-right of the
|
|
Electrum status bar). It shows that the plugin is installed and opens the plugin
|
|
settings on click; it also de-duplicates the button per window. (Comments in
|
|
English.)
|
|
|
|
### 3.5 `bal/gui/qt/lists.py` and `bal/gui/qt/widgets.py` — GUI usability
|
|
|
|
- **Tooltips** (hover) on the Will toolbar icons, all in English:
|
|
Wizard (`Wizard - Build your will`), Delivery time (truck), Check Alive
|
|
(siren), Calendar, Check (refresh).
|
|
- **Toolbar order** changed to:
|
|
`Wizard | Delivery time | Check Alive | Calendar | Check`; layout margins
|
|
tightened so everything fits the Will window.
|
|
|
|
### 3.6 `bal/core/util.py` — BUGFIX (pre-existing regression)
|
|
|
|
In `get_value_amount` (line 324) `Util.in_output(...)` (returns `bool`) had been
|
|
used instead of `Util.din_output(...)` (returns the tuple
|
|
`(same_amount, same_address)`), causing:
|
|
```
|
|
TypeError: cannot unpack non-iterable bool object
|
|
```
|
|
**Fixed** by restoring `din_output`. Found by running the official Gitea tests
|
|
(`tests/test_core_util.py::test_get_value_amount`).
|
|
|
|
---
|
|
|
|
## 4. Verification (ruff + official tests)
|
|
|
|
### 4.1 ruff (lint / PEP8)
|
|
- `ruff check` on the new code: **no new issues** introduced. The `F403/F405/
|
|
F401` warnings come from the original `from .common import *` pattern;
|
|
per-file counts are identical between HEAD and the working tree.
|
|
- The new parallel functions add **0 `E501`** (line-length) issues; in
|
|
`window.py` the count actually decreased after the rewrite.
|
|
- `ruff check tests/parallel_ping_test.py` → no new issues.
|
|
|
|
### 4.2 Official tests from the Gitea repo `kaibot/bal-plugin-ai/tests`
|
|
Run against the refactored code (with all the networking + UI changes):
|
|
|
|
| Suite | Result |
|
|
|-------|--------|
|
|
| `test_core_*` + `test_gui_*` (pytest) | **182 passed** |
|
|
| `smoke_test.py` | OK |
|
|
| `external_zip_test.py` | OK |
|
|
| `windows_overflow_test.py` | OK |
|
|
| `gui_fixes_test.py` | OK |
|
|
| `parallel_ping_test.py` (new) | OK — parallel ping/push/check ~`0.50s` for 8 servers (sequential would be ~`4.00s`); global deadline enforced; `on_tick` fired from the calling thread; static checks that the dialogs use the parallel helpers + the `Xs / Ns` counter |
|
|
|
|
Commands (as per README):
|
|
```bash
|
|
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
|
|
python3 -m pytest tests/ -q
|
|
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
|
|
python3 tests/smoke_test.py electrum.plugins.bal
|
|
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
|
|
python3 tests/external_zip_test.py bal-electrum-plugin.zip
|
|
QT_QPA_PLATFORM=offscreen PYTHONPATH=<electrum-src> \
|
|
python3 tests/parallel_ping_test.py bal
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Integration notes / risks
|
|
|
|
- **No change to the server protocol**: only the *how* (parallel) and the
|
|
*when* (retries/deadline) of the calls changed, not the payloads.
|
|
- **Push transactions**: per-server retries are intentionally kept so a real
|
|
transaction is not lost to a transient hiccup; only ping/info/download use
|
|
fast-fail. The global deadline marks unanswered servers as failed (`on_timeout`)
|
|
so the user can retry later.
|
|
- **`max_workers=8`** is conservative; with many servers (e.g. 20) it can be
|
|
raised, but 8 workers already collapse the total time to the slowest server.
|
|
- **Thread/UI**: all UI updates from workers go through `pyqtSignal`-based
|
|
dialog updates; the periodic counter is driven by `on_tick` from the calling
|
|
thread. Do **not** reintroduce a raw heartbeat thread emitting signals — it
|
|
does not repaint reliably.
|
|
- **Compatibility**: signatures are backward compatible (new parameters are
|
|
keyword-only with defaults that preserve the old behaviour).
|
|
|
|
---
|
|
|
|
## 6. How to test
|
|
|
|
1. Install `bal-electrum-plugin.zip` (Tools → Plugins → install from file).
|
|
Fully close and reopen Electrum to avoid the cached zip import.
|
|
2. Configure several Will-Executors, including **at least one unreachable**.
|
|
3. Run push / ping / Check: each dialog shows per-server status plus a counter
|
|
`N/total (Xs / 30s)` and **no longer freezes** on the dead server — within
|
|
the global deadline the operation reports the dead server and proceeds.
|
|
|
|
The SHA-256 of the zip is printed by `build_zip.py` at the end of the build
|
|
(use it to verify integrity).
|