Each archive scraper now has its own class with hardcoded URL and parsing
logic; config only carries auto_queue, timeout, and rate_limit_seconds.
- html_scraper: refactor to base class with public shared utilities
(YEAR_RE, AUTHOR_PREFIX_PAT, cls_inner_texts, img_alts)
- rusneb.py (new): RusnebPlugin extracts year per list item rather than
globally, eliminating wrong page-level dates
- alib.py (new): AlibPlugin extracts year from within each <p><b> entry
rather than globally, fixing nonsensical year values
- shpl.py (new): ShplPlugin retains the dead ШПИЛ endpoint with hardcoded
params; config type updated from html_scraper to shpl
- config: remove config: subsections from rusneb, alib_web, shpl entries;
update type fields to rusneb, alib_web, shpl respectively
- plugins/__init__.py: register new specific types, remove html_scraper
- tests: use specific plugin classes; assert all CandidateRecord fields
(source, title, author, year, isbn, publisher) with appropriate constraints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- html_scraper: add img_alt strategy (НЭБ titles from <img alt>), bold_text
strategy (Alib entries from <p><b>), Windows-1251 encoding support,
_cls_inner_texts() helper that strips inner HTML tags
- rsl: rewrite to POST SearchFilterForm[search] with CSRF token and CQL
title:(words) AND author:(word) query format
- config: update rusneb (img_alt + correct author_class) and alib_web
(encoding + bold_text) to match fixed plugin strategies
- tests: add tests/test_archives.py with network-marked tests for all six
archive plugins; НЛР and ШПИЛ marked xfail (endpoints return HTTP 404)
- presubmit: exclude network tests from default run (-m "not network")
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When deleting a shelf or book, remove the corresponding boundary from
the parent's boundary list so len(boundaries) == len(children) - 1
is maintained. Add API-level tests covering first, middle, and last
child deletion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Photo-based book cataloger with AI identification.
Room → Cabinet → Shelf → Book hierarchy; FastAPI + SQLite backend;
vanilla JS SPA; OpenAI-compatible plugin system for boundary
detection, text recognition, and archive search.