* fix(document-extract): render PDF image fallback per page so multi-page scans don't starve later pages
clawpdf's mode:"images" extract applies a single maxPixels budget across
every page, so the first page consumes it and later pages collapse to ~1x1
PNGs that vision OCR models reject. Render each selected page in its own
extract() call so the pixel budget resets per page and every page yields a
usable image.
* fix(document-extract): preserve aggregate PDF render budget
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* fix(pdf): resolve standard fonts from pdfjs package root
Resolve PDF.js standard fonts via pdfjs-dist/package.json instead of a
relative ../../node_modules path so the fallback renderer does not depend
on emitted dist chunk layout.
Add focused regression coverage that asserts the forwarded
standardFontDataUrl matches the installed pdfjs-dist package root and
exists on disk.
* fix(pdf): resolve pdfjs standard fonts from package root
* fix(pdf): use PDF.js font URL separator
---------
Co-authored-by: Dr JCai <jingxiao.cai@gmail.com>
Co-authored-by: vincentkoc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>