mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-06 15:10:52 +00:00
refactor(pdf): move document extraction to plugin
* refactor(pdf): move document extraction to plugin * fix(deps): sync document extract lockfile * fix(pdf): harden document extraction plugin
This commit is contained in:
@@ -172,8 +172,9 @@ Current behavior:
|
||||
rasterized into images and passed to the model, and the injected file block uses
|
||||
the placeholder `[PDF content rendered to images]`.
|
||||
|
||||
PDF parsing uses the Node-friendly `pdfjs-dist` legacy build (no worker). The modern
|
||||
PDF.js build expects browser workers/DOM globals, so it is not used in the Gateway.
|
||||
PDF parsing is provided by the bundled `document-extract` plugin, which uses the
|
||||
Node-friendly `pdfjs-dist` legacy build (no worker). The modern PDF.js build
|
||||
expects browser workers/DOM globals, so it is not used in the Gateway.
|
||||
|
||||
URL fetch defaults:
|
||||
|
||||
|
||||
@@ -112,7 +112,9 @@ Fallback details:
|
||||
- If text extraction succeeds but image extraction would require vision on a
|
||||
text-only model, OpenClaw drops the rendered images and continues with the
|
||||
extracted text.
|
||||
- Extraction fallback requires `pdfjs-dist` (and `@napi-rs/canvas` for image rendering).
|
||||
- Extraction fallback uses the bundled `document-extract` plugin. The plugin owns
|
||||
`pdfjs-dist`; `@napi-rs/canvas` is used only when image rendering fallback is
|
||||
available.
|
||||
|
||||
## Config
|
||||
|
||||
|
||||
Reference in New Issue
Block a user