Files
openclaw/extensions/feishu/src/docx-batch-insert.ts
Elarwei 0740fb83d7 feat(feishu): add markdown tables, positional insert, color_text, and table ops (#29411)
* feat(feishu): add markdown tables, insert, color_text, table ops, and image fixes

Extends feishu_doc on top of #20304 with capabilities that are not yet covered:

Markdown → native table rendering:
- write/append now use the Descendant API instead of Children API,
  enabling GFM markdown tables (block_type 31/32) to render as native
  Feishu tables automatically
- Adaptive column widths calculated from cell content (CJK chars 2x weight)
- Batch insertion for large documents (>1000 blocks, docx-batch-insert.ts)

New actions:
- insert: positional markdown insertion after a given block_id
- color_text: apply color/bold to a text block via [red]...[/red] markup
- insert_table_row / insert_table_column: add rows or columns to a table
- delete_table_rows / delete_table_columns: remove rows or columns
- merge_table_cells: merge a rectangular cell range

Image upload fixes (affects write, append, and upload_image):
- upload_image now accepts data URI and plain base64 in addition to
  url/file_path, covering DALL-E b64_json, canvas screenshots, etc.
- Fix: pass Buffer directly to drive.media.uploadAll instead of
  Readable.from(), which caused Content-Length mismatch for large images
- Fix: same Readable bug fixed in upload_file
- Fix: pass drive_route_token via extra field for correct multi-datacenter
  routing (per API docs: required when parent_node is a document block ID)

* fix(feishu): add documentBlockDescendant mock to docx.test.ts

write/append now use the Descendant API (documentBlockDescendant.create)
instead of Children API. The existing test mock was missing this SDK
method, causing processImages to never be reached and fetchRemoteMedia
to go uncalled.

Added blockDescendantCreateMock returning an image block so the
'skips image upload when markdown image URL is blocked' test flows
through processImages as expected.

* fix(feishu): address bot review feedback

- resolveUploadInput: remove length < 1024 guard on file path detection.
  Prefix patterns (isAbsolute / ~ / ./ / ../) already correctly distinguish
  file paths from base64 strings at any length. The old guard caused file
  paths ≥1024 chars to fall through to the base64 branch incorrectly.

- parseColorMarkup: add comment clarifying that mismatched closing tags
  (e.g. [red]text[/green]) are intentional — opening tag style is applied,
  closing tag is consumed regardless of name.

* fix(feishu): address second-round codex bot review feedback

P1 - Reject single oversized subtrees in batch insert (docx-batch-insert.ts):
  A first-level block whose descendant count exceeds BATCH_SIZE (1000) cannot
  be split atomically (e.g. a very large table). Previously such a block was
  silently added to the current batch and sent as an oversized request,
  violating the API limit. Now throws a descriptive error so callers know to
  reduce the content size.

P2 - Preserve unmatched brackets in color markup parser (docx-color-text.ts):
  Text like 'Revenue [Q1] up' contains a bracket pair with no matching '[/...]'
  closer. The original regex dropped the '[' character in this case, silently
  corrupting the text. Fixed by appending '|\[' to the plain-text alternative
  so any '[' that does not open a complete tag is captured as literal text.

* fix(feishu): address third-round codex bot review feedback

P2 - Throw ENOENT for non-existing absolute image paths (docx.ts):
  Previously a non-existing absolute path like /tmp/missing.png fell
  through to Buffer.from(..., 'base64') and uploaded garbage bytes.
  Now throws a descriptive ENOENT error and hints at data URI format
  for callers intending to pass JPEG binary data (which starts with /9j/).

P2 - Fail clearly when insert anchor block is not found (docx.ts):
  insertDoc previously set insertIndex to -1 (append) when after_block_id
  was absent from the parent's child list, silently inserting at the wrong
  position. Two fixes:
  1. Paginate through all children (documentBlockChildren.get returns up to
     200 per page) before searching for the anchor.
  2. Throw a descriptive error if after_block_id is still not found after
     full pagination, instead of silently falling back to append.

* fix(feishu): address fourth-round codex bot review feedback

- Enforce mutual exclusivity across all three upload sources (url, file_path,
  image): throw immediately when more than one is provided, instead of silently
  preferring the image branch and ignoring the others.
- Validate plain base64 payloads before decoding: reject strings that contain
  characters outside the standard base64 alphabet ([A-Za-z0-9+/=]) so that
  malformed inputs fail fast with a clear error rather than decoding to garbage
  bytes and producing an opaque Feishu API failure downstream.
  Also throw if the decoded buffer is empty.

* fix(feishu): address fifth-round codex bot review feedback

- parseColorMarkup: restrict opening tag regex to known colour/style names
  (bg:*, bold, red, orange, yellow, green, blue, purple, grey/gray) so that
  ordinary bracket tokens like [Q1] can no longer consume a subsequent real
  closing tag ([/red]) and corrupt the surrounding styled spans.  Unknown tags
  now fall through to the plain-text alternatives and are emitted literally.
- resolveUploadInput: estimate decoded byte count from base64 input length
  (ceil(len * 3 / 4)) BEFORE allocating the full Buffer, preventing oversized
  payloads from spiking memory before the maxBytes limit is enforced.  Applies
  to both the data-URI branch and the plain-base64 branch.

* fix(feishu): address sixth-round codex bot review feedback

- docx-table-ops: apply MIN/MAX_COLUMN_WIDTH clamping in the empty-table
  branch so tables with 15+ columns don't produce sub-50 widths that Feishu
  rejects as invalid column_width values.
- docx.ts (data URI branch): validate the ';base64' marker before decoding
  so plain/URL-encoded data URIs are rejected with a clear error; also validate
  the payload against the base64 alphabet (same guard already applied in the
  plain-base64 branch) so malformed inputs fail fast rather than producing
  opaque downstream Feishu errors.

* Feishu: align docx descendant insertion tests and changelog

---------

Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-28 09:58:56 -06:00

191 lines
5.9 KiB
TypeScript

/**
* Batch insertion for large Feishu documents (>1000 blocks).
*
* The Feishu Descendant API has a limit of 1000 blocks per request.
* This module handles splitting large documents into batches while
* preserving parent-child relationships between blocks.
*/
import type * as Lark from "@larksuiteoapi/node-sdk";
import { cleanBlocksForDescendant } from "./docx-table-ops.js";
export const BATCH_SIZE = 1000; // Feishu API limit per request
type Logger = { info?: (msg: string) => void };
/**
* Collect all descendant blocks for a given set of first-level block IDs.
* Recursively traverses the block tree to gather all children.
*/
// eslint-disable-next-line @typescript-eslint/no-explicit-any -- SDK block types
function collectDescendants(blocks: any[], firstLevelIds: string[]): any[] {
const blockMap = new Map<string, any>();
for (const block of blocks) {
blockMap.set(block.block_id, block);
}
const result: any[] = [];
const visited = new Set<string>();
function collect(blockId: string) {
if (visited.has(blockId)) return;
visited.add(blockId);
const block = blockMap.get(blockId);
if (!block) return;
result.push(block);
// Recursively collect children
const children = block.children;
if (Array.isArray(children)) {
for (const childId of children) {
collect(childId);
}
} else if (typeof children === "string") {
collect(children);
}
}
for (const id of firstLevelIds) {
collect(id);
}
return result;
}
/**
* Insert a single batch of blocks using Descendant API.
*
* @param parentBlockId - Parent block to insert into (defaults to docToken)
* @param index - Position within parent's children (-1 = end)
*/
// eslint-disable-next-line @typescript-eslint/no-explicit-any -- SDK block types
async function insertBatch(
client: Lark.Client,
docToken: string,
blocks: any[],
firstLevelBlockIds: string[],
parentBlockId: string = docToken,
index: number = -1,
): Promise<any[]> {
const descendants = cleanBlocksForDescendant(blocks);
if (descendants.length === 0) {
return [];
}
const res = await client.docx.documentBlockDescendant.create({
path: { document_id: docToken, block_id: parentBlockId },
data: {
children_id: firstLevelBlockIds,
descendants,
index,
},
});
if (res.code !== 0) {
throw new Error(`${res.msg} (code: ${res.code})`);
}
return res.data?.children ?? [];
}
/**
* Insert blocks in batches for large documents (>1000 blocks).
*
* Batches are split to ensure BOTH children_id AND descendants
* arrays stay under the 1000 block API limit.
*
* @param client - Feishu API client
* @param docToken - Document ID
* @param blocks - All blocks from Convert API
* @param firstLevelBlockIds - IDs of top-level blocks to insert
* @param logger - Optional logger for progress updates
* @param parentBlockId - Parent block to insert into (defaults to docToken = document root)
* @param startIndex - Starting position within parent (-1 = end). For multi-batch inserts,
* each batch advances this by the number of first-level IDs inserted so far.
* @returns Inserted children blocks and any skipped block IDs
*/
// eslint-disable-next-line @typescript-eslint/no-explicit-any -- SDK block types
export async function insertBlocksInBatches(
client: Lark.Client,
docToken: string,
blocks: any[],
firstLevelBlockIds: string[],
logger?: Logger,
parentBlockId: string = docToken,
startIndex: number = -1,
): Promise<{ children: any[]; skipped: string[] }> {
const allChildren: any[] = [];
// Build batches ensuring each batch has ≤1000 total descendants
const batches: { firstLevelIds: string[]; blocks: any[] }[] = [];
let currentBatch: { firstLevelIds: string[]; blocks: any[] } = { firstLevelIds: [], blocks: [] };
const usedBlockIds = new Set<string>();
for (const firstLevelId of firstLevelBlockIds) {
const descendants = collectDescendants(blocks, [firstLevelId]);
const newBlocks = descendants.filter((b) => !usedBlockIds.has(b.block_id));
// A single block whose subtree exceeds the API limit cannot be split
// (a table or other compound block must be inserted atomically).
if (newBlocks.length > BATCH_SIZE) {
throw new Error(
`Block "${firstLevelId}" has ${newBlocks.length} descendants, which exceeds the ` +
`Feishu API limit of ${BATCH_SIZE} blocks per request. ` +
`Please split the content into smaller sections.`,
);
}
// If adding this first-level block would exceed limit, start new batch
if (
currentBatch.blocks.length + newBlocks.length > BATCH_SIZE &&
currentBatch.blocks.length > 0
) {
batches.push(currentBatch);
currentBatch = { firstLevelIds: [], blocks: [] };
}
// Add to current batch
currentBatch.firstLevelIds.push(firstLevelId);
for (const block of newBlocks) {
currentBatch.blocks.push(block);
usedBlockIds.add(block.block_id);
}
}
// Don't forget the last batch
if (currentBatch.blocks.length > 0) {
batches.push(currentBatch);
}
// Insert each batch, advancing index for position-aware inserts.
// When startIndex == -1 (append to end), each batch appends after the previous.
// When startIndex >= 0, each batch starts at startIndex + count of first-level IDs already inserted.
let currentIndex = startIndex;
for (let i = 0; i < batches.length; i++) {
const batch = batches[i];
logger?.info?.(
`feishu_doc: Inserting batch ${i + 1}/${batches.length} (${batch.blocks.length} blocks)...`,
);
const children = await insertBatch(
client,
docToken,
batch.blocks,
batch.firstLevelIds,
parentBlockId,
currentIndex,
);
allChildren.push(...children);
// Advance index only for explicit positions; -1 always means "after last inserted"
if (currentIndex !== -1) {
currentIndex += batch.firstLevelIds.length;
}
}
return { children: allChildren, skipped: [] };
}