mirror of
https://github.com/openclaw/openclaw.git
synced 2026-03-15 12:00:43 +00:00
* feat(feishu): add markdown tables, insert, color_text, table ops, and image fixes Extends feishu_doc on top of #20304 with capabilities that are not yet covered: Markdown → native table rendering: - write/append now use the Descendant API instead of Children API, enabling GFM markdown tables (block_type 31/32) to render as native Feishu tables automatically - Adaptive column widths calculated from cell content (CJK chars 2x weight) - Batch insertion for large documents (>1000 blocks, docx-batch-insert.ts) New actions: - insert: positional markdown insertion after a given block_id - color_text: apply color/bold to a text block via [red]...[/red] markup - insert_table_row / insert_table_column: add rows or columns to a table - delete_table_rows / delete_table_columns: remove rows or columns - merge_table_cells: merge a rectangular cell range Image upload fixes (affects write, append, and upload_image): - upload_image now accepts data URI and plain base64 in addition to url/file_path, covering DALL-E b64_json, canvas screenshots, etc. - Fix: pass Buffer directly to drive.media.uploadAll instead of Readable.from(), which caused Content-Length mismatch for large images - Fix: same Readable bug fixed in upload_file - Fix: pass drive_route_token via extra field for correct multi-datacenter routing (per API docs: required when parent_node is a document block ID) * fix(feishu): add documentBlockDescendant mock to docx.test.ts write/append now use the Descendant API (documentBlockDescendant.create) instead of Children API. The existing test mock was missing this SDK method, causing processImages to never be reached and fetchRemoteMedia to go uncalled. Added blockDescendantCreateMock returning an image block so the 'skips image upload when markdown image URL is blocked' test flows through processImages as expected. * fix(feishu): address bot review feedback - resolveUploadInput: remove length < 1024 guard on file path detection. Prefix patterns (isAbsolute / ~ / ./ / ../) already correctly distinguish file paths from base64 strings at any length. The old guard caused file paths ≥1024 chars to fall through to the base64 branch incorrectly. - parseColorMarkup: add comment clarifying that mismatched closing tags (e.g. [red]text[/green]) are intentional — opening tag style is applied, closing tag is consumed regardless of name. * fix(feishu): address second-round codex bot review feedback P1 - Reject single oversized subtrees in batch insert (docx-batch-insert.ts): A first-level block whose descendant count exceeds BATCH_SIZE (1000) cannot be split atomically (e.g. a very large table). Previously such a block was silently added to the current batch and sent as an oversized request, violating the API limit. Now throws a descriptive error so callers know to reduce the content size. P2 - Preserve unmatched brackets in color markup parser (docx-color-text.ts): Text like 'Revenue [Q1] up' contains a bracket pair with no matching '[/...]' closer. The original regex dropped the '[' character in this case, silently corrupting the text. Fixed by appending '|\[' to the plain-text alternative so any '[' that does not open a complete tag is captured as literal text. * fix(feishu): address third-round codex bot review feedback P2 - Throw ENOENT for non-existing absolute image paths (docx.ts): Previously a non-existing absolute path like /tmp/missing.png fell through to Buffer.from(..., 'base64') and uploaded garbage bytes. Now throws a descriptive ENOENT error and hints at data URI format for callers intending to pass JPEG binary data (which starts with /9j/). P2 - Fail clearly when insert anchor block is not found (docx.ts): insertDoc previously set insertIndex to -1 (append) when after_block_id was absent from the parent's child list, silently inserting at the wrong position. Two fixes: 1. Paginate through all children (documentBlockChildren.get returns up to 200 per page) before searching for the anchor. 2. Throw a descriptive error if after_block_id is still not found after full pagination, instead of silently falling back to append. * fix(feishu): address fourth-round codex bot review feedback - Enforce mutual exclusivity across all three upload sources (url, file_path, image): throw immediately when more than one is provided, instead of silently preferring the image branch and ignoring the others. - Validate plain base64 payloads before decoding: reject strings that contain characters outside the standard base64 alphabet ([A-Za-z0-9+/=]) so that malformed inputs fail fast with a clear error rather than decoding to garbage bytes and producing an opaque Feishu API failure downstream. Also throw if the decoded buffer is empty. * fix(feishu): address fifth-round codex bot review feedback - parseColorMarkup: restrict opening tag regex to known colour/style names (bg:*, bold, red, orange, yellow, green, blue, purple, grey/gray) so that ordinary bracket tokens like [Q1] can no longer consume a subsequent real closing tag ([/red]) and corrupt the surrounding styled spans. Unknown tags now fall through to the plain-text alternatives and are emitted literally. - resolveUploadInput: estimate decoded byte count from base64 input length (ceil(len * 3 / 4)) BEFORE allocating the full Buffer, preventing oversized payloads from spiking memory before the maxBytes limit is enforced. Applies to both the data-URI branch and the plain-base64 branch. * fix(feishu): address sixth-round codex bot review feedback - docx-table-ops: apply MIN/MAX_COLUMN_WIDTH clamping in the empty-table branch so tables with 15+ columns don't produce sub-50 widths that Feishu rejects as invalid column_width values. - docx.ts (data URI branch): validate the ';base64' marker before decoding so plain/URL-encoded data URIs are rejected with a clear error; also validate the payload against the base64 alphabet (same guard already applied in the plain-base64 branch) so malformed inputs fail fast rather than producing opaque downstream Feishu errors. * Feishu: align docx descendant insertion tests and changelog --------- Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
191 lines
5.9 KiB
TypeScript
191 lines
5.9 KiB
TypeScript
/**
|
|
* Batch insertion for large Feishu documents (>1000 blocks).
|
|
*
|
|
* The Feishu Descendant API has a limit of 1000 blocks per request.
|
|
* This module handles splitting large documents into batches while
|
|
* preserving parent-child relationships between blocks.
|
|
*/
|
|
|
|
import type * as Lark from "@larksuiteoapi/node-sdk";
|
|
import { cleanBlocksForDescendant } from "./docx-table-ops.js";
|
|
|
|
export const BATCH_SIZE = 1000; // Feishu API limit per request
|
|
|
|
type Logger = { info?: (msg: string) => void };
|
|
|
|
/**
|
|
* Collect all descendant blocks for a given set of first-level block IDs.
|
|
* Recursively traverses the block tree to gather all children.
|
|
*/
|
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any -- SDK block types
|
|
function collectDescendants(blocks: any[], firstLevelIds: string[]): any[] {
|
|
const blockMap = new Map<string, any>();
|
|
for (const block of blocks) {
|
|
blockMap.set(block.block_id, block);
|
|
}
|
|
|
|
const result: any[] = [];
|
|
const visited = new Set<string>();
|
|
|
|
function collect(blockId: string) {
|
|
if (visited.has(blockId)) return;
|
|
visited.add(blockId);
|
|
|
|
const block = blockMap.get(blockId);
|
|
if (!block) return;
|
|
|
|
result.push(block);
|
|
|
|
// Recursively collect children
|
|
const children = block.children;
|
|
if (Array.isArray(children)) {
|
|
for (const childId of children) {
|
|
collect(childId);
|
|
}
|
|
} else if (typeof children === "string") {
|
|
collect(children);
|
|
}
|
|
}
|
|
|
|
for (const id of firstLevelIds) {
|
|
collect(id);
|
|
}
|
|
|
|
return result;
|
|
}
|
|
|
|
/**
|
|
* Insert a single batch of blocks using Descendant API.
|
|
*
|
|
* @param parentBlockId - Parent block to insert into (defaults to docToken)
|
|
* @param index - Position within parent's children (-1 = end)
|
|
*/
|
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any -- SDK block types
|
|
async function insertBatch(
|
|
client: Lark.Client,
|
|
docToken: string,
|
|
blocks: any[],
|
|
firstLevelBlockIds: string[],
|
|
parentBlockId: string = docToken,
|
|
index: number = -1,
|
|
): Promise<any[]> {
|
|
const descendants = cleanBlocksForDescendant(blocks);
|
|
|
|
if (descendants.length === 0) {
|
|
return [];
|
|
}
|
|
|
|
const res = await client.docx.documentBlockDescendant.create({
|
|
path: { document_id: docToken, block_id: parentBlockId },
|
|
data: {
|
|
children_id: firstLevelBlockIds,
|
|
descendants,
|
|
index,
|
|
},
|
|
});
|
|
|
|
if (res.code !== 0) {
|
|
throw new Error(`${res.msg} (code: ${res.code})`);
|
|
}
|
|
|
|
return res.data?.children ?? [];
|
|
}
|
|
|
|
/**
|
|
* Insert blocks in batches for large documents (>1000 blocks).
|
|
*
|
|
* Batches are split to ensure BOTH children_id AND descendants
|
|
* arrays stay under the 1000 block API limit.
|
|
*
|
|
* @param client - Feishu API client
|
|
* @param docToken - Document ID
|
|
* @param blocks - All blocks from Convert API
|
|
* @param firstLevelBlockIds - IDs of top-level blocks to insert
|
|
* @param logger - Optional logger for progress updates
|
|
* @param parentBlockId - Parent block to insert into (defaults to docToken = document root)
|
|
* @param startIndex - Starting position within parent (-1 = end). For multi-batch inserts,
|
|
* each batch advances this by the number of first-level IDs inserted so far.
|
|
* @returns Inserted children blocks and any skipped block IDs
|
|
*/
|
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any -- SDK block types
|
|
export async function insertBlocksInBatches(
|
|
client: Lark.Client,
|
|
docToken: string,
|
|
blocks: any[],
|
|
firstLevelBlockIds: string[],
|
|
logger?: Logger,
|
|
parentBlockId: string = docToken,
|
|
startIndex: number = -1,
|
|
): Promise<{ children: any[]; skipped: string[] }> {
|
|
const allChildren: any[] = [];
|
|
|
|
// Build batches ensuring each batch has ≤1000 total descendants
|
|
const batches: { firstLevelIds: string[]; blocks: any[] }[] = [];
|
|
let currentBatch: { firstLevelIds: string[]; blocks: any[] } = { firstLevelIds: [], blocks: [] };
|
|
const usedBlockIds = new Set<string>();
|
|
|
|
for (const firstLevelId of firstLevelBlockIds) {
|
|
const descendants = collectDescendants(blocks, [firstLevelId]);
|
|
const newBlocks = descendants.filter((b) => !usedBlockIds.has(b.block_id));
|
|
|
|
// A single block whose subtree exceeds the API limit cannot be split
|
|
// (a table or other compound block must be inserted atomically).
|
|
if (newBlocks.length > BATCH_SIZE) {
|
|
throw new Error(
|
|
`Block "${firstLevelId}" has ${newBlocks.length} descendants, which exceeds the ` +
|
|
`Feishu API limit of ${BATCH_SIZE} blocks per request. ` +
|
|
`Please split the content into smaller sections.`,
|
|
);
|
|
}
|
|
|
|
// If adding this first-level block would exceed limit, start new batch
|
|
if (
|
|
currentBatch.blocks.length + newBlocks.length > BATCH_SIZE &&
|
|
currentBatch.blocks.length > 0
|
|
) {
|
|
batches.push(currentBatch);
|
|
currentBatch = { firstLevelIds: [], blocks: [] };
|
|
}
|
|
|
|
// Add to current batch
|
|
currentBatch.firstLevelIds.push(firstLevelId);
|
|
for (const block of newBlocks) {
|
|
currentBatch.blocks.push(block);
|
|
usedBlockIds.add(block.block_id);
|
|
}
|
|
}
|
|
|
|
// Don't forget the last batch
|
|
if (currentBatch.blocks.length > 0) {
|
|
batches.push(currentBatch);
|
|
}
|
|
|
|
// Insert each batch, advancing index for position-aware inserts.
|
|
// When startIndex == -1 (append to end), each batch appends after the previous.
|
|
// When startIndex >= 0, each batch starts at startIndex + count of first-level IDs already inserted.
|
|
let currentIndex = startIndex;
|
|
for (let i = 0; i < batches.length; i++) {
|
|
const batch = batches[i];
|
|
logger?.info?.(
|
|
`feishu_doc: Inserting batch ${i + 1}/${batches.length} (${batch.blocks.length} blocks)...`,
|
|
);
|
|
|
|
const children = await insertBatch(
|
|
client,
|
|
docToken,
|
|
batch.blocks,
|
|
batch.firstLevelIds,
|
|
parentBlockId,
|
|
currentIndex,
|
|
);
|
|
allChildren.push(...children);
|
|
|
|
// Advance index only for explicit positions; -1 always means "after last inserted"
|
|
if (currentIndex !== -1) {
|
|
currentIndex += batch.firstLevelIds.length;
|
|
}
|
|
}
|
|
|
|
return { children: allChildren, skipped: [] };
|
|
}
|