name: process-datasheets description: "Batch-process datasheets from the shared queue. Claims items, parses PDFs into wiki markdown, publishes to the Adom Wiki, and loops until a time budget expires or the queue is empty. Use with: /process-datasheets --until 17:00" user-invocable: true
Process Datasheets
Batch-process datasheets from the shared parsing queue. Claims the next highest-priority item, parses it using the datasheet-parser workflow, publishes to the Adom Wiki, then loops until time runs out or the queue is empty.
Prerequisites
datasheet-queueMCP server configured (providesclaim_datasheet,submit_result,release_datasheet,fail_datasheet,check_time_budget,heartbeat_datasheettools)poppler-utilsandimagemagickinstalledadom-wikiCLI available
Usage
/process-datasheets --until 17:00
/process-datasheets --until 17:00 --interval 30m
/process-datasheets --count 3
Arguments
--until <HH:MM or ISO>— Stop processing at this time (default: 1 hour from now)--count <N>— Process at most N datasheets then stop--interval <Nm>— (For use with/loop) Pause between runs; the skill processes one item per invocation when this is set
Agent Identity
Determine your agent ID from the container hostname:
AGENT_ID=$(hostname)
Use this as agent_id in all MCP tool calls.
Workflow
Step 0: Parse Arguments
Extract --until, --count, and --interval from the skill arguments. Defaults:
until: 1 hour from nowcount: unlimitedinterval: not set (process continuously)
Step 1: Check Time Budget
Before claiming anything, verify there's time left:
check_time_budget({ stop_time: "<until value>", min_minutes_needed: 10 })
If continue is false, stop immediately — do not claim an item you can't finish.
Step 2: Claim Next Datasheet
claim_datasheet({ agent_id: "<AGENT_ID>" })
If the queue is empty, stop — there's nothing to do.
Save the returned item.id, item.part_name, and item.pdf_url for the next steps.
Step 3: Parse the Datasheet
Follow the datasheet-parser skill workflow (Steps 1–7):
- Download the PDF — use
item.pdf_urlif provided, otherwise WebSearch for it - Extract text with
pdftotext - Extract and classify images with
pdfimages - Optimize images for web upload (resize, compress)
- Generate wiki markdown — structured
.mdmatching the wiki renderer format - Prepare metadata JSON — manufacturer, part number, packages, etc.
- Publish to wiki —
adom-wiki page publish+ metadata API call + asset uploads
During long parses, send a heartbeat every 10 minutes to prevent claim timeout:
heartbeat_datasheet({ item_id: <id>, agent_id: "<AGENT_ID>" })
Step 4: Submit Result
After successful publish:
submit_result({ item_id: <id>, agent_id: "<AGENT_ID>", wiki_slug: "datasheets/<partname>" })
Step 5: Handle Failures
If parsing fails at any step:
Recoverable (network timeout, temp file issue) → release back to queue:
release_datasheet({ item_id: <id>, agent_id: "<AGENT_ID>" })Permanent (corrupt PDF, no extractable data, part doesn't exist) → mark failed:
fail_datasheet({ item_id: <id>, agent_id: "<AGENT_ID>", reason: "Detailed error description" })
Step 6: Loop or Stop
After completing (or failing) one datasheet:
- Decrement
--countif set. If count reaches 0, stop. - If
--intervalis set, stop (the/loopscheduler will re-invoke). - Call
check_time_budgetagain. Ifcontinueis false, stop. - If the queue had items and time remains, go back to Step 2.
Error Recovery
| Situation | Action |
|---|---|
| PDF download fails (404, timeout) | fail_datasheet with reason |
pdftotext produces no output | fail_datasheet — likely a scanned/image PDF |
adom-wiki page publish fails | release_datasheet — might be a transient wiki issue |
| Agent crashes mid-parse | Queue auto-releases after 30min timeout |
| Time budget expired mid-parse | release_datasheet — let another agent finish later |
Example Session
> /process-datasheets --until 17:00
Checking time budget... 2h 45m remaining, 8 items pending.
Claiming next datasheet...
Claimed #5 — STM32F103 [P10]
PDF: https://www.st.com/resource/en/datasheet/stm32f103c8.pdf
Downloading PDF... done (1.2MB)
Extracting text... done (45 pages)
Extracting images... 23 images found
Classifying and optimizing... 12 key diagrams selected
Generating wiki markdown... done
Publishing to wiki... done (datasheets/stm32f103)
Uploading 12 diagram assets... done
Submitted result for #5.
Checking time budget... 2h 12m remaining, 7 items pending.
Claiming next datasheet...
Claimed #8 — ESP32-S3 [P20]
...
Scheduling
Pattern A: Single session with /loop
/loop 30m /process-datasheets --until 17:00 --interval 30m
Every 30 minutes, the skill claims and processes one item. Stops when the clock hits 17:00.
Pattern B: Scheduled task (persistent)
/schedule every 30 minutes /process-datasheets --until 17:00 --interval 30m
Survives session restarts. Each trigger processes one item.
Pattern C: Continuous until done
/process-datasheets --until 17:00
Processes items back-to-back until time runs out or the queue is empty.
MCP Server Configuration
Users add this to their .mcp.json or settings:
{
"mcpServers": {
"datasheet-queue": {
"url": "https://6kgcmtonzymg.adom.cloud/mcp"
}
}
}