Reads the manufacturer's datasheet PDF and produces everything needed to create a chip from scratch: symbol JSON, footprint with per-pad positions + JEDEC mechanical dimensions, 3D chip geometry + lead annotations for OCCT /create-chip, two SVGs, multi-variant discovery, and field-level provenance. The only purely-datasheet-derived source in chipsmith's cross-validation.
ds2sf
Datasheet → symbol + footprint + provenance JSON.
A small Rust CLI that runs downstream of chip-fetcher: it reads a chip's datasheet PDF and produces the JSON inputs needed by Adom's sym_create MCP plus a footprint metadata file plus a field-level provenance audit trail. Built for both human use and agent-to-agent calling by an upstream automation (chip-fetcher, batch pipelines).
Mirrors the step2glb naming convention.
What it produces
For a chip directory like ~/project/chip-fetcher/library/<MPN>/, ds2sf writes four JSONs:
<MPN>-symbol.extracted.json # sym_create-shaped (drop-in MCP input)
<MPN>-footprint.extracted.json # padCount, bodyDimensions, leadDimensions,
# padDescriptions, symbolPinMap, kicad_baseline
<MPN>-extraction.provenance.json # per-pin {page, table_caption}
# per-dimension {field, value, page, figure_label}
<MPN>-extraction.result.json # agent-to-agent contract: status + exit_code + hints
The provenance JSON is the audit trail. Pick any pin or any dimension, follow the page field, open the PDF on that page, and confirm the value is visibly there. Same loop for footprint dimensions via figure_label.
How it works
Two focused claude -p calls per datasheet:
- Symbol pass — pinout tables, pin types, Adom logical groups (POWER / I2C / GPIO / etc.), per-pin page provenance.
- Footprint pass — package outline, recommended land pattern, body + lead dimensions, per-dimension page provenance.
Each pass has its own JSON schema and its own validator. Failures retry once with the validation error fed back to the model. Symbol-pass receives the canonical pin-name → pin-number map for the footprint pass so the two outputs cannot disagree with themselves.
After Claude finishes, ds2sf normalizes the package name against the service-kicad standard library (Package_SO, Package_QFP, Package_DFN_QFN, Package_LQFP, Package_TO_SOT_SMD, Package_DIP, …). When the package matches, downstream tools just pull the .kicad_mod from there. When it doesn't (custom OLGA packages, modules, optical sensors), bodyDimensions + leadDimensions carry the datasheet-extracted values and kicad_baseline is null.
Sample output
Top of <MPN>-symbol.extracted.json (ADS1115IDGSR)
{
"symbolName": "ADS1115",
"manufacturer": "Texas Instruments",
"package": "VSSOP-10",
"description": "Ultra-small 16-bit sigma-delta ADC with PGA, internal reference, …",
"category": "ic",
"referencePrefix": "U",
"datasheetUrl": "https://www.ti.com/lit/ds/symlink/ads1115.pdf",
"pins": [
{ "number": "8", "name": "VDD", "type": "power_in", "group": "POWER", "description": "…" },
{ "number": "3", "name": "GND", "type": "passive", "group": "GROUND", "description": "…" },
{ "number": "9", "name": "SDA", "type": "bidirectional","group": "I2C", "description": "…" },
{ "number": "4", "name": "AIN0","type": "input", "group": "ADC", "description": "…" }
],
"groups": [ { "name": "POWER", "description": "…" }, … ]
}
Top of <MPN>-extraction.provenance.json
{
"datasheet": {
"path": "/…/ADS1115IDGSR.pdf",
"url": "https://www.ti.com/lit/ds/symlink/ads1115.pdf",
"sha256": "65231e81…",
"source_tier": "mfr",
"source_host": "ti.com/lit",
"page_count": 57
},
"symbol": {
"pin_provenance": [
{ "pin_number": "1", "pin_name": "ADDR",
"page": 3, "table_caption": "Table 4-1. Pin Functions: RUG, DYN, and DGS Packages",
"confidence": "high" }
]
},
"footprint": {
"dimension_provenance": [
{ "field": "bodyDimensions.x", "value": 3.1,
"page": 52, "figure_label": "DGS0010A VSSOP - 1.1 mm max height Package Outline",
"confidence": "high" }
]
}
}
Agent-to-agent contract
Every invocation writes <MPN>-extraction.result.json and exits with one of three codes:
| Exit | status | Caller action |
|---|
| 0 | ok | Accept the outputs. review_required[] may have informational entries. |
| 2 | needs_better_pdf | Try one of hints[].candidates and re-call with --pdf <new>. |
| 3 | unrecoverable | Surface to a human; inspect .ds2sf-cache/ for the raw Claude responses. |
hints[] is a tagged-union list:
try_alternate_pdf — {reason, candidates: PathBuf[]}. ds2sf already ranked the chip-dir's other PDFs; pick the first one not yet tried.
update_upstream_metadata — {file, key, current, extracted}. e.g. info.json's package disagrees with the datasheet; patch it and re-run.
use_datasheet_extracted_footprint — {package}. Custom package; downstream footprint authoring should use the captured dimensions.
human_review — {reason, notes}. Engineering nuance the schematic designer should see.
This means an upstream agent like chip-fetcher can call ds2sf, walk hints[].candidates until it gets status: ok, and accept the extraction without human input.
let mut tried = HashSet::new();
let mut r = run_ds2sf(chip_dir, None);
tried.insert(r.datasheet_used.clone());
while r.status == "needs_better_pdf" {
let next = r.hints.iter()
.filter_map(|h| if let Hint::TryAlternatePdf{candidates, ..} = h { Some(candidates) } else { None })
.flatten().find(|c| !tried.contains(*c));
let Some(next) = next else { break };
tried.insert(next.clone());
r = run_ds2sf(chip_dir, Some(next));
}
Full reason / action enums and the calling recipe are in SKILL.md.
CLI
ds2sf extract <CHIP-DIR> # read PDF, emit symbol + footprint + provenance + result JSONs
ds2sf svg <CHIP-DIR> # regenerate <MPN>-footprint.authoritative.svg from cached
# footprint JSON; back-stamps outputs.footprint_svg onto
# an existing <MPN>-extraction.result.json so chipsmith
# & other downstream agents can find it via the contract
ds2sf normalize <PACKAGE> # resolve a package string against service-kicad stdlib (debug)
ds2sf inspect <CHIP-DIR> # print summary of an existing extraction
ds2sf health # probe `claude` CLI + service-kicad reachability
ds2sf install # drop SKILL.md to ~/.claude/skills/ds2sf/
extract flags: --pdf, --mpn, --manufacturer, --datasheet-url, --source-tier, --out-dir, --model (default claude-opus-4-7), --force, --symbol-only, --footprint-only, --json-only.
Validated chips (initial release)
| MPN | Package | Stdlib | Pins | Note |
|---|
| ADS1115IDGSR | VSSOP-10 | ✅ Package_SO:MSOP-10_3x3mm_P0.5mm | 10 | clean |
| DRV8316RRGFR | VQFN-40 (7×5) | ❌ custom | 40 | caught chip-fetcher's info.json lying about package size |
| VL53L8CX | OLGA-16 | ❌ custom | 17 | all 5 muxed pins flagged, thermal pad called out |
| ESP32-S3-WROOM-1-N4 | SMD module | ❌ custom | 41 | castellated pads, boot-strapping pins, PSRAM variant conflicts |
Failure-path also verified: feeding the SoC datasheet to a module product correctly returned status: needs_better_pdf with five ranked candidate PDFs in hints[].
Build
cargo build --release
sudo cp target/release/ds2sf /usr/local/bin/
ds2sf install
Or just ./build.sh.
Cost & latency (Haiku 4.5 baseline)
- ~$0.20–0.30 per chip end-to-end
- ~2–3 min wall time per chip
- Anthropic prompt cache keeps the PDF warm across the symbol→footprint calls
name: ds2sf
description: Datasheet → symbol + footprint + provenance JSON. Reads a chip's datasheet PDF and emits sym_create-shaped symbol JSON, footprint metadata, a field-level provenance audit trail, and an agent-to-agent result.json with stable exit codes for programmatic re-search-and-retry. Trigger words — ds2sf, datasheet to symbol, datasheet to footprint, extract symbol from datasheet, extract footprint from datasheet, datasheet pinout extract, sym extracted json, fp extracted json, datasheet sym fp, chip pin grouping, pin provenance, footprint provenance, sym_create input from datasheet, ds2sf result.json, datasheet extractor agent contract.
ds2sf — datasheet → symbol + footprint + provenance
Pipeline: read a chip-fetcher chip directory, send the datasheet PDF to Claude in two focused passes (symbol → footprint), normalize the package against the service-kicad standard library, and emit four JSON files alongside the existing chip-fetcher artifacts.
⚠ Always Opus, never Haiku
Production runs MUST use the default --model claude-opus-4-7. EEs rely on this output to wire schematics; Haiku's chip-data misreads are not acceptable in that pipeline. Concur (the cross-source consensus check) does catch most Haiku errors via 4-source consensus, but the right answer is to not produce them in the first place. Override the model only for development experiments where a follow-up Opus pass is acceptable.
When to use
- After
chip-fetcher lands a chip into ~/project/chip-fetcher/library/<MPN>/.
- When you want an Adom-grouped symbol (POWER / I2C / GPIO / etc.) — the UL bundle's
.kicad_sym is flat and ungrouped.
- When you need a downstream-ready
sym_create input JSON.
- When you need page-level provenance to deeplink the Adom user back to the datasheet PDF page each fact came from.
Module products: pass --pdf explicitly
For module products (ESP32-WROOM, NORA, ATWINC, SARA-R5, etc.), chip-fetcher's auto-named <MPN>.pdf may be the internal SoC datasheet — which has the SoC's pinout, not the module's external castellated-pad pinout. The extractor will refuse to hallucinate (writes a result.json with status: "needs_better_pdf" and a list of candidate PDFs already on disk).
Look for the module-specific PDF in the chip-dir (<MPN>-original.pdf, <MPN>-datasheethardware-guideline.pdf, etc.) and pass it via --pdf:
ds2sf extract /home/adom/project/chip-fetcher/library/ESP32-S3-WROOM-1-N4 \
--pdf /home/adom/project/chip-fetcher/library/ESP32-S3-WROOM-1-N4/ESP32-S3-WROOM-1-N4-original.pdf
Commands
ds2sf extract <CHIP-DIR>
Run the full pipeline against a chip directory.
| Flag | Description |
|---|
--pdf <path> | Override auto-detected datasheet (default <dir>/<basename>.pdf). |
--mpn <id> | Override the MPN. |
--manufacturer <name> | Override the manufacturer. |
--datasheet-url <url> | Override the datasheet URL. |
--source-tier <mfr|snapmagic|mouser|digikey|arrow|cse|lcsc|unknown> | Override the upstream provenance tier. |
--out-dir <path> | Output directory (default: chip-dir). |
--model <id> | Claude model (default claude-opus-4-7). Always use Opus for production EE work. Haiku produces frequent misreads on chip data — pin-order swaps (DMG2305 GDS↔GSD), pad undercounts (ICE40UP5K saw 24 of 48), package-variant confusion (WSL2010 → WSL2512), and JTAG-letter typos (MTDO → MTD0). Concur catches these via cross-source consensus, but the right answer is to never let them happen: use Opus. Only override to a cheaper model for development testing where a follow-up Opus run is acceptable. |
--force | Bypass cache, re-run Claude. |
--symbol-only | Skip footprint pass. |
--footprint-only | Re-run only footprint pass against cached symbol. |
--json-only | Suppress progress; only OK: / ERROR: lines. |
Example:
ds2sf extract /home/adom/project/chip-fetcher/library/ADS1115IDGSR
ds2sf normalize <PACKAGE>
Resolve a package string against service-kicad's standard library (debug aid).
ds2sf normalize "VSSOP-10"
# OK: VSSOP-10 → Package_SO:MSOP-10_3x3mm_P0.5mm (service-kicad stdlib)
ds2sf inspect <CHIP-DIR>
Print a summary of an existing extraction without re-running anything.
ds2sf health
Probe claude CLI + service-kicad reachability.
ds2sf install
Drop this SKILL.md into ~/.claude/skills/ds2sf/.
Outputs
<chip-dir>/
<MPN>-symbol.extracted.json # sym_create-shaped (only on status=ok)
<MPN>-footprint.extracted.json # fp-metadata-shaped (only on status=ok)
<MPN>-extraction.provenance.json # audit trail (only on status=ok)
<MPN>-extraction.result.json # agent-to-agent contract (always written)
.ds2sf-cache/ # raw Claude responses for re-emit / debug
The provenance JSON is the audit trail. Open the file, pick a pin, follow pin_provenance[i].page → open that page in the datasheet PDF → confirm the pin name + description are visibly there. Same for footprint dimensions via dimension_provenance[i].page + figure_label.
Agent-to-agent contract
ds2sf is built so an upstream agent (chip-fetcher, automation) can call it programmatically, read the result, and decide what to do next without human input.
Exit codes
| Exit | status | Meaning | Caller action |
|---|
| 0 | ok | Extraction completed. review_required[] may still have entries — they're informational, not blocking. | Accept the outputs. |
| 2 | needs_better_pdf | Extractor refused because the supplied PDF doesn't have the right data (typically: SoC datasheet was passed for a module, or wrong package variant). | Try one of hints[].candidates and re-call with --pdf <new>. |
| 3 | unrecoverable | Hard failure (Claude returned malformed output twice; pre-flight failure; emit error). | Surface to a human; inspect .ds2sf-cache/. |
<MPN>-extraction.result.json shape
Always written. Stable schema (schema_version: "1"):
{
"schema_version": "1",
"status": "ok | needs_better_pdf | unrecoverable",
"exit_code": 0 | 2 | 3,
"mpn": "ADS1115IDGSR",
"manufacturer": "TI",
"datasheet_used": "/home/.../ADS1115IDGSR.pdf",
"datasheet_sha256": "65231e81…",
"summary": "ADS1115IDGSR — 10 pins, package VSSOP-10 …",
"review_required": [ … merged from symbol + footprint passes … ],
"hints": [
{ "action": "try_alternate_pdf",
"reason": "module_external_pinout_missing",
"candidates": ["/path/to/X-original.pdf", "/path/to/X-hardware-guidelines.pdf"] },
{ "action": "update_upstream_metadata",
"file": "/path/to/info.json",
"key": "package",
"current": "VQFN-32 (5×5×0.9 mm)",
"extracted": "VQFN-40 (7×5×0.9 mm)" },
{ "action": "use_datasheet_extracted_footprint",
"package": "Optical LGA-16" },
{ "action": "human_review",
"reason": "muxed_pin_protocol_select",
"notes": "…" }
],
"outputs": {
"symbol_json": "/path/to/X-symbol.extracted.json | null",
"footprint_json": "/path/to/X-footprint.extracted.json | null",
"provenance_json": "/path/to/X-extraction.provenance.json | null"
},
"extractor_version": "0.1.0",
"claude_model": "claude-opus-4-7"
}
Canonical review_required.reason values
| Reason | When | Caller action |
|---|
module_external_pinout_missing | Empty pins[] returned because the supplied PDF was a SoC datasheet for a module product. | hints[].try_alternate_pdf → re-call with one of the candidate PDFs. |
package_mismatch_in_hint | info.json's package disagrees with the datasheet's actual package. | hints[].update_upstream_metadata → patch info.json's package to match extracted, then accept. |
non_standard_package_no_kicad_baseline | The package isn't in service-kicad's stdlib (custom OLGA, modules, optical). | hints[].use_datasheet_extracted_footprint → use <MPN>-footprint.extracted.json's bodyDimensions + leadDimensions directly. |
muxed_pin_protocol_select | Some pins (e.g. SDA/MOSI) are protocol-multiplexed. | Informational. Pin descriptions in provenance explain how. |
pad_dimensions_from_example_layout | Footprint dims came from an "Example Board Layout" rather than a formal recommended land pattern. | Informational; recommend cross-check vs IPC-7351. |
boot_strapping_pins, vdd_spi_voltage_dependency, psram_variant_conflict, non_uniform_pad_pitch, power_supply_configuration, thermal_management, i2c_address_multidevice | Engineering nuances Claude flagged for human review. | Surface to schematic designer. |
unrecoverable_extraction_error | Claude returned malformed JSON twice in a row. | Surface to a human; inspect raw responses in .ds2sf-cache/. |
Agent calling recipe
// (Pseudo-Rust for chip-fetcher's caller side.)
fn run_ds2sf(chip_dir: &Path, pdf_override: Option<&Path>) -> ResultDoc {
let mut cmd = Command::new("ds2sf");
cmd.arg("extract").arg(chip_dir).arg("--force").arg("--json-only");
if let Some(p) = pdf_override { cmd.arg("--pdf").arg(p); }
let _ = cmd.status(); // exit code mirrored in result.json
let path = chip_dir.join(format!("{}-extraction.result.json", basename(chip_dir)));
serde_json::from_str(&fs::read_to_string(path).unwrap()).unwrap()
}
let mut tried: HashSet<PathBuf> = HashSet::new();
let mut r = run_ds2sf(chip_dir, None);
tried.insert(r.datasheet_used.clone());
while r.status == "needs_better_pdf" {
// Pull the first try_alternate_pdf hint we haven't already tried.
let next = r.hints.iter()
.filter_map(|h| if let Hint::TryAlternatePdf { candidates, .. } = h { Some(candidates) } else { None })
.flatten()
.find(|c| !tried.contains(*c));
let Some(next) = next else {
// Out of candidates — escalate to human.
log_for_review(&r); break;
};
tried.insert(next.clone());
r = run_ds2sf(chip_dir, Some(next));
}
match r.status.as_str() {
"ok" => {
accept(r.outputs);
// Apply update_upstream_metadata hints (e.g. fix info.json's package).
for h in &r.hints {
if let Hint::UpdateUpstreamMetadata { file, key, extracted, .. } = h {
patch_json(file, key, extracted);
}
}
}
"unrecoverable" => log_for_review(&r),
_ => unreachable!(),
}
Caching behavior under cache-hit
If <MPN>-extraction.provenance.json already exists and --force is not passed, ds2sf skips the Claude calls entirely and exits with the cached result.json's exit code. Re-running is idempotent.