pyconveyor¶
Deterministic YAML pipeline engine for structured LLM extraction.
pyconveyor lets you describe an extraction workflow in YAML, write prompts in Jinja2, and define step logic in plain Python. The runner handles model calls, retries, schema validation, parallel execution, and structured result summaries.
Get started in 60 seconds¶
# 1. Bootstrap a project (interactive — no Python files needed)
pyconveyor init my_pipeline/ --interactive
cd my_pipeline/
# 2. Set your API key
export OPENAI_API_KEY=sk-...
# 3. Run
pyconveyor run pipeline.yaml --input '{"paper": "Smith et al. demonstrate that..."}'
Or go deeper with the Quickstart guide →
Key features¶
| Feature | What it means |
|---|---|
| YAML-first | The whole pipeline — models, steps, schemas, prompts — lives in one YAML file |
| CLI-first | pyconveyor init, run, batch, benchmark — no Python needed to get started |
| OpenAI-compat-first | Works with Ollama, vLLM, LM Studio, and any hosted endpoint |
| Self-correcting retries | Schema and parse errors are fed back to the model so it can fix itself |
| Vocabularies on fields | Declare controlled vocabularies on schema fields; automatic fuzzy matching and suggestion capture |
| Benchmarking built in | Compare pipeline versions against golden-standard cases; get per-step accuracy |
| HTML/PDF reports | One command produces a shareable report with tables, graphs, and charts |
| Extraction-focused | Optimised for classification, annotation, and structured record extraction |
| Explicit DAG | Every step, dependency, and control flow branch is visible in one YAML file |
| Comprehensible in one sitting | The entire runner is one file; the YAML format has a one-page reference |
Navigation¶
Getting started
- Quickstart — up and running in 5 minutes
- Concepts — how pipelines, steps, and context fit together
Guides
- Step Types —
llm,ensemble,transform,validate,io,parallel,condition - YAML Schema — inline schemas, field descriptions, validators, nested objects
- Validation Feedback — self-correcting retry loops
- Batch Processing — process thousands of documents in parallel
- Benchmarking — measure accuracy, compare pipelines, generate reports
- Vocabulary Fields — constrained extraction with fuzzy matching
- Response Caching — speed up development with cached LLM responses
- Providers — OpenAI, Anthropic, Ollama, custom providers
- Hooks — callbacks for observability and side effects
Reference
- YAML Schema — every field, type, and default
- CLI Reference —
init,run,batch,validate,schema,benchmark,visualise,vocab review