API Reference
LangStruct API
Section titled “LangStruct API”The LangStruct class is the primary entry point for building extractors, parsing
natural language queries, and exporting results. All examples below assume:
from langstruct import LangStruct
ls = LangStruct(example={ "company": "Apple", "revenue": 125.3, "quarter": "Q3 2024",})Initialization
Section titled “Initialization”LangStruct( schema: Optional[Type[Schema]] = None, model: Optional[Union[str, dspy.LM]] = None, optimizer: str = "miprov2", chunking_config: Optional[ChunkingConfig] = None, use_sources: bool = True, example: Optional[Dict[str, Any]] = None, examples: Optional[List[Dict[str, Any]]] = None, schema_name: str = "GeneratedSchema", descriptions: Optional[Dict[str, str]] = None, refine: Union[bool, Refine, Dict[str, Any]] = False, **llm_kwargs,)- Provide a Pydantic
schemaor pass one/many example dicts for automatic schema generation. - Pass
model="gpt-4o-mini",model=dspy.LM(...), or omit to auto-detect from configured API keys. - Set
refine=Trueor pass aRefineconfig to boost accuracy with additional model calls.
Extraction
Section titled “Extraction”extract
Section titled “extract”result = ls.extract( text_or_texts, confidence_threshold: float = 0.0, validate: bool = True, debug: bool = False, return_sources: Optional[bool] = None, max_workers: Optional[int] = None, show_progress: bool = False, rate_limit: Optional[int] = None, retry_failed: bool = True, refine: Union[bool, Refine, Dict[str, Any], None] = None, **kwargs,)- Accepts either a single string or a list of strings. Lists automatically parallelize.
validate=Trueruns LangStruct’s validator; combine withdebug=Trueto surface suggestions.- Override
return_sourcesto force-enable/disable character-level grounding per call. - Use
refine=Trueor a custom dict (e.g.,{ "strategy": "bon", "n_candidates": 5 }). - Returns an
ExtractionResultfor single inputs orList[ExtractionResult]for batches.
extract_batch
Section titled “extract_batch”Explicit parallel API when you want access to failure details:
results = ls.extract_batch( texts, max_workers: int = 10, show_progress: bool = True, rate_limit: Optional[int] = None, return_failures: bool = False,)- Set
return_failures=Trueto receive aProcessingResultwithsuccessful/failedcollections. - Honors the same validation/refinement arguments as
extract.
Query Parsing
Section titled “Query Parsing”parsed = ls.query( query_or_queries, explain: bool = True, max_workers: Optional[int] = None, show_progress: bool = False, rate_limit: Optional[int] = None, retry_failed: bool = True,)- Converts natural language RAG queries into
ParsedQueryobjects containingsemantic_terms,structured_filters,confidence, and optional explanations. - Accepts strings or lists; lists parallelize just like
extract.
query_batch
Section titled “query_batch”Same spirit as extract_batch, returning either parsed queries or a ProcessingResult
when return_failures=True.
results = ls.query_batch( queries, max_workers: int = 10, show_progress: bool = True, rate_limit: Optional[int] = None, return_failures: bool = False,)Optimization & Evaluation
Section titled “Optimization & Evaluation”optimize
Section titled “optimize”ls.optimize( texts: List[str], expected_results: Optional[List[Dict]] = None, validation_split: float = 0.2,)- Initializes a DSPy optimizer (MIPROv2 by default, or GEPA if
optimizer="gepa"). - Provide
expected_resultsfor supervised optimization; otherwise LangStruct uses metric-free improvements. - Returns the same
LangStructinstance for chaining.
evaluate
Section titled “evaluate”scores = ls.evaluate( texts: List[str], expected_results: List[Dict], metrics: Optional[List[str]] = None,)- Computes accuracy/F1 by default, with optional
precisionandrecall. - Uses the current extractor pipeline, so run
optimizebeforehand if you want tuned prompts.
Exporting & Visualization
Section titled “Exporting & Visualization”export_batch
Section titled “export_batch”ls.export_batch( results: List[ExtractionResult], file_path: str, format: str = "csv", include_metadata: bool = False, include_sources: bool = False, **kwargs,)- Supports
csv,json,excel, andparquetoutputs. - Set
include_sources=Trueto embed grounding spans beside values.
save_annotated_documents / load_annotated_documents
Section titled “save_annotated_documents / load_annotated_documents”- Persist extractions to JSONL with
save_annotated_documents(results, "extractions.jsonl"). - Rehydrate them via
load_annotated_documents(path)to continue processing or visualize later.
visualize
Section titled “visualize”html = ls.visualize(results_or_jsonl, file_path: Optional[str] = None, **kwargs)- Generates the interactive HTML viewer used across LangStruct demos.
- Pass a file path to save the visualization, or omit to receive the HTML string.
Saving & Loading Extractors
Section titled “Saving & Loading Extractors”ls.save("./my_extractor")writes schema, pipeline state, optimizer config, and refinement options to disk.LangStruct.load(path)reconstructs the extractor (API keys must still be set in the environment).
Schema Introspection
Section titled “Schema Introspection”- Access
ls.schema_infoto retrieve field descriptions, JSON Schema, and an example payload structure. Useful for generating docs or debugging auto-generated schemas.
Related Utilities
Section titled “Related Utilities”ls.save_annotated_documents,ls.load_annotated_documents, andls.visualizeshare formats with LangExtract—handy for migrating annotated corpora.ProcessingResultobjects (returned whenreturn_failures=True) exposesuccessful,failed,success_rate, and helper methods likeraise_if_failed().