LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
-
Updated
Jun 11, 2026 - Python
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
Turn Chaos Into Structure. A Type-Safe AI Agent that extracts valid JSON from unstructured data using PydanticAI, FastHTML, and Gemini 2.5.
Universal prompt library for structured outputs & ready-to-use content. Teachers get lesson plans, developers get reliable JSON/CSV. Works across GPT-4, Claude, Gemini.
n8n workflow templates extractor
Fine-tune Qwen3-0.6B for resume parsing using LoRA
Structured JSON extraction from LLMs with validation, repair, and streaming.
AI-powered structured web scraper that visually builds JSON schemas and uses Gemini 2.5 Flash & Playwright to extract clean, validated JSON with >80% DOM noise reduction.
Google Shopping Ads scraper and e-commerce product data extraction API. Extract live paid listings, merchant domains, pricing, and discount stickers from Google Shopping with this Apify Actor. Free tier available.
HIJOBS scraper and Scotland Highlands & Islands regional job data extraction. Extract salaries, locations, apply emails, and full descriptions from hijobs.net without Cloudflare blocking. Free tier available.
Japan Company scraper and B2B corporate data extraction API. Extract 4.5M+ Japanese corporate registries from METI gBizINFO. Get capital, headcounts, addresses, representatives, certifications, subsidies, and procurement data. Free tier available.
Apna.co scraper and Indian blue-collar job data extraction API. Extract job listings, salary ranges, recruiter WhatsApp and call preferences, company addresses, and coordinate mappings from apna.co with this Apify Actor. Free tier available.
Production-style fine-tuning project for schema-constrained JSON extraction using QLoRA + DPO, with reproducible evals, training curves, and vLLM benchmarks.
arXiv scraper and research paper API. Extract titles, authors, abstracts, PDFs from arXiv with this Apify actor. Free tier available.
Welcome to the Jungle scraper and job data extraction API. Extract job listings, salary, and company data from WTTJ with this Apify actor. Free tier available.
Mercari Japan scraper and product data extraction API. Extract listings, prices from Mercari Japan marketplace with this Apify actor. Free tier available.
Fine-tuned Qwen2.5-7B on Fireworks AI for structured JSON extraction from job postings. LoRA SFT + DPO | FastAPI | +47% over baseline.
CourtListener scraper and legal data extraction API. Extract court opinions, cases from CourtListener with this Apify actor. Free tier available.
Document ingestion and chunking agent that extracts and validates typed JSON against a strict schema.
A Json Analysis Tool
Professional-grade AI logistics pipeline built with Java 17 and Spring Boot 3. Converts unstructured documents into validated JSON via non-blocking WebClient adapters (Groq/Llama 3.1). Features real-time dashboard, Slack/Email notifications, and secure error handling."
Add a description, image, and links to the json-extraction topic page so that developers can more easily learn about it.
To associate your repository with the json-extraction topic, visit your repo's landing page and select "manage topics."