There are two ways to get extraction results from a document processing API. The first is polling: you submit a document, receive an ID, then periodically call a status endpoint until the result is ready. The second is webhooks: you submit a document, and the API calls your server when the result is ready — you don't ask, you get told.
For anything beyond simple synchronous requests, webhooks are the better architecture. Here's how to use them.
Polling vs Webhooks
Polling:
Your server → POST /extract → receives document_id
Your server → GET /documents/{id} → status: "processing"
Your server → GET /documents/{id} → status: "processing"
Your server → GET /documents/{id} → status: "completed" + resultProblems: unnecessary requests, added latency, wasted compute, polling interval creates a trade-off between responsiveness and efficiency.
Webhooks:
Your server → POST /extract → receives document_id [processing happens] DocuParseAPI → POST your-server.com/webhooks/docuparse → result delivered
Your server does nothing until the result arrives. No polling loop. No wasted requests.
Webhooks are particularly valuable for:
- Processing high volumes of documents in background queues
- Mobile or web applications where the user doesn't stay on the page
- Event-driven pipelines: extract → validate → write to ERP
- Batch operations where many documents are submitted at once
The Webhook Payload
document.completed
Fired when a document finishes processing successfully:
{
"event": "document.completed",
"document_id": "doc_clx7abc123",
"timestamp": "2026-05-19T14:32:00Z",
"data": {
"document_type": "invoice",
"merchant": "Riverside Consulting LLC",
"invoice_id": "INV-2026-0091",
"date": "2026-05-01",
"due_date": "2026-05-31",
"currency": "USD",
"subtotal": "3500.00",
"tax": "350.00",
"total": "3850.00",
"payment_method": null,
"line_items": [
{
"description": "Strategy Consulting — May",
"quantity": 35,
"unit_price": "100.00",
"total": "3500.00"
}
]
}
}document.failed
Fired when extraction was attempted but could not produce a result:
{
"event": "document.failed",
"document_id": "doc_clx7abc456",
"timestamp": "2026-05-19T14:32:05Z",
"error": {
"code": "EXTRACTION_FAILED",
"message": "Document extraction failed for this file."
}
}batch.completed
Fired after all documents in a batch upload have finished processing:
{
"event": "batch.completed",
"batch_id": "batch_abc789",
"timestamp": "2026-05-19T14:33:00Z",
"summary": {
"total": 12,
"succeeded": 11,
"failed": 1
}
}Signature Verification
DocuParseAPI signs each webhook delivery with an HMAC-SHA256 signature. The signature is in the X-DocuParse-Signature header as sha256=<hex_digest>.
Why signature verification matters: Without it, anyone who knows your webhook URL can send fake events to your server. A malicious request could trigger your invoice processing pipeline with fabricated data.
How to get your webhook secret:
- Go to Dashboard → Settings → Webhooks
- Add your endpoint URL
- Copy the generated secret — store it as
DOCUPARSE_WEBHOOK_SECRETin your environment variables
The verification logic:
import hmac, hashlib
def is_valid_signature(payload_bytes: bytes, header: str, secret: str) -> bool:
expected = "sha256=" + hmac.new(
secret.encode(), payload_bytes, hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, header)Use hmac.compare_digest (Python) or crypto.timingSafeEqual (Node.js) — not == — to prevent timing attacks.
Testing Without a Live Server
Option 1 — webhook.site (fastest):
- Go to webhook.site
- Copy your unique URL
- Set it as your webhook endpoint in the DocuParseAPI dashboard
- Upload a test document
- Watch the payload arrive in real time in your browser
Option 2 — ngrok (for local development):
# Start your local server on port 3000 node server.js # In another terminal, expose it publicly ngrok http 3000 # Gives you: https://abc123.ngrok.io # Use https://abc123.ngrok.io/webhooks/docuparse as your endpoint
Option 3 — simulate locally:
# Simulate a webhook delivery to your local server
curl -X POST http://localhost:3000/webhooks/docuparse \
-H "Content-Type: application/json" \
-H "X-DocuParse-Signature: sha256=YOUR_COMPUTED_SIGNATURE" \
-d '{
"event": "document.completed",
"document_id": "doc_test123",
"timestamp": "2026-05-19T14:00:00Z",
"data": {
"merchant": "Test Vendor",
"total": "100.00",
"currency": "USD",
"date": "2026-05-19"
}
}'Common Mistakes
Doing slow work before responding: Database writes, external API calls, and email sends should all happen after you've sent the 200 response. If your handler takes 10 seconds before responding, the delivery times out and gets retried — you may process the event twice.
Not handling retries: If your server returns a non-200 status or times out, DocuParseAPI will retry the delivery. Your handlers should be idempotent — processing the same event twice should have the same result as processing it once. The document_id field is your deduplication key.
Ignoring document.failed events: Every submission that fails extraction fires a document.failed event. If you only handle document.completed, failed documents disappear silently. Always handle both.