When batch executions fail, diagnose and resolve issues using execution logs, error patterns, and retry strategies.
Identifying failures
After a batch completes:
- Go to Logs → Batch Jobs
- Open the batch execution
- Check the Failed count
- Click View Failures
You’ll see all failed items with error messages.
Common failure types
Error: Missing required field: customerId
Cause: Input data missing expected fields.
Fix:
- Review CSV headers or JSON keys
- Ensure all required fields are present
- Add validation at the start of the flow to handle missing data
API rate limits
Error: Rate limit exceeded
Cause: Too many concurrent requests to external API or AI model provider.
Fix:
- Reduce batch concurrency in settings
- Add delay steps between API calls
- Split batch into smaller chunks
- Contact provider to increase rate limits
Timeout
Error: Execution timed out
Cause: Individual items taking too long to process (default step timeout is 5 minutes).
Fix:
- Optimize slow steps (faster models, simpler prompts)
- Remove items that consistently time out
- Adjust step timeout if needed
Invalid data
Error: Invalid JSON or Type error
Cause: Data doesn’t match expected format.
Fix:
- Add data validation steps
- Clean input data before batch processing
- Use transform steps to normalize data
Downloading failed items
- On the batch job page, click Download Failed Items
- Save the CSV file
- Review error messages in the
error column
This file includes original inputs plus error details.
Fixing and retrying
Fix the flow
If failures are due to flow logic:
- Update the flow to handle edge cases
- Add error handling (try/catch in transform steps)
- Publish the updated flow
- Re-run the batch with failed items
Fix the data
If failures are due to bad input:
- Open the failed items CSV
- Correct the problematic data
- Remove the
error column
- Upload the corrected CSV as a new batch
Retry with adjustments
Re-run failed items with different settings:
- Go to batch job page
- Click Retry Failed Items
- Adjust:
- Higher timeout
- Different model (if prompt step failures)
- Click Start Retry
Look for patterns in failures. If 20% of items fail with the same error, fix the root cause rather than retrying individual items.
Partial success handling
Design flows to handle partial failures gracefully:
This prevents entire batch from failing on individual errors.
Monitoring during execution
Watch batch progress in real-time:
- Go to batch job page
- Enable Live Updates
- Monitor success/failure counts
- Cancel the batch if failure rate is too high
Debugging individual failures
To investigate a specific failed item:
- Find the execution ID in the failed items list
- Go to Logs
- Search for the execution ID
- Review full execution trace
See Working with Logs for detailed troubleshooting.
Preventing future failures
- Validate inputs: Add validation steps at flow start
- Set realistic timeouts: Allow enough time for complex operations
- Handle errors gracefully: Use try/catch and conditional error handling
- Test with edge cases: Run small batches with problematic data before scaling
- Monitor API quotas: Ensure you have sufficient rate limits
When to give up
Some items may be impossible to process:
- Fundamentally corrupted data
- External services permanently unavailable
- Items that consistently timeout despite optimization
Document these items, exclude them from future batches, and investigate root causes separately.
Next steps
- Running flows in batch for batch execution basics
- Debugging flows for flow-level troubleshooting
- Automating batch processing to schedule batches
- Working with Logs for production error investigation