Handling batch failures

When batch executions fail, diagnose and resolve issues using execution logs, error patterns, and retry strategies.

Identifying failures

After a batch completes:

Go to Logs → Batch Jobs
Open the batch execution
Check the Failed count
Click View Failures

You’ll see all failed items with error messages.

Common failure types

Input validation errors

Error: Missing required field: customerId

Cause: Input data missing expected fields.

Fix:

Review CSV headers or JSON keys
Ensure all required fields are present
Add validation at the start of the flow to handle missing data

API rate limits

Error: Rate limit exceeded

Cause: Too many concurrent requests to external API or AI model provider.

Fix:

Reduce batch concurrency in settings
Add delay steps between API calls
Split batch into smaller chunks
Contact provider to increase rate limits

Timeout

Error: Execution timed out

Cause: Individual items taking too long to process (default step timeout is 5 minutes).

Fix:

Optimize slow steps (faster models, simpler prompts)
Remove items that consistently time out
Adjust step timeout if needed

Invalid data

Error: Invalid JSON or Type error

Cause: Data doesn’t match expected format.

Fix:

Add data validation steps
Clean input data before batch processing
Use transform steps to normalize data

Downloading failed items

On the batch job page, click Download Failed Items
Save the CSV file
Review error messages in the error column

This file includes original inputs plus error details.

Fixing and retrying

Fix the flow

If failures are due to flow logic:

Update the flow to handle edge cases
Add error handling (try/catch in transform steps)
Publish the updated flow
Re-run the batch with failed items

Fix the data

If failures are due to bad input:

Open the failed items CSV
Correct the problematic data
Remove the error column
Upload the corrected CSV as a new batch

Retry with adjustments

Re-run failed items with different settings:

Go to batch job page
Click Retry Failed Items
Adjust:
- Lower concurrency

Higher timeout
Different model (if prompt step failures)

Click Start Retry

Look for patterns in failures. If 20% of items fail with the same error, fix the root cause rather than retrying individual items.

Partial success handling

Design flows to handle partial failures gracefully:

1 // In a transform step
2 try {
3   const result = riskyOperation(input);
4   return { success: true, result };
5 } catch (error) {
6   return { success: false, error: error.message, input };
7 }

This prevents entire batch from failing on individual errors.

Monitoring during execution

Watch batch progress in real-time:

Go to batch job page
Enable Live Updates
Monitor success/failure counts
Cancel the batch if failure rate is too high

Debugging individual failures

To investigate a specific failed item:

Find the execution ID in the failed items list
Go to Logs
Search for the execution ID
Review full execution trace

See Working with Logs for detailed troubleshooting.

Preventing future failures

Validate inputs: Add validation steps at flow start
Set realistic timeouts: Allow enough time for complex operations
Handle errors gracefully: Use try/catch and conditional error handling
Test with edge cases: Run small batches with problematic data before scaling
Monitor API quotas: Ensure you have sufficient rate limits

When to give up

Some items may be impossible to process:

Fundamentally corrupted data
External services permanently unavailable
Items that consistently timeout despite optimization

Document these items, exclude them from future batches, and investigate root causes separately.

Next steps

Running flows in batch for batch execution basics
Debugging flows for flow-level troubleshooting
Automating batch processing to schedule batches
Working with Logs for production error investigation

1	// In a transform step
2	try {
3	const result = riskyOperation(input);
4	return { success: true, result };
5	} catch (error) {
6	return { success: false, error: error.message, input };
7	}