When a CloudRadial AutomationAI run fails or times out, the run detail in History tells you which step stopped it and why. This article covers reading step logs, how step timeouts work, common failure causes, splitting long work into multiple nodes, and rerunning. It is for Admin and Owner roles.
- Reading Step Logs
- Step Timeouts
- Common Failure Causes
- Splitting Long Work Into Multiple Nodes
- Rerunning
Reading Step Logs
Open the run from History (/runs) to see its steps in order. The failing step carries a failed or timed-out status; open it and read its sections:
- Output — what the step produced, if anything, before it stopped
- Log — any log text the step captured during execution
- Error — the error message, shown when the step failed
Work down the steps until you reach the first one that did not succeed — that is where the run stopped and where the Error and Log sections explain the cause.
Step Timeouts
Each step has a time budget. A step runs for at most its configured timeoutSeconds, which defaults to five minutes, and the hard cap is 30 minutes — the Designer will not let a node's timeout exceed it. If a step runs past its budget, AutomationAI marks the run timed out and stops it. The budget is enforced as a lease: when a runner leases a step the control plane sets the lease to expire at the lease time plus the step's timeout plus a short grace, and a periodic sweep fails any run still running past that point.
Common Failure Causes
When a run fails or times out, check for:
- A script error in the step — read the Error section for the message the runner reported
- A step that needs longer than its timeout allows, which surfaces as a timed-out run rather than a script error
- The deployment's runner being offline or revoked, so the work is never picked up — confirm the runner is healthy on the Runners page
- Missing input or a binding that did not resolve, so the step received nothing to act on
Splitting Long Work Into Multiple Nodes
Because a single step cannot exceed the 30-minute hard cap, work that needs longer must be modeled as several nodes rather than one long-running job — the Azure Functions HTTP path the runner uses would not survive a single step that long anyway. Break the work into stages, let each node finish well inside its timeout, and pass results between nodes so the next stage continues where the last left off.
Rerunning
After you have addressed the cause, start the deployment again with the Run action — each run is a fresh execution, so a new run picks up the current pinned version on the runner's next poll. If you changed the workflow itself, publish the new version and re-pin the deployment to it before rerunning, since a deployment always runs its pinned published version.
Comments
0 comments
Please sign in to leave a comment.