Monitoring¶
molq separates submission from observation. Jobs are persisted in
SQLite, refreshed by reconciliation, and surfaced through handles, record
lists, CLI commands, and a Rich dashboard. Reconciliation, polling, and
events all live on Submitor; live scheduler introspection lives on
Cluster.
Two views of "what is running"¶
| View | Source | Includes other users? |
|---|---|---|
submitor.list_jobs() |
molq's persisted SQLite records | No — only your jobs |
cluster.get_queue() |
live squeue --me / qstat / bjobs |
Scheduler-dependent |
Use list_jobs when you want molq's own state machine view (with retries,
transitions, dependencies). Use get_queue when you want what the
scheduler client itself shows.
Job Lifecycle¶
Jobs move through a terminal-aware state enum:
from molq import JobState
JobState.CREATED
JobState.SUBMITTED
JobState.QUEUED
JobState.RUNNING
JobState.SUCCEEDED
JobState.FAILED
JobState.CANCELLED
JobState.TIMED_OUT
JobState.LOST
All terminal states report True from is_terminal.
Persistence¶
JobStore persists all job records in SQLite with WAL mode enabled. By
default the database is stored at ~/.molq/jobs.db. Multiple Submitors
(across multiple Clusters) share this store and filter by their target's
name.
Each record contains:
- stable
job_id cluster_name(the Submitor's target name)scheduler(the scheduler kind)- scheduler identity (
scheduler_job_id) - cached state
- timestamps such as
submitted_at,started_at, andfinished_at - exit code and failure reason
- execution metadata including default stdout and stderr paths
Reconciliation¶
JobReconciler is responsible for syncing persisted state with the
scheduler. Its main loop is:
- Load active jobs for a cluster.
- Batch-query the scheduler (via the Cluster's Transport).
- Compare cached state with reported state.
- Persist transitions and updated timestamps.
submitor.refresh_jobs() runs a reconciliation pass for the current
target Cluster.
Blocking waits¶
JobMonitor provides higher-level waiting primitives on top of
reconciliation.
For multiple jobs:
Polling strategies¶
molq ships three public polling strategies:
| Strategy | Use case |
|---|---|
FixedStrategy |
Simple fixed-interval polling |
ExponentialBackoffStrategy |
Lower scheduler pressure for long jobs |
AdaptiveStrategy |
Polling cadence based on expected runtime |
These strategies live in molq.strategies and back the monitor layer.
Logs¶
Each submitted job gets default stdout and stderr paths unless you
override them in JobExecution. By default these files live under the
submission working directory at .molq/jobs/<job-id>/.
At runtime, those resolved paths are stored in JobRecord.metadata
under:
molq.stdout_pathmolq.stderr_pathmolq.job_dir
The CLI exposes them through:
Lifecycle events¶
Submitor emits typed events through an EventBus. Subscribe with
on_event / unsubscribe with off_event:
from molq import EventType
def on_terminated(payload):
print(f"{payload.job_id}: {payload.transition.new_state.value}")
submitor.on_event(EventType.JOB_COMPLETED, on_terminated)
submitor.on_event(EventType.JOB_FAILED, on_terminated)
# Later:
submitor.off_event(EventType.JOB_COMPLETED, on_terminated)
Background reconciliation¶
submitor.run_daemon(once=False, interval=5.0, run_cleanup=True) runs a
lightweight loop that calls refresh_jobs (and optionally
cleanup_jobs) on a fixed interval. Useful in long-running services that
want fresh state without explicit polling at every read.
Full-screen dashboard¶
The Rich dashboard is available in two forms:
RunDashboardfor a focused dashboard experienceMolqMonitorfor a full-screen view across all persisted jobs
From the CLI:
Closing the dashboard does not cancel running jobs.