TDM
Tetsuo Data ManagerPull Primary Data for a Symbol
Fetches the configured retention window of minute- level OHLCV bars for one symbol from the configured market-data provider, validates against the trading calendar, and persists to local storage. Single- flight at both layers — only one primary pull runs at a time across the system.
Update Primary Data for a Symbol
Fetches the most recent days of minute-level OHLCV bars for one symbol from the configured market-data provider, merges them into the locally-stored primary CSV with dedupe (freshly- fetched rows supersede stale rows at the same timestamp so broker corrections land cleanly), then checks NYSE trading-day completeness across the update window. Any missing trading day after merge — or no existing primary on disk for the symbol — escalates to a full pull of the configured retention window. Cheaper than a full Pull when the symbol's data is already mostly current; transparently recovers via full pull when integrity is broken. Single-flight per job kind — runs concurrently with a different-kind job (e.g. derivative generation).
Pull All Primary Data
Fetches the current universe inventory from the UIP service (via ORK proxy), then iterates over every symbol pulling its full retention window of minute-level OHLCV bars from the configured market-data provider, validating, and persisting to local storage. Sequential-path processes one symbol at a time; parallel-path uses a thread pool sized by Max workers (0 = auto-detect from CPU and broker rate). Per-symbol failures are best-effort — logged + counted, do not abort the whole run. Hours of runtime; default concurrency-gate setting (off) holds the system exclusive while it runs. Fails immediately if no inventory is available — run the inventory build first.
Update All Primary Data
Fetches the current universe inventory from UIP (via ORK), prunes any local primary CSV whose symbol is not in that inventory (so primaries for delisted / blacklisted / dropped symbols don't accumulate forever), then iterates over every inventory symbol running the cheap incremental update pipeline — pulls only the recent update window, merges with existing primary data, and escalates per-symbol to a full pull only when its own diagnostics demand it (missing trading days, max-date drift, or no existing CSV on disk). Sequential-path processes one symbol at a time; parallel-path uses a thread pool sized by Max workers (0 = auto-detect from CPU). Per- symbol failures are best-effort — logged + counted, do not abort the whole run. Hours of runtime worst case, much less when most symbols are already current; default concurrency-gate setting (off) holds the system exclusive while it runs. Waits for the inventory build to complete and fails immediately if no UIP inventory is available.
Generate Derivative Data for a Symbol
Reads the locally-stored primary OHLCV bars for one symbol, builds a complete market-minute grid for every NYSE trading day in the bars' date range, forward-fills OHLC and zero-fills volume for any minute the source skipped, then resamples at the configured interval (anchored to 09:30 ET so each bin starts on a market-open multiple), drops bins outside market hours and partial bins at the day edges, and pivots long → daily-wide. The resulting dataset replaces any existing derivative file for the symbol. Single-flight per job kind — a primary pull running concurrently does not block this.
Generate Derivative Data for All Symbols
Wipes every existing derivative CSV, then walks the on-disk primary inventory (the symbols TDM already has primary data for — NOT UIP's published universe) and runs the per-symbol derivative pipeline against each. Sequential-path processes one symbol at a time; parallel-path uses a thread pool sized by Max workers (0 = auto-detect from CPU). Per-symbol failures are best-effort — logged + counted, do not abort the whole run. No upstream dependency is declared: this is the right job to fire when primary data is intentionally partial (testing, recovery from an aborted pull-all). Default concurrency-gate setting (off) holds the system exclusive while it runs.
Refresh All Training Data
Composite end-to-end training-data refresh in one button: wipes every existing derivative CSV, prunes any local primary CSV whose symbol is not in the current universe inventory (so dead symbols don't keep collecting derivative- generation work), runs the bulk incremental update over UIP's current inventory (per-symbol pipeline does the cheap update window first and escalates to a full pull only on missing days, max-date drift, or no existing CSV), then regenerates derivative CSVs for every symbol with primary data on disk after the update. Sequential-path processes one symbol at a time within each phase; parallel-path uses a thread pool sized by Max workers (0 = auto-detect from CPU) — both phases share the same worker-pool size since they run sequentially on the same host. Per-symbol failures are best-effort within each phase (logged + counted, do not abort the run). Waits for the inventory build to complete before starting the update phase. Default concurrency-gate setting (off) holds the system exclusive while it runs.
Wipe All Primary Data
Permanently deletes every primary OHLCV CSV under primary_storage_dir. Use when starting fresh or recovering from corrupted local storage. Cannot be undone — the locally-cached primary inventory disappears and must be re-pulled symbol by symbol via the Pull action above. Single-flight per job kind; a per-symbol pull running concurrently is not blocked but is the operator's race to avoid.
Prune Non-Inventory Primary Data
Removes every primary CSV whose symbol is not in the current universe inventory. Useful for reclaiming space and keeping local primary aligned with the published universe after symbols are delisted, blacklisted, or otherwise dropped from the inventory feed. Waits for the inventory build to complete before reading the keep-set, so the comparison reflects the most recent universe; refuses to run if no inventory is available or the inventory is empty (would be equivalent to wipe-all-primary, which is a separate action). Cannot be undone — pruned symbols must be re-pulled via Pull or Pull All if they re-enter inventory. Single-flight per job kind.
Wipe All Derivative Data
Permanently deletes every derivative (daily-wide) CSV under derivative_storage_dir. Cannot be undone — derivative data must be regenerated from the primary inventory via the Generate action above. Use when the configured derivative interval changes and existing derivatives no longer match, or when recovering from corrupted local storage. Single- flight per job kind; a per-symbol generation running concurrently is not blocked but is the operator's race to avoid.