Data Hygiene: The Unsexy Edge
- Oct 3, 2025
- 2 min read
Clean data beats clever math.
Every systematic trader eventually learns that the scariest bugs don’t crash your code—they **improve your equity**. A missing bar here, a time zone shift there, a corporate split not adjusted correctly, or a subtle indicator that “peeks” at the current bar’s future high—these all make results look better than they should. You won’t get an error message; you’ll get **flattery**. And flattery is expensive.
Data hygiene begins with **time**. Align your price series to the clock you intend to trade. Daylight-saving changes, server time zones, and exchange calendars can all shift timestamps and create invisible gaps. Next, hunt for **missing or duplicated bars**. Decide explicit rules for how you fill or ignore them. Adjust for **corporate actions** on equities so your history reflects real, continuous economics. On multi-asset portfolios, avoid **survivorship bias** by including delisted symbols where relevant; otherwise, you’re testing on winners only.
Guard your calculations. Many indicator libraries compute with the **current bar’s complete information**, which includes prices not known at decision time. That’s **look-ahead bias**. Ensure your entries and exits only use data available at the moment the order would be placed. A simple test is to **delay** every decision by one bar and see if your performance collapses. If it does, your model might be peeking.
Finally, **cross-verify**. Compare a small window of your data with an independent source. Run a trivial strategy—like buy-and-hold or a fixed schedule of trades—to confirm that P&L math, rollovers, and commissions are computed correctly. Data hygiene isn’t charming, but it is **character**. A clean mirror shows you your real face; only then can you decide if you like what you see.
Do you need help creating your own bot? Contact us.

Comments