The Science of Trade Timing: When Should You Actually Click "Buy"?

by Martin Russmann

The Bottom Line: This article explains a rigorous, scientific approach to deciding when to enter and exit trades in stock markets. Unlike most trading advice, this framework accounts for the real costs of trading, uses proper statistical methods, and provides a clear formula for when action is justified. Whether you're a curious investor or a quantitative researcher, this guide bridges the gap between academic rigor and practical application.

Introduction: The Real Question Nobody Answers

Here's a scenario every investor faces: You've done your research. You believe a stock is going to rise. But when exactly should you buy it? Right now? At market open tomorrow? Should you wait for a dip?

Most trading books give you vague advice like "buy on weakness" or "follow the trend." But they never answer the fundamental question: How confident do you need to be before acting?

This article presents a complete framework that answers that question with mathematical precision—while remaining grounded in the messy reality of trading costs, uncertain information, and noisy markets.

What Makes This Different?

Traditional approaches treat timing as pattern recognition: "The market tends to go up on Mondays" or "Buy when price touches the 50-day moving average." But these approaches suffer from three fatal flaws:

They ignore trading costs. A small statistical edge can easily be wiped out by the spread, fees, and market impact.
They assume independence. Stock returns are not like coin flips—today's move affects tomorrow's.
They never tell you when to act. A 55% win rate sounds good, but is it enough after costs?

Our framework addresses all three problems by treating timing as what it really is: a decision under uncertainty with frictions.

Part 1: Setting Up the Problem

The Prices You See vs. The Prices You Get

Before we can talk about timing, we need to be honest about something: the price you see on your screen is not the price you'll actually pay.

When you look at a stock quote, you typically see something called the mid-price—the average of the best bid (what buyers will pay) and the best ask (what sellers want):

$S_t = \frac{A_t + B_t}{2}$

where $S_t$ is the mid-price at time $t$ , $A_t$ is the ask price, and $B_t$ is the bid price.

But here's the catch: You can't actually trade at the mid-price. If you want to buy immediately, you pay the ask. If you want to sell immediately, you receive the bid. The difference—called the spread—is your first cost of doing business.

What Is a Return, Really?

When we talk about how much a stock moved, we use returns—the percentage change in price. There are two common ways to measure this:

Simple return (what most people think of):
$r_t = \frac{S_t - S_{t-1}}{S_{t-1}}$

Log return (what quants prefer because it has nicer mathematical properties):
$\ell_t = \log S_t - \log S_{t-1}$

For small moves, these are nearly identical. Log returns have the advantage that they add up nicely over time: the log return from Monday to Wednesday equals the Monday-to-Tuesday return plus the Tuesday-to-Wednesday return.

For a prediction horizon of $\tau$ periods (say, 90 minutes), we define:
$\ell_{t \to t+\tau} = \log S_{t+\tau} - \log S_t$

This is what we're trying to predict: how much will the stock move over our chosen time horizon?

The Cardinal Rule: No Peeking at the Future

This sounds obvious, but it's where most timing research goes wrong: every calculation must use only information available at the time of the decision.

Mathematically, we express this using the concept of a filtration $\mathcal{F}_t$ —a fancy term for "all the information you could possibly know at time $t$ ." This includes past prices, past trades, news that's already been released, and any indicators you've computed from historical data.

A valid timing signal must be $\mathcal{F}_t$ -measurable, meaning it depends only on information available at time $t$ . Anything else is cheating—and will produce spectacular backtests that fail miserably in live trading.

Part 2: The True Cost of Trading

Your Fill Price Is Not the Mid-Price

Let's model what actually happens when you trade. If you're buying, your actual fill price looks something like this:

$P_t^{\text{fill, buy}} = A_t \times (1 + \kappa_t^{\text{impact}} + \kappa_t^{\text{slip}}) + \text{fee}_t$

And when selling:

$P_t^{\text{fill, sell}} = B_t \times (1 - \kappa_t^{\text{impact}} - \kappa_t^{\text{slip}}) - \text{fee}_t$

Let's break down these terms:

Component	What It Means	Typical Size
$A_t$ or $B_t$	Ask or bid price	The starting point
$\kappa^{\text{impact}}$	Market impact—your order moves the price against you	1-10 basis points
$\kappa^{\text{slip}}$	Slippage from latency and execution delays	1-5 basis points
$\text{fee}_t$	Brokerage fees and exchange costs	0-10 basis points

Note: A basis point (bp) is 0.01%, so 10 bp = 0.1%.

The Net Return: What You Actually Make

For a long trade (buying now, selling later), your net log return is:

$g_{t,\tau}^{\text{long}} = \log P_{t+\tau}^{\text{fill, sell}} - \log P_t^{\text{fill, buy}}$

This is the number that matters. Not the mid-price return. Not the gross return. The net return after all costs.

Why this matters: Imagine you predict a 0.3% move with 60% accuracy. Sounds profitable, right? But if your total trading costs are 0.2%, your edge just shrunk by two-thirds. This is why many "proven" strategies evaporate when you account for realistic execution.

Part 3: From Prediction to Decision

Here's where most timing research stops: they build a model, measure its accuracy, and call it a day. But accuracy isn't action. How do you actually decide when to trade?

Two Ways to Think About It

Approach A: Full Distribution (Ideal)

If you can model the entire distribution of future returns—not just the average, but how spread out or skewed they might be—you can make optimal decisions using expected utility theory.

Approach B: Win/Loss Framework (Practical)

A simpler approach is to predict:
1. The probability of a profitable trade
2. The expected win size when you're right
3. The expected loss size when you're wrong

Let's define these precisely:

$y_t = \mathbb{1}\left[g_{t,\tau}^{\text{long}} > 0\right]$

This is a binary variable: 1 if the trade would be profitable, 0 otherwise. Our model estimates:

$\hat{p}_t = P(y_t = 1 \mid \mathcal{F}_t)$

This is the probability—given everything we know at time $t$ —that the trade will be profitable.

We also estimate:

$\mu_t^+ = \mathbb{E}\left[g_{t,\tau}^{\text{long}} \mid \mathcal{F}_t, y_t = 1\right]$

This is the expected gain when we win—a positive number.

$\mu_t^- = -\mathbb{E}\left[g_{t,\tau}^{\text{long}} \mid \mathcal{F}_t, y_t = 0\right]$

This is the expected loss when we lose—also expressed as a positive number for convenience.

The Million-Dollar Question: How Confident Is Confident Enough?

This is the heart of the framework. Given our probability estimate $\hat{p}_t$ and our win/loss magnitudes $\mu_t^+$ and $\mu_t^-$ , when should we actually trade?

The expected value of trading is:

$\mathbb{E}[g_{t,\tau}^{\text{long}} \mid \mathcal{F}_t] = \hat{p}_t \cdot \mu_t^+ - (1 - \hat{p}_t) \cdot \mu_t^-$

In plain English: your expected profit equals (probability of winning × average win) minus (probability of losing × average loss).

But we shouldn't trade just because expected value is positive. We want a margin of safety—call it $\lambda_t$ —to account for model uncertainty, risk limits, and operational constraints. So our rule becomes:

Trade only if: $\hat{p}_t \cdot \mu_t^+ - (1 - \hat{p}_t) \cdot \mu_t^- > \lambda_t$

The Magic Formula: Your Required Win Rate

Solving that inequality for $\hat{p}_t$ , we get the minimum probability threshold:

$\boxed{\pi_t^* = \frac{\mu_t^- + \lambda_t}{\mu_t^+ + \mu_t^-}}$

This is the key result. It tells you exactly how confident you need to be before acting.

Let's understand what this formula says:

If this increases...	Then $\pi^*$ ...	Intuition
Expected loss $\mu^-$	Goes up	Bigger potential losses require more confidence
Expected win $\mu^+$	Goes down	Bigger potential wins justify acting with less certainty
Risk margin $\lambda$	Goes up	More cautious stance raises the bar

A Worked Example

Suppose your model gives you:
- $\hat{p}_t = 0.62$ (62% estimated probability of profit)
- $\hat{\mu}_t^+ = 0.0009$ (expected win of 9 basis points)
- $\hat{\mu}_t^- = 0.0007$ (expected loss of 7 basis points)
- $\lambda_t = 0.0001$ (risk margin of 1 basis point)

The required threshold is:

$\pi_t^* = \frac{0.0007 + 0.0001}{0.0009 + 0.0007} = \frac{0.0008}{0.0016} = 0.50$

Since $\hat{p}_t = 0.62 > 0.50 = \pi_t^*$ , you should trade.

But now imagine volatility spikes, and you increase your risk margin to $\lambda_t = 0.0005$ :

$\pi_t^* = \frac{0.0007 + 0.0005}{0.0009 + 0.0007} = \frac{0.0012}{0.0016} = 0.75$

Now $\hat{p}_t = 0.62 < 0.75 = \pi_t^*$ , so you should not trade.

This is why thresholding must depend on current market conditions—not be a fixed number.

Part 4: Finding Statistical Edges

Now that we know how to convert probabilities into decisions, where do those probabilities come from? The framework uses three main sources of information.

4.1 Temporal Patterns: Do Certain Times Work Better?

The Question: Are there certain days of the week or times of day when stocks tend to perform differently?

The Challenge: This seems simple, but it's a statistical minefield. Markets are not like coin flips—there's serial correlation (today's return affects tomorrow's) and heteroskedasticity (volatility clusters). Standard statistical tests assume away these features and give misleading results.

Weekday Effects

For each day $d$ (Monday through Friday), we want to estimate:

$\mu_d = \mathbb{E}\left[\ell_t \mid \text{day}(t) = d\right]$

and test whether it differs from other days.

The Right Way to Test This:

Calculate returns using a consistent definition
Estimate differences using HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors—these account for the fact that returns are correlated and have changing volatility
Use block permutation tests that preserve the time structure while breaking the day-of-week association
Apply multiple testing corrections because you're testing many hypotheses (5 days × multiple stocks × multiple horizons)

Intraday Windows

Similarly, we can partition the trading day into windows (Open, Mid-morning, Midday, Afternoon, Close) and compute the return within each:

$L_w(t) = \sum_{u \in w(t)} \ell_u$

The same statistical discipline applies: HAC inference, permutation tests, multiple testing control.

Important Caveat: Temporal patterns are weak and regime-dependent. They should inform your model as features, not drive decisions on their own.

4.2 Price Levels: Support and Resistance

Technical analysts have long observed that prices seem to "bounce" off certain levels. Can we formalize this?

Rolling Quantiles as Levels

Instead of drawing arbitrary lines on a chart, we use statistical quantiles of recent prices:

$S_t^{\text{support}} = \text{Quantile}_\alpha\{S_u : u \in [t-W, t-1]\}$
$R_t^{\text{resistance}} = \text{Quantile}_{1-\alpha}\{S_u : u \in [t-W, t-1]\}$

Here, $W$ is the lookback window (say, 5,000 to 20,000 bars) and $\alpha$ is a small number like 0.1 or 0.15.

Translation: Support is roughly the 10th percentile of recent prices—a level the stock rarely trades below. Resistance is the 90th percentile—a level it rarely exceeds.

Generating Candidates

With a tolerance $\delta$ (to avoid exact-boundary whipsaws) and a momentum check $m_t = S_t - S_{t-k}$ :

Long candidate: Price near support AND momentum turning positive
$S_t \leq S_t^{\text{support}}(1 + \delta) \quad \text{and} \quad m_t > 0$
Short candidate: Price near resistance AND momentum turning negative
$S_t \geq R_t^{\text{resistance}}(1 - \delta) \quad \text{and} \quad m_t < 0$

The momentum check prevents you from trying to catch a falling knife—it waits for evidence that the bounce is actually happening.

4.3 Multi-Horizon Momentum

Rather than relying on a single moving average crossover (which is noisy), we aggregate signals across multiple time horizons:

$\text{Score}(t) = \sum_{k \in K} w_k \cdot \text{sign}\left(\text{MA}_{k,\text{short}}(t) - \text{MA}_{k,\text{long}}(t)\right)$

This score is positive when short-term averages are above long-term averages across multiple horizons (bullish) and negative in the opposite case (bearish).

The weights $w_k$ should be estimated from past data and regularized to prevent overfitting.

Part 5: The Machine Learning Layer

What Are We Predicting?

The primary target is the binary outcome: will the trade be profitable after costs?

$y_t = \mathbb{1}\left[g_{t,\tau}^{\text{long}} > 0\right]$

Note that this uses executable fill prices, not mid-prices. This ensures we're predicting economic profitability, not just price direction.

Features (Inputs to the Model)

All features must be computed using only past information. Typical inputs include:

Recent returns: $\{\ell_{t-k}\}$ for various lags
Volatility measures: Realized volatility, range-based estimators
Temporal features: Day of week, time of day, rolling window estimates
Level features: Distance to support/resistance
Momentum: Aggregated score across horizons
Microstructure: Spread, depth proxies, volume patterns

Model Choice

For tabular data like this, gradient boosting (XGBoost, LightGBM) is typically a strong baseline. More complex models like transformers or RNNs can be tried but must beat the simpler approach after costs—not just on accuracy metrics.

The model outputs:

$\hat{p}_t = P(y_t = 1 \mid \mathcal{F}_t)$

Optionally, it can also output magnitude estimates $\hat{\mu}_t^+$ and $\hat{\mu}_t^-$ .

Why Calibration Matters More Than Accuracy

Here's a subtle but critical point: your model's probabilities must be trustworthy.

A model is well-calibrated if, among all the times it says "60% chance of profit," about 60% actually are profitable. Many models have good accuracy but terrible calibration—they might output "70%" when the true probability is only 55%.

Why does this matter? Because your threshold formula $\pi_t^*$ uses $\hat{p}_t$ as an actual probability. If your model's probabilities are wrong, you'll trade too much or too little.

Solution: After training, apply a calibration step (isotonic regression or Platt scaling) on held-out data to correct the probabilities.

Part 6: Validation—Proving It Works

The Problem with Regular Train/Test Splits

In typical machine learning, you randomly shuffle data into training and test sets. This doesn't work for time series because:

Future leakage: Random shuffling can put future observations in training
Overlapping labels: If your prediction horizon is 90 minutes, observations 30 minutes apart share some of the same future returns

Walk-Forward Validation with Purging and Embargo

The solution is walk-forward validation:

Train on historical data up to time $T_1$
Validate on data from $T_1 + \text{embargo}$ to $T_2$
Roll forward and repeat

The embargo period ensures no information leakage. If your prediction uses returns over $[t, t+\tau]$ , then any training sample within $\tau$ periods of the validation start could leak information. A safe embargo is at least $\tau$ periods.

Purging removes training samples whose label periods overlap with validation/test periods.

Metrics That Matter

Statistical Metrics:
- AUC (Area Under ROC Curve)
- Brier Score (measures probability accuracy)
- ECE (Expected Calibration Error)

Economic Metrics:
- Net P&L after all costs
- Sharpe Ratio (return per unit risk)
- Maximum Drawdown (worst peak-to-trough loss)
- Fill Rate (what percentage of intended trades actually execute)

Critical: All economic metrics must be computed net of execution costs. A strategy with great gross returns but poor execution is not a strategy—it's an illusion.

Part 7: Execution—Where Theory Meets Reality

The Execution Shortfall

You can have the best predictions in the world, but poor execution will destroy your edge. We measure execution quality using shortfall:

$ES = \frac{P^{\text{fill}} - B(W)}{B(W)}$

where $B(W)$ is a benchmark price—typically VWAP (Volume-Weighted Average Price) over your execution window.

Positive shortfall means you did worse than the benchmark; negative means better.

Transaction Cost Analysis (TCA)

Total trading cost breaks down as:

$C = C_{\text{fees}} + C_{\text{spread}} + C_{\text{impact}} + C_{\text{slippage}}$

Market impact is often modeled with a square-root rule: impact scales with the square root of your participation rate (what fraction of volume you represent).

The Feedback Loop

Here's the key insight: your cost assumptions should be updated from realized execution data.

If your model assumed 5 bp of costs but you're consistently seeing 8 bp, your threshold $\pi_t^*$ is too low and you're overtrading. Update the conservative buffer $\epsilon_c$ based on actual shortfall statistics.

Part 8: Putting It All Together

The Complete Pipeline

Here's how everything fits together in real-time:

For each decision time t:
    1. UPDATE FEATURES
       - Compute all rolling statistics using only data up to t
       - No future information allowed

    2. GENERATE PREDICTION
       - Feed features into calibrated model
       - Output: probability p̂_t and magnitude estimates μ̂⁺, μ̂⁻

    3. COMPUTE THRESHOLD
       - Calculate π*_t = (μ̂⁻ + λ_t) / (μ̂⁺ + μ̂⁻)
       - λ_t depends on current volatility and risk limits

    4. DECIDE
       - If p̂_t > π*_t: proceed to execution
       - Otherwise: no action

    5. EXECUTE (if trading)
       - Select order type based on urgency and liquidity
       - Use TWAP/VWAP/POV algorithm to minimize impact
       - Respect participation limits

    6. RECORD AND LEARN
       - Log fill price, shortfall, latency
       - Update cost buffer if systematic deviations

The Baseline Ladder: Complexity Must Earn Its Keep

Before deploying any sophisticated model, compare against simpler alternatives:

Level	Strategy	Purpose
1	Random timing	Sanity check—anything should beat this
2	Simple momentum + fixed costs	Basic heuristic benchmark
3	Statistical timing only	Tests value of temporal patterns
4	Gradient boosting with features	Strong tabular baseline
5	Sequence model (Transformer/RNN)	Only if it beats Level 4 stably

Rule: Never use a complex model unless it beats the simpler one in net-of-cost terms across multiple assets and time periods.

Part 9: What Can Go Wrong (And How to Avoid It)

Common Failure Modes

Problem	Symptom	Solution
Leakage	Amazing backtest, terrible live performance	Audit all features; enforce purging and embargo
Multiple testing	"Discovered" patterns that don't replicate	Control false discovery rate; require effect sizes
Cost neglect	Profitable before costs, losing after	Use executable fills; maintain adaptive cost buffer
Miscalibration	Systematic overtrading or undertrading	Monitor ECE; recalibrate on fresh data
Regime change	Strategy works, then suddenly doesn't	Rolling re-estimation; regime detection

What This Framework Does NOT Guarantee

Let's be clear: no methodology guarantees profits.

What this framework does guarantee:
- If you observe good performance, it's less likely to be an artifact of data snooping or cost-free fantasy
- Your decision rule is economically interpretable and auditable
- You have a systematic way to update beliefs and improve

Part 10: Practical Takeaways

For Individual Investors

Know your costs. Before evaluating any timing idea, understand your actual trading costs (spread + fees + slippage).
Demand calibration. If someone tells you their model has "70% accuracy," ask: "Are the predicted probabilities actually correct?" Accuracy without calibration is nearly useless for decision-making.
Use the threshold formula. Even without a fancy model, you can use:
$\pi^* = \frac{\mu^- + \lambda}{\mu^+ + \mu^-}$
Estimate average wins and losses from your trading history, add a risk margin, and you have a principled minimum confidence requirement.
Be skeptical of temporal patterns. "The market goes up on Mondays" might have been true historically but may not survive proper statistical scrutiny or persist in the future.

For Quantitative Researchers

Report everything net of costs. Gross returns are misleading. Always specify your execution model.
Use the baseline ladder. Force complex models to prove their worth against simpler alternatives.
Calibrate before thresholding. ECE matters more than AUC for actual trading decisions.
Document for reproducibility. Data sources, preprocessing, feature definitions, embargo rules, execution assumptions—all should be explicit.

For Portfolio Managers

The threshold is not a constant. $\pi_t^*$ should vary with volatility, liquidity, and risk budget. A fixed threshold is suboptimal.
Execution quality is alpha. Two identical prediction models can have vastly different P&L based on execution. Measure and optimize for shortfall.
Audit the loop. Regularly verify that predicted probabilities match realized frequencies and that cost assumptions match reality.

Conclusion: From Signals to Decisions

The core insight of this framework is simple but often overlooked: timing is not pattern recognition—it's decision-making under uncertainty with frictions.

This shift in perspective has profound implications:

Statistical significance is necessary but not sufficient; you need economic significance after costs
Probability estimates must be calibrated, not just accurate
The threshold for action must be derived from costs and risk, not chosen arbitrarily
Execution is not an afterthought; it's where alpha lives or dies

The formula $\pi_t^* = \frac{\mu_t^- + \lambda_t}{\mu_t^+ + \mu_t^-}$ encapsulates this philosophy: how confident you need to be depends on what's at stake.

By following the procedures outlined here—dependence-aware statistics, leakage-safe validation, execution-conscious evaluation, and explicit threshold derivation—you can build timing systems that are both scientifically credible and practically tradeable.

Or, equally valuable, you can quickly falsify timing ideas that don't survive scrutiny—before they cost you money.

Glossary

Term	Definition
Basis point (bp)	0.01%, or one-hundredth of a percent
Calibration	The property that predicted probabilities match actual frequencies
ECE	Expected Calibration Error—a measure of how well-calibrated a model is
Embargo	A gap between training and test data to prevent information leakage
Filtration ( $\mathcal{F}_t$ )	All information available at time $t$
HAC	Heteroskedasticity and Autocorrelation Consistent (a type of robust standard error)
Log return	$\log(P_t) - \log(P_{t-1})$ ; percentage returns that add over time
Market impact	The price move caused by your own trading
Purging	Removing training samples that overlap with test labels
Shortfall	The difference between your execution price and a benchmark
Slippage	Unfavorable price movement between decision and execution
VWAP	Volume-Weighted Average Price—a common execution benchmark