Prediction Models

Understanding how we predict End-of-Day volume for MATIF Wheat futures

Dataset Manager

Datasets & Profiles

Active	Visible	Color	Name / Description	Created	Actions
No datasets found. Upload one to get started.

Managing Profiles

Upload new historical data CSVs to create profiles.
Toggle the eye icon to show/hide profiles on the graph.
Set a profile as Active to use it for live predictions.

Profile Visualization

Intraday Volume Profile

Loading profile...

System Overview

Our system uses two complementary models to predict the End-of-Day (EOD) volume for Euronext MATIF Wheat futures. Each model provides predictions with 95% confidence intervals, allowing you to assess prediction uncertainty as the trading day progresses.

Mathematical Foundation: Why It Works

The high accuracy of our backtest results is not accidental. It is mathematically grounded in the high autocorrelation of intraday volume profiles.

The Stability Principle

Market microstructure theory (and our own simulated data) demonstrates that while total daily volume fluctuates wildy, the distribution of that volume throughout the day is remarkably stable. If 15% of volume typically trades by 11:00 AM, that ratio tends to hold true whether the total day is 10k or 100k contracts.

Proof of Model Validity

Our walk-forward backtest serves as the proof: by strictly partitioning training and testing data, we showed that knowing the volume at 14:00 (2 PM) allows us to predict the End-of-Day total with ~98% accuracy on standard days. This confirms that the Volume-to-Date ratio is a statistically robust predictor.

HCVP Model

Historical Cumulative Volume Profile (Baseline)

How It Works

The HCVP model uses a simple but effective approach: it learns the typical intraday volume pattern from historical data. For each 15-minute interval during the trading day, it calculates what percentage of the daily volume has typically been traded by that time.

Formula

V_EOD = V_current / P_expected

Where P_expected is the average % of daily volume completed at current time

Key Parameters

Cumulative Profile

Average % of volume traded at each 15-min interval across all historical days

Error Standard Deviation

Historical prediction error (~15%) used to calculate confidence intervals

✓ Strengths

• Simple and interpretable
• Fast computation
• Captures intraday W-shape pattern
• Reliable baseline

✗ Limitations

• Doesn't account for recent trends
• Ignores calendar effects
• Same prediction every day
• ~15% average error

CMEM Model

Component Multiplicative Error Model (Advanced)

Recommended

How It Works

The CMEM model combines three components to make more accurate predictions. It considers yesterday's volume, weekly trends, the time of day, AND special calendar events like USDA reports or roll periods.

Three Components

Daily Dynamics (μd)

Predicts today's total volume using learned weights for yesterday's volume and the 5-day moving average

μd = w₁×Lag1 + w₂×Rolling5 + Intercept

Seasonal Component (si)

Intraday volume pattern - the W-shape curve showing higher activity at open (10:45), US open (15:30), and close (18:30)

si = cumulative % at time t

Calendar Multipliers (M)

Adjustments for special days: USDA reports (+20%), roll periods (+50%), US holidays (-30%), harvest season (+15%)

M = 1.0 (normal) to 1.5 (roll period)

Prediction Formula

V_EOD = (μ_d × M) × (1 - w) + (V_current / s_i) × w

Where w = % of day completed. This weighted average blends the static forecast (μd × M) with dynamic extrapolation, giving more weight to current data as the day progresses.

Calendar Adjustments

Day Type	Multiplier	Effect	Reason
USDA Report	1.20	+20%	Monthly crop report at 18:00 CET
Roll Period	1.50	+50%	Contract expiry approaching
Harvest Season	1.15	+15%	July-August physical delivery activity
US Holiday	0.70	-30%	Reduced US trader participation
Standard	1.00	±0%	Normal trading day

✓ Strengths

• Considers recent volume trends
• Calendar-aware (USDA, holidays, etc.)
• Adapts throughout the day
• ~12% average error (vs 15% for HCVP)
• More accurate on special days

⚠ Considerations

• More complex to interpret
• Requires more historical data
• Needs periodic retraining
• Multipliers may need tuning

Euronext Data Delay

Euronext live market data is delayed by 15 minutes. When our scraper fetches volume at clock time T, the data actually reflects the market state at T−15 minutes.

Why This Matters for Predictions

Both models (HCVP and CMEM) look up a historical cumulative volume profile to determine what percentage of daily volume is expected at a given time. If we used the current clock time T with volume data from T−15min, the models would use a higher expected percentage than what the volume actually corresponds to, leading to systematic underestimation of End-of-Day volume.

Example

Scraper runs at 14:45 CET. Volume data is actually from 14:30 CET. At 14:45, the expected cumulative profile is 62%. At 14:30, it's 55%. Without correction: EOD = V / 0.62 (too low). With correction: EOD = V / 0.55 (accurate).

The system accounts for this by using the effective data time (clock time minus 15 minutes) for all profile lookups. This delay is configurable via the DATA_DELAY_MINUTES environment variable.

Understanding Confidence Intervals

Both models provide 95% confidence intervals, meaning there's a 95% chance the actual EOD volume will fall within the predicted range. The confidence interval narrows throughout the day as more data becomes available.

Example

At 11:00 AM: Prediction = 30,000 ±6,000 (±20% uncertainty)
At 5:00 PM: Prediction = 31,200 ±1,500 (±5% uncertainty)

Metrics Definitions

MAPE (Mean Absolute % Error)

The average percentage difference between predicted and actual values. Lower is better.

RMSE (Root Mean Square Error)

The standard deviation of prediction errors. Penalizes large errors more heavily. Lower is better.

Hit Rate

The percentage of times the actual value falls within the predicted 95% confidence interval. Higher is better (ideally close to 95%).

Avg Error (Average Absolute Error)

The average absolute difference between predicted and actual values in volume units. Lower is better.

Which Model Should I Use?

Use CMEM for most trading decisions. It's more accurate and accounts for market events.

Use HCVP as a baseline reference or when you want a simple, consistent prediction.

Compare both to gauge prediction uncertainty. Large differences suggest unusual market conditions.