top of page
搜尋

Point-in-Time Equity Return Prediction Using Interpretable Machine Learning

  • 作家相片: rosy851018
    rosy851018
  • 3月29日
  • 讀畢需時 3 分鐘

Can We Predict Stock Returns with Data?

Introduction

Stock returns are often considered unpredictable, especially in the short term. Prices move quickly, and daily changes are often noisy and random.


In this project, I explore a key question:

Can we use data to predict whether stock prices will go up or down?

To answer this, I combined price data and company fundamentals, and built machine learning models to test predictability across different time horizons.


Step1: Building a Clean Dataset

To avoid unrealistic results, I used a Point-in-Time (PIT) approach. This ensures that the model only uses information that was available at that time.

merged = pd.merge_asof(
    prices,
    fundamentals,
    left_on='Date',
    right_on='Report Date',
    by='Ticker',
    direction='backward'
)

This avoids data leakage from future information, which is a common mistake in financial modeling.


Step2: Turning Raw Data into Signals

Raw data alone is not enough. The key is to turn it into meaningful features.

I started with simple price movements:

panel["ret_1d"] = panel.groupby("Ticker")["Close"].pct_change()
panel["vol_chg_1d"] = panel.groupby("Ticker")["Volume"].pct_change()

Then I added momentum signals to capture trends:

for w in [5, 20, 63]:
    panel[f"momentum_{w}"] = (
        panel.groupby("Ticker")["Close"].pct_change(w)
    )

These features help the model understand how prices move over time.


Step 3: Capturing Market Trends

Markets often follow trends, so I added moving average signals:

panel["ema_12"] = panel.groupby("Ticker")["Close"].transform(
    lambda s: s.ewm(span=12).mean()
)

panel["ema_26"] = panel.groupby("Ticker")["Close"].transform(
    lambda s: s.ewm(span=26).mean()
)

panel["ema_cross"] = panel["ema_12"] - panel["ema_26"]

This captures whether the market is in an uptrend or downtrend.


Step 4: Adding Company Fundamentals

Stock prices are not only driven by market behavior—they also depend on company performance.

So I added fundamental features:

panel["eps"] = panel["Net Income"] / panel["Shares (Diluted)"]
panel["profit_margin"] = panel["Net Income"] / panel["Revenue"]
panel["rev_growth_qoq"] = panel.groupby("Ticker")["Revenue"].pct_change()

This allows the model to learn from both price behavior and business fundamentals.


Step 5: Defining the Prediction Task

Instead of predicting exact returns, I simplified the problem:

Will the stock go up or down?

def make_label(df, h):
    future_ret = (1 + df["ret_1d"]).shift(-1).rolling(h).apply(lambda x: x.prod() - 1)
    return (future_ret > 0).astype(int)

This turns the problem into a binary classification task, which is more stable.


Step 6: Building the Model

I chose a simple but powerful model: logistic regression.

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", LogisticRegression(
        penalty="l1",
        solver="liblinear"
    ))
])

L1 regularization helps:

  • Remove unimportant features

  • Keep the model interpretable


Step 7: Comparing Models

I also tested other models:

models = {
    "LogReg": LogisticRegression(),
    "RandomForest": RandomForestClassifier(),
    "XGBoost": XGBClassifier()
}

Interestingly, all models performed similarly.

This shows that: Good data and features matter more than complex models.


What Did I Find?

The results show that short-term predictions (1 day) are almost random, while performance improves significantly at the 20-day horizon and becomes highly predictable over 60–120 days. As the time horizon increases, market noise decreases, trends become clearer, and fundamentals play a more important role.


Turning Predictions into Strategy

To test the model in practice, the predictions were used to build a simple trading strategy: go long when the predicted probability is high, and stay in cash otherwise. The results show higher returns, lower drawdowns, and more stable performance compared to buy-and-hold. This suggests that the model captures useful patterns that can be applied in real trading.

signal = (proba > 0.55).astype(int)
strategy_return = signal * R_test.values

Final Thoughts

Strong results don’t always require complex models. Clean data, good features, and a clear problem matter more. Even in noisy markets, data-driven methods can still uncover useful signals—especially over longer horizons.


Key Takeaways

  • Stock returns are hard to predict in the short term

  • Predictability improves over longer horizons

  • Technical signals dominate short-term predictions

  • Fundamentals matter more in the long run

  • Simple models can still be very powerful


Technical Skills Used

  • Python & Data Analysis: pandas, numpy

  • Machine Learning: Logistic Regression (L1/L2), Random Forest, XGBoost

  • Quantitative Methods: Time series modeling, return forecasting, feature selection

  • Feature Engineering: Momentum, volatility, EMA signals, financial ratios

  • Data Engineering: Point-in-Time (PIT) data pipeline, data cleaning

  • Model Validation: Accuracy, ROC-AUC, cross-validation, backtesting

  • Visualization: Matplotlib, Seaborn


References

  1. N. Jegadeesh and S. Titman, “Returns to Buying Winners and Selling Losers,” Journal of Finance, vol. 48, no. 1, pp. 65–91, 1993.

  2. M. Carhart, “On Persistence in Mutual Fund Performance,” Journal of Finance, vol. 52, no. 1, pp. 57–82, 1997.

  3. C. Asness, T. Moskowitz, and L. Pedersen, “Value and Momentum Everywhere,” Journal of Finance, vol. 68, no. 3, pp. 929–985, 2013.

  4. N. Jegadeesh and S. Titman, “Momentum,” Review of Financial Studies, vol. 23, no. 2, pp. 793–826, 2011.

  5. J. B. Heaton, N. G. Polson, and J. H. Witte, “Deep Learning in Finance,” Journal of Political Economy, vol. 129, no. 6, pp. 197–241, 2001.




 
 

CONTACT

Boston, MA | rosytsao@bu.edu  |  Tel: +1 (857) 565-9677

  • Gmail_icon_(2020)_edited
  • GitHub
  • LinkedIn

Thank you for providing the information above.

bottom of page