Fix GA training window to include more data.

francisco3511 · Nov 25, 2024 · c10f147 · c10f147
1 parent 9b64e0d
commit c10f147
Show file tree

Hide file tree

Showing 5 changed files with 15 additions and 33 deletions.
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 
 
-This project implements an intelligent dynamic stock selection system using a **Genetic Algorithm-optimized XGBoost** (GA-XGBoost) classifier to identify stocks with potential market outperformance. The model analyzes quarterly financial statements, market data, insider trading patterns and other external data to predict whether a stock will outperform the S&P 500 index over a one-year horizon over a large margin. The project includes a **Streamlit-based analytics dashboard** that provides comprehensive stock analysis tools, including technical indicators, financial metrics visualization, and model-driven insights.
+This project implements an intelligent dynamic stock selection system using an **Adaptive Genetic Algorithm-optimized XGBoost** (GA-XGBoost) classifier to identify stocks with potential market outperformance. The model analyzes quarterly financial statements, market data, insider trading patterns and other external data to predict whether a stock will outperform the S&P 500 index over a one-year horizon over a large margin. The project includes a **Streamlit-based analytics dashboard** that provides comprehensive stock analysis tools, including technical indicators, financial metrics visualization, and model-driven insights.
 
 
 ## Table of Contents
@@ -23,7 +23,7 @@ This project implements an intelligent dynamic stock selection system using a **
 
 ## Project Overview
 
-The stock classifier is built using GA-XGBoost and trained on:
+The stock classifier is built using Adaptive GA-XGBoost and trained on:
 - Quarterly financial statements
 - Market data and technical indicators
 - Insider trading information
@@ -39,9 +39,9 @@ The model predicts whether a stock will outperform the S&P 500 over a one-year h
 ## Features
 
 - **Model Training**
-  - GA-XGBoost classifier with optimized hyperparameters
+  - Adapative GA-XGBoost classifier with optimized hyperparameters
   - Feature engineering including growth ratios, financial metrics, price momentum, and volatility
-  - Cross-validation and performance metrics
+  - Expanding window cross-validation and performance metrics
 
 - **Streamlit App**
   - Market overview dashboard
@@ -77,8 +77,8 @@ The model predicts whether a stock will outperform the S&P 500 over a one-year h
 
 2. Install pre-commit hooks:
    ```bash
-   chmod +x scripts/install-hooks.sh
-   ./scripts/install-hooks.sh
+   chmod +x install-hooks.sh
+   install-hooks.sh
    ```
 
 ## Usage
@@ -104,11 +104,13 @@ Score stocks for a given trade date:
    stocksense --score --trade-date YYYY-MM-DD
    ```
 
+In order to evaluate for the last trading date, don't specify a trade date.
+
 ### Streamlit App
 
 To open the Streamlit app:
    ```bash
-   stocksense --app
+   stocksense-app
    ```
 
 
@@ -130,29 +132,6 @@ Ye, Z. J., & Schuller, B. W. (2023). Capturing dynamics of post-earnings-announc
 
 Liu, X. Y., Yang, H., & Chen, Q. (2019). A Sustainable Quantitative Stock Selection Strategy Based on Dynamic Factor Adjustment. *Columbia University*. [[paper]](add_link_if_available)
 
-```bibtex
-@article{yang2020practical,
-  title={A Practical Machine Learning Approach for Dynamic Stock Recommendation},
-  author={Yang, Hongyang and Liu, Xiao-Yang and Wu, Qingwei},
-  institution={Columbia University},
-  year={2020}
-}
-
-@article{ye2023capturing,
-  title={Capturing dynamics of post-earnings-announcement drift using a genetic algorithm-optimized XGBoost},
-  author={Ye, Zhengxin Joseph and Schuller, Bj{\"o}rn W.},
-  institution={Imperial College London},
-  year={2023}
-}
-
-@article{liu2019sustainable,
-  title={A Sustainable Quantitative Stock Selection Strategy Based on Dynamic Factor Adjustment},
-  author={Liu, Xiao-Yang and Yang, Hongyang and Chen, Qingwei},
-  institution={Columbia University},
-  year={2019}
-}
-```
-
 
 ## License
 

diff --git a/scripts/install-hooks.sh → install-hooks.sh b/scripts/install-hooks.sh → install-hooks.sh
diff --git a/stocksense/main.py b/stocksense/main.py
@@ -16,12 +16,14 @@ def prepare_data():
 @click.option("-u", "--update", is_flag=True, help="Update stock data.")
 @click.option("-t", "--train", is_flag=True, help="Train model.")
 @click.option("-s", "--score", is_flag=True, help="Score stocks.")
+@click.option("-f", "--force", is_flag=True, default=False, help="Force model retraining.")
 @click.option(
+    "-tdq",
     "--trade-date",
     type=click.DateTime(formats=["%Y-%m-%d"]),
     help="Trade date for model operations (format: YYYY-MM-DD)",
 )
-def main(update, train, score, trade_date):
+def main(update, train, score, force, trade_date):
     """
     CLI handling.
     """
@@ -36,7 +38,7 @@ def main(update, train, score, trade_date):
         stocks = DatabaseHandler().fetch_sp500_stocks()
         handler = ModelHandler(stocks, trade_date)
         if train:
-            handler.train(data)
+            handler.train(data, retrain=force)
         if score:
             handler.score(data)
 

diff --git a/stocksense/model/genetic_algorithm.py b/stocksense/model/genetic_algorithm.py
@@ -136,7 +136,7 @@ def get_train_val_splits(data: pl.DataFrame, stocks: list[str], min_train_years:
         if i + 1 < min_train_years:
             continue
 
-        train_years = years[: i + 1]
+        train_years = years[: i + 2]
         val_years = [years[i + 2], years[i + 3]]
 
         train = data.filter(pl.col("tdq").dt.year().is_in(train_years))

diff --git a/stocksense/model/model_handler.py b/stocksense/model/model_handler.py
@@ -89,6 +89,7 @@ def train(self, data: pl.DataFrame, retrain: bool = False):
             model.save_model(model_file)
         except Exception as e:
             logger.error(f"ERROR: failed to train model - {e}")
+            raise
 
     def score(self, data):
         """