MachineLearningForEuropeanOptions

Project for Financial Engineering Club at UIUC - Spring 2025 semester

Environment

I am currently working on this locally on my PC with a NVIDIA GeForce RTX 4060. I am using PyTorch version 2.6.0+cu126 for CUDA 12.6.

Project Overview

Undergraduate Student Project Proposal: Machine Learning for European Option Pricing

The goal of this project is to develop a machine learning-based system for pricing European options and estimating implied volatility using open-source data from OptionDX.com. The project will leverage state-of-the-art algorithms, including Random Forest, XGBoost, and Neural Networks, to analyze historical option data and predict market behavior.

Key Steps

Data Collection: Gather historical European option data from OptionDX.com (free), including strike price, expiration date, underlying asset price, and other relevant variables. SPX is a European style option instrument on the S&P500 Index.
Model Development: Implement three machine learning models:
- Random Forest – tree-based model, useful for feature importance analysis.
- XGBoost – ensembled model for efficient gradient boosting.
- Neural Networks (e.g., Multi-Layer Perceptron) - for capturing complex non-linear patterns in option pricing.
Model Evaluation: Compare the performance of the models using metrics such as RMSE, Rsquared, and computational time efficiency.
Visualization: Create visualizations to demonstrate model predictions versus actual implied volatility surfaces.
Documentation & Deployment: Document findings and deploy a user-friendly interface for real-world application.

Expected Outcomes

- A robust machine learning framework for European option pricing constructed in Python (Google Colab link).

- Insights into the relative performance of different algorithms for financial data analysis.

- A deployable tool for traders and risk managers to estimate implied volatility quickly.

Stretch Goals

- Compare the machine learning approaches to numerical methods for calculating implied volatility (Black-Scholes model). Is there a time advantage to using machine learning trained on this data compared to the numerical method? Is there a loss of accuracy?

- Compare the machine learning models trained on different sizes of training set. What is the loss or increase in accuracy due to training set size? What is the trade-off due to additional training time?

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
exploration		exploration
neural_network		neural_network
optionsdx/aapl_eod_2023q1-fslib7		optionsdx/aapl_eod_2023q1-fslib7
.gitignore		.gitignore
LICENSE		LICENSE
MachineLearningEuropeanOptionsProjectDescription.pdf		MachineLearningEuropeanOptionsProjectDescription.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MachineLearningForEuropeanOptions

Environment

Project Overview

Undergraduate Student Project Proposal: Machine Learning for European Option Pricing

Key Steps

Expected Outcomes

Stretch Goals

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MachineLearningForEuropeanOptions

Environment

Project Overview

Undergraduate Student Project Proposal: Machine Learning for European Option Pricing

Key Steps

Expected Outcomes

Stretch Goals

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages