About the Model
I take great pride in having completely proprietary models. 100% fully custom. Nothing is duplicated from other models out there. No I don’t use API calls to ChatGPT. I pull my own data daily and rerun my python code to get the latest bet suggestions.
Data is key. Without good data, there is no way to derive intelligence. Without good data, you have no chance at achieving any sort of accuracy with your ML/AI models. That is why about 90% of my time with AI Picks Pro is spent on my data aggregation pipelines. I have meticulously set up my data scraping code to be extremely robust. I have decades of sports data stored and it updates daily to always have the latest information for retraining my models.
Exploratory Data Analysis is next. It is crucial to understand what data relates to the outcome of the game. Some data is useful, and other data might be noisy. It is important to only add useful features into our model, a common pitfall in ML projects is aggregating tons of data and throwing it into a model without vetting each feature. Having a deep understanding about the data is extremely key. As humans, we have biases about what information we think would help to predict the outcome of a game. But in ML/AI modeling, we want to use statistical methods to answer these questions and try to ignore our biases.
Preprocessing the data comes after EDA. Now we know what data we need to include in our model, the data that has relationships with our target prediction value. But before we can model, we need to clean up the data we are choosing to use. We may need to filter out some outliers, impute missing values, etc. The outcomes from EDA help inform this step in the process.
Feature Engineering is the process of creating additional features from our dataset. We may find it is important to create interaction terms and game specific information, such as - home team performance at home and away team performance outside when they usually play in a dome. These sorts of insights can be super helpful. Information like this has real relationships to the score outcome and there is no data source that has this information for us to pull from. We must write code to derive these types of features ourselves to create as much information we can before modeling. Feature engineering is key to differentiate our models from others. Every model out there knows how many rebounds one team has and how many rebounds the other team has. Deriving additional value from this information is the secret sauce. Some of my models require PCA, some don’t. Some of my models require a lot of interaction terms between the home and away team, some require none. Predicting sports outcomes are not a one size fits all, so it has been important for me to treat each like a new project and go through extensive exploration of different features.
Machine Learning Modeling. The best part of the process, where we get to try different model types and compare error metrics. I spend a great deal of time vetting my model choices, I back test on years of historical data to see which models perform the best with bet suggestions and error metrics. Some of my models are ensembles of weak learners, some of my models are multilayer perceptron neural networks, some of my models are boosted trees. All of my different use cases require a different model type based on the number of features, number of training examples and many other factors. The key takeaway here is that we owe it to ourselves to not leave any stone unturned. I have tried deep neural networks, parametric and non-parametric models, linear models and non-linear models and many many more. We are making sports wagers with real money so we better find the best model type for our use case.
I love talking about this stuff. Please use the contact page and send over any questions. I’m happy to share info and discuss ideas together.