NFLPool Prototyping with MongoDB

Yesterday was a good day.

With the static pages for nflpool.xyz complete, I started thinking about the dynamic pages. These are going to require access to the database and I’ll be using MongoDB. I had started the MongoDB course from Talk Python, but put that aside to go back to the Python for Entrepreneurs course to get the site up using Pyramid.

I took a step back and did some brainstorming about the data model I’ll need for the database. I grabbed a spare whiteboard and started scribbling.

nflpool whiteboard brainstorming

Using MongoDB requires you to shift your mental model from traditional SQL and joining tables as MongoDB doesn’t technically do joins. I needed to think about what a collection looks like and do I embed more documents within a collection or have multiple collections?

I went back through the MongoDB course chapter on Modeling and Document Design. Mr. Kennedy has you ask 6 questions when it comes to embed or not embed:

Is the embedded data wanted 80% of the time?
How often do you want the embedded data without the containing document?
Is the embedded data a bounded set?
Is that bound small?
How varied are your queries?
Is this an integration DB or an application DB?

The answers that I came up (that are hopefully correct):

Yes
Almost always
I’m such a newbie I don’t even know what a bounded set is.
I think so.
Not very.
This is an application database.

Keeping in mind that MongoDB has a 16MB limit, which sounds a lot smaller than it really is as I’m only dealing with text and not embedding images or anything like that, the answer is to embed everything.

The next step was to go back through the MySportsFeeds APIs and figure out which ones I’ll be using. In no particular order:

Cumulative Player Stats (for individual player picks, such as the passing yards leader)
Roster Players (used for the league players to pick who the individual player leaders are)
Playoff Team Standings (Used for wildcard playoff picks)
Division Team Standings (Used for which teams will finish 1st, 2nd and last in each division)
Conference Team Standings (Used for the team that will lead its conference in points for as well as some team data, such as the team name, abbreviation, etc.)

I may want the full game schedule at some point, but that’s a bigger challenge than I need to get into right now.

I ended up deciding that I need two collections within my database:

Users: this will store the registration information for each player in the league and be used for logging into the website
Seasons: Here I will embed all of the data for each season of play and have a document for 2016, 2017, etc. Within each year I’ll embed the league player’s picks and then a document for each of the APIs above that has 17 embeds – one for each week of the NFL season. One of my goals is that player can go back to 2016 and look at their progress for each week and see their point total versus the rest of the league. Getting ahead of myself, this will not be available in MLBPool2 (if and when I ever build that) as that will only be a real time look at your score and then show the final year results.

So it may look something like this, with two collections: Users and Seasons:

nflpooldatabase

–Users (embed a document for each player in the league in this collection)

–Seasons (Example: “2016” – and then embed the following in the 2016 document:)

2016
- player-picks
  - 1 document for each player’s picks
- Week 1 through 17 (17 documents total) with the NFL stats for that week embedded here:
  - AFC East
  - 11 more documents for division picks like AFC East above
  - 6 documents for individual leaders
  - Tiebreaker
  - Documents for Points For, Wildcard, etc.
2017
- player-picks
- Weeks 1-17 (One document per week)
  - NFL stats with all the embedded documents above

Now it was time to re-build the functions to go get the data from the MySportsFeeds API. I had this working in the last iteration of the app when I was using SQLite, but I’ve never used MongoDB before. Over my lunch hour, I successfully prototyped taking one query and putting it into my MongoDB running locally.

Eureka moment: Spending the lunch hour successfully prototyping importing data from an API over the internet to a local MongoDB using Python

— Paul Cutler (@prcutler) July 19, 2017

The feeling of euphoria in successfully using MongoDB was huge.

Last night after work, I took the next step. Keeping in mind the size limitation of MongoDB, I could take steps to filter the API calls, especially for cumulative stats. I only need a few key stats for a subset of all players in the NFL. For example, I just need passing yards for quarterbacks. The MySportsFeeds API provides a ton of stats for every player – such as fumbles, passes over 20 yards, QB Rating, completions, and defensive stats (even though they’re an offensive player).

Thankfully, Brad Barkhouse of MySportsFeeds is always available in the MSF Slack channel. I couldn’t figure out how to build a filter for just certain positions and a specific stat. (It turns out it’s just an & sign). So if I just want sacks for defensive players, it looks like this:

https://api.mysportsfeeds.com/v1.1/pull/nfl/2016-2017-regular/cumulative_player_stats.json?position=LB,SS,DT,DE,CB,FS,SS&playerstats=Sacks

So my task for the weekend is to figure out if I want to just embed a document for each individual category or one document with all of the cumulative stats and then just build queries for each category I care about (sacks, interceptions, passing yards, etc.)

I probably should just focus on the next phase of the Python for Entrepreneurs training though, and get the user login and authentication built and then go through the Albums part of the training, which I’ll mimic for the the league players to submit their picks. I’m running out of time as I really have only a few weeks before I need the player picks to be submitted before the start of the season.

Prioritizing is fun. But I’m so happy with some of the breakthroughs and progress and not trying to think of all the challenges ahead and just take it one step at a time.