NFLPool Prototyping with MongoDB

Yesterday was a good day.

With the static pages for nflpool.xyz complete, I started thinking about the dynamic pages. These are going to require access to the database and I’ll be using MongoDB. I had started the MongoDB course from Talk Python, but put that aside to go back to the Python for Entrepreneurs course to get the site up using Pyramid.

I took a step back and did some brainstorming about the data model I’ll need for the database. I grabbed a spare whiteboard and started scribbling.

nflpool whiteboard brainstorming

Using MongoDB requires you to shift your mental model from traditional SQL and joining tables as MongoDB doesn’t technically do joins. I needed to think about what a collection looks like and do I embed more documents within a collection or have multiple collections?

I went back through the MongoDB course chapter on Modeling and Document Design. Mr. Kennedy has you ask 6 questions when it comes to embed or not embed:

  1. Is the embedded data wanted 80% of the time?
  2. How often do you want the embedded data without the containing document?
  3. Is the embedded data a bounded set?
  4. Is that bound small?
  5. How varied are your queries?
  6. Is this an integration DB or an application DB?

The answers that I came up (that are hopefully correct):

  1. Yes
  2. Almost always
  3. I’m such a newbie I don’t even know what a bounded set is.
  4. I think so.
  5. Not very.
  6. This is an application database.

Keeping in mind that MongoDB has a 16MB limit, which sounds a lot smaller than it really is as I’m only dealing with text and not embedding images or anything like that, the answer is to embed everything.

The next step was to go back through the MySportsFeeds APIs and figure out which ones I’ll be using. In no particular order:

  • Cumulative Player Stats (for individual player picks, such as the passing yards leader)
  • Roster Players (used for the league players to pick who the individual player leaders are)
  • Playoff Team Standings (Used for wildcard playoff picks)
  • Division Team Standings (Used for which teams will finish 1st, 2nd and last in each division)
  • Conference Team Standings (Used for the team that will lead its conference in points for as well as some team data, such as the team name, abbreviation, etc.)

I may want the full game schedule at some point, but that’s a bigger challenge than I need to get into right now.

I ended up deciding that I need two collections within my database:

  • Users: this will store the registration information for each player in the league and be used for logging into the website
  • Seasons: Here I will embed all of the data for each season of play and have a document for 2016, 2017, etc. Within each year I’ll embed the league player’s picks and then a document for each of the APIs above that has 17 embeds – one for each week of the NFL season. One of my goals is that player can go back to 2016 and look at their progress for each week and see their point total versus the rest of the league. Getting ahead of myself, this will not be available in MLBPool2 (if and when I ever build that) as that will only be a real time look at your score and then show the final year results.

So it may look something like this, with two collections: Users and Seasons:

nflpooldatabase
–Users (embed a document for each player in the league in this collection)
–Seasons (Example: “2016” – and then embed the following in the 2016 document:)

  • 2016
    • player-picks
      • 1 document for each player’s picks
    • Week 1 through 17 (17 documents total) with the NFL stats for that week embedded here:
      • AFC East
      • 11 more documents for division picks like AFC East above
      • 6 documents for individual leaders
      • Tiebreaker
      • Documents for Points For, Wildcard, etc.
  • 2017
    • player-picks
    • Weeks 1-17 (One document per week)
      • NFL stats with all the embedded documents above

Now it was time to re-build the functions to go get the data from the MySportsFeeds API. I had this working in the last iteration of the app when I was using SQLite, but I’ve never used MongoDB before. Over my lunch hour, I successfully prototyped taking one query and putting it into my MongoDB running locally.

The feeling of euphoria in successfully using MongoDB was huge.

Last night after work, I took the next step. Keeping in mind the size limitation of MongoDB, I could take steps to filter the API calls, especially for cumulative stats. I only need a few key stats for a subset of all players in the NFL. For example, I just need passing yards for quarterbacks. The MySportsFeeds API provides a ton of stats for every player – such as fumbles, passes over 20 yards, QB Rating, completions, and defensive stats (even though they’re an offensive player).

Thankfully, Brad Barkhouse of MySportsFeeds is always available in the MSF Slack channel. I couldn’t figure out how to build a filter for just certain positions and a specific stat. (It turns out it’s just an & sign). So if I just want sacks for defensive players, it looks like this:

https://api.mysportsfeeds.com/v1.1/pull/nfl/2016-2017-regular/cumulative_player_stats.json?position=LB,SS,DT,DE,CB,FS,SS&playerstats=Sacks

So my task for the weekend is to figure out if I want to just embed a document for each individual category or one document with all of the cumulative stats and then just build queries for each category I care about (sacks, interceptions, passing yards, etc.)

I probably should just focus on the next phase of the Python for Entrepreneurs training though, and get the user login and authentication built and then go through the Albums part of the training, which I’ll mimic for the the league players to submit their picks. I’m running out of time as I really have only a few weeks before I need the player picks to be submitted before the start of the season.

Prioritizing is fun. But I’m so happy with some of the breakthroughs and progress and not trying to think of all the challenges ahead and just take it one step at a time.

Python for Entrepeneurs Progress

I sat down excited at dinner last night excited to share with my wife the two things I learned in my Talk Python course yesterday. The first was learning the basics of CSS, something I’ve avoided for years. I’m not going to even pretend I understand CSS, but it’s a base knowledge to work with and there is still a whole chapter of applied front-end frameworks, so I’m sure there will be more on CSS.

The second was a cache-busting technique, making it easy to both develop a website and see the changes right away without having to clear the browser cache and great for users that they’ll see the updates in production when it happens.

As I follow along in the Python for Entrepreneurs class, I’m trying something different this time. Rather than code along with the examples and do the examples as Mr. Kennedy does them, I’m trying to build nflpool.xyz using similar code as to what is is in the training. There has been a couple gotchas doing it this way, as you’ll start a chapter doing something one way and then learn a different and better way to do it. Overall, I kind of like doing it the way I’m doing it as the hands-on applications is one of the ways I learn best.

Unfortunately, the cache-busting code broke Pyramid and I got a myriad of errors in my Chameleon templates. I didn’t realize this until after I had started the routing section of the training and then lost an hour or two trying to trouble shoot the cache-busting. I finally gave up and ripped out cache-busting code from the templates and everything is working again. Well, working without the cache-busting code.

As I worked on the routing section, I’m not going to say I truly understand it yet, but it started to click for me why using a framework like Pyramid using Python makes sense. When I’ve mentioned to a couple of people that I’m going to use Python and Pyramid to build a site, I’m usually asked why I just wouldn’t use Javascript like everyone else these days. For me, focusing on one language at a time and not trying to learn too much is key. I’m pretty sure that I could do everything I want to do with nflpool in Javascript (including both the game calculations and website), but Python’s readability and reputation for being a good language to start with really appeals to me. So why wouldn’t I build it in Python? It gives me more hands-on experience with the language, which I need, and I can include the code needed for the scoring right into the application and don’t have to build two things – the game scoring and a website.

I still have a lot to get through this week. I need to finish the applied web development including forms (yay! Maybe I will have the ability to take user picks up by next month. Now, don’t get ahead of myself…); then front-end frameworks with CSS and Bootstrap; the biggest of them all – databases (more on that in a second); as well as account management; and finally, deployment. I skimmed the deployment chapter Sunday night – lots of good stuff (even if they use an Ubuntu VPS in the training) and I’m excited to give Ansible a chance.

One of the nice things about the trainings from Talk Python is that Michael Kennedy offers office hours for a Q&A section if there is something you’re stuck on. The next one is tomorrow, which is perfect. I’m going to see if I can solve my cache-busting problem, and if not, maybe ask for help. I also want to ask him his thoughts on using MongoDB instead of SQLite after taking his MongoDB training last week, while Python for Entrepreneurs uses SQLite.

I’m really enjoying the class and glad I’m making time for it. I don’t know if I’m going to have a skeleton up by the end of the weekend or not, but it’s coming along.

Stay on Target (or why my Python app still isn’t built)

One of the things I’m not doing well is focusing on one task at a time. As I continue to learn Python, every time I across a way to do something, I want to implement it right away without thinking ahead of how all the different things work together. Then I’ll get stuck, and frustrated, and my pace slows.

I need to find a task, stay on target, and just finish it, rather than jumping from feature to feature. With this in my mind, I’ve taken a step back to think about what needs to get done.

There are two major things that need to be built:

nflpool.xyz

Using Pyramid, I need to get the website up – even if this is just a skeleton. The Python for Entrepreneurs will get me there. I need to follow through and finish the course.

Major features for the website include:

  1. User creation / management: This includes creating an account, resetting passwords and login / logout. The course does an awesome job of how to properly hash and salt the passwords – I just watched and worked on this chapter yesterday.
  2. Yearly Player Picks: I will need to create a form for each player, after they have created an account, to submit their picks. This form will need to talk to the database to display the list of teams in each conference and the players available in each of the positions. I briefly looked at the Pyramid documentation Friday night and something like WTForms might work for this, but I really know nothing about it at this point. From there the player will need to hit submit, then review their picks or make changes, and then submit their picks which are stored in the database.
  3. Scoring: The last section of the website is the most important part to each player – how are their picks doing against everyone else? One of the reasons I’m using a database is that the cumulative player stats that MySportsFeeds provides are just that – cumulative through the season. There isn’t a way to just get the quarterback stats for week 5 of the 2016 season – so I need to store each weeks stats in the databases. This way a player can track their progress in nflpool through the year. Want to see where they stand right now? Check. Versus two weeks ago? Check. So the website will need to default to the latest week and then let the player choose the year and the week to see past history.

The only downside at this time for creating the website is if I want to use sqlite vs mongodb – I’d prefer to use mongodb as I’m stuck on how to create the individual player picks table and wanted to try it as a key / value store in a mongodb collection. The course is focused on SQLite with SQLAlachemy – something I’d like to learn but I think mongodb might also be easier for taking the JSON from MySportsFeeds and just sticking it right in the database.

nflpool app

The app has two major features that need to be completed:

  1. Import data via JSON from MySportsFeeds into the database: I had all of this done using SQLite. If I choose to switch databases, I’ll need to rewrite this this code.
  2. Scoring calculations: This isn’t done at all. This depends on the player picks table in the database, which is where I was stuck a few months ago when I took a break. I can’t figure out the data model for it no matter how many times my wife tries to explain it to me and I don’t know if I’m just being optimistic when I think a key value store in mongodb would work better. I’m going to give this a bit more thought and actually write out what the document would look like. This would probably be an embedded document in each user’s account.

Next Steps

With all that said, I think working on the website and getting a skeleton up is the best next step. If I can get the site up, then start to work on the submission form for the picks (which will require a bit of importing team data into a database, so I’ll have to make a decision there), I think I’ll feel a lot better. In a perfect world – and I know this isn’t going to happen in the next 30 – 45 days when I really need it, would be to have the submission form working prior to the 2017 season starting. Even if I have to still calculate the points weekly like I’ve been doing for the last two years, at least I’d have the picks in the database this time instead of having to work with the challenge of Google Sheets.

MongoDB for Python for Developers

I’m taking the latest training course that just launched a couple of weeks ago from Michael Kennedy at Talk Python: MongoDB for Python for Developers. This is my first exposure to NoSQL. Over the last year, I’ve searched Google a few different times trying to understand what NoSQL without any success – it always went over my head. Within ten minutes of starting this course, I think I might understand what a document database is.

I took a break from coding the nflpool app a few months ago after my wife gave me some feedback on how I was designing the data model for the SQLite database I was using. I was pretty frustrated, not with her, but just my lack of knowledge. I still hadn’t figured out how to import the individual player pick’s from the Google Sheet I was using, though I did find some open source code that did it perfectly. The challenge was if I changed the data model, the import functionality was going to change significantly and I couldn’t figure it out.

Here it is, early July, and I feel the panic of not having the app built for the upcoming NFL season for the second year in a row. That’s ok – it’s a marathon, not a sprint, to learn Python and build the app. Everyone needs a hobby.

I’m going to create a new branch in nflpool and see if I can use MongoDB instead of SQLite. I need to sit down this weekend and give some thought and sketch out the data model for the Users collection, but I think it could (should?) work better than what I was planning. It’s already obvious that importing the the NFL statistics from MySportsFeeds via JSON directly into MongoDB should be a slam dunk.

The challenge in switching is twofold: First, I’ll need to understand how that changes the Python For Entrepreneurs course – as I’m going to use Pyramid for the web framework, I’ll need to understand how those will work together. This is especially true for the user accounts and database sections of the course.

The second risk is by switching to MongoDB from a SQL language, there will be no help available from my wife. I might drive her crazy with my questions and the way I ask them, but she has a lot of knowledge of SQL and it might be even more of challenge doing the database on my own, in addition to the Python.

I’m enjoying the MongoDB for Python for Developers course. To be fair, it’s definitely over my head – I’m not a real developer nor do I have any kind of database experience or know any Javascript, so I’m taking it slow and in chunks. I’m not coding the examples as I follow along yet – I’m going to audit the whole course, give some thought to confirm this is what I want to do, and then I’ll go through it again. It’s probably in my best interest to finish Python for Entrepreneurs and get the Pyramid web app up and running. I do enjoy Mr. Kennedy’s courses – the way the courses are structured, how each lecture builds on the others and his delivery makes them worth the money.  Even for some of the topics where I don’t have the prerequisite knowledge I probably should, I find myself learning.  I’m on vacation next week and plan to spend a good chunk of time going through both the Python for Entrepreneurs course and the new MongoDB course.

NFLPool 0.1 milestone completed

I followed through on my last blog post and made a lot of progress over the weekend – the best way to learn is by doing.  I’ve updated my roadmap for nflpool and broke the development of the nflpool app into chunks:

  • 0.1: Database creation complete – write the Python code and SQL statements to create all the needed database tables using sqlite3.  This includes using the requests module to import all players in the NFL into the database from MySportsFeeds.
  • 0.2: Import the 2016 statistics from MySportsFeeds into the database. This includes everything needed to calculate an NFLPool player’s score: individual player statistics, division standings, Wild Card seeds, etc.
  • 0.3: Scoring calculations are complete – the app works. The nflpool app can take every player’s picks, compare it to the final standings, and output everyone’s score for this past 2016 season.
  • 0.4: If 0.3 can calculate the final 2016 standings, 0.4 will add functionality to step through every week individually for 2016 from weeks 1 through 17. This will have to be different code as it won’t use the requests module to get real time data, it will use the JSON data I downloaded weekly last year. This will help me prepare for the 2017 season proving that it can calculate the score each week until the season ends.
  • 0.5: The nflpool app now lives on its website, nflpool.xyz. This will include an online form for the 2017 season where players can make their picks and these picks are inserted into the database. This will be built on Pyramid (after I complete the Python for Entrepreneurs course from Talk Python to do this.)
  • 1.0: Full nflpool.xyz integration. Players can browse by week for the current season and past seasons.

After this weekend, the 0.1 milestone is complete. I ran into a few challenges, but the database is complete and I even have cumulative NFL Player stats imported as part of the 0.2 milestone. The first challenge I ran into was I could not get the CSV file imported into the sqlite3 database. We originally used a Google Form to capture each player’s picks. I saved that in Google Docs as a CSV file to be imported. I kept getting a too many values to unpack error and no matter how many times I compared the CSV columns to the SQL statement – it was expecting 47 and no matter how many times I checked and re-checked, I couldn’t find my mistake. After doing some Google searches, I came across this Python script on Github to import a CSV into sqlite – and it worked!

The second challenge I ran into today. I realized after importing the player’s picks and the NFL Player statistics that I was using NFL Player names in the CSV file but I was using the player_id, an integer, from MySportsFeeds for the database. Using the player_id is the correct way to do this, but I needed to modify the CSV and re-import. No problem, but after doing this, I realized I would need to do the same thing again for the Team picks – I need to use the team_id not the team name.

This is all now done and I can move on to the 0.2 milestone. Starting with the five picks for individual stats (passing yards, rushing yards, receiving yards, sacks and interceptions – all already imported using requests!), I’ll write a function that will compare a player’s picks to if the NFL player finished in the top three of that category and assign the correct points. I’ll then add an if statement to see if the nflpool player made a unique pick in that category, and if so, double the points earned.

From there I’ll move on to all the other categories such as Division Standings or Points For and use the same logic.

This is huge progress. The point calculations will be the hardest part of the app (outside of building the website) and now it’s time to see how much Python I’ve learned.