Where I get stats for MLBPool and NFLPool: MySportsFeeds (and it’s awesome)

A few years ago I started to look into how I could build apps to manage MLBPool and NFLPool. The key would be how to integrate all of the team and player statistics and where to get that data. I was floored when I saw the pricing of how much companies charge to provide those stats – it was hundreds to thousands of dollars per month to get access to baseball or football stats. The closer you got to real time statistics, the more it was. Most of these companies are providing statistics to commercial services that run fantasy leagues (I’m guessing), which is why fantasy leagues charge a fee to host your fantasy league.

It’s been a couple years now and I don’t remember how I came across MySportsFeeds, but they offer a commercial service for companies like I mentioned above, but they also have a key differentiator. For educational purposes, developers or research, they offer a free service to access stats for completed seaons. Best of all, you can also subscribe for a very low price to their Patreon to get access to live data. I’m happy to say I was one of the first Patreon subscribers and for $5 / month I get access with a 3 minute delay. (I really only need data overnight and not real time for my apps, but what an awesome price). MySportsFeeds currently offers statistics for the NHL, NBA, MLB and NFL and statistics are available in JSON, XML or CSV.

There are a few different things I love. One, there is a Slack channel where the owner and lead developer, Brad Barkhouse, helps out. He’s extremely responsive to community questions and is always around. Two, the service is always getting better. Last summer they launched wrapper libraries for popular programming languages including Python, Ruby, R, NodeJS and more. Three, they have fantastic documentation that includes all the parameters you can pass to the different feeds to help filter what information or statistics you might need.

There are a couple quirks. For NFLPool, the Team Standings feeds don’t account for tiebreakers. I can’t fault them for that as the NFL tiebreaker calculations can be complex. After the NFL season ended, I pinged Mr. Barkhouse and he quickly updated the feed to match the NFL standings, which I needed for my app.

In baseball, I started to enter all of the 2017 MLBPool picks for testing. I need to make sure that the app works and matches the scoring that was done by hand last year. When entering picks, one player had chosen Yu Darvish for one of the pitching categories. When I went to make the pick in MLBPool, he wasn’t available as an American League pitcher. MySportsFeeds showed him on the Los Angeles Dodgers – but he wasn’t traded from the Rangers to the Dodgers until July 31, 2017. Brad will fix his roster information, but MySportsFeeds uses a cumulative player statistics so the feed shows Darvish’s stats for the entire year. But in Major League Baseball by rule, a player’s stats when traded between leagues are NOT cumulative. This is obviously an edge case for MySportsFeeds, but something I’m going to have to account for before MLBPool launches. (I currently store all baseball player statistics in one table – I’m going to need to split this and have two tables, one for the American League and one for the National League to account for this).

MySportsFeeds is under constant development and always improving. Mr. Barkhouse and team updated the API last year from version 1.1 to 1.2 and work is underway for 2.0. They added Daily Fantasy data last year. Users can also file issues in Github or ping him in Slack to get items added to the roadmap. Overall I am extremely happy with the service and highly recommend MySportsFeeds.

MLBPool2 & MySQL / MariaDB

When I wrote yesterday introducing MLBPool2, I buried the lede. One of the biggest changes between NFLPool and MLBPool2 is the fact I’m now using MariaDB and MySQL as the backend instead of SQLite, which NFLPool uses. (I did look at PostgreSQL since so many Python developers seem to prefer it, but I’ve never been able to get a PostgreSQL server up and running on Linux or Mac. My sysadmin skills are nonexistent.)

Since I’m using SQLAlchemy for 90% of the SQL interactions, setting it up was pretty easy, I just needed to make sure when creating the tables I added things like string length where needed. A basic example that shows the difference between the two is the table that stores the division information. In football, it’s the NFC East, AFC North etc, and in baseball it’s the AL East, NL Central, etc.

In NFLPool it was:

class DivisionInfo(SqlAlchemyBase):
    __tablename__ = 'DivisionInfo'
    division_id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
    division = sqlalchemy.Column(sqlalchemy.String)

And in MLBPool2:

class DivisionInfo(SqlAlchemyBase):
    __tablename__ = 'DivisionInfo'
    division_id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
    division = sqlalchemy.Column(sqlalchemy.String(8))

Easy enough. But since SQLite is a persistent database, I learned the hard way that I need to close each session in MySQL with a session.close() statement or I see lots of fun errors like this:

OperationalError: (pymysql.err.OperationalError) (1040, 'Too many connections') (Background on this error at: http://sqlalche.me/e/e3q8)

It’s taken a lot of trial and error figuring out where I need these. I’ve learned they have to go before any return statements and even when I think I have them in all the right places, it turns out I don’t. Yesterday I was entering all of the picks for everyone who played in 2017 to do some testing (to see if the app’s results and scores match what was done by hand) and after entering six player’s picks, I ran into it again. Sure enough, in the PlayerPicks service, I didn’t have any session.close() statements when I returned all of the lists that make up the picks. I had just added Rollbar functionality to the site to keep track of errors and I was pleasantly surprised to learn that when you connect Rollbar to your Github repo, it automatically opens an issue for you on Github with the error. (Pretty cool, Rollbar!)

I’m still a little worried that after I deploy and the site has been up for a while that the “Too many connections” error is going to happen.

The other thing I forget to share was a link to the Github repo for MLBPool2. It’s open source under the MIT X11 license. I originally had NFLPool under the GPL but changed it to MIT as well. I liked the idea of it being under the GPL in case anyone ever used it and I could have access to the changes, but let’s be honest, the chances that anyone is going to use the codebase is slim to none and I’d rather be more permissive (and I have issues with the Free Software Foundation, but no need to get into that.) The key takeaway is I’m a big believer in open source and I think making it more permissive is the right thing to do.

I’m undecided if I’m going to port NFLPool to MySQL. I think it’s probably a better option, but the few how-to’s I’ve read give me pause on how to import the data from SQLite to MySQL. I’m not sure if it’s worth the effort considering all of the features I want to back port and / or add to NFLPool. (But that’s a different discussion for a different blog post).

Introducing MLBPool2

After learning Python and creating NFLPool, it was time for another project. This time it was building the site for MLBPool2, which inspired NFLPool. MLBPool was the brain child of former commissioner Jason Theros who created the league and rules.  Sadly, MLBPool came to an end after the 2011 season. The original site was written in ASP and none of the code was available and for the last few years after my friend resurrected the league he did almost everything by hand. I had created a Google spreadsheet / form to get all of the player’s pool picks, but scoring was a manual process and he was only able to do it a handful of times throughout the season. I had a WordPress site but as I wasn’t playing in MLBPool2, I never really updated it. (It’s still up for another week or so – and if you’re curious, you can look at the rules on how to play).

That all changes with MLBPool2. Like NFLPool, the app is written in Python (3.6) and Pyramid. I debated about starting from scratch or just modifying NFLPool and opted for the latter. I’ve been hip deep in development for the last two months and the finish line is almost in sight. I should have been writing about the development more, but that takes away from my coding time. (Shame on me!)

The major difference from NFLPool is that players have the ability to change up to 14 of their picks at baseball’s All-Star Break. This was much more complex than I thought it would be and I realized if I was going to do it, I might as well add more and more functionality to make it easier for the player. This included:

  • Players can change their picks before the season starts without penalty
  • When changing a pick, the drop down menu defaults to the original pick a player made
  • I added a column when changing a pick that shows if the original pick was unique or not

The hardest part was all of the datetime calculations around changing picks. If the season hadn’t started, let the user change their picks; if the season has started, redirect the user that it’s too late to change a pick; if it’s during the All-Star Break, let them change their pick and then have the system make that pick worth half the points; and if it’s after the All-Star Break, redirect them again that it’s too late to change anything.

For whatever reason, I have a hard time working with Python’s datetime module. I had planned to use Kenneth Reitz’s Maya – Datetime for Humans, but unfortunately the documentation is offline. I ended up going with the Pendulum module, which has been fantastic to work with and has excellent documentation. (It’s so good I emailed the developer a couple weeks ago with a thank you note). I even created a service just to deal with the date and time manipulations, rather than have Pendulum instances throughout the code. A great side benefit is that it makes testing so much easier.

class TimeService:
    def get_time():
        """Create a service to get the time - there were too many instances of getting the current time in
        the codebase making testing difficult."""
        # Change now_time for testing
        # Use this one for production:
        # now_time = pendulum.now(tz=pendulum.timezone('America/New_York')).to_datetime_string()
        # Use this one for testing:
        now_time = pendulum.create(2017, 3, 17, 18, 59, tz='America/New_York')

        return now_time

As you can see in the code above, I can just create one instance for testing and change the date to before the season starts, the All-Star Break or after the break. This also fixes an issue I had with NFLPool where I did not do the datetime manipulation correctly because of timezone differences with my web server and a user was locked out of submitting picks before the deadline. This worked out so well I even added an alert to the page where you submit picks showing how much time is left until picks are due:

There are two major pieces of functionality that need to be finished. There are two complex SQL queries. One to update the unique picks and one to calculate the scoring. I couldn’t figure out how to do this in SQLAlchemy and my wife Kelly wrote direct SQL statements in the code. I was able to re-write the first one to calculate unique picks after the season started but haven’t figured out how to update it for after the All-Star Break. I don’t have the patience to learn SQL right now, so she is going to help me with those when she’s on spring break from the University of Minnesota next week. From there, it will be time for deployment – and just in time, as players will have about ten days from deployment to when picks are due and the Major League Baseball season starts.

Pyramid is just a joy to work with and I’m so thankful for the Talk Python course that taught me to use it. (I wish Pyramid had 20% of mindshare that Flask does. Maybe it does where it matters, but there is just so much on the web about Flask that it feels like it doesn’t).

The best part about writing MLBPool2 though is my confidence level in coding in Python has increased greatly. I’m doing things in MLBPool2 that I didn’t do in NFLPool – from manipulating datetimes, string manipulation, a lot more if / else statements, Slack integration, and more advanced Chameleon templates. I’m sure there are lot of areas that are still not Pythonic enough, but I feel more confident and I know the learning won’t stop. I’ll try and write some more blog posts about what I’ve learned and how MLBPool2 differs from NFLPool (and what I want to add back into NFLPool.)

NFLPool 2017 Recap

The NFLPool 2017 season wrapped up a month ago. The application performed admirably. Every week I logged in, downloaded the weekly statistics from MySportsFeeds, and the scoring calculations updated and posted on the standings page. I emailed the players every other week with the update and link to the standings (and the reminder that the team standings points would not be final until the end of the season due to MySportsFeeds shows division standings doesn’t account for the correct tiebreakers). Everything looked good and it was working as expected.

Or so we thought.

After week 17 was complete, I ran the update again and sent out the preliminary results. I worked with Brad at MySportsFeeds to update the division standings feed to rank the teams correctly according to the NFL’s tiebreakers. Then, one player caught that some of his individual leaders weren’t assigning points correctly. Digging in, I saw that in my picks, some of my players weren’t having their points assigned either.

I was so focused on the team standings and not individual player standings that there was a bug in the code. Week 1 worked correctly, but weeks 2-17 did not calculate the individual player performance correctly – and none of us caught it! Kelly was able to fix the SQL query she wrote and voila, everything worked! The only catch was that a week had gone by and the way that I programmed the standings page to display the title it now says “Week 18” instead of “Week 17”.

I’m pretty proud of myself for creating my first Python application (even if I didn’t write the SQL queries that do the scoring calculations). Everything worked great and I’ve learned so much about Python (and still have so much to learn). I have a list of things to improve and enhance for the 2018 season. In no particular order:

  • Update / fix the datetime function when a user submits their picks. One player got locked out too soon.
  • The traversal to the standings is www.nflpool.xyz/standings – this shows the current standings. This needs to add a year, to both allow players to see a previous season’s history – such as www.nflpool.xyz/standings/2017 – I’ll probably need to add a template page for standings then to list out all the available years as well as figure out what I want to do in the navigation.
  • Write a function that if the week is 17, call it final on the template page’s header.
  • Figure out how curl can call the get request to update the stats – it has to use my login to access the admin panel’s URL to call the get request and I don’t know how to do that in curl.
  • Lots of other enhancement plans, such as porting the app from SQLite to MySQL, but the above list are the big ones. I’m trying to make sure I capture any bugs or enhancements on Github.

Speaking of MySQL, NFLPool was always intended to do two things:
1. Automate our NFLPool league
2. Serve as a testbed for MLBPool2

I’m happy to say that MLBPool2 is now under active development. I have exactly 8 weeks from today before the Major League Baseball season kicks off, so I have about 6 weeks to get it working. And I’ve already been able to get it to work with MySQL! But MLBPoo2 development is a different blog post for later this week.

What’s next for NFLPool

NFLPool has been up for six weeks and everything is working great. I’ve been updating the standings every Tuesday without any issues. I’ve taken the last month to catch my breath after my massive coding spree to get it launched and I’ve been thinking about what’s next. I have a few options:

  • Add some tests (all Python projects should have tests written, right?)
  • Adding documentation in reStructured Text to NFLPool
  • Re-visit some of the Python trainings now that I have a basic grasp of Python and learn some more “advanced” concepts (generators, list comprehensions, etc.)
  • Work more on the NFLPool admin panel (manage users better, add the ability to manage if a user has paid for the season, and a few more ideas)
  • Port NFLPool from SQLite to MySQL
  • Start working on MLBPool2 based on the NFLPool codebase

The choice became an easy one as a couple of weeks ago I was asked by Brian Okken if I would like to review a copy of his new book, Python Testing with pytest.

I shared with Brian that I’m coming at this from the perspective of someone new to Python – I’ll be digging into the book this weekend and I’ll be blogging my progress on adding tests to my project with what I’ve learned.

Thanks Brian for the review copy and I’m excited to learn about testing in Python!