Web Scraping and Python
I’m flying along in the Coursera course Python for Everybody , from the University of Michigan taught by Dr. Charles Severance . I’ve completed the first two of four courses which give you an introduction to Python.
I’m now on the third course, Using Python to Access Web Data . This and the fourth course focused on databases, are the two key foundations for the web app I want to build. I just finished Chapter 12, which introduces the BeautifulSoup library for scraping web pages. This is going to be huge – I’ll be able to scrape ESPN to find which MLB or NFL teams lead their divisions or leading in the wild card races.
Being on vacation this week, I’ve been able to complete a few chapters and am now a couple weeks ahead of schedule. I’m tempted to pause and see if I can take what I’ve learned with BeautifulSoup and actually write some small Python programs to actually scrape and print the results. It might be good practice to reinforce what I’ve learned.
The next two chapters are key as well. XML and then the one I’m most looking forward to: JSON. I’ve already signed up for a developer account with MySportsFeeds and am receiving JSON data for player stats, teams and conference standings. I’ve spoken in the past with one of their lead developers and they don’t currently keep statistics for wildcard or playoff standings, so I’m going to need to use BeautifulSoup in my app to get those. I’ll also need to make a decision if I’m going to use that JSON data for player stats and query against it myself or just use the nflgame or nfldb libraries that have already been built. The biggest challenge their is that both of those libraries are written in Python 2.7 and I really want to write my apps in Python 3.x.
I know I’m getting ahead of myself. Every time I learn something that will be applicable to the app I want to build and I talk to my wife about it, she tells me to slow down. My mind is always racing with how I can apply what I’m learning and how it will affect the architecture of the app. Some people say the best way to learn a programming language is to build something and learn as you go. I can’t wait to put all this Python learning to practice.