Update: April 4, 2018 – The Maddest March Ever?

 

 

Wow. What a tournament! Over the past few weeks we’ve been treated to everything college basketball has to offer: huge upsets, dagger-like buzzer beaters, an underdog story for the ages, and the warm glow of victory yet again for Philadelphia. Congrats to the Villanova Wildcats on a dominant tournament performance!! We are all witness to the birth of the next great dynasty.

Perhaps more importantly though, Chandran has been crowned champion of the Data Genius #ViztheMadness competition. By picking Villanova to win it all, despite bleeding that UNC baby blue, Chandran captured the top spot in this year’s competition. He narrowly beat our official Data Genius team bracket which had selected Virginia as the champion. Congratulations Chandran! Bragging rights are yours, until next year.

As discussed last week, the Data Genius team bracket built using SAP Analytics Cloud has stood up to the experts and heavy-hitters in the bracketology field. Our collective bracket correctly picked a total of 38 victors for a 60% success rate. We’re proud of what a ragtag band of aspiring citizen data scientists have been able to accomplish using the augmented analysis available in SAP Analytics Cloud.

We also knew we couldn’t predict all the upsets but were still interested in how our model would predict known matchups past Round 1. So, after each round was complete, we plugged the teams into our SAP Analytics Cloud story and tried to predict every game using the same model we used to select our bracket. This increased our success rate to 70% with 44 correct victors selected.

Luckily there were two well-regarded experts we could use to benchmark our results for this round-by-round analysis: KenPom’s tournament probabilities and FiveThirtyEight’s predictions. I am pleased to report that our SAP Analytics Cloud model stood up to both, with the experts achieving success rates of 68 and 70% respectively. Again, the Data Genius team is not claiming to know more than experts like Ken Pomeroy or Nate Silver’s team. Not even close. But with good data and the right tools, like SAP Analytics Cloud, ordinary people can perform extraordinary analysis.

While predicting sport results can be a fun, if frustrating, exercise, there are lots of other applications for advanced analytics. A recent article featuring SAP’s Data Genius team, and quoting last year’s #ViztheMadness champ Nic Smith, explores the data science competition landscape and how crowdsourcing can be an excellent way to generate new insights to really hard problems. Check out the amazing ideas born from the recent Spatial Hackathon hosted by SAP and ESRI. These teams are tackling real challenges, such as water contamination and earthquake risk, that affect millions of people around the globe.

With the augmented functionality of SAP Analytics Cloud, we’ve shown how our Data Genius team can compete with the best in the business. You don’t need to be a data scientist to use SAP Analytics Cloud, so next time you see a data science competition on Kaggle or Topcoder that interests you, hop over to sapanalytics.cloud and download a free trial version of SAP Analytics Cloud and start noodling! The Data Genius team highly recommends Smart Discovery as a starting point.

Better start learning now so you’re ready for the world-wide soccer tournament this summer.

Update: March 26, 2018 – The Race to the Quarterfinals Is Complete!

 

Another two rounds in the books and just like Rounds 1 and 2, this weekend’s games delivered! The Loyola-Chicago Ramblers are making an unbelievable run and have secured their place in the unexpected value report next year. The other three teams standing are less surprising, but that hasn’t made their path any less exciting. Congrats to all the players, staff, and fans of teams moving on to San Antonio.

As for the DataGenius scoreboard, things are tightly packed. Chandran has pulled even with the SAP Analytics Cloud model with a total of 37 correct picks. Nic and JM are hanging in there with 36 apiece while John has started to fall behind with 33. However, Chandran has the upper hand by picking Villanova to take it all.

Like Chandran, there is probably someone in your bracket pool who has pulled away from the field. What may be surprising is which one of your co-workers, family, or friends is winning. We’ve all heard stories of people choosing their bracket based on definitively non-basketball criteria such as team colors or mascot cuteness and finishing ahead of the true fans in the pool. This is a result of the unpredictability of sport and a large portion of what makes the tournament both appealing and frustrating. As John mused in his blog, How do you predict the unpredictable?

The DataGenius team is not a group of die-hard college basketball fans. Nor are we top data scientists. We refer to ourselves as aspiring citizen data scientists, the kind of people we see using SAP Analytics Cloud to do data discovery and analysis in a business setting. The point of us building the ViztheMadness bracket wasn’t to predict the perfect bracket since we knew that was never going to happen. We wanted to show how SAP Analytics Cloud could help us, a group of non-experts, build a bracket based on data rather than luck.

To assess how the DataGenius bracket is doing, we analyzed some of the publicly available bracket data. Turns out, we’re doing pretty well! Aggregated brackets built from user’s submissions to large platforms have 34 correct picks with 22 of those coming in Round 1, right in line with our results of 37 and 23. More impressively, the SAP Analytics Cloud model is outperforming all but the top experts. These experts include writers, insiders, broadcasters, analysts, and others that live and breathe college basketball. Based on our track record, we believe our strong results are on trend rather than an outlier.

Please don’t misunderstand: we aren’t saying that the DataGenius team now knows more than the experts. Far from it. By building a quality data model and leveraging the augmented analysis toolset in SAP Analytics Cloud, we were able to pick a bracket that is performing on-par with industry experts. If our group of casual fans can achieve this level of success, imagine what you can accomplish with data and expertise for your own business.

Check back next week after the net is cut down. We’ll review how the SAP Analytics Cloud model performed and crown our team champion.

Update: March 19, 2018 – Round 1 of the Tournament Doesn’t Disappoint!

The players left everything on the court and knocked down some incredible buzzer-beaters to help the tournament live up to the “madness” moniker. Our team-built bracket, based on our SAP Analytics Cloud model, predicted 23, or about 72%, of the Round 1 matchups correctly. Both Nic and JM picked one better using their own criteria weightings but all team members were basically equal—not unexpected when using the same data model for analysis.

A major aspect of bracketology, and the tournament in general, is the upsets that are bound to occur—this year being no exception. Nine Round 1 matchups ended with lower seeds beating their higher-seeded opponents, in line with the historic average of 8.4 Round 1 upsets per year from our 2007-2017 data set. What was completely unexpected was 16th seed UMBC utterly dominating tournament favourite Virginia to cement the Retrievers in the history books. We aren’t ashamed to say our analysis did not predict anything close to that result. In fact, only Chandran resisted choosing the Cavaliers as champions and now stands the best chance of claiming bracket bragging rights.

The team’s SAP Analytics Cloud model did predict 6 upsets, 3 of which came to fruition, however none involved teams seeded 1-5. Interestingly, SAP Analytics Cloud picked every 10 seed to upset their 7-seeded opponents even though only 38% of 7/10 matchups since 1985 have been upsets. We decided to take a closer look at why this might be.

We started with the Smart Discovery grouping for seeds and if we could use our human intuition to improve that aspect of the prediction. Turns out, SAP Analytics Cloud is grouping seeds 1, 2, 3, 4, and 10 together and predicting 1.89 tournament wins for those seeds. One of these things is not like the others!

This could be a function of our limited data set which only goes back 11 years or the fact that Syracuse made a deep run as the 10 seed in 2016. As is permanently the case, the data could always be better. Fortunately, SAP Analytics Cloud has a Smart Grouping feature that can help us dive deeper into the specific criteria identified by Smart Discovery and make better decisions based on the data we do have. Definitely something to look at for next year!

Come back next week for an update on how the team did predicting the next two rounds of the tournament.

Original Post

Don’t be alarmed if your friends, colleagues, or families in the United States are a little distracted this month. While preparing for tax season can certainly be exciting, that doesn’t explain why everyone is decked out in college regalia and basketball jerseys. In fact, it’s the annual basketball championship tournament that gets everyone whipped up in a frenzy, and the Data Genius team is no different. So hold on to your seatstoday we’re going to give you the insider’s look into our bracket.

We’ve shared the backstory on our collaborative Data Genius team bracket in John Schitka’s blog from last week. And Nic Smith, Chandran Saravana, and John gave more details and answered questions in the March 14 #askSAP Viz the Madness Live Chat, which you can watch anytime on replay (you won’t want to miss it!).

But like all competitive fans, our team didn’t stop there. We each built our own individual brackets, and you can see them (and choose your favorite) further down in the post.

You can also join in the fun and try it yourself.  Just:

  1. Grab some data. There are lots of publicly available data sets for college basketball stats and results such as Kaggle, Sports Reference, and KenPom.
  2. Download a free trial version of SAP Analytics Cloud. Upload the data, build your model, explore the insights, and build your own story.
  3. Join the (friendly) trash talk by sharing your results on Twitter using #vizthemadness and @SAPAnalytics.
  4. Stay tuned to this blog each week as we keep a running update about how our brackets are faring!

Ready to Dive Deeper into the Bracket-Building Process?

Selecting the winner of any tournament can be both fun and challenging—just like trying to determine the best future path can be challenging in business. It really is a team effort since differing viewpoints and experiences crop up.  Tools like SAP Analytics Cloud can help with that collaboration in addition to revealing  influencers in the data. That’s why it was the perfect tool to use for this year’s college basketball tournament predictions.

Like any good analytics project, it all starts with the data. Our team made the decision to gather historical data going back to the 2007 season. In SAP Analytics Cloud, we used the Modeler to wrangle our chosen data from websites, spreadsheets, and proprietary databases. From there, we defined our measures and dimensions and published the model so everyone in the group could start analyzing on their own.

And analyze we did! We took two weeks to noodle on the data before getting back together to decide which criteria we were going to use to pick the team bracket. Everyone agreed that Smart Discovery, one of the machine learning-enabled analysis methods in SAP Analytics Cloud, was the best way to determine which criteria would influence tournament outcomes.

Smart Discovery also allowed us to use our knowledge of basketball, or lack thereof, to inform our analysis by excluding unrelated factors or filtering by different dimensions. The speed at which we got results from Smart Discovery meant we could tweak, re-run, and iterate to gain insights fast. A nice bonus was the unexpected value report, which was essentially a list of Cinderella teams that had conquered the odds and captured our hearts over the years.

Collaboration Made Easy

The collaboration features within SAP Analytics Cloud made it easy to keep in touch and make sure everyone was on track. Perhaps the most surprising consequence of analyzing basketball data was how the trash talk seamlessly transitioned from the court to the cloud. The image below gives a blog-friendly example, but I assure you they weren’t all so kind!

Selecting the Criteria

After everyone had a chance to explore the data on their own, we gathered back together as a team to decide which criteria we’d use to pick the team bracket. This session mirrored the business scenarios that we’ve observed from SAP Analytics Cloud customers. The conversation went smoothly since the team had marked up the analyses with comments and discussions, and once the 2018 seeds were selected, we appended the new data to our model and built a dashboard to evaluate the match-ups.

From here, we were able to eyeball the relevant criteria and build our team bracket.


Going It Alone

After all was said and done, some of the team members thought they had uncovered crucial insight that didn’t make it into the team bracket. So we created our own, and we’re sharing them here.

Who will earn the bragging rights? Check back here every week to find out—we’ll update you with the running total of correct picks.

Nic Smith’s Picks

“Fresh off a resounding victory in last year’s tournament, I went back to what works. I ran a few Smart Discoveries to confirm I was on the right track and then used my personal know-how to focus on strength of schedule and number of possessions. Thankfully, our data set included offensive and defensive efficiency stats that had been adjusted for opponent as well as an adjusted tempo metric. Using these stats and a little secret sauce, I picked what’s destined to be this year’s winning bracket.” Nic

Nic Smith’s Picks

Chandran Saravana’s Picks

“All the Smart Discoveries I ran told me the same thing—seed is the most important factor that influences tournament wins. This is why upsets are so surprising. But the fact that every year there is at least one upset is part of what makes the tournament so interesting—no team is safe.

Therefore, in the early rounds I chose primarily based on seed, except for some tight 8/9 and 7/10 matches. Deeper in the tournament, I believe the seeds matter less and I used some of the other criteria the team had agreed upon. And of course, my beloved hometown North Carolina Tar Heels will make the championship (but I didn’t want to jinx it by picking them as champs).” Chandran

Chandran Saravana’s Picks

John Schitka’s Picks

“After the teams were announced Sunday, I did two things—created a bracket my traditional way and created one using the dashboard in SAP Analytics Cloud. They differed considerably, and I will be interested to see which one fares better (only John’s SAP Analytics Cloud bracket is posted here). When the teams were ahead on different influencers or very close on others it came down to a judgment call. When things were close I relied on the luck factor, and when things were pretty equal except for one influencer, I gave the nod to that influencer. And there were some cases where the stats slightly favoured one team, but contradicted influences such as luck or conference, which have a big “win cliff.” In these cases I made a call sometimes going with the stats, sometimes with conference.” John

John Schitka’s Pikcs

JM Lauzon’s Picks

Instead of using the full 11-year data set, I restricted my initial analysis to the last three years. However, this greatly reduced the quality of the insights since there were fewer tournament wins to model. I expanded my Smart Discovery out to five years to beef up the quality, but the model still returned some insights I considered strange, namely shooting guard defensive rebounds and point guard height as key influencers. I ended up using Seed, Adjusted Efficiency Margin, Steal Rate, and Conference in that order.

JM Lauzon’s Picks

Your Turn

You’ve seen what we came up with, now do you have what it takes to beat the SAP DataGenius bracket? What insights can you uncover? It’s as easy as 1, 2, 3…

  1. Grab some data. There are lots of publicly available data sets for college basketball stats and results such as Kaggle, Sports Reference, and KenPom.
  2. Download a free trial version of SAP Analytics Cloud. Upload the data, build your model, explore the insights, and build your own story.
  3. Join the (friendly) trash talk by sharing your results on Twitter using #vizthemadness and @SAPAnalytics.

Learn More

VN:F [1.9.22_1171]
Rating: 5.0/5 (5 votes cast)
#ViztheMadness with the SAP Analytics DataGenius Team, 5.0 out of 5 based on 5 ratings