Interested in Machine Learning & Data Mining (in Python)? 
On November 8th, we will be having a postdoctoral researcher coming in to speak about his work and some projects he’s completed. Dr. Kuusisto completed his Computer Science PhD in Machine Learning in 2015 at UW-Madison, and works in the Regenerative Biology lab at the Morgridge Institute to build models from genetic expression data that can predict when compounds are toxic to developing neurological tissues.

Thank you to everyone who came last week to see the Genetic Algorithms and SAS presentations! I hope you all managed to get some pizza. :^)

If you missed the meeting or wanted to see the slides again, here is what we covered.

Genetic Algorithms:

If you have more questions about Genetic Algorithms, email Matthew at:


Some reminders from Rachel:
The registration for Student Symposium  is November 16th, teams of 2-4 with a faculty advisory will compete in a data challenge with a data set with SAS software to use. The top 3 teams will be highlighted at the Global Forum in April 2019.
SAS E-Learning – Access code is : G70007601
Free University Edition software
Video tutorials –
Free Book on Data Science
Apply to Internships on Handshake or here

If you have any questions for Rachel regarding SAS, contact her at:

Hello dotData!

Meeting Tomorrow (10/11/2018)

Reminder that we have our meeting tomorrow on Getting into Research + Preparing a Résumé for Internships (7pm in CS1221).

View the slides here if you can’t make it!

As always, Adithya and I will stick around afterwords to answer any questions you may have regarding classes, as well as discuss ongoing projects. We can give you some personal feedback on your résumé. There may be cookies.

• Bring your résumé if you have one!

Hi Data Science Club,

Thanks to everyone for coming to the meeting! For anyone who wasn’t able to make it to the meeting, here is the presentation that was given. Sorry if I went a bit fast—I’d be happy to elaborate on any points I skimmed (email me or Adithya). For the next meeting, the topic will be:

Résumé Workshop for Tech Internships & Research
7pm on Thursday 10/11 in CS 1221

While this meeting topic is subject to change, I think it’d be good because it’s internship-hunting season.

Right now we’re looking at meeting every other week… But that said, there’s a lot of topics that people are interested in and a lot of things that I’d personally talking about. I’ll be sending out another email with a poll, listing some workshop topics for a meeting. If you have a topic you feel competent in or project that you did & would like to present, email me.

A recap of topics and resources for this past meeting:

I ended up discussing a lot more than I expected about prospective classes to support an interest in Data Science. Besides the slides in the presentation, here are some of the other resources I mentioned:

Someone spoke to me about internships for underclassmen, and it made me remember a resource I DEFINITELY wish I’d known about—an advisor mentioned it to me this last spring. They largely open up in December, but it’s definitely worth eyeballing now.

  • Research Internships through the National Science Foundation

A resource that I didn’t know about until I became the resource: tutoring in the computer sciences building. The word tutoring can have a stigma to it and feels inaccurate in this case, because anyone can just show up and have a pseudo-TA help them debug their programs and explain concepts. They’re guaranteed* to cover every class up to 400. After that, it’s down to what the tutors have personally taken.

Every Sunday-Wednesday there’s tutoring in the lounge immediately above the east entrance of the Computer Sciences building. It’s from 3-9pm for Mon-Wed, and 2-8pm on Sundays. I’m personally there from 6-9 on Mondays, and often from 5-9ish on Wednesdays. If you’d like to get involved as a tutor, contact Andrew Kuemmel!

The Graduate Program in Biostatistics at Vanderbilt University is seeking quantitatively oriented undergraduates (e.g., majors in statistics, mathematics, computer science, or quantitative sciences) who have an interest in pursuing a graduate degree in Biostatistics or Data Science. Our program has an emphasis on biomedical applications, statistical theory, and computational methods as well as a strong emphasis on traditional data science topics such as machine learning and computational algorithms. I have attached a copy of our program brochure. Please consider posting the brochure for students to see and forwarding on an electronic copy of this email to your students.

If you would like hard copies of our brochure, please contact our program manager, Amanda Harding at with your mailing address, and she would be happy to send them to you.

Vanderbilt DS/Biostats Brochure

Joe Tenini, a Data Scientist at Epic’s Inpatient Predictive Analytics R&D, will give this month’s DS3 Seminar. He will discuss the types of questions that his team is interested in, the massive data that they have access to, and some of his work that puts it all together.

The talk is public and should be accessible to anyone interested in data science. Join us! Please invite others! 10/28 at 3:30 in 140 Bardeen. Full details below:

Joe Tenini PhD, Data Scientist at Epic, Inpatient Predictive Analytics R&D.
When/where: 10/28 at 3:30 in 140 Bardeen.

Abstract: What would you do if you knew every medication administered, procedure performed, lab resulted, and diagnosis made for 190 million patients? What sort of questions could you ask? What sort of problems could you take on? What if you could deliver your insights directly to the patients and providers who need them?

For data scientists at Epic, these are questions we ask ourselves daily. In this talk we’ll discuss opportunities to put data to work in healthcare, the tools and technologies involved, and some specific challenges and solutions that come up during day to day work. This will be an interdisciplinary talk. Students and practitioners from all fields and experience levels are encouraged to attend and bring questions.

Bio: Joe Tenini joined Epic after receiving his PhD in mathematics from the University of Georgia. His current work centers on the modeling of patient deterioration and the development of early warning systems in the acute care setting.

If you are interested in connecting with Joe or the team at Epic, let us know!

Follow the event here:

An opportunity with the Milwaukee Bucks for a senior thesis! While the talk has past, feel free to follow up with Mike, Seth, or one of us to get more information:

Are you interested in writing a senior honors thesis using NBA data?
Mike and Seth (the analytics team at the Milwaukee Bucks) are open to sharing data and advice to students interested in NBA data projects.  If you are interested in pursuing this, please come to the talk today (info below).
As part of his talk, Mike will describe the types of data that are available.  Then, you will then need to submit a proposal for your project.
What should the proposal contain?
A proposal will likely contain these four elements:
(1) A focused question and a hypothesis.
(2) A description of the data that you will use.
(3) A rough description of how you would like to process the data.
(4) Preliminary thoughts on the types of analysis that will be performed and an idenficiation of key hurdles.
What makes a proposal great?
The proposal should clearly communicate the aims and methods.  The proposal should be focused and interesting.  If it is not obvious, it should explain why the proposed question is answerable with the available data.  The very best proposals use the publicly available data (there is a lot of it) to perform a preliminary analysis or a “feasibility study”.  Finally, the final product of the research should be useful for the team.
How will the proposal be judged?
Does the proposal clearly communicate the aims and methods?
Is it focused and interesting?
If it is not obvious, does it explain why the proposed question is answerable with the available data?
Is the final “product” of the research useful for the team?
The very best proposals use the publicly available data (there is a lot of it) to perform a preliminary analysis or a “feasibility study”.
How long should it be?
No more than 2 pages.  Shorter is better.
How do I submit the proposal?
Email a pdf to and by Nov 11.
About the Talk:
Abstract: In this presentation, we will discuss data science through the field of professional basketball. However, many of the topics covered will have wider applications. We will discuss our approach to basketball analysis using specific examples of data design, automation, and research. We will also discuss the importance of succinctly communicating the analysis and visualizing conclusions. Following the presentation, we will allow time for Q+A.