SPEAKERS & WORKSHOP PRESENTERS

Talk Titles | Workshop Info | Abstracts | Bios

Halim Abbas / Head of Data Science at Cognoa
“Learning from Cognitive Clinical Data – Challenges and Techniques”

Abstract: Predictive modeling as applied to the world of cognitive science comes with its own unique set of challenges. This talk showcases some of the problems, techniques, and algorithms that we had developed at Cognoa while focusing on that domain.Topics discussed include building predictors from noisy clinical datasets, feature selection and feature sequencing techniques adapted to clinical questionnaires, as well as ML performance metrics adapted from clinical science.

Bio: Halim is a high tech innovator who spearheaded world-class data science projects at game changing tech firms such as eBay and Quixey. Formally educated in Machine Learning, his professional expertise span Information Retrieval, Natural Language Processing, and Big Data. Halim has a proven track record of applying state of the art data science techniques across industry verticals such as eCommerce, web & mobile services, airline, BioPharma, and the medical technology industry.Halim currently leads the Data Science department at Cognoa, a data driven behavioral health care Palo Alto startup.

Trevor Bass / Founder of Bitten Labs
“What every executive needs to know about data science”

Abstract: With the vast amounts of data now available, companies in nearly every industry stand to capitalize on data to gain a competitive advantage. Studies have shown that data driven companies are more productive and profitable than their competitors and have higher stock market valuations. Not all organizations have the resources or desire to build an internal data science competency, but all can benefit from some degree of analytical orientation. Regardless of where an organization sits on the analytics maturity spectrum, its leaders need to know when and how to capitalize on data as a strategic asset. When armed with basic data science literacy, an organization’s leaders are best positioned to spearhead a data strategy. Executive sponsorship is also critical to the success of analytics related initiatives; lack of sponsorship is the top reason such initiatives fail.

This presentation covers, through practical case studies, the information that every business leader needs to know about data science. Attendees will leave better able to recognize data related opportunities, understand the potential benefits and risks, interpret business cases for analytics, and begin developing a data strategy.

Bio: Trevor Bass is a data scientist with a decade of experience building highly successful and innovative products and teams. He runs Bitten Labs, a data science management consultancy, education provider, and innovation lab, where he helps organizations figure out how data science can benefit them, implements pilot projects, and builds and scales analytics competencies. Prior to Bitten Labs, Trevor created and led through acquisition the data science function at payment processor Litle & Co. (acquired by Vantiv), which performed product R&D; drove customer acquisition, retention, upselling, and cross-selling; provided quantitative consulting throughout the company and for its customers; and established clear industry thought leadership. He holds a Master’s degree from Rutgers University and a Bachelor’s degree magna cum laude from Harvard University, both in mathematics.

Angela Bassa / Data Science Tech Advisor at Mirah
“Making behavioral health more data driven was never going to be painless”

Abstract: There are obvious upsides to making healthcare more objective and data-driven, but there are also significant obstacles in implementing many of the cutting edge approaches and technologies that data scientists use to disrupt existing processes. Strategically, known confounding factors are commonplace in behavioral health data: not only is attributing direct causality from intervention a difficult proposition, but key stakeholders are often resistant (clinicians don’t think they need help from a data perspective, and patients make these interactions tough to model). Tactically, UI/UX changes can require clinical/IRB approval, user acquisition is comparatively onerous (requiring BAAs, informed consent forms, etc.), and many standard analytics tools and techniques must be modified for ethical and regulatory reasons (e.g. scraping must not contain any PHI). These challenges are magnified by the fact that behavioral health data usually has much smaller datasets than other industries. In this talk, I’ll describe how our team of compassionate data scientists, technologists, and clinicians at Mirah are taking these challenges head on to generate important insights to enhance understanding about how therapy is working (or not working) and get patients on the path to recovery faster.

Bio: She serves as a Technical Advisor for Mirah, a Boston-based MedTech startup focused on making behavioral healthcare more objective and data-driven. She also leads Data Science at EnerNOC; her team’s groundbreaking accomplishments in cloud infrastructure and energy analytics maximize data insights on a real-time global scale. Her projects have gone on to earn accolades such as INFORMS’ Edelman award for Achievement in Operations Research and the Management Sciences; and the Massachusetts Innovation & Technology Exchange award for Big Data and Analytics Innovations.

Angela discovered data science while studying Math at MIT, only it wasn’t called that yet. Over the past two decades she has learned to lead data teams in academic, commercial, and industrial applications. She also has one patented invention, and 24 patents pending in the US, the EU, and Australia.

Gil Benghiat / Founder at DataKitchen
“Big Data Warehouse & Agile Analytic Operations: Pharma Case Study with Amazon Redshift and S3”

Abstract: The list of failed data warehouse projects is long. Operational projects often leave end-users, data analysts and data scientists frustrated with long lead times for changes and bug fixes. This case study will illustrate how to make changes to the data warehouse and dashboards quickly *and* with high quality.

For background, we first look at a pharmaceutical launch, present the Seven Shocking Steps to Agile Analytic Operations, explain Redshift and other Amazon Web Services (AWS) technologies, and show how to implement a Data Lake in AWS. The presenters then examine how to partition the work to teams and walk through examples on how to quickly implement features in the analytic system. Finally, we show how assets migrate between teams and review lessons learned.

The speakers are the founders of DataKitchen and have decades of hands on and executive management experience in data, analytics, and software development and are current practitioners of Agile Analytic Operations.

Bio: Gil Benghiat is one of three founders of DataKitchen, a company on a mission enable analytic teams deliver value quickly and with high quality. Gil’s career has always been data oriented starting with collecting and displaying network data at AT&T Bell Laboratories (now Alcatel-Lucent), managing data at Sybase (purchased by SAP), collecting and cleaning clinical trial data at PhaseForward (IPO and then purchased by Oracle), integrating pharmaceutical sales data at LeapFrogRx (purchased by Model N), and liberating data at Solid Oak Consulting. Gil has a Masters of Science in Computer Science from Stanford University and Bachelor of Science in Applied Mathematics/Biology from Brown University. He completed hiking all 48 of New Hampshire’s 4,000 peaks and is now working on the New England 67.

Christopher Bergh / Founder and Head Chef at Data Kitchen
“Big Data Warehouse & Agile Analytic Operations: Pharma Case Study with Amazon Redshift and S3”

Abstract: The list of failed data warehouse projects is long. Operational projects often leave end-users, data analysts and data scientists frustrated with long lead times for changes and bug fixes. This case study will illustrate how to make changes to the data warehouse and dashboards quickly *and* with high quality.

For background, we first look at a pharmaceutical launch, present the Seven Shocking Steps to Agile Analytic Operations, explain Redshift and other Amazon Web Services (AWS) technologies, and show how to implement a Data Lake in AWS. The presenters then examine how to partition the work to teams and walk through examples on how to quickly implement features in the analytic system. Finally, we show how assets migrate between teams and review lessons learned.

The speakers are the founders of DataKitchen and have decades of hands on and executive management experience in data, analytics, and software development and are current practitioners of Agile Analytic Operations.

Bio: Christopher Bergh is a Founder and Head Chef at DataKitchen. Chris has more than 25 years of research, engineering, analytics, and executive management experience. Previously, Chris was Regional Vice President at Model N. Before Model N, Chris was COO of LeapFrogRx, a descriptive and predictive analytics software and service provider. Prior to LeapFrogRx Chris was CTO and VP of Product Management of MarketSoft (now part of IBM) . Prior to that, Chris developed Microsoft Passport, the predecessor to Windows Live ID, a distributed authentication system used by 100s of Millions of users today. Chris began his career at the MIT Lincoln Laboratory and NASA Ames Research Center. Chris served as a Peace Corps Volunteer Math Teacher in Botswana, Africa. Chris has an M.S. from Columbia University and a B.S. from the University of Wisconsin-Madison. He is an avid cyclist, hiker, reader, and father of two teenagers.

Tyler Bird / Community Development Consultant at CARTO
“TBD”

Abstract: TBD

Bio: TBD

Richard Boire / Senior Vice President at Environics Analytics
“Do’s and Don’t of Predictive Analytics”

Abstract: Predictive Analytics is now a mainstream discipline as demonstrated by the development of data science departments within many organizations. Yet, unlike other disciplines such as marketing and accounting which have strong academic foundations, predictive analytics still relies on its practitioners in the formulation of knowledge as a foundation for learning both for practitioners and marketing end users. In this session, we discuss key do’s and don’ts within the predictive analytics practitioner’s discipline but discuss its ultimate practical impact to marketers. Case studies and examples both reinforce and provide much richer detail in such key areas as the ability to define the right business problem. Alongside the business problem , we also highlight how the lack of due diligence and detail to the data leads to ineffective solutions when the business problem has been correctly identified. Use of the right tools and techniques is also discussed but with emphasis on what is simple and understandable as it is these solutions that are more easily embraced by the marketing area. Simple is indeed better for most marketing solutions and the rationale for this is discussed in this session. Yet, besides the right do’s in predictive analytics, experience in the discipline has resulted in knowledge of what not to do. We highlight key areas in this area such as overstatement of model results and how marketers and data scientists can identify this issue. A specific case study is discussed to outline the process in identifying model overstatement. In the context of measurement and evaluation of predictive analytics solutions, setting up the right foundation and process is critical to its success.. Once again, an example is used to demonstrate how results became misleading due to the lack of a process in creating the right measurement framework.

Bio: Richard Boire’s experience in predictive analytics and data science dates back to 1983, when he received an MBA from Concordia University in Finance and Statistics. His initial experience at organizations such as Reader’s Digest and American Express allowed him to become a pioneer in the application of predictive modelling technology for all direct marketing programs. This extended to the introduction of models which targeted the acquisition of new customers based on return on investment. With this experience, Richard formed his own consulting company back in 1994 which is now called the Boire Filler Group, a Canadian leader in offering analytical and database services to companies seeking solutions to their existing predictive analytics or database marketing challenges. Richard is a recognized authority on predictive analytics and is among a very few, select top five experts in this field in Canada, with expertise and knowledge that is difficult, if not impossible to replicate in Canada. This expertise has evolved into international speaking assignments and workshop seminars in the U.S. , England, Eastern Europe, and Southeast Asia. Within Canada, he gives seminars on segmentation and predictive analytics for such organizations as Canadian Marketing Association (CMA), Direct Marketing News,Direct Marketing Association Toronto, Association for Advanced Relationship Marketing(AARM.) and Predictive Analytics World(PAW). His written articles have appeared in numerous Canadian publications such as Direct Marketing News, Strategy Magazine, and Marketing Magazine. He has taught applied statistics,data mining and database marketing at a variety of institutions across Canada which include University of Toronto, George Brown College,Seneca College, and currently Centennial College. Richard was Chair at the CMA’s Customer Insight and Analytics Committee and sat on the CMA’s Board of Directors from 2009-2012. . He has chaired numerous full day conferences on behalf of the CMA(the 2000 Database and Technology Seminar as well as the 2002 Database and Technology Seminar and the first-ever Customer Profitability Conference in 2005. He has most recently chaired the Predictive Analytics World conferences in both 2013 and 2014 which were held in Toronto. He has co-authored white papers on the following topics: ‘Best Practices in Data Mining’ as well as ‘Customer Profitability: The State of Evolution among Canadian Companies’. In Oct. of 2014, his new book on “Data Mining for Managers-How to use Data(Big and Small) to Solve Business Problems” was published by Palgrave Macmillian.

Yves Boussemart / Director at QuantumBlack
“From data science to impact”

Abstract: This talk will discuss how to get maximum real-world impact from advanced analytics.

Bio: Yves leads QuantumBlack in America. A computer engineer by training, Yves obtained a PhD from MIT in 2011 before working in mining and the finance industry.

Peter Bull / Co-founder DrivenData.org
“Machine Learning for Social Good: Building your first model”

Abstract: Come ready to get your data science on. The goal of this workshop is to build your first machine learning model. You’ll need a laptop and some enthusiasm to get started. We will start from raw data and end by making a set of predictions. This will be an applied workshop looking at a real problem. We’ll be modeling whether water pumps in Tanzania are functional or non-functional based on information about when they were installed, the kind of pump it is, and how it is managed. This will be based on a competition running on DrivenData.

The primary goal is for participants to understand that machine learning isn’t magic. In the time it takes to run a tutorial, we can load a dataset, train a model, and make predictions. The secondary goal is that participants will know where to look if they want to keep learning about data science or apply it to problems they are working on. One of the focuses of the example will be demonstrating the resources where participants can learn more about each of the steps in the process.

Bio: Peter is a co-founder at DrivenData, whose mission is to bring the power of data science to the social sector. Recently he has worked on projects in smart school budgeting, predicting trends in women’s healthcare, and improving public services by using novel data sources. He earned his master’s in Computational Science and Engineering from Harvard in 2013. Previously he worked as a software engineer at Microsoft and earned a BA in philosophy from Yale University.

Joseph Cauteruccio, Jr. / Machine Learning Engineer at Spotify
“Embarrassingly Parallel: The theory and practice behind distributed Machine Learning”

Abstract: Gathering, processing, and munging massive datasets is becoming easier with the advancements in tools like Spark and Hive and cloud services like EMR and DataFlow. While processing data on a distributed systems no longer requires writing MapReduce jobs from scratch, effectively implementing machine learning workflows that can leverage the same distributed systems is still a complex matter.

In this talk we will go over the parallel implementations of some popular machine learning algorithms for classification, regression, and clustering. In particular we will examine what, if any, algorithmic considerations have to be made to enable parallelization. Then we will discuss how versions of these algorithms are implemented in distributed systems. Finally, we will review some off-the-shelf implementations with MLlib and VW AllReduce.

Bio: Joe Cauteruccio is a Machine Learning Engineer at Spotify. His research interests include deep learning, manifold learning and distributed computing. Outside of work, Joe spends his spare time cooking, rock climbing and playing banjo, not necessarily in that order.

Prasad Chalasani / SVP, Data Science at MediaMath
“Estimating Causal Effect of Ads in a Real-Time Bidding Platform”

Abstract: A real-time bidding platform responds to incoming ad-opportunities (“bid requests”) by deciding whether or not to submit a bid and how much to bid. If the submitted bid wins, the user is shown an ad. Advertisers hope that ad-exposure leads to an increased likelihood of a desired action, such as a click or conversion (purchase, etc). So an important quantity that advertisers want to measure is the causal effect of advertising, namely, what is the response probability of an exposed user, compared with the counterfactual (un-observable) response-rate of the user if they were not exposed to the ad. In an ideal randomized test, the user is randomly assigned to test or control AFTER the submitted bid is won, and test users are served the ad in the normal way, while control users are not. While this is ideal from a statistical perspective, in practice this approach has the drawback that money spent by advertisers is wasted when a user is assigned to control. At MediaMath we have developed a methodology for causal effect measurement where users are assigned to test or control BEFORE bid submission. One challenge here is that not all test-group users are exposed to an ad; only a winning bid results in ad exposure, and the winning population can have a significant bias. This talk will describe our approach to handle this and other challenges to ad impact measurement in this setting, and how we use MCMC Gibbs sampling to arrive at confidence intervals for ad-impact.

Bio: Prasad Chalasani is the SVP of Data Science at Media Math, leading the development of innovative, proprietary scalable algorithms, and analytics that leverage massive amounts of data to power smarter digital marketing for the world’s leading advertisers. Prior to joining Media Math, Prasad led Data Science at Yahoo Research, and before that worked for 10 years as a quantitative researcher and portfolio manager of statistical trading strategies at hedge funds and at Goldman Sachs. Prasad holds a PhD in Computer Science from CMU and BTech in Computer Science from IIT.

Brad Cordova / Co-Founder/CTO at TrueMotion
“Moving Beyond Kaggle, Life Lessons Learned The Hard Way On How To Solve The Worlds Most Challenging Problems”

Abstract: Kaggle is a great place to improve and learn a specific part of data science. In real life applications however often times you don’t even have enough data to solve the desired problem at hand; the data you do have is extremely noisy, poorly labeled, or badly formatted; even worse the problem you are trying to solve may be ambiguous, or may not even be possible to solve. The Kaggle part of modern data science pipelines usually turns out to not be the most important or challenging. So then, what are the differences between the challenges faced in Kaggle and problems in the real-world? What do I do if I don’t have enough data? What happens if my classifier performs terribly on the first few tries because my data is so noisy? What are the most important skills to develop to become a prodigious data scientist in the often untamed real-world? We will discuss how to specifically tackle these problems and find out even though the problems we face are daunting, even the hardest of problems can be solved, and we as data scientists are a fundamental key in coming up with solutions to the worlds greatest problems.

Bio: Brad is a founder and CTO of TrueMotion. He graduated from MIT with a degree in Electrical Engineering and Computer Science, he previously worked at CERN in the Caltech group doing High-Energy Theoretical Physics. He was involved in a near fatal accident with a distracted driver using a cell phone, which had a profound impact on his life. He was blown away by the magnitude, difficulty, and lack of solutions to this problem. He left his PhD in May 2013 to pursue this with his full attention.

Leonard D’Avolio / CEO and Co-founder, Cyft
“The Future of Healthcare Analytics”

Abstract: The transition to value-based care has made it more important than ever for healthcare organizations to use their data to determine how best to achieve high quality care. In response, an onslaught of business intelligence vendors have descended on healthcare with the data warehouses, reporting tools, and dashboard analytics that led to tremendous efficiency in other industries. Unfortunately, healthcare is not like other industries. Decades of “fee for service” care have left us with up to 50% of clinically relevant information stored as unstructured free text. In addition, structured data as fundamental as what diseases people suffer from (ICD codes) can be up to 80% inaccurate. As value-based care organizations are discovering, these multi-million dollar investments are useful for understanding what happened – how many beds were filled, drugs prescribed, surgeries performed – but are incapable of answering the fundamental questions of value-based care: what should happen, to whom, and when. Fortunately, there are technologies that can tease signal from the noise of healthcare data: machine learning and natural language processing. Unfortunately, these technologies have, until now, only been accessible to PhDs and data scientists and take on average 1,000 man hours to develop into production-ready products. Ultimately, to be useful, these technologies must become ubiquitous. My talk will focus on of the current state of analytics, their strengths and weaknesses, and describe efforts to make machine learning useful and even accessible to healthcare. These efforts span 8 years of research, 15 published papers across 25 different use cases at 250+ hospitals, 17 health plans, and with several pharmaceutical partners with support from governments, non-profits, and now finally industry.

Bio: I’ve spent the last dozen years working to make data healthcare’s most valuable resource for improving patient lives. I’ve done so as an Assistant Prof at Harvard Med School & Brigham and Women’s Hospital, a Director at the Department of Veterans Affairs, a founder of two healthcare companies, as an advisor to several healthcare related non-profits, researcher, and writer. I led the development of one of the world’s largest genomic science programs (the Million Veteran Program, 575k enrolled and counting), implemented the world’s first randomized controlled trial embedded completely within routine care via an electronic medical record system, and proved that data science (machine learning + natural language processing) can be made accessible to non-data scientists via software. I also implemented a mobile phone-based + coaching based maternal and neonatal heath program in Uttar Pradesh India where, despite average incomes of just over a dollar a day we were able to implement a real time data feedback system improving over 60k births and counting. Today, I am the CEO and co-founder of Cyft a company dedicated to making prediction available for all of healthcare. My efforts to share lessons learned have spanned academic journals, popular press, lectures at universities, and talks at venues such as the Institute for Healthcare Improvement, TEDMed, and several US and Indian government agencies.

Jonathan Dahlberg / Customer Facing Data Scientist at DataRobot
“Leverage Modeling Technology to Optimize Healthcare Marketing”

Abstract: As healthcare systems balance both volume to value-based service delivery and marketing approaches, there is an even greater need to access and leverage all of an organization’s data, analytics, and digital communication assets. A significant piece of this requires predictive analytics. For example, consider the ability to predict which high-risk individuals that would benefit from health screenings, determining the propensity for certain cancers, and aiding physicians with optimal diagnoses and predicting hospital readmission risk. This presentation will review how Evariant, a leading CRM provider leverages DataRobot’s predictive modeling technology to help optimize healthcare marketing.

Bio: TBD

Bob Darin / Chief Analytics Officer at CVS Health
“Reinventing Pharmacy Care Using Analytics”

Abstract: CVS Health is using analytics as a key strategic enabler across all parts of our business. Learn how CVS Health is investing in analytics across people, process, and technology to improve the health of the patients we serve and unlock value in operational processes across a Fortune 7 company.

Bio: Bob Darin is the Chief Analytics Office for CVS Health and is responsible of the overall business strategy to invest in and enable analytics across CVS Health. His areas of responsibility include leading the investment strategy in analytic infrastructure, tools, talent acquisition, data management/ governance processes around analytics, and ensuring that these investments translate into measurable business value. He also has management responsibility for numerous analytic teams across CVS Health

He holds an honors MBA in analytic finance from the University of Chicago Graduate School of Business, and he received a magna cum laude degree in economics from Harvard College.

Subrata Das / Founder and President at Machine Analytics
“Social Media Text and Predictive Analytics”

Abstract: The proliferation of social media has led to an explosion of data and opinion, fueling interest in sentiment and social network analyses, especially as individuals, brands and corporations look to manage their reputations and keep up with business intelligence. In order to effectively distill valuable kernels of information from big data and deduce broader conclusions (sentiment), multiple techniques are necessary to employ. You will be taken beyond simple term analysis into the complex world of natural language processing and machine and deep learning learning for contextual search, document classification, text summarization, topic extraction and information structuring (triples extraction). We will analyze Twitter feeds and Facebook posts, derive sentiment from Amazon and TripAdvisor reviews, and relate market sentiment to stock price movement, using Machine Analytics’ cutting-edge text analytics tool aText and predictive analytics tool iDAS.

Bio: Dr. Subrata Das is the founder and president at Machine Analytics (www.machineanalytics.com), a company in the Boston area customizing big data analytics and fusion solutions for clients in government and industry. Subrata is also a technology consultant at MIT Lincoln Lab, an adjunct faculty at Villanova School of Business, and a part-time lecturer at Northeastern. Subrata has published many journal and conference articles. He is the author of five books including Computational Business Analytics, published by CRC Press/Chapman and Hall, and High-Level Data Fusion, published by the Artech House. Subrata recently spent two years in Grenoble, France, as the manager of over forty researchers in the document content laboratory at the Xerox European Research Centre. In the past, Subrata led many projects funded by DARAP, NASA, US Air Force, Army and Navy, ONR and AFRL. In the past, Subrata held research positions at Imperial College, London, received a PhD in Computer Science from Heriot-Watt University in Scotland, and masters from University of Kolkata and Indian Statistical Institute. Subrata’s technical expertise includes a broad range of computational artificial intelligence, machine learning, and deep linguistics processing techniques. Subrata has published many conference and journal articles, edited a journal special issue, and regularly gives seminars and training courses based on his books. Subrata is proficient in multiple programming and scripting languages including Java, C++ and R, and in various database systems. He has conceived and developed in-house tools aText, iDAS and RiskAid that provide underlying engines of the Machine Analytics products.

Jordi Diaz / Senior Data Scientist at Pixability
“Introduction to Bayesian Inference with Python and PyMC3”

Abstract: Bayesian Inference is becoming an essential tool for data scientists as a complement to machine learning and classical statistics. Bayesian methods excel at creating richly informed predictions from interpretable models for big and small datasets. Recent advances in sampling algorithms and computational libraries have made Bayesian inference more accessible and applicable to a broader spectrum of real-world problems in industry and academia. In this talk we introduce Bayesian methods for data scientists through practical examples. We learn how to build Bayesian models from scratch with Python and PyMC3, a powerful Probabilistic Programming library that allows for automatic Bayesian inference on user-defined probabilistic models.

Bio: Jordi Diaz is a Senior Data Scientist at Pixability, where he builds statistical models and machine learning algorithms to understand and predict the behavior of audiences on YouTube, Facebook, Instagram and Twitter. He is the organizer of Boston Bayesians, a meet-up group for those interested in Bayesian methods for statistics and machine learning. Jordi has ten years of experience in data analysis, machine learning and software engineering. Previously, he worked at Qualcomm, developing wireless technology that powers millions of smartphones worldwide. Jordi received his PhD in Electrical Engineering from the New Jersey Institute of Technology, where he researched statistical signal processing and information theory for wireless communications. He holds a Telecommunication Engineering Degree from UPC, Barcelona, Spain and was a fellow of the Advanced Study Program at M.I.T.

Bill Disch / Director of Analytics at Evariant
“Leverage Modeling Technology to Optimize Healthcare Marketing”

Abstract: As healthcare systems balance both volume to value-based service delivery and marketing approaches, there is an even greater need to access and leverage all of an organization’s data, analytics, and digital communication assets. A significant piece of this requires predictive analytics. For example, consider the ability to predict which high-risk individuals that would benefit from health screenings, determining the propensity for certain cancers, and aiding physicians with optimal diagnoses and predicting hospital readmission risk. This presentation will review how Evariant, a leading CRM provider leverages DataRobot’s predictive modeling technology to help optimize healthcare marketing.

Bio: As Director of Analytics at Evariant, Bill’s focus is on design, execution, and implementation of optimal analytics. Maximizing ROI as a result of multivariate predictive modeling is a primary goal. Bill has been a presenter at major marketing, analytics, and academic conferences including the DMA, AMA, APA, APHA, and GSA. His background as an experimental and health psychologist has included teaching as well as several years of clinical work. His specialty areas include older adults, community-based research, HIV/AIDS, sexual behavior and risk, environmental stress, influenza and pneumonia vaccination, depression and mental health, medication and health literacy. Dr. Disch earned his Ph.D. in experimental psychology from the University of Rhode Island, with a specialty in quality of life and well-being, and mixed-methods (quant, qual, mixed).

Gerard Dwan / Informationist at Knowledgent
“Knowledge > Information”

Abstract: Information is everywhere. Now more than ever, people and organizations are collecting, sending, sharing, and analyzing data in the hopes to provide context (and advice) for action. There is, however, still a need for human intervention. When companies need to drive decisions to change corporate strategy, pivot to a new market, or hire new staff, they need to do so with data AND an understanding of their business. This talk will reveal why people are an essential part of the data process and why Knowledge is greater than Information alone.

Bio: Gerard Dwan is a Solution Partner at Knowledgent, a data and analytics firm. There he specializes in bringing Big Data to Big Business. He acts as a technical lead in several large-scale enterprise deployments of cutting edge data platforms. Prior to this, he was the Manager of Solution Architecture at Attivio, leading a team of technical problem solvers responsible for the design and validation of client solutions. Prior to his management role he was a Solution Architect at Attivio helping large companies obtain analysis and understanding of their big data and content.

Maren Eckhoff / Senior Data Scientist at QuantumBlack
“Predicting machine failures”

Abstract: With the advent of the Internet of Things, companies collect a wealth of data that can be used to monitor degradation of their assets. In this talk, we will discuss how to get from the raw data to a production system that predicts failures and drives an optimal maintenance strategy. The first part of the talk will explore suitable machine learning techniques and some of the feature engineering challenges. The importance of combining different data sources will be explained. In the second part, we will discuss smart variable creation and secure data processing using lambda architectures.

Bio: Maren is a Senior Data Scientist at QuantumBlack leading the Data Science work on client and internal projects. Before joining QuantumBlack, Maren developed forecasting methods for Tesco to predict the customer demand in stores. Maren holds a Ph.D. in Probability Theory and two Master’s degrees in Mathematics.

Eric Estabrooks / Founder and VP of Cloud and Data Services at Data Kitchen
“Big Data Warehouse & Agile Analytic Operations: Pharma Case Study with Amazon Redshift and S3”

Abstract: The list of failed data warehouse projects is long. Operational projects often leave end-users, data analysts and data scientists frustrated with long lead times for changes and bug fixes. This case study will illustrate how to make changes to the data warehouse and dashboards quickly *and* with high quality.

For background, we first look at a pharmaceutical launch, present the Seven Shocking Steps to Agile Analytic Operations, explain Redshift and other Amazon Web Services (AWS) technologies, and show how to implement a Data Lake in AWS. The presenters then examine how to partition the work to teams and walk through examples on how to quickly implement features in the analytic system. Finally, we show how assets migrate between teams and review lessons learned.

The speakers are the founders of DataKitchen and have decades of hands on and executive management experience in data, analytics, and software development and are current practitioners of Agile Analytic Operations.

Bio: Eric Estabrooks is a Founder and VP of Cloud and Data Services at DataKitchen where he is focusing on client delivery and AWS cloud operations. Prior to DataKitchen, Eric was the VP of Cloud and Data Services at LeapFrogRx, acquired in 2013 by Model N (MODN), where be built a high performing data services team that had quality and process improvement baked into its DNA. LeapFrogRx provided a SaaS for analyzing sales & marketing data for its Pharma customers. Before coming to Boston, Eric was a lead developer and software architect at Premisys Corporation, acquired by J.D. Edwards, where he helped develop a Configure, Price, Quote (CPQ) platform for manufacturers of highly customizable products in a variety of industries. Eric holds a B.S. in Mechanical Engineering from Penn State University. He was bit by the software bug, early in his career, while working on Finite Element Analysis and implementing algorithms to model material deformation at high temperature. He is looking forward to start bagging those 4k peaks with his young son.

Daniel Ferrante / Chief Data Officer at SFL Scientific
“Overview of Data Science with Applications in Healthcare”

Abstract: The amount of data that will be generated globally will continue to explode exponentially. Areas that have traditionally been developed heuristically, such as education, agricultural sciences and healthcare for example, are now moving towards more rigorous data-driven techniques. In healthcare, spreads of diseases are being modelled using historic data, patients are getting immediate diagnoses using text analytics on doctors’ notes, and MRIs/X-rays/CAT scans can all be classified automatically using machine vision. In this overview, I will discuss applications and algorithms across all major sub-disciplines of machine learning: including clustering, time-series analysis, natural language processing and machine vision.

Bio: Dr. Ferrante completed his Ph.D. in theoretical physics at Brown University, winning the Physics Department’s awards for Scholarship and for Excellence in Teaching. He has since worked at Cold Spring Harbor Laboratory on neuroscience related projects, having been a Fellow at the Swartz Foundation for theoretical and computational neuroscience since 2012. He is an expert in applied and computational methods, and analytical modeling of complex systems. His most recent work in neuroscience concerned the use of Topological Data Analysis to understand the relationship between autism spectrum disorder and olfaction. He used Persistent Homology to cluster and classify a series of genes and used this information to correlate against a comprehensive dataset in autism from which he gathered novel insight into autism. In the field of complex systems, he introduced the notion of mollifiers and polynomial chaos to generate controlled approximations to certain nonlinear dynamical systems.

Chuck Freedman / Chief Developer Advocate, Cloud Platforms Group, Big Data Solutions at Intel
“Getting Started with the Trusted Analytics Platform”

Abstract: This is hands-on and you will need to bring your laptop to participate. Alternatively, you can follow on or pair up with someone else.

In this lab you will learn how to use the Trusted Analytics Platform, an open source collaborative project, to create a data science solution workflow that ingests data, use cloud instances of Spark and Jupyter to model data, and deploy a simple visualization solution around the post-processed data.

After completing this lab, you will have better understanding of data science and analytics fundamentals towards deploying a cloud-based solution in TAP.

Bio: Chuck Freedman is Chief Developer Advocate for Big Data Solutions at Intel. For over a decade, Chuck has led engagement and support of developer communities working with platforms in all industries. He has spoken at over 50 events bringing solutions, best practices, and new technology to individuals and organizations. With a constant goal of enabling innovation and adding value to applications and business, Chuck is currently focused on bringing developers and data scientists into a new era of collaboration with Intel’s latest cloud platforms.

Andrea Gallego / COO of Cloud platform at QuantumBlack
“Predicting machine failures”

Abstract: With the advent of the Internet of Things, companies collect a wealth of data that can be used to monitor degradation of their assets. In this talk, we will discuss how to get from the raw data to a production system that predicts failures and drives an optimal maintenance strategy. The first part of the talk will explore suitable machine learning techniques and some of the feature engineering challenges. The importance of combining different data sources will be explained. In the second part, we will discuss smart variable creation and secure data processing using lambda architectures.

Bio: Andrea is COO of QuantumBlack’s Cloud platform. She also manages the cloud platform team and helps drive the vision and future of McKinsey Analytics’ digital capabilities. Andrea has broad expertise in computer science, cloud computing, digital transformation strategy and analytics solutions architecture. Prior to joining the Firm, Andrea was a technologist at Booz Allen Hamilton. She holds a BS in Economics and MS in Analytics (with a concentration in computing methods for analytics).

Laurent Gautier / Senior Investigator at Novartis Institutes for BioMedical Research
“Polyglot data analysis with Python, R, SQL, and Spark”

Abstract: There exists several languages for data analysis, each with its strengths and libraries. We will demonstrate with practical examples that mixing them can be efficient, and not as inelegant as one might fear.

Bio: Laurent is a long-time R user and original core member of the bioconductor project, and about equally long-time user of Python. The intersection of both made him write a popular bridge between Python and R (rpy2). He also earned a M.Sc.Eng. and a PhD. During the day he can be found somewhere between Data Science and Life Sciences.

Robby Grodin / Data Science Instructor and Subject Matter Expert at General Assembly
“Learning While Doing: Perpetual Growth In Data Science”

Abstract: Getting into the Data Tech industry can be difficult enough, but once you’re here, how do you stay relevant? Keeping up with innovations across an expansive industry can be daunting without guidance. A demanding work week leaves very little time to keep up on the new technologies and paradigms that enter the data engineer’s lexicon on a daily basis. In this talk, we will discuss how to learn data science on your own, both on the job and during your spare time. We will also discuss mentorship – how to find mentors, how to learn from one, and how to be a mentor as well.

Bio: Robby Grodin is a Data Engineer at Wayfair and Co-Founder at Toy Pig Co, a digital experience consultancy. As an educator, Robby has dedicated himself to mentoring and coaching individuals who are new to the technology industry. For the last three years, he has been a part of the General Assembly community through instructing, mentoring, and curriculum writing. His course offerings at GA include Intro to SQL, Data Science 101, Programming With Python, and more. In 2016, Robby was enlisted as a Subject Matter Expert for the Data Science Immersive course, contributing scoping and materials for the program which is currently being taught around the world.

Eric Gunther / Co-founder of Sosolimited
“Expressive Data”

Abstract: Data is a powerful material for creative expression. Looking beyond the visualization of data, we can use it to design physical forms. We can paint with it and sculpt with it. We can use it to tell human stories. Within its patterns and structures lies as much beauty as insight. We will speak about using data as an expressive material to connect with people, institutions, and ideas. We will look at alternative approaches to working with data, and talk about striking a balance between legibility and expression. We will show some of our projects that sit at the intersection of design, technology, and data, including a chandelier that turns global data into light, a platform for transforming images into clothing, a public light sculpture that visualizes 311 requests, and a building-scale installation that reveals social media conversations about innovation.

Bio: Eric Gunther is a cofounder of Sosolimited, where he works on creative applications of new technologies in entertainment, architecture, and data. Sosolimited has created data driven sculptures, social media powered light shows, and interactive experiences for clients including Twitter, Google, HBO, IBM, Intel, Porsche, and Vice. Eric is a producer and has composed music for radio, dance, and multimedia installations. In his personal art practice, he builds vibrating sound sculptures, works of art designed for the sense of touch. He loves to dance and experiment with the intersection of dance and technology. With collaborator Jeff Lieberman, he directed a time-bending dance music video for the band OK Go.

Juliet Hougland / Head of Data Science, Engineering at Cloudera
“Measuring Software Quality with Lessons from Epidemiologists, Actuaries and Charlatans”

Abstract: Is our software any good? Is our work on it making it better or worse? Can we quantify how much it has changed?

Engineering organizations face these questions constantly, and know there are not any easy answers. Luckily, we can draw on well known risk assessment techniques from epidemiologists and actuaries. We will explore the historic development of these ideas from studying the effects of smoking to setting maritime cargo insurance rates in babylon, ancient greece, and victorian england. This talk will focus on how Cloudera measures and compares quality of our software.

A useful as observational methods of risk assessment are, they are also easy to misuse and misinterpret. We will discuss some choice examples of misuse and abuse of analytic methods, with examples from Newton’s Principia to particle physicists, and hopefully avoid our own charlatanry in the future.

Bio: Juliet Hougland answers complex business problems using statistics to tame multi-terabyte datasets. She succeeds in applying and explaining the results of mathematical models across a variety of industries including Software, Industrial Energy, Retail and Consumer Packaged Goods. Juliet is currently the Head of Data Science, Engineering at Cloudera where she focuses on using data to help engineering build high quality products. Juliet’s been sought after by Cloudera’s customers as a field-facing data scientist advising on which tools to use, teaching how to use them, recommending the best approach to bring together the right data to answer the business problem at hand and building production machine learning models. For many years Juliet has been a contributor in the open source community working on projects such as Apache Spark, Scalding, and Kiji. Her evangelism of technology and data science also extends to delivering highly regarded technical talks and serving as the technical editor for Learning Spark by Karau et al. and Advanced Analytics with Spark by Ryza et al. Juliet holds an MS in Applied Mathematics from University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in Math-Physics.

Amara Keller / Analytics Platform Community Manager at Intel
“Getting Started with the Trusted Analytics Platform”

Abstract: This is hands-on and you will need to bring your laptop to participate. Alternatively, you can follow on or pair up with someone else.

In this lab you will learn how to use the Trusted Analytics Platform, an open source collaborative project, to create a data science solution workflow that ingests data, use cloud instances of Spark and Jupyter to model data, and deploy a simple visualization solution around the post-processed data.

After completing this lab, you will have better understanding of data science and analytics fundamentals towards deploying a cloud-based solution in TAP.

Bio: Amara graduated from Trinity University in Computer Science. She has a background in enterprise application development prior to stepping into the community manager role for the Trusted Analytics Platform. She works with TAP to help grow the community of developers, data scientists, and system operators. She is an active member of Women Who Code Portland, currently serving as the Community Lead. She constantly strives to inspire those in and around tech.

Boris Kerzhner / Senior Data Scientist at BookXchange
“Case Study: Top 3 Pharma Company in the World Divide and Conquer: A fascinating Resource Deployment Optimization Strategy with Predictive Analytics”

Abstract: An amazing journey into one of the world-class problems facing the top 3 pharma company in the world. At the core of the problem is a strategy of how to optimally deploy a limited amount of resources to optimize profit. Framing the problem into an almost experimental design, we will combine several key KPIs that drive profit with consumer behavior and predictive analytics, including CHAID and random forests to derive the guiding strategy for this top 3 pharma company. The strategy represents a significant improvement over the outgoing predecessor and offers a quantitative approach that can be replicated for many other similar type problems — it has been approved and will be implemented.

Bio: Boris Kerzhner is a Senior Data Scientist at BookXChange. A mathematician with a graduate degree in Bio Informatics he now helps top 3 pharma company leverage advanced and predictive analytics to optimize its resource allocations. Combining his experience with Big Data and using many of the tools and techniques he had learned for Bio-Informatics he finds answers to some classic questions of customer segmentation, customer lifetime value, fraud detection, and profile matching, awards and rankings just to name a few. Passionate about data science, he is always eager to engage in a discussion around different modeling techniques and algorithms. Aside from deriving insights from BIG DATA, Boris enjoys chess and recreational mathematics from which he draws new ideas to implement in the real world.
Boris holds a graduate degree in BioInformatics from the Georgia Institute of Technology (GIT). He holds an undergraduate degrees (Summa Cum Laude) in Mathematics and Psychology with minors in Cognitive Science and Philosophy from Georgia Institute of Technology, where he graduated valedictorian.

Clay Kim / Senior Manager of Data Science at Localytics
“Deep Convolutional Neural Networks for Mobile App Churn Prediction”

Abstract: Mobile App user data can be represented as images in order to perform churn, purchase, and engagement prediction using deep convolutional and recurrent neural network architectures. Each of our customers’ apps is unique, and new models with varying architectures must be created in order to maximize predictive power. By using dozens of temporal and non-temporal features, we have created a powerful, scalable system that determines the factors related to app behavior, unique to each application.

Bio: Clay Kim is the Senior Manager of Data Science at Localytics, responsible for managing applications of machine learning on mobile analytics and marketing data.

Laura Kinkead / Analytics associate at athenahealth
“Using Data to Unbreak Healthcare”

Abstract: Matt Ritter and Laura Kinkead are on the Data Science team at athenahealth, a cloud-based software company that provides medical billing, electronic medical records, and patient portals for medical groups and health systems. Athena’s rich, structured data covers 80,000 providers and 80M patients, and allows us to investigate interesting Data Science questions with implications for public health (for instance, we’ve developed a regional flu tracker that’s more accurate and timely than the CDC). We will give a window into Data Science at athenahealth: what we do day-to-day, what our typical projects look like, what tools we use, plus an in-depth look at a current machine learning project: predicting which denied claims our clients should fix and rebill, and which they should ignore.

Bio: Laura Kinkead is on Athena’s data science team and is an expert on athenaCollector, Athena’s medical billing and revenue cycle management product. Her work focuses on harnessing Athena’s network data with machine learning to make medical billers’ work easier. She earned a masters in Computer Science from Rensselaer Polytechnic Institute. Outside of work, she leads a local Girls Who Code group, helping middle and high school girls discover the world of coding.

Sri Krishnamurthy / Founder of QuantUniversity.com
“What am I missing? Best practices in handling missing data in large data sets”

Abstract: Doing Data science is fun but it is estimated that more than 80% of the time is spent in cleansing data. Handling missing data is essential to ensure the integrity of the modeling process. Brute force methods of removing records with missing data may come in handy when we have large amounts of data but may not be effective when working with temporal data.In this workshop, we will discuss the key techniques in missing data analysis and share best practices in handling missing data. We will also share a case study using Apache Spark to create ETL pipelines primarily with the goal of addressing missing data in large data sets.

Bio: Sri is the founder of www.QuantUniversity.com, a data and Quantitative Analysis Company and the creator of the Analytics Certificate program (www.analyticscertificate.com ). Sri has more than 15 years of experience in analytics, quantitative analysis, statistical modeling and designing large-scale applications. Prior to starting QuantUniversity, Sri has worked at Citigroup, Endeca, Mathworks and with more than 25 customers in the financial services and energy industries. He has trained more than 1000 students in quantitative methods, analytics and big data in the industry and at Babson College, Northeastern University and Hult International Business School. In 2016, QuantUniversity will be offering the Analytics Certificate Program in Boston to train the next generation of analysts enabling them to leverage data science and big data technologies to scale up analytics in the enterprise.

Ramesh Kumar / CEO and Co-founder at zakipoint health
“Transparency in healthcare: can we get it please”

Abstract: TBD

Bio: Ramesh Kumar is the CEO of zakipoint Health – a healthcare analytics and cost management platform company. Ramesh is an expert in the field of technology, data science and value based healthcare. Ramesh has led the efforts at zakipoint Health to form partnerships with some of the leading healthcare services companies.

Previously, Ramesh co-founded ActiveMedia Technology, a mobile marketing and CRM platform that grew to a multi-million dollar business and was sold to a publicly listed company. Ramesh holds B.A. and MEng from Oxford, UK and MSc. from UPenn in Operations Research and has completed unit I of OPM program at Harvard Business School.

Ian Lassonde / Senior Associate of Risk Management Technical Resiliency
“Making Data Talk: Actionable Analytics to Drive Change”

Abstract: Albert Einstein once said, “If you can’t explain it simply, you don’t understand it well enough.” Manipulating data, creating new visualizations, or developing metrics are useless unless you are able to tell a story and tie specific actions that correspond to the data you are presenting. Developing actionable analytics requires the ability to not only manipulate and interpret the data, but to understand what story needs to be told. In this presentation I will explore how to develop effective visualizations and metrics through a variety of examples that are currently used by senior business leaders at a Fortune 100 Financial Services company. I have used large internal data sets from a variety of sources to develop actionable metrics that are used in strategic risk based decisions regarding the company’s information technology environment. This presentation will provide the audience the tools necessary to effectively develop and present actionable analytics in order to drive change in their organization.

Bio: Ian Lassonde is a Senior Associate in Technical Resiliency at TIAA. While at TIAA, in addition to his other responsibilities, he has led a data-driven solution that visualizes the current IT environment in order to make effective risk based decisions. Prior to his role at TIAA, he was the founder of Fifth Law LLC, a cyber-security company focused on protecting organizations from realistic threats by simulating potential cyber-attacks. He was an officer in the United States Marine Corps and a graduate of the United States Naval Academy, where he graduated with a degree in computer science.

Victor S.Y. Lo / Vice President, Data Science, Workplace Investing at Fidelity Investments
“From Uplift Predictive Analytics to Uplift Prescriptive Analytics: Examples and Applications”

Abstract: Traditional randomized experiments allow us to determine the overall causal impact of a treatment or intervention program (e.g. marketing, medical, political, social, education). Uplift modeling takes a further step to identify individuals who are truly positively influenced by a treatment through statistical modeling or machine learning. This technique allows us to identify the “persuadables” and thus optimize target selection in order to maximize treatment benefits. This important subfield of data mining, data science, or business analytics has gained significant attention in areas such as personalized marketing, personalized medicine, and political election with increasing publications and presentations in recent years from both industry practitioners and academics.

In this workshop, I will introduce the concept of Uplift, review multiple Uplift Predictive Analytics methods, and contrast with the traditional approach. Our discussion will include approaches to handling a general situation where only observational data are available, i.e. without randomized experiments, applying methodologies from causal inference. We will then introduce Uplift Prescriptive Analytics, which extends the single treatment to multiple treatments with optimization solutions at the individual level. Alternative optimization methods for handling uncertainty will also be discussed. Although the talk is geared towards marketing applications (“personalized marketing”), the same methodologies can be readily applied in other fields such as insurance, medicine, education, political, and social programs. Examples from the retail and non-profit industries will be used to illustrate the methodologies.

Bio: Victor S.Y. Lo is a seasoned Big Data, Marketing, Risk, and Finance leader and innovator with over two decades of extensive consulting and corporate experience employing data-driven solutions in a wide variety of business areas, including Customer Relationship Management, Market Research, Advertising Strategy, Risk Management, Financial Econometrics, Insurance, Product Development, Transportation, and Human Resources. He is actively engaged with Big Data Analytics, causal inference, and is a pioneer of Uplift/True-lift modeling, a key subfield of data mining. Victor has managed teams of quantitative analysts in multiple organizations. He currently is a Vice President leading the Data Science team in Workplace Investing at Fidelity Investments. Previously he managed the advanced analytics teams in Personal Investing and Managerial Finance at Fidelity Investments. Prior to Fidelity, he was VP and Manager of Modeling and Analysis at FleetBoston Financial (now Bank of America), and Senior Associate at Mercer Management Consulting (now Oliver Wyman). For academic services, Victor has been a visiting research fellow and corporate executive-in-residence at Bentley University. He has also been serving on the steering committee of the Boston Chapter of the Institute for Operations Research and the Management Sciences (INFORMS) and on the editorial board for two academic journals. Victor earned a master’s degree in Operational Research and a PhD in Statistics, and was a Postdoctoral Fellow in Management Science. He has co-authored a graduate level econometrics book and published numerous articles in Data Mining, Marketing, Statistics, and Management Science literature.

Beth Logan / Vice President, Optimization at DataXu
“Big ML @ DataXu”

Abstract: DataXu’s mission is to make marketing smarter through the use of data science. Our core technology processes over 1.8 million ad requests per second and uses machine learning and other optimization techniques to buy the ads slots that are most valuable for our customers. Our system processes 2PB of data per day and runs unattended 24×7 in over 30 countries. This presentation will discuss steps we took to run ML at scale, including recent improvements.

Bio: Beth is the VP of Optimization at DataXu, a leader in programatic marketing. She has made contributions to a wide variety of fields, including speech recognition, music indexing and in-home activity monitoring. Beth holds a PhD in speech recognition from the University of Cambridge.

Sean Lorenz / Founder & CEO at Senter
“Bringing Intelligence to Health at Home”

Abstract: Wearables and the IoT are hot topics these days, but all this data is meaningless until we begin applying machine intelligence to it. This talk will explore practical use cases for how machine intelligence can be used to keep us healthier at home using off-the-shelf smart home, wearable, and connected health products.

Bio: CEO & Founder of @SenterIoT – a smart home hub for better health; computational neuroscience PhD survivor

Marty Lurie / Senior Systems Engineer at Cloudera
“Tracking Drug Reactions on Hadoop”

Abstract: Monitoring events to detect medical outbreaks, security attacks, marketing trends, etc requires leading technologies for ingestion, computation, storage, and display. In this article we’ll demonstrate how to use Hadoop and several components to ingest and report adverse drug reactions in near real time. This real time capability shows how much Hadoop has evolved from the batch-processing offering of ten years ago, circa 2006. The ingestion will combine Kafka and Flume. A Cloudera Hadoop Cluster will store the data with analysis including SQL via the Apache Impala project and visualization using HUE and Solr. Auditing, required for any system with sensitive data, will be performed by Cloudera Navigator

Bio: Marty Lurie started his computer career generating chads while attempting to write Fortran on an IBM 1130. His day job is Hadoop Systems Engineering at Cloudera, but if pressed he will admit he mostly plays with computers. His favorite program is the one he wrote to connect his Nordic Track to his laptop (the laptop lost two pounds, and lowered its cholesterol by 20%). Marty is a Cloudera Certified Hadoop Administrator, Cloudera Certified Hadoop Developer, an IBM-certified Advanced WebSphere Administrator, Informix-certified Professional, Certified DB2 DBA, Certified Business Intelligence Solutions Professional, Linux+ Certified, and has trained his dog to play basketball. You can contact Marty at marty@cloudera.com.

William Lyon / Developer Relations Engineer at Neo4j
“Building a Real-Time Recommendation Engine with Neo4j and Python Data Tools”

Abstract: In this session we will show how to build a meetup.com recommendation engine using Neo4j and Python. Our solution will be a hybrid which makes uses of both content based and collaborative filtering using Neo4j to glue all the data together, Cypher to query the dataset and Python to do analysis and pre/post processing of data.

Our solution will be a hybrid which makes uses of both content based and collaborative filtering to come up with multi layered recommendations that take different datasets into account e.g. we’ll combine data from the meetup.com and twitter APIs.

We’ll evolve the solution from scratch and look at the decisions we make along the way in terms of modeling and coming up with factors that might lead to better recommendations for the end user.

Bio: William Lyon is a software developer at Neo4j, the open source graph database. As an engineer on the Developer Relations team, he works primarily on integrating Neo4j with other technologies, building demo apps, helping other developers build applications with Neo4j, and writing documentation. Prior to joining Neo, William worked as a software developer for several startups in the real estate software, quantitative finance, and predictive API fields. William holds a Masters degree in Computer Science from the University of Montana. You can find him online at lyonwj.com

Scott MacGregor / Engineering Director at Akamai Technologies
“What Big Data will Require in an IoT World”

Abstract: The data management infrastructure needed to support IoT implementations will be those big data infrastructure implementations we are currently inventing, designing, and building – but with even greater capacity. This infrastructure will power the evolving analytics infrastructure to be more intelligent and more automated. This session will provide insights into what steps organizations are doing now to ensure their infrastructure is in place as data generation continues to explode.

Bio: TBD

Christopher Mack / Director of Customer Engineering at Basis Technology
“”gRbg in”: Accurate NLP for your Application”

Abstract: Natural language processing is an integral part of any modern AI application. How do you measure the quality of the text analysis in your architecture? Which natural language technologies are going to perform best on your data? Understanding how to reliably evaluate text analysis output is the first step to making any improvements to a system using natural language. I will present several text analysis platforms side by side and discuss how to conduct your own evaluations for the most common natural language processing tasks.

Bio: Chris Mack is the Director of Customer Engineering at Basis Technology. His team designs and implements solutions using text analytics software. Chris has spent the last 20 years in software development, data analytics, business strategy, and business operations. He received his BS in Management from Bentley University where he also studied Computer Information Systems.

Mark Marinelli / CTO at Lavastorm Analytics
“Design principles of impactful analytics CoEs and the agile imperative”

Abstract: While enterprises are comprised of various business and technical departments, each still needs to be speaking a common language when it comes to data. Implementing an impactful analytics CoE (Center of Excellence) not only ensures top organizations can find and retain the right talent—data scientists who don’t want to spend all of their time wrangling data—but it also lays the groundwork for an effective advanced and predictive analytics environment. In this session, Lavastorm CTO Mark Marinelli will discuss best practices for constructing successful analytics CoEs, built upon agile methodologies, to deliver ongoing and transformative value.

Bio: Mark Marinelli, Chief Technology Officer at Lavastorm, is responsible for driving product innovation and development strategy. A 20-year veteran of the analytics software industry, his extensive experience spans software development, product management, and product strategy from early multidimensional databases to modern data analytics technologies. He previously held positions at Kenan Systems, Lucent Technologies, and Macrovision.

Terran Melconian / Senior Director of Analytics at Jobcase Inc
“Hiring your First Data Scientist: When and How”

Abstract: In this talk, I’ll take you through a process for assessing your company’s available data and opportunities/objectives to figure out whether you should be looking for a data scientist, data analyst, or machine learning engineer, and to identify the specific skills that you need for your role. I’ll then suggest a framework for constructing an interview exercise and evaluating candidates even if you are not deeply acquainted with the details of the techniques the candidate uses.

Bio: Terran is the Senior Director of Analytics at Jobcase Inc, where his portfolio of projects spans machine learning, predictive modelling, and information retrieval. Jobcase is the only social media site dedicated to empowering America’s workforce, providing one place to manage all things job-related, especially access to the knowledge, connections and wisdom of 56 million other Jobcasers. Previously, Terran managed the Hadoop data warehouse team at TripAdvisor and was a technical lead on Image Search projects at Google.

Shirin Mojarad / Data Scientist at McGraw-Hill Education
“Finding Causality in Large Data Using Propensity Score Matching”

Abstract: Everyone knows that correlation is not causation. But is there something in between? In many cases, establishing simple correlation is not good enough, while establishing causation through Randomized Control Trials (RCT) is either impractical or too expensive. Is there a middle way that gets us towards causation but avoids the complexity of RCTs? There are a set of emerging statistical techniques associated with causal inference that allow us to study cause and effect properties of systems, including the relationship between treatment and outcome. It is being applied in variety of domains, from understanding customer behavior and purchase patterns in ad modeling to evaluating treatment effects and disease outcomes in medical sciences. This talk will provide an overview of causal inference and suggest strategies for making it part of the data scientist’s toolkit.

Bio: Shirin Mojarad is currently a Data Scientist at at McGraw-Hill Education. She was formerly a senior analytics specialist in the Advanced Analytics team at Canadian Imperial Bank of Commerce (CIBC) and a data mining consultant with a leading software company in predictive analytics. She is an expert in navigating and deriving insight from large datasets using data mining techniques. She has wide experience in framing and conducting complex analyses and experiments using large datasets to find trends in diverse data sources and analyze behavioral patterns using advanced statistical modeling and data mining techniques. Shirin received her Ph.D. in Electrical Engineering and her M.Sc. in Communications and Signal Processing from Newcastle University U.K., where she specialized in predictive modeling and artificial neural networks.

Marco A. Montes de Oca / Principal Data Scientist at clypd, inc.
“Swarm Intelligence: A Distributed Artificial Intelligence Paradigm”

Abstract: TBD

Bio: He is a Principal Data Scientist at clypd, Inc., where he applies and develops mathematical and computational techniques for automating and personalizing TV ad scheduling (also known as “programmatic TV advertising”). He was a postdoctoral researcher at the Department of Mathematical Sciences at the University of Delaware until 2014. He earned his Ph.D. in Engineering Sciences at the Universite Libre de Bruxelles, Brussels, Belgium. He also holds a B.S. in Computer Systems Engineering from the Instituto Politecnico Nacional, Mexico, and a M.S. in Intelligent Systems from the Tecnologico de Monterrey, Monterrey, Mexico. Marco Montes de Oca is interested in the theory and practice of swarm intelligence, complex systems, and optimization. He has published in journals and conferences that deal with the three main areas of application of swarm intelligence, namely, data mining, optimization, and robotics. He is a member of the editorial board of the journal ‘Swarm Intelligence’.

Rani Nelken / Director of Research at Outbrain
“Dynamic online profiling of users’ content consumption preferences”

Abstract: Serving personalized content recommendations requires automatically creating user profiles reflecting users’ content consumption preferences and continually evolving them. Our profiles use abstract content-specific features of the consumed content, such as categories, topic models, and entities, which we automatically extract using NLP methods. Aggregating these features per user at scale raises interesting architectural and algorithmic challenges. In particular, it is impractical to store the entire dynamic history of a user’s interaction features, requiring us to use algorithms that selectively decay information in favor of a more compact representation. The talk describes some of our methodology for creating, updating, and evaluating these user profiles at large scale.

Bio: Rani Nelken is Director of Research at Outbrain, where he leads a team focused on profiling users’ content consumption preferences by applying NLP and ML methods to users’ content interaction history. Before joining Outbrain, he was a Research Fellow in CS at Harvard University, and worked at several other companies including IBM Research and Mercury Interactive. He received his PhD in CS in 2001.

Dave Nielsen / Developer Advocate, Trusted Analytics Platform (TAP) at Intel
“Getting Started with the Trusted Analytics Platform”

Abstract: This is hands-on and you will need to bring your laptop to participate. Alternatively, you can follow on or pair up with someone else.

In this lab you will learn how to use the Trusted Analytics Platform, an open source collaborative project, to create a data science solution workflow that ingests data, use cloud instances of Spark and Jupyter to model data, and deploy a simple visualization solution around the post-processed data.

After completing this lab, you will have better understanding of data science and analytics fundamentals towards deploying a cloud-based solution in TAP.

Bio: As Sr. Developer Advocate for TAP, Dave helps data scientists, developers and administrators create, consume and deploy data models for applications. Prior to TAP, Dave worked at Redis Labs helping developers take advantage of the unique performance capabilities of open source Redis. Dave is known for his role in creating CloudCamp and BigDataCamp which inspired many developers to use new technologies that empower todays fastest growing companies. Dave achieved modest notoriety when he proposed to his girlfriend in his book “PayPal Hacks”. Dave and Erika are married and have a family living in Mountain View, CA.

Paul Paczuski / Data Scientist at The Data Incubator
“Intermediate R workshop”

Abstract: Improve your R workflow with some essential packages: dplyr, ggplot, tidyr, and broom.

Bio: Paul teaches and develops course content for the highly-competitive fellowship at the Data Incubator in New York. He also helps people get started with data science through his startup methodofmoments.io. Previously, he was a statistician at a clinical trials research group at the Harvard School of Public Health. He holds an M.S. in Biostatistics (University of Michigan).

Matthew Ritter / Senior Analytics Associate at athenahealth
“Using Data to Unbreak Healthcare”

Abstract: Matt Ritter and Laura Kinkead are on the Data Science team at athenahealth, a cloud-based software company that provides medical billing, electronic medical records, and patient portals for medical groups and health systems. Athena’s rich, structured data covers 80,000 providers and 80M patients, and allows us to investigate interesting Data Science questions with implications for public health (for instance, we’ve developed a regional flu tracker that’s more accurate and timely than the CDC). We will give a window into Data Science at athenahealth: what we do day-to-day, what our typical projects look like, what tools we use, plus an in-depth look at a current machine learning project: predicting which denied claims our clients should fix and rebill, and which they should ignore.

Bio: Matt Ritter is on Athena’s Data Science team, focused on the More Disruption Please (MDP) program. His work centers on evaluating the impact of MDP’s startups on our clients’ operational metrics. He also coordinates the Data Methods Lab, where Athenistas from across the company come together to share best practices and incorporate knowledge from industry and academia.

Kurt Rosenfeld / Managing Director & Founder of Data and Analytics Practice at CTI
“Your healthcare scientists are being held back with the wrong tools”

Abstract: In this session we will explore a paradigm change in Big Data architectures that significantly improves the “time to execute” that is needed to make it all work for your data scientists. And we’ll show you why the Cloud is not the way to go.

Bio: Kurt Rosenfeld is Vice President of the Business Intelligence and Analytics, one of Corporate Technologies two technology consulting practices. In this role, Kurt oversees the company’s strategic drive in the BI market. Having founded the practice, he has driven its rapid growth, building a consulting team able to deliver a wide variety of solutions across numerous industries.

Jody Schechter / Lead Scientist at Booz Allen Hamilton
“Statistics for Engineers: Using R to Forecast Climate Change Impact for the U.S. Army Corps of Engineers”

Abstract: This talk will present two applications that Booz Allen Hamilton built for the Army Corps of Engineers Reaction to Climate Change Program: 1) Nonstationarity Detection Tool for analysis of changes in distribution of streamflow gage time series, and 2) SeaTracker, which compares USACE predictions for sea level change to the historical record. Both of these applications allow for consistent, repeatable use of statistical methods, and easy interpretation of results, by a non-statistician audience; but whereas one was built with Tableau/R Integration, the other showcases R Shiny. This presentation will offer context around the decisions to employ these technologies, will compare these two approaches from a developer perspective, and will cover guiding principles for communicating statistical test results to a non-statistician audience through meaningful visualizations.

Bio: Jody Schechter is a Lead Scientist at Booz Allen Hamilton. She has worked with several federal agencies to address their data science needs, most recently building out applications for the Reaction to Climate Change Program at the Army Corps of Engineers. Jody’s work ranges from data visualization projects to predictive analytics. Prior to her work with Booz Allen, she was an analyst with Compete, Inc. (now Millward Brown Digital), an online consumer behavior analytics firm. Jody has a master’s degree in Computational Science and Engineering from Harvard University and a bachelor’s degree in Economics from the University of Michigan.

Dan Scudder / Co-founder and VP of Growth Markets at LiveRamp
“Overcoming Big Data Fragmentation”

Abstract: Valuable customer data is collected all the time: when a person researches for a new car, purchases a movie ticket, or even Instagram shares a photo at Coachella. All these snapshots offer companies perceptive data on their target audiences. However, many still struggle to understand how to best engage with their customers. The key to more relevant marketing is a company’s stockpile of 1st party customer data.

One of the biggest misconceptions in marketing is that big data exists in a single, accessible place. In truth, data exists in silos across every organization. More often than not, different members of a marketing team are using different tools and applications, pulling from puddles of data that only represent fragments of the total customer view. How can this data be unified? How can organizations draw the most value out of their datasets?

Then, there’s the biggest piece of the data puzzle: offline data. It’s almost impossible to know if your marketing tactics are working, if you don’t know where most of your conversions are happening. In our multi-device world, it’s necessary to know the person who watched the new Star Wars trailer on their phone is the same person buying a movie ticket at the theater months down the line. But when the customer data you’re relying on is fragmented, how can you build accurate user segments needed to successfully execute your overall marketing strategy?

Join this session and discuss the shortcomings of siloed data today; how companies can use first, second and third party data together; and lastly how to bring all this valuable data together to create a complete customer view.

Bio: Dan is the co-founder and Vice President of Growth Markets for LiveRamp, acquired by Acxiom in May 2014. Current customers range from Live Nation, Citi, AMEX, Macy’s, TransUnion and Gap. Prior to LiveRamp, Dan was Vice President of Rapleaf. He served on the Direct Marketing Club of New York and holds a B.S. from Babson College.

Michael Segala / CEO & Co-founder at SFL Scientific
“Time Series Classification: A Modern Approach with Applications to Healthcare”

Abstract: Time series classification has been used to solve a wide range of issues across industries from mathematical finance to weather forecasting. Historically, typical classification algorithms (be it SVM, Logistic Regression, etc.) have been applied after transforming away the temporal dynamics of the data through feature engineering. However, such techniques often remove important underlying information, resulting in a loss of the predictive power. Recent studies have shown that the K Nearest Neighbors (KNN) algorithm combined with Dynamic Time Warping (DTW) lead to significant improvements on canonical methods. DTW is used as a distance metric to determine the similarity between two sequences that have non-linear temporal variations. This technique has led to dramatic gains in both precision and accuracy in time series classification problems. In this talk I will present the KNN + DTW algorithm and its applications to the healthcare sector.

Bio: Michael Segala is the CEO and co-founder of SFL Scientific, a data science consulting firm that specializes in big data solutions. His firm leverages advanced machine learning and analytics techniques to provide insight into numerous industry-spanning problems, from healthcare to stock market prediction. Before founding SFL Scientific, Michael worked as a data scientist in several well known tech companies, such as Compete Inc. and Akamai Technologies. He holds a PhD in Particle Physics from Brown University.

Heather Shapiro / Technical Evangelist at Microsoft
“An End to Boring Data with Visualizations”

Abstract: Put the days of trying to decipher meaning from boring spreadsheets behind you. Visualize data to give greater and immediate meaning to all those numbers with Python. We will explore the variety of options available for data visualization in Python using different libraries and understand which ones excel for what type of task. Create maps, statistical graphs and more detailed or interactive visualizations that can also be used on the web, ideal to take that blog post to a whole new level. We will tackle boring data by looking at python libraries available for mapping such as basemap and folium, statistical graphs from libraries such as matplotlib and seaborn, as well as libraries such as Bokeh and Plotly that can be used for making interactive graphs.

Bio: Heather Shapiro is a Technical Evangelist in Microsoft’s Developer Experience group, where she educates developers on Microsoft’s new technologies. In this role, Heather works closely with students and developer communities across the North East to understand the newest technologies and architectures. Prior to becoming a Technical Evangelist, Heather completed her undergraduate degree at Duke University and graduated in the Class of 2015. She received Bachelors of Science in Computer Science and Statistical Science, and completed an Honors Thesis about employing Bayesian Approaches to Understanding Music Popularity. Heather blogs at http://microheather.com and tweets at http://twitter.com/microheather.

Raj Singh / Developer Advocate, IBM Cloud Data Services
“Safety-enabled apps: using Boston crime data to make Pokemon Go a safer space”

Abstract: Governments are increasingly making their data available freely in machine-readable format, which is great. A drawback, however, is that government data is organized and produced in a way that’s most useful to government operations, and it’s not offered in an app-developer or data scientist friendly way. I talk about a project to harvest crime data daily and make it available for web apps and analytics as part of a larger effort to deliver ready-to-use open data sets on the cloud. The importance of adding contextual data to apps — whether it be crime or weather or something else — is underscored by recent news that thieves have been “luring” Pokemon Go players into secluded areas.

Bio: Raj is a Developer Advocate and Open Data Lead at IBM Cloud Data Services. He specializes in all things geospatial, hacking on analytics in R/dashDB and Spark/iPython notebooks and building software in NodeJS, Python, Java and PHP. He’s currently driven to make CDS the best place to obtain and exploit comprehensive, curated open data sets for business. Raj pioneered Web mapping-as-a-service in the late 1990s with Syncline, a startup he co-founded. After that he finished his PhD in Urban Planning at MIT, exploring the potential of web services to power urban information systems. Prior to joining IBM in 2014, Raj worked on geospatial data interoperability challenges for the Open Geospatial Consortium.

Ted Slater / Global Head of Healthcare & Life Sciences at Cray Inc.
“Your healthcare scientists are being held back with the wrong tools”

Abstract: In this session we will explore a paradigm change in Big Data architectures that significantly improves the “time to execute” that is needed to make it all work for your data scientists. And we’ll show you why the Cloud is not the way to go.

Bio: Ted Slater is Global Head of Healthcare & Life Sciences at Cray, Inc. He has held senior roles in large pharmaceutical companies, content providers, and biotechs. He holds master’s degrees in Molecular Biology from the University of California at Riverside, and in Computer Science from New Mexico State University.

James Sturtevant / Technical Evangelist at Microsoft
“Flight Data Machine Learning Challenge”

Abstract: Wouldn’t you like to be able to predict when your flight will be delayed? Come explore how easy it is to use Azure Machine Learning to generate your own flight predictions. First, this workshop will walk you through each step to create a base level prediction model using data we provide. Then, we’ll provide insights on tweaking the model to improve the prediction. Finally, you can explore by making your own changes to produce an even better model. You will painlessly gain exposure to the basic Machine Learning workflow and related terminology while having fun as you gain hands-on experience. Bring your laptop because by the end of the workshop, you will have built your own flight prediction model and explored how to make it perform better. Best of all, you will never be left wondering if your flight will be delayed again!

Bio: James is a Tech Evangelist for Microsoft. He has been working in the web development space for 10+ years working with startups and enterprises to improve the way they do business through technology. James has extensive experience working with startups during the growth phase helping them deliver the product that their customers have asked for. He is focused on automating development workflows that allow companies to achieve more. When he isn’t practicing his software craft James can be found running through the woods, growing hops, or hiking with his daughter.

Patrick Surry / Chief Data Scientist at Hopper
“Buy or Wait? Consumer-friendly Airfare Prediction”

Abstract: Buying a plane ticket is a time-consuming and frustrating process that often leaves the buyer feeling unhappy. Flight prices are less transparent and fluctuate more than almost anything else a consumer buys, even though airfare is one of the most expensive purchases for a typical family.

Our goal at Hopper is to increase transparency by giving consumers advice about where and when to fly — and when to buy — to save money on their air travel. We believe this helps consumers buy earlier, with less effort, and ultimately feel better about their purchase. One of our key features is our “”when to buy”” advice: we’ll watch prices for your trip and notify you when the price is right.

Recommending when to buy is tough for two main reasons: first is the airfare marketplace and its idiosyncrasies present unique analytical challenges, and second is that the prediction must be highly consumer-friendly: both comprehensible and immediately actionable. This session will outline how we’re helping consumers save 10% on average, and up to 40% in some cases.

Bio: As Chief Data Scientist at Hopper, Patrick Surry analyzes flight data to help consumers make smart travel choices. Patrick is recognized as a travel expert and he frequently provides data-driven insight on the travel industry and airfare trends.

Patrick’s studies and commentary are frequently featured in outlets such as New York Times, USA Today, Wall Street Journal, TIME, among many others. Patrick also regularly appears on various broadcast stations to offer travel insight and tips.

Patrick holds a PhD in mathematics and statistics from the University of Edinburgh, where he studied optimization based on evolutionary algorithms, following an HBSc in continuum mechanics from the University of Western Ontario.

Joe Sutherland / Senior Developer/Data Scientist at Prattle
“Get Your Script to Production Faster: The new Micro-ETL Framework”

Abstract: Moving insights gained from analytics to production code that automatically produces results is a development nightmare. Regularly and programatically producing analysis, particularly when event driven, places unique challenges that can stress development teams. At Prattle, we found a way to bridge the gap between notebooks and scripts to production code. Our micro-ETL framework allows users to test code in a full production environment, and then quickly deploy working ideas. Users break down the process of extracting data, processing, and evaluation into a series of easily tracked micro tasks which are run using a unique application developed in our labs. The tasks can be written in Python, R, or JavaScript, with the goal being to have application adapt to the development process of the analyst or developer, and not the reverse. This framework fills the gap between exploration and production.

Bio: Joe leads Prattle’s development of Portend, a scalable, intelligent, cloud-based platform that turns insight-generating code into actionable data with fewer steps than ever before. As a software developer and data scientist, he frequently contributes to open-source coding projects. Recently, his program doc2text was recognized as a substantial contribution to document processing and computer vision by the Hacker News community.

As a researcher, Joe’s published and presented academic work comprises several topics, including the political economy of central banks, voting behavior, ideology and polarization, public opinion, and campaign effects. His current focus is on the insights generated by the extraction of features and latent dimensions from text data in both observational and experimental contexts. In 2014, he was awarded the prestigious Antoinette Dames Prize for his work on elections.

Joe is pursuing his Ph.D. at Columbia University. He holds a Master’s from Columbia and a Bachelor’s from Washington University in St. Louis. Before joining the Prattle team, he worked at the White House.

Andrew Therriault / Chief Data Officer, City of Boston
“Data Science and the City of Boston”

Abstract: Boston’s Citywide Analytics Team was founded in 2015 to improve all aspects of city government through the use of data and analytics. Over the course of its first year, the team racked up an impressive list of accomplishments through the use of data visualization, automated reporting tools, and statistical analysis. In 2016, we are expanding the team’s scope to cover an even broader set of data science methods and tools. This talk will give recent examples of how the team has used predictive modeling, experimental testing, text analysis, and other techniques to improve the lives of Boston residents. We’ll also share some of the team’s plans for the coming years, as we continue to move Boston forward as a nationally-recognized leader in data-driven government.

Bio: Andrew Therriault joined the City of Boston as its first Chief Data Officer in June, 2016. An expert on predictive modeling, quantitative research, and data integration, he previously served as Director of Data Science for the Democratic National Committee and as Senior Data Scientist for Greenberg Quinlan Rosner Research. Therriault received his PhD in political science from New York University in 2011 and completed a postdoctoral research fellowship at Vanderbilt.

Benjamin Thurston / Director, Platform Strategy and Consulting at CTI
“Big data projects keep failing – make sure you are not next”

Abstract: Many Big Data architectures are flawed – they are too fragile, too complex and way too expensive to deliver on their high expectations. Whether they have failed to deliver the performance or ROI, we’ll dissect the root causes in the technology stack, data approaches, and the engineering process. Regardless of where you are in your Big Data journey, this session will provide valuable insights and alternative strategies to ensure winning outcome.

Bio: Benjamin Thurston is Director of Platform Strategy and Consulting, responsible for leading the pre-sales and delivery components of our Platform Services team. Serving as a member of the leadership team, Ben contributes to our marketing and business strategy and planning.

Richard Tibbetts / CEO at Empirical Systems
“TBD”

Abstract: TBD

Bio: TBD

Carmen van Nieuwkerk / Trusted Analytics Platform Training Specialist at Intel
“Let’s build a bridge to the Data Science jobs of tomorrow”

Abstract: The Harvard Business Review referred to Data Scientist as “The Sexiest Job of the 21st Century.” Yet many data science jobs went unfilled in 2015 and this year there will be even more. Why? Because the skills required are complex and evolving. Many jobs not only require programming capabilities, machine learning and even deep industry expertise. Each of us is in a position to enable the next generation to learn data science. But in such a quickly changing industry, it’s not obvious which skills will be required in the future. So let’s discuss what skills are hot today and how we can best prepare students for a career in data science tomorrow.

As a part of this discussion, Dave Nielsen and Carmen Van Nieukerk from Intel’s Analytics Solutions Group will seek feedback about an outline for a proposed hands-on workshop to introduce data science to developers.

Bio: As an Analytics Platform Training Specialist, Carmen conducts regular workshops and training sessions, educating data scientists and developers on best practices and methods for getting the most out of working with Intel’s Machine Learning, AI and Analytics technologies. She provides help and guidance for users, enabling them to harness the power of their data by demonstrating the potential of tools like TAP (Trusted Analytics Platform), deploying models and building applications.

Prior to joining Intel Carmen worked in the Web Analytics space implementing platforms, providing custom reporting, behavior insights and analysis for websites like National Geographic, Pottermore, Ebay, Casino.com, Cancer.org, Bestseller.com and Fox.com.

Carmen lives in Portland Oregon, loves playing board games, and creates glass art in her free time.

Vijetha Vemulapalli / Senior Data Scientist at Berg, LLC
“TBD”

Abstract: TBD

Bio: TBD

Beth Zeranski / Azure Analytic Architect at Microsoft
“Flight Data Machine Learning Challenge”

Abstract: Wouldn’t you like to be able to predict when your flight will be delayed? Come explore how easy it is to use Azure Machine Learning to generate your own flight predictions. First, this workshop will walk you through each step to create a base level prediction model using data we provide. Then, we’ll provide insights on tweaking the model to improve the prediction. Finally, you can explore by making your own changes to produce an even better model. You will painlessly gain exposure to the basic Machine Learning workflow and related terminology while having fun as you gain hands-on experience. Bring your laptop because by the end of the workshop, you will have built your own flight prediction model and explored how to make it perform better. Best of all, you will never be left wondering if your flight will be delayed again!

Bio: Beth is a Cloud Architect who became interested in Machine Learning to harvest insight from the abundance of available of data. She recently joined Microsoft as part of the DX TED team in Boston. She has a primary focus on generating Data Insight from Business Intelligence (BI), Analytics (A), Machine Learning (ML) and Deep Learning (DL). She earned her MS in CSE from the University of Washington and a BSEE from Northeastern University. Beth has a wide breadth of experience in hardware design, bios development and software. Beth is known for effective product execution which has enabled reliable and repeatable software product delivery at a number of companies. Most recently, Beth was at VMware and prior to that she worked in open source for a number of years. In Microsoft, she is interested in advancing Microsoft’s Data Insight capabilities on Azure by leveraging BI, A, ML and DL.