Analytics Frontiers Agenda 2019
March 27, 2019 at The Ritz-Carlton, Uptown Charlotte
Time: 7:15-8:30 am
Registration / Coffee & breakfast sponsored by dATA ROBOT
Great Room I : Registration
Urban Garden: coffee and breakfast
Time: 8:30-8:45 am / Location: Ballroom
Introduction and Welcome
Fatma Mili, Dean of the College of Computing and Informatics, UNC Charlotte
Time: 8:45-9:45 am / Location: Ballroom
Keynote: Rumman ChowdHury, Responsible AI lead and analytics executive, accenture
ethics in ai, demystifying deep learning and artificial intelligence
We are entering the next big technological revolution with Deep Learning and Artificial Intelligence. Why does AI need ethics and how is this different from other technologies? What do we mean by ethics and how do we go about creating frameworks for ethical AI?
|Time: 9:45-10:00 am / Location: ballroom foyerCoffee Break|
Breakout SessionS I: 6 sessions
Time: 10:00-10:45 am / Location: Salon I, Salon II, salon III, Great room I, Great Room II, The Den
Session I 10:00-10:45 am / Track: Financial Services / Location: Salon I
machine learning techniques for large scale entity resolutioN
One of the challenges facing financial institutions is mapping merchant names with many variations (e.g. ACME-00242, ACME-CHARLOTTE, ACME RALEIGH, TST*ACME) to a standardized entity name (ACME). In this presentation, we will examine machine learning algorithm like eXtreme Gradient Boosted Tree and deep learning approach like Recurrent Neural Networks for such large-scale entity resolution.
Session I 10:00-10:45 am / Track: PRofit through Analytics / Location: Salon II
How To Align Your Business Users With Advanced Analytic Capabilities To Drive Your Organization Forward
Speaker: Joshua Sutton, CEO, Pandera
For many skilled practitioners it is an uphill battle fighting for budget, resources, and approval on the analytic initiatives that matter. Join Pandera Systems CEO, Josh Sutton, to discuss how to artfully build an effective narrative that positions you and your team’s to solve the most crucial and emerging business issues with high tech and advanced analytics. He will review the communication building blocks that are essential to articulate scientific approaches aimed at benefiting business outcomes such as EBITDA and other mission critical measures. In all, come learn the overarching strategy of creating a top down data science narrative for your entire organization that ensures alignment across business units through clear communication plans, continuous innovation, and constant success measurements.
Session I 10:00-10:45 am / Track: Healthcare / Location: Salon III
THE VALUE OF INTENTIONAL COMMUNICATION AND STRUCTURE DESIGN IN ANALYTICS
Speaker: Sean Gannon, Health Analyst, Novant Health
Learn about the theoretical principles underlying the Novant Health’s Population Health Analytics Team’s approach to working with partners, as well as real-world experiences implementing those principles. Sean will discuss how ideas like the theory of mind, cognitive biases, and transparency have direct implications for analysts. Learn how integrating these theories throughout workflows improves output, implementation, and organizational support.
Session I 10:00-10:45 am / Track: Compliance and Policy / Location: Great Room I
navigating regulatory and privacy challenges in analytics data science and AI
Speaker: Christopher Johannessen, Director of Digital Services and Data Science, Sia Partners An expanding web of regulations – driven recently by privacy concerns – add yet another of layer of complexity to analytics, AI and data science, both here in the United States and across the world.
In this session, we’ll explore:
We’re looking forward to sharing our thoughts and participating in a dialogue on this topic of growing importance.
Session I 10:00-10:45 am / Track: Social and Ethical Impacts / Location: Great Room II
Examining Untempered Social Media: Online Extremism and Information Mutation on Gab.com
Speaker: Siddharth Krishnan, Assistant Professor of Computer Science at UNC Charlotte and Arunkumar Bagavathi, PhD Student, UNC Charlotte
Online social media often mirrors the social phenomenon of “Chinese Whispers” where information mutates or changes during dissemination. Twitter and Facebook in conjunction with smaller “fringe” communities like Gab.com, echo different perspectives to facts, leading to alternative news, and the social network does it under the guise of “free speech”. Sometimes, this echo chamber effect transforms into online extremism. In this work, we propose a novel framework to examine information mutation and its relationship with user engagement and popularity measured through conversations in Gab. Using approximately 3.7 million cascading conversation patterns with close to 300k users along with 3 million related blogs and news articles; we study the manifestation of information mutation as well as online extremism.
To demonstrate our framework, we present two case studies of information mutation and online extremism, namely – the Charlottesville Unite the Right protest in August 2017 and the Pittsburgh synagogue shooting in October 2018. Particularly, in the context of the Pittsburgh shooting, we present a thorough analysis of content similar to that of the shooter and evolving narratives of distorted information.
Session I 10:00-10:45 am / Track: Data and AI / Location: The Den
Automated Document Classification: Using Machine Learning to Guide Data Loss Prevention
In 2017, Cybersecurity at TIAA rolled out a plug-in to Microsoft Office products that encouraged employees to classify the confidentiality of documents. The plugin creates a hidden watermark that is tracked by TIAA’s Data Loss Prevention (DLP) tool to ensure sensitive information does not leave the company.
To enhances our proactive capabilities within Cybersecurity at TIAA, we leveraged machine learning algorithms and symbolic reasoning to produce a recommendation for confidentiality of documents based on the content within the documents. We’ll speak to how we gathered documents to train the algorithms, incorporated Optical Character Recognition (OCR) of images, and partnered with our Cybersecurity Operations team to take action on the insights. We’ll also discuss the impacts that this effort has had on the organization and the opportunities for continued analysis.
Time: 10:45-11:00 am / Location: ballroom foyer
Time: 11:00 am-12:00 pm /Location: Ballroom
Keynote: Viktor Mayer SchoEnberger, PROFESSOR OF INTERNET GOVERNANCE AND REGULATION, OXFORD university, AND AWARD-WINNING AUTHOR
data’s unexpected achilles heel-and what to do about it
The data age offers huge opportunities, but also some significant challenges. So far much of the debate about challenges has been focused on known problems and familiar solutions, such as more privacy and less data analytics – much like the drunk looking for the lost car keys under the lamp post (because there’s light) rather than where he lost them. In this talk, I suggest that this strategy is wrong and may expose us to severe consequences. The real challenges are linked to data – its concentration and representation. Fortunately, there are policy mechanisms available to address these challenges, centered around a, perhaps counter-intuitive, more comprehensive utilization of data.
Time: 12:00-1:30 pm / Location: ballroom and urban garden
Lunch and Networking: Two locations: Ballroom and Urban Garden
Ballroom: If your company is a sponsor with a reserved luncheon table included, your reserved seating is located in the Ballroom.
Urban Garden: individual ticket holders and others are invited to enjoy lunch in the Urban Garden.
The extended lunch break provides an opportunity for attendees to network and visit the Exhibitor Hall located in the Urban Garden.
Breakout Sessions II: 6 Sessions
Time: 1:45-2:30 PM / Location: Salon I, Salon II, salon III, Great room I, Great Room II, The Den
Session II 1:45-2:30 pm / Track: Financial Services / Location: Salon I
wells fargo panel discussion: safe and sound ai for financial institutions
Moderator: Harsh Singhal, Head of Decision Science and Artificial Intelligence Validation, Wells Fargo Panelists: Vishant Sharma, Director, Federal Reserve Bank of Atlanta and Jie Chen, Managing Director, Advanced Technologies for Modeling (AToM) Group of Corporate Model Risk, Wells Fargo
Session II 1:45-2:30 pm / Track: Profit through Analytics / Location: Salon II
building a business on Data
Companies grapple with the problem of extracting insight and value from the data generated by their interactions with their customers as well as by routine business processes. While having the raw data is a necessary ingredient, having the right tools can ease and accelerate the process tremendously. Customer Data Platforms (CDPs) are one such tool. We review the raw data that was available within ADP, the largest payroll processor in the world, and the potential use cases for products created from that data.
The challenges in converting the data to usable, commercially viable data products will then be discussed, including data security, privacy and usability issues. The session will wrap up by outlining how Quaero’s CDP was leveraged to build a highly flexible, user friendly environment to process and transform this data into useful, usable data products that deliver value to a variety of customers in multiple industries.
Session II 1:45-2:30 pm / Track: Healthcare / Location: Salon III
adventures in feature engineering
In this crash course on feature engineering, the speakers will review commonly-employed methods for improving model performance through the careful construction of model features. In this session, attendees will learn various transformation (e.g. log transforms), encoding (e.g. frequency encoding), and smoothing methods (e.g. prior-weight smoothing) to optimize a feature’s explanatory value. Additionally, common pitfalls that occur with feature engineering will be discussed. For example, attendees will be exposed to methods to normalize data points within a skewed distribution, smooth volatile probabilities in a low-volume distribution, and represent categorical data numerically.
The session will close with a discussion of methods to identify the most promising features and how the feature selection process might vary depending on the model choice. Throughout the presentation, the speakers will draw from a variety of real-world examples in the healthcare setting to frame these methods in the context of a relatable business problem.
Session II 1:45-2:30 pm / Track: Compliance and Policy / Location: Great Room I
using nlp model to classify complaints
Speaker: Kyra Koch, Data Scientist, TIAA
This presentation will cover how NLP can be used to classify complaints. Topics to be covered: the problem, the process and feature generation, model development, model deployment, and model monitoring.
Session II 1:45-2:30 pm / Track: Social and Ethical Impacts / Location: Great Room II
The light and dark side of people analytics
Speaker: Tracey Smith, President, Numerical Insights
This presentation will focus on the growing field of Human Resource analytics, its applications and the concerns that come with performing studies on employees. Learn the practical applications of analytics inside Human Resources and its link to addressing social concerns. Learn about the ethical considerations, how good intentions can go awry and the changing trends in employee data privacy and transparency. Finally, learn about the challenges of building customized, self-service analytics in Human Resources and the missing skill sets that hold back the success of HR Analytics.
Session II 1:45-2:30 pm / Track: Data and AI / Location: The Den
journey to ai
Speaker: Janine Sneed, Chief Digital Officer and VP of Customer Success, IBM Hybrid Cloud
Studies show that the average human adult makes 35,000 decisions every single day. Some of the decisions are challenging (like product strategies, investment decisions, sales deployment models) while others are quite frankly boring (answering pricing questions for the 27th time today). We see organizations wanting to apply AI to take the “boring” out of decisions and engage in higher value activity, but it’s not easy. Research tells us that 85% of executives and managers believe AI will allow their companies to gain competitive advantage, but less than 20% have some AI offerings or products today.
Why is it hard to apply AI? Organizations lack a trusted Information Architecture (IA) to collect data, organize data, and analyze data. Models are being built with bias and limited auditability as to how the model was derived. It’s hard to do AI without IA. In this session, we’ll walk the AI ladder with an IA and share customer stories based on our experience with hundreds of clients around the world.
Time: 2:30-2:45 pm / Location: ballroom foyer
Breakout Sessions III: 6 Sessions
Time: 2:45-3:30 PM / Location: Salon I, Salon II, salon III, Great room I, Great Room II, The Den
Session III 2:45-3:30 pm / Track: Financial Services / Location: Salon I
pitfalls of autoML platforms
Speaker: Cliff Weaver, CEO and Founder, R is my hammer!
AutoML is a platform that ingests data and runs many algorithms in parallel resulting in a leader board. The user then selects a top performing model and deploys it in production, typically as a RESTful API. While this sounds like a solution, autoML assists the data scientist with just one small part of the overall data science process. Businesses adopting autoML platforms as a solution and not just as a piece of the data science process expose themselves to avoidable risk. At one extreme, the autoML algorithm may under-perform others that may be easily tuned. On the other extreme, misclassifying a false negative on a cancer screen could have dire consequences. AutoML is not a replacement for formal data science projects.
Session III 2:45-3:30 pm / Track: Profit through Analytics / Location: Salon II
reinventing analytics to drive value in retail
Speaker: Doug Jennings, Vice President of Data Analytics and Customer Insights, Lowe’s
What happens when a business does not see value from a data and analytics program? At Lowe’s, you start over. Doug will share the Lowe’s recent journey to reinvent and rebuild the global data and analytics function. Highlighting personal perspectives, lessons learned and use cases, he will show how data and analytics now drives step-wise value creation across many business functions at Lowe’s. He will also look forward in identifying emerging opportunities for analytics in the ever evolving omni channel retail industry.
Session III 2:45-3:30 pm / Track: Healthcare / Location: Salon III
data analytics and methodological review of a research study among copd patients at atrium health
In this session, we will review an Institutional Review Board approved research study aimed at decreasing post-hospitalization acute care encounters among patients admitted with an acute exacerbation of Chronic Obstructive Pulmonary Disease (COPD). We will discuss leveraging niche and overlapping expertise in analytics (e.g. project management, business intelligence, data analytics). We will also explore challenges and solutions utilizing existing reports as well as using clinical and billing data sources to identify specific populations. Additional discussion will include reviews of software and accessing electronic medical record data, the iterative process of interfacing with clients and changing requirements, scaling reporting, and the process of disseminating results. Attendees will get a glimpse into the shifting approach to evaluate quality improvement initiatives through randomized trials.
Session III 2:45-3:30 pm / Track: Compliance and Policy / Location: Great Room I
reading china: predicting policy change with machine learning
Speaker: Weifeng Zhong, Research Fellow, American Enterprise Institute
For the first time in the literature, we develop a quantitative indicator of the Chinese government’s policy priorities over a long period of time, which we call the Policy Change Index (PCI) for China. The PCI is a leading indicator of policy changes that covers the period from 1951 to the third quarter of 2018, and it can be updated in the future. It is designed with two building blocks: the full text of the People’s Daily — the official newspaper of the Communist Party of China — as input data and a set of machine learning techniques to detect changes in how this newspaper prioritizes policy issues. Due to the unique role of the People’s Daily in China’s propaganda system, detecting changes in this newspaper allows us to predict changes in China’s policies.
The construction of the PCI does not require the understanding of the Chinese text, which suggests a wide range of applications in other settings, such as predicting changes in other (ex-) Communist regimes’ policies, measuring decentralization in central-local government relations, quantifying media bias in democratic countries, and predicting changes in lawmakers’ voting behavior and in judges’ ideological leaning. (Note: for more details about this project, see: policychangeindex.com).
Session III 2:45-3:30 pm / Track: Social and Ethical Impacts / Location: Great Room II
when big data discriminates: how to avoid legal pitFalls with regards to discrimination in online marketing
Speaker: Erin Illman, Partner and Chair of Cybersecurity and Privacy Practice Team, Bradley
The proliferation of data collection involving practically every aspect of our lives is allowing unprecedented abilities for advertisers to custom-tailor their messaging or market to increasingly smaller targeted segments. This is not always a good thing, and if not done properly it can run afoul of the law. Facebook was recently sued for allowing ads for housing to be served in ways that excluded some protected classes from even seeing the ads.
This session will cover the Facebook case and its implications, including a related HUD investigation, and how these developments could implicate companies when they leverage online marketing. Beyond that, we’ll delve into the future of big data analytics and machine learning which raises even more complexities as the logic of how decisions are made may be an unknown black box. We’ll discuss how to navigate this landscape to best protect your company.
Session III 2:45-3:30 pm / Track: Data and AI / Location: The Den
Pragmatic Modeling: The Middle Ground between Prediction and Explanation
Speaker: Sriram Natarajan, Principal – Data Scientist, Infosys
Successful scientific theories are characterized by general explanations and specific predictions. Their success makes it natural to conflate explanatory value with predictive power and vice versa. However, from a data science perspective, these concepts are separable properties of effective theories. Thus, validated predictions in complex, black box ML models can be difficult to explain (high prediction, low explanation). Similarly, sophisticated theoretical narratives pushed by domain experts can flourish in the absence of empirical validation (high explanation, low prediction). Pragmatic modeling encompasses theoretical approaches and numerical techniques that target a middle ground, with sufficient predictive power and adequate explanatory value, to enhance the usability and acceptance of ML models. We introduce pragmatic modeling in the context of supervised learning with structured data. Topics covered include problem framing, feature representation, model-specific and model-agnostic interpretation, and example based explanation.
Time: 3:30-4:00 pm / Location: ballroom foyer
Breakout Sessions IV: 6 sessions
Time: 4:00-4:45 pm / Location: Salon I, Salon II, salon III, Great room I, Great Room II, The Den
Session IV 4:00-4:45 pm / Track: Financial Services / Location: Salon I
deep insights into interpretability of machine learning algorithms and applications to risk management
Speaker: Jie Chen, Managing Director, Advanced Technologies for Modeling (AToM) Group of Corporate Model Risk, Wells Fargo
The “black box” nature of machine learning (ML) models have limited their widespread adoption in banking and finance. The input-output relationships in ML models are difficult to understand and interpret as they involve thousands of “under-the-hood” calculations. In banking, one must be able to explain the basis of a credit decision to a customer, or the relationship between macro-economic variables and a loss forecast to a regulator. Further, one has to ensure the relationships are consistent with historical and business understanding.
In this presentation, we will provide a framework and a suite of algorithms and associated visualization tools that help to resolve the opaqueness of ML algorithms. These are based on our research at Wells Fargo as well as recent results in the literature. The class of techniques include global diagnostics and local models for interpretability, and structured neural networks. Participants will also be given access to white papers and articles that have been submitted for external publication.
Session IV 4:00-4:45 pm / Track: Profit through Analytics / Location: Salon II
how ATD, the largest tire distributor world-wide, is “re-inventing the wheel” through Advanced analytics
Speaker: Tim Eisenmann, Chief Analytics Officer and Senior Vice President of Advanced Analytics, American Tire Distributors (ATD)
“Never pass on a good crisis” could be the mantra for Advanced Analytics at ATD. In times of major competitive threats ATD is re-inventing itself through the development of analytics assets and the set-up of an internal analytics center-of-excellence. This presentation will introduce the audience to the “Profitable Choice Tool” which became the cornerstone of ATDs sales organization when 20% of revenues disappeared virtually overnight in mid-2018. Pushed out after just nine days of development, the current tool is built on a highly-complex mixed-integer programming algorithm and will revolutionize B2B tire sales. Learn which elements (aside from smart math) made it so successful and how you can incorporate some of ATD’s lessons-learned into your company’s analytics development lifecycle.
Session IV 4:00-4:45 pm / Track: Healthcare / Location: Salon III
real world patients: risk assessment and risk prediction models for hospitalization and readmission
Speaker: David Olaleye, Senior Software Development Manager and Principal Research Statistician, SAS Institute Real world evidence (RWE) studies provide pragmatic evidence to investigate the effectiveness of care management and health care utilization in real-world clinical settings. Leveraging the power of RWE to predict patient outcomes and forecast health care utilization have been met with the challenges of big data from disparate data sources. We show how to quickly create patient cohorts for identifying patients at-risk for hospitalization and readmission using these data sources. Machine learning algorithms informed by industry-standard clinical diagnoses and episode-of-care (EoC) definitions are used to capture the multi-dimensional health care utilization profiles of patients followed over time. Using publicly available claims data from the 2008–2010 CMS Medicare population, constructed EoC profiles are used to build the risk assessment model for propensity of hospitalization, and the risk prediction model for readmission. Derived scores are used to risk-stratify the patients to examine the impact of health care resource consumption and the likelihood of 30-day hospital readmission. Download Presentation
Session IV 4:00-4:45 pm / Track: Compliance and Policy / Location: Great Room I
Talent Survey ResultS: building the new data-driven workforce
Speakers: Richard Cline, Grant Thornton’s Managing Director of Innovation Go-to-Market,
Session IV 4:00-4:45 pm / Track: Data and AI / Location: GREAT ROOM II
Using AI to Harness the Knowledge of the Crowd in Online Health Forums
Speaker: Reza Mousavi, Assistant Professor, Data Science & Business Analytics, Belk College, UNC Charlotte
The quality of answers in health-related community-based question answering (HCQA) forums has been a concern for both users and forum administrators. This concern hinges around the fact that any registered user with no formal training in health sciences could still provide health recommendations to other users within these forums. We conducted a two-phase study to develop a better understanding of quality of health recommendations in HCQA forums. First, we employed machine learning and text mining to construct six measures of quality to automatically examine the quality of health content. We then verified the reliability of our constructs by comparing our algorithmic quality ratings with those of two board-certified physicians. Second, we used the measures to assess the quality of 247,114 answers posted on Yahoo! Answers Health section. In the answer corpus, we examined the effect of the quality of the first answer on the quality of the subsequent answers. Our results suggest that the quality of the subsequent answers is impacted by the quality of the first answer. We further show that this impact is larger when the answerers are more familiar with the forum but smaller when the forum provides tips and suggestions for answering questions. To test the robustness of our results we tested our hypothesis on a set of duplicate questions that we identified using a deep learning model. Our two-phase study would not only help HCQA forums to automatically measure the quality of answers, but also to redesign their forum to encourage high quality answers early. This in turn would make the forum more efficient for users who seek medical advice on HCQA forums.
Session IV 4:00-4:45 pm / Track: Data and AI / Location: The Den
Talent analytics: using data science to drive people decisions and business impact
Speaker: Scott Tonidandel, Professor of Management, Belk College, UNC Charlotte
This talk will focus on how companies can leverage data science tools to gain a competitive advantage through people. I will present numerous examples of how I have applied these tools in business settings to improve the talent management function in organizations. For example, in one study we used email trace data to examine the impact of diversity on team coordination. In another study, we used natural language processing to automatically categorize unstructured text responses from over 8,000 leaders to better understand the kinds of challenges leaders are facing.
Time: 5:00-7:00 pm / Location: The Urban Garden
Networking Reception “Tech and Beer (and wine)” Sponsored by Pandera