This year’s Summit will focus on Data Science and AI in Research, Medicine, and Education! But as with all our Summits, we will be covering high performance computing, big data analytics, modeling and simulation and more. Data Science and AI are exploding as new technologies, tools, methodologies and techniques continue to emerge. Data science, data mining, statistical analysis, machine learning, deep learning and neural networks, expert systems and decision support, natural language processing, image processing, and large language models, are being used across all major disciplines and industries: Business, Agriculture, the Sciences, Engineering, Medicine and Healthcare, Communications, Social Sciences, and the Humanities. This Summit will explore all this and more!
SPECIAL NOTE: We have decided to reduce our Summit Plans this year and go virtual for several reasons: First, there was a reduced commitment by sponsors and a rising cost of providing a live event. Our Summit has always been free to attend, and we have always provided good facilities, food and refreshments. This would have been a challenge this year and we did NOT want to require payment to attend. Second, a virtual event will allow for more participation of our researchers from across the region as well as selective participation of the talks of interest to those participating. We will have a shorter Summit in duration, but we will have a GREAT lineup of speakers and topics. We hope that you will consider having some of the virtual talks presented LIVE to your classes on 10/16/24. Third, UKy’s CS and CCS we will be providing a series of additional seminars and workshops throughout the fall live and via Zoom, so stay tuned for more information.
If you have questions, or you would like to be a sponsor for this event, please contact:
Tony ElamOR
Teresa MoodyDr. Jules White, Vanderbilt University, Director of Vanderbilt’s Initiative on the Future of Learning & Generative AI, Associate Dean of Strategic Learning Programs in the School of Engineering, and Professor of Computer Science
Dr. Prasanna Balaprakash, ORNL, Director of AI Programs and Distinguished R&D Scientist, Division Chief
Break
Michael Shepherd, Dell Technologies, Senior Distinguished Engineer and Services CTO
Time | Speaker | Title of Talk |
---|---|---|
1:00 - 1:10 | Trey Conatser | Opening Remarks |
1:10 - 1:20 | Angela D. Nagel | Using Artificial Intelligence To Give Specific and Responsive Feedback to Students |
1:20 - 1:30 | Iuliana Popescu | Using ChatGPT for the Assessment of Peer-Reviewing Reports on Research Papers |
1:30 - 1:40 | Moderator | Discussion and Q&A for “Feedback on Writing and Reviews” |
1:40 - 1:45 | Break | Break |
1:45 - 1:55 | Yeonjung Kang and Muzhen Li | Exploring Traditional, AI-Based, and Human-AI Collaborative Learning |
1:55 - 2:05 | Cheryl Vanderford and Yuyan Xia | Utilizing Artificial Intelligence to Practice Patient-Provider Encounters |
2:05 - 2:15 | Thom Cochell and Shawna Felkins | Preparing Materials Science Students for AI Integration in Industry |
2:15 - 2:25 | Moderator | Discussion and Q&A for “Research and Training in Development” |
2:25 - 2:30 | Break | Break |
2:30 - 2:40 | Matthew Winslow | Writing Letters of Recommendation with AI |
2:40 - 2:50 | Ryan Flores,Anita Lee-Post,Kun Huang, Holly Hapke, John M. Stacy | Early Online Course Performance Prediction using AI Modeling |
2:50 - 3:00 | Shelley Irving and Isaac Joyner | AI Utilization in the Application of Churn Analysis to Clinical Site Sufficiency for Clinical Programs. |
3:00 - 3:10 | Moderator | Discussion and Q&A for “Analytics and Administrative Efficiencies |
3:10 - 3:15 | Break | Break |
3:15 - 3:25 | Mi Sun An | Three Use Cases of Gen AI in Construction Management Courses |
3:25 - 3:35 | Shannon Eastep | Game On with AI: Ice Breakers That Spark Connection |
3:35 - 3:45 | Shane Hadden | AI as a Class Participant |
3:45 - 3:50 | Moderator | Discussion and Q&A for “Assignments and Learning Activities” |
4:00 - 4:25 | Denice Robertson | AI as an Educational Ally: Optimizing Assignments and Student Engagement |
4:25 - 4:30 | Moderator | Discussion and Q&A with Denice Robertson |
4:30 - 4:55 | Thad Crews | Key Insights for AI Adoption in Education and Business |
4:55 - 5:00 | Moderator | Discussion and Q&A with Thad Crews |
Time | Speaker | Title of Talk |
---|---|---|
1:00 - 1:30 | Hunter Moseley | Predicting The Pathway Involvement For All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Gene and Genomes |
1:30 - 2:00 | Amit Parulekar | Responsible AI innovation framework |
2:00 - 2:30 | Samuel Zaruba Smith | Too Little, Too Risky: Evaluating Human Supervision for AI Anomaly Detection |
2:30 - 2:40 | Carlos Rodríguez López | Development of an Epigenetic Clock to Predict Gestational Age in pregnant mares |
2:40 - 3:00 | Overrun/Q&A/Break | Break |
3:00 - 3:30 | Cody Bumgardner | The current state of applied medical AI |
3:30 - 4:00 | Adel Elmaghraby | How Technology is Changing Medicine |
4:00 - 4:30 | Varun Dwaraka | Using AI/ML to develope the most predictive and clinically informative epigenetic measures. |
4:30 - 4:40 | Muhao Chen | Model Reduction under Finite Word-Length Constraints for Large-Scale Computing |
4:40 - 5:00 | Moderator | Overrun/Q&A/Closing |
Important Dates:
Deadline for speaker abstract submission: FRIDAY, October 4, 2024
Speaker selection/announcement: WEDNESDAY, October 2, 2024
Summit Keynotes & Speaker Sessions – WEDNESDAY, October 16, 2024
In addition to the registration information, Speakers will be asked for a Title of Talk and Abstract (including a PDF option). Abstract (1-2 pages max) should include a brief description of research or AI oriented educational activity to be presented and a mini-bio. Please also indicate which topic area you feel most appropriate: Medicine/Healthcare, Research (non-medical), or Education); and duration for your talk: Lightning Talk (10 minutes) or Featured Talk (30 minutes) For further info/details select here.
You can put speaker's bio or talk related info here. Duis a mi quis metus porttitor eleifend. Pellentesque finibus ultrices imperdiet. Maecenas auctor tortor quis risus tincidunt, mattis mattis leo placerat. Fusce metus augue, sagittis eget enim vel, venenatis auctor est. In interdum felis massa, ac porta nunc pretium non. In fringilla orci vitae imperdiet malesuada. Vestibulum feugiat lobortis est, in sagittis nisi molestie vel. Mauris ultrices vitae lectus eu feugiat. Fusce semper, nisi at placerat mollis, augue elit pretium enim, eu pellentesque justo purus et lectus.
Donec neque magna, molestie vel varius ut, pretium a urna. Pellentesque placerat nunc eu condimentum pellentesque. Vivamus dictum nisl leo, id fermentum lectus porttitor et. Pellentesque tristique erat libero, condimentum porttitor nisl pharetra et.
Abstract: The Ohio Supercomputer Center (OSC) supports Ohio universities and industrial clients with high performance computing (HPC) services. Each year, OSC serves hundreds of faculty and thousands of students from dozens of universities across the state. Meeting this mission is complicated by numerous challenges such as the rapid evolution of hardware, expansion of student access to HPC and support for new areas such as artificial intelligence. Meeting these challenges requires advances in hardware, software and training. Notable hardware innovations include cooling solutions for OSC’s flagship clusters and Ascend, a new, innovative GPU cluster designed for AI applications. In software, the OSC-led Open OnDemand project provides web-based access to compute resources and is deployed at hundreds of HPC centers worldwide. Lastly, OSC is leading a series of workshops to help cyberinfrastructure professionals gain expertise in burgeoning artificial intelligence (AI) technologies. In this talk, I will review OSC’s impact and highlight these innovations.
Bio: David E. Hudak, Ph.D., is the executive director of the Ohio Supercomputer Center (OSC). He previously was OSC's director of supercomputer services and program director for cyberinfrastructure and software development. His research interests include high-level languages for parallel computing, HPC software engineering and network protocol design.
Prior to his appointment at OSC, Hudak served as associate professor of computer science and engineering at Ohio Northern University from 1992-2000 and as chairman of the department of computer science from 1998-2000. He was named to the Reichelderfer endowed chair in computer science at ONU. Hudak has consulted with the Computational Fluid-Dynamics Laboratory at Wright-Patterson Air Force Base and has served as chairman of the Parallel Computing Curricula for Ohio College Educators group.
Hudak earned a bachelor's degree in mathematics from Bowling Green State University and holds master's and doctoral degrees in computer engineering from the University of Michigan.
Abstract:
NSF’s support to the nation’s science and engineering researchers is built upon advanced computing and data analytics capabilities vital to innovation and economic prosperity. A new era in advanced computing support began in September 2023 with the launch of the Advanced Cyberinfrastructure Coordination Ecosystem of Services and Support program (ACCESS). Closely collaborative relationships between Principal Investigators and cyberinfrastructure professionals (CIP) at the core of this ecosystem often enable breakthroughs in research.
This talk will reflect on CIP successes in order to invite discussion of OAC’s vision to further expand the workforce and democratize opportunities to grow CIP careers. How did achievements from the XSEDE program inform the ACCESS ecosystem strategy? Which obstacles are we focused on overcoming to promote more CIP communities, recruit and retain new CIP talent, and institutionalize career pathways? How can individuals help ACCESS engage with your community? How should we demonstrate progress?
Bio:
Thomas Gulbransen has degrees from University of Rhode Island, Stony Brook University, and Harvard University and is a Certified Project Manager (PMP 2020). He manages projects and initiatives by balancing a triad of experience supporting transformative science, responsive cyberinfrastructure (CI), and financial accountability of projects. Fieldwork in environmental science embedded scientific design principles guided by data quality objectives. Cyberinfrastructure systems and workforce development projects rely on user-driven functional requirements and organizations’ value propositions. Fiscal accountability of data science investments employs project management methods and communications.
Currently he is Program Director, Office of Advanced Cyberinfrastructure, Computer and Information Science and Engineering (NSF) - Responsible for proposal solicitation, review, and funding recommendations for cyberinfrastructure research and education. Responsible for identification and development of integrative programs across NSF directorates and through interagency collaborations. Responsible for enhancement of project management to fulfill NSF’s mission. In addition, he is Project Manager, Hybrid Approach to Repurpose Plastics Using Novel Engineered Processes (HARNESS) (DOE Office of Energy Efficiency and Renewable Energy). Mr. Gulbransen is responsible for coordinating the plans and accountability of the multi-organization team’s 3-year effort to research and develop an innovative platform for polyether- polyurethane (PE-PU) foam waste upcycling to polyols (POs) and diamines.
Abstract: With the Age of AI and technology shifts dramatically shifting the landscape of digital assistant possibilities, new experiences are moving from the imagination of movie makers to the world of enterprise grade solutions. Join Michael Shepherd as he provides a brief history of the impact Gaming has had on the AI industry and walks you through a 10yr progression from predicting hardware failures on 80M+ systems to Digital Human interfaces that can show emotion. With ensembles of Generative AI models, Michael will demo 4yrs of work in Dell’s Digital Assistants that have bypassed the Uncanny Valley problem and are starting to be used in Cities and University. Don’t miss out seeing what the future of HyperAutomation looks like as you learn about the shift to Web 3.0 where Spatial Computing will leverage Personal Assistants to do your tasks. You’ll see live Digital Human interactions answering questions in real time from RAG systems.
Bio: Michael is a Sr. Distinguished Engineer, 24-year Dell veteran and recognized technical evangelist who speaks globally on the evolution and impact of AI and emerging technologies. He helped form Dell’s first AI CORE and was foundational in the original services AI Research team. With a solid track record as a visionary thought leader, he currently leads the Service CTO team as he focuses on next gen- solutions for the Age of AI.
Abstract: Generative AI is poised to revolutionize our interaction with technology, potentially surpassing the smartphone in its impact on our lives and businesses. These tools are not just static software applications; they are dynamic, adaptive systems capable of tasks ranging from tutoring and meal planning to software development and cybersecurity enhancement. This talk will delve into the key capabilities of generative AI, including its ability to understand context, perform advanced human-like reasoning, and incorporate knowledge from diverse domains. Moreover, we'll explore how generative AI is an interdisciplinary technology that will transform all of computing, bridging gaps between fields and reshaping how we approach problem-solving in every discipline.
Bio: Dr. Jules White is Senior Advisor to the Chancellor, Director of Vanderbilt’s Initiative on the Future of Learning & Generative AI, and Professor of Computer Science in the Dept. of Computer Science at Vanderbilt University. He created one of the first online classes for Prompt Engineering (https://coursera.org/learn/prompt-engineering), which is the study of how you converse with AI effectively. His courses have over 450,000 people enrolled. He is a National Science Foundation CAREER Award recipient. His research has won multiple Best Paper Awards. He has also published over 170 papers.
Abstract: We will present an overview of the Oak Ridge National Laboratory's Artificial Intelligence Initiative, which aims to advance the domains of science, energy, and national security. At the core of this initiative are two fundamental thrusts: transformative science applications and cross-cutting assurance. The application thrust focuses on developing AI methods to accelerate scientific discoveries, while the cross-cutting assurance thrust ensures that AI systems are secure, trustworthy, and energy-efficient. Secure approaches include alignment, privacy preservation, and robustness testing for AI models. Trustworthiness is achieved through validation and verification processes, coupled with advanced techniques in uncertainty quantification and causal reasoning. Meanwhile, energy efficiency is prioritized by developing scalable solutions, integrating edge computing technologies, and adopting a holistic co-design approach that optimizes the synergy between software and hardware resources.
Bio: Prasanna Balaprakash is the Director of AI Programs and Distinguished R&D Scientist at Oak Ridge National Laboratory, where he directs laboratory research, development and application of artificial intelligence and machine learning (AI/ML) to solve problems of national importance. His research interests span artificial intelligence, machine learning, optimization, and high-performance computing. He is a recipient of the U.S. Department of Energy's 2018 Early Career Award. Prior to joining Oak Ridge, he was a R&D lead and computer scientist at Argonne National Laboratory. He earned his Ph.D. from CoDE-IRIDIA at the Université Libre de Bruxelles in Brussels, Belgium, where he was a recipient of the European Commission's Marie Curie and Belgian F.R.S-FNRS Aspirant fellowships.
Abstract: Due to its ease of use and vast number of libraries available, Python is seeing an increasing use in numerical computing. However, Python has very little support for CPU parallelism and its native performances are limited, leading to hybrid codebases combining Python with another language such as C++. Furthermore, with the introduction of GPU computing, it is becoming ven harder to fully use the hardware available.
Jax is a Python library developed by Google as a building block for Deep-learning frameworks. It lets us write Python code using a Numpy-like interface then uses a just-in-time compiler to run it efficiently on both CPU and GPU. By giving us a clear separation between the semantic of the code, written by the user, and its optimization, fully taken in charge by the compiler, it gives us a very productive way to write high performance Python code that runs on both CPU and GPU.
This workshop will consist of a short presentation followed by a hands-on session. It will be targeted at developers doing numerical computing or machine learning in Python. No prior experience with JAX is needed but a basic grasp of Python and Numpy will be expected. 2 hours composed of a 30 minutes introductory talk followed by a 1h30 hands-on session.
Bio: Nestor Demeure is a NESAP Postdoctoral Researcher at NERSC with a focus on high performance computing, numerical accuracy, and artificial intelligence. He specializes in helping teams of researchers make use of high-performance computing environments. He is currently working to help port the TOAST software framework to the new Perlmutter supercomputer and port it to graphics processors (GPUs).
Abstract: What if I told you that you can automatically deploy and manage Kubernetes Container Platform across multiple failure domains/availability zones with AZ aware cloud storage, multi-tenant isolation, built-in firewalls, security groups and auto scaling capability?
You would say, "No problem! I can achieve that with any public cloud."
But what if you were tasked to do it on-prem? What if you needed more than a Virtual Machine can provide? What if I told you that you can have all of that in your own datacenter, plus ability to run and manage your Kubernetes over a mix of VMs AND Bare Metal?
Please join us and learn how we have helped our customers break their monolithic on-premise infrastructure and virtualization platform into modern, distributed, container architectures. This presentation will include a technical overview of an architected solution using technologies such as OpenStack, Kubernetes, Ansible, Ceph plus container management tooling.
Bio:
Chris is a Red Hat Cloud Infrastructure Solutions Architect. He is proud to help his clients validate their business and technical use cases on OpenStack, Kubernetes and supporting components like storage, networking or cloud automation and management. He is the father of two little kids and enjoys the majority of his free time playing with them. When the kids are asleep he gets to put the “geek hat” on and build datacenter labs to hack crazy use cases!
Darin is a Red Hat Principal Solutions Architect and has over 25 years of IT experience, wearing various hats over the years ranging from application development to database administrator to system administrator to enterprise architect. It is his experience in such broad categories that allows him to help others in understanding and utilizing cloud platforms such as OpenStack to effectively deliver solutions that make sense. Darin is currently working as a Principal Solutions Architect on the Red Hat Cloud Infrastructure Tiger Team assisting in Proof of Concept deployments, technical deep dives and assistance with problem resolution. Prior to working at Red Hat, Darin's previous OpenStack experience was helping a value added reseller (VAR) devise a cloud strategy to deliver for government agencies and prior to that, working for Mirantis.
Abstract:
The 20 to 30-minute presentation will outline the current environment, problems encountered, remedies and mitigations, plans for the short and near term, and a "hopes and dreams" discussion.
Mr. Bendl is still hoping someone will solve instantaneous cloning. He needs a spare to cover all the meetings, maybe a second to hide in the data center so he can squeeze in more work each day. Hopefully the process will be as simple as deploying a server instance in OpenStack.
Bio:
Kurt Bendl works at the National Renewable Energy Laboratory (NREL) in Golden, Colorado for the Computational Science Center's (CAC) Advanced Computing Operations (ACO) group supporting research focused HPC, virtualization, and cloud operations. Wearing many hats, Kurt fills the roles of OpenStack Lead, Security Operations lead, HPC Systems Administrator, automation engineer (Ansible, etc.), budget oversight, and mentor to newly hired employees. He's also worked as the ACO's network administrator and was originally hired as a software developer.
Kurt is responsible for building and maintaining the CSC's OpenStack/OpenInfra based infrastructure. The current environment is the 3rd "stack" deployed by ACO, and the first to provide support for traditional HPC workloads (RDMA/RoCE/OpenHPC/Slurm).
Abstract: NVIDIA GPUS offer huge acceleration capabilities and are used today in a wide range of scientific disciplines. More than 2,000 applications are accelerated on NVIDIA GPUs, and more are being added each day. This lecture is aimed at researchers and developers interested in harnessing the power of GPUs – for both traditional simulation and physics-informed neural networks. For a developer focus, we'll go through GPU programming languages, optimized GPU-accelerated libraries, and performance portable programming models. Including the new AI-driven multi-physics framework, NVIDIA Modulus. And for the application end-user, we'll show how to access GPU-optimized, pre-compiled containers for some of the most popular scientific applications and frameworks.
Bio: Kristopher Keipert is a Solutions Architect at NVIDIA, focused on engagements in higher education and research. With a research computing background, Kristopher enjoys assisting scientific developers with porting and optimizing their code for GPU. He also helps develop and deliver GPU workshops to enable the next generation of GPU programmers. Prior to NVIDIA, Kristopher was a computational scientist at Argonne National Laboratory where he performed research in analytical performance modeling and contributed as a developer on the NWChemEx project. Kristopher earned his PhD in physical chemistry from Iowa State University.
Abstract: With so many new tools available to interpret and synthesize data sets from different sources in new ways, data continues to grow in both size and payload. With the evolution of Machine Learning, AI and Big Data, new platforms will emerge to answer the needs of progressive infrastructure and data practitioners. In this session, Kartik will cover different educational and commercial trends and use cases in financial and life sciences and how those demands are shaping new requirements for storage infrastructure from petabytes to exabytes. VAST Data is an all-flash data platform built to serve these emerging requirements.
Bio:
Dr. Kartik joined VAST Data in January of 2020, running the global presales organization. He is part of the incredible success of VAST Data which increased almost 10-fold in valuation and revenue in this period. An accomplished technologist and executive in the industry, he has a wide array of experience in Cloud Architectures, AI/Machine Learning/Deep Learning, as well as in the Life Sciences, covering high-performance computing and storage. He has had a lifelong deep passion for studying complex problems in all spheres spanning both workloads and infrastructure at the vanguard of current day technology.
Prior to his work at VAST Data, he was with EMC (later Dell) for two decades, as both a Distinguished Engineer and global executive running the Converged and Hyperconverged Division go-to-market. He has a Ph.D. in Particle Physics with over 75 publications and 3 patents to his credit over the years. He enjoys mathematics, jazz, cooking and travelling with his family in his non-existent spare time.
Abstract: Join the Microsoft Education team as we demonstrate how to deploy secure, standards-compliant research computing environments on the Azure cloud.
Bio: Eric grew up in Eastern Kentucky and discovered his first passion, through the inspiration of an amazing teacher, when he started programming for the first time in the 4th grade. He studied computer science at Morehead State University, and after working professionally in IT for 15 years, decided to pursue his MBA. Currently Eric is researching learning and performance technologies at Florida State University and hopes to complete his doctoral program in April of 2023. Eric has worked at Microsoft since 2011 and in the Education vertical since 2016, where he has focused on Azure technologies as a Principal Cloud Solution Architect. Eric also leads a national community of practice for education customers which meets weekly via Microsoft Teams.
Abstract: The cloud is often seen as a binary alternative to local resources for research. This talk explores how a hybrid approach enables many of the best elements of both local and cloud resources by addressing some key researcher challenges and concerns that surround both. Generally, researchers want to get on with their research. Fiddling with compute and storage infrastructure is not high on the list of things researchers prefer to spend time. Local resources can get the job done; however, they are often either over or, more often, under-provisioned; not to mention based on point-in-time technology. Costs, deployment and operational complexity, and inherent limits are well understood and bounded. There is comfort but it comes with tradeoffs. The cloud has appeal for its ability to track technology developments, offers a wide range of resources, flexible configuration and deployment options, and scale; all on-demand. The cloud can also appear daunting from the perspective of complexity and cost. Researchers are quick to relate stories of spending surprises or cringe at the prospect of the time investment to (re)learn skills they would rather not have to know in the first place. A hybrid approach, enabled by modern cloud tooling, allows researchers to break free from the constraints of local resources when it makes sense. Cloud can be leveraged, as needed, when the capabilities or availability of local resources are limited, or specific resource types are unavailable, or research workflows do not map efficiently to a mostly fixed local infrastructure. In these situations, specific cloud resources can be created intelligently, on-demand, and even ephemerally while integrating into a local user experience. We will look at some examples of how a hybrid approach can be applied to HPC/HTC, script execution, and desktop research applications to enable and accelerate research in a complementary manner rather than as a binary choice.
Bio: Scott Friedman, Ph.D. is a principal member of the Higher Education Research Group at Amazon Web Services (AWS). Scott focuses on articulating how university researchers and their institutions can most effectively leverage AWS technologies and capabilities that enable state-of-the-art computational, data science, and AI/ML based research. Prior to AWS, Scott was the CTO for Advanced Research Computing at UCLA reporting to its Vice Provost for Academic Computing and Vice Chancellor for Research. His role at UCLA centered on strategic initiative alignment, representation of UCLA to the UC system and nationally, research enablement, and resource delivery through advanced computing resources to the UCLA community. He has received grant funding as a PI, Co-PI, and Senior Staff from NSF, NIH, DOE, and NEH. Scott holds an M.S. and Ph.D. in Computer Science from UCLA with research interests in distributed and parallel systems performance, optimization, and analysis.
Abstract: In this session, we will discuss strategies for designing scalable parallel algorithms for high-performance computing. We begin with an overview of parallelization concepts and parallel computer architectures, followed by a discussion of the elements of parallel algorithms and strategies for designing parallel algorithms. In addition to discussion of concepts, we present real-life problems and solutions, and the implications of future architectures on parallel algorithm design.
Bio: Rebecca Hartman-Baker leads the User Engagement Group at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, where she's responsible for NERSC’s engagement with the user community to increase user productivity via advocacy, support, training, and the provisioning of usable computing environments. She's a graduate of the University of Kentucky (BS, Physics) and the University of Illinois at Urbana-Champaign (PhD, Computer Science).
Abstract:
Maureen Doyle, Professor and Chair of Computer Science, Northern Kentucky University
David Fardo, Professor of Biostatistics, Stephen W. Wyatt Endowed Professor of Public Health, University of Kentucky
Jens Hannemann, Associate Professor of Computer Science, Kentucky State University
Harry Zhang, Associate Professor of Computer Science and Engineering, University of Louisville
Abstract: The data tsunami is upon us: transformative research is increasingly data-intensive, with collections in Terabytes to Petabytes, and Exabytes coming soon. Projections of research storage growth show that, by 2025, research data will outstrip even YouTube. Yet many institutions are underprepared not only for the "volume, velocity and variety" of data, but especially for stewardship of exponentially growing collections. The University of Oklahoma (OU) has been awarded a National Science Foundation grant that has acquired and deployed, and will maintain for 8+ years, a large scale storage resource -- the OU & Regional Research Store (OURRstore) -- to enable faculty, staff, postdocs, graduate students and undergraduates to pursue data-intensive research, by building large and growing data collections, to share these datasets with collaborators and even the public, and to provide this capability to all institutions in (a) the Great Plains Network and (b) Established Program to Stimulate Competitive Research (EPSCoR) jurisdictions. Via an innovative, low-cost business model, researchers buy their own tape cartridges, good for 8+ years, and pay zero usage charges (just cartridge and shipping costs). OURRstore is expected to have hundreds of users.
Dr. Henry Neeman is the Director of the OU Supercomputing Center for Education & Research, Associate Professor in the Gallogly College of Engineering and Adjunct Associate Professor in the School of Computer Science at the University of Oklahoma, as well as the joint co-manager of the XSEDE Campus Engagement program with Dr. Dana Bruson of Internet2.
He received his BS in computer science and his BA in statistics with a minor in mathematics in 1987 from the University at Buffalo, State University of New York, his MS in CS from the University of Illinois at Urbana-Champaign (UIUC) in 1990 and his PhD in CS from UIUC in 1996.
Prior to coming to OU, Dr. Neeman was a postdoctoral research associate at the National Center for Supercomputing Applications (NCSA) at UIUC, and before that served as a graduate research assistant both at NCSA and at UIUC'c Center for Supercomputing Research & Development.
In addition to his own teaching and research, Dr. Neeman has collaborated with dozens of research groups, applying High Performance Computing techniques in fields such as numerical weather prediction, bioinformatics and genomics, data mining, high energy physics, astronomy, nanotechnology, petroleum reservoir management, river basin modeling and engineering optimization.
Abstract:
Deep Learning (DL) techniques have been actively investigated in manufacturing, such as machine condition monitoring, process modeling and quality evaluation, because of their promising capabilities in automatic feature extraction and nonlinear system modeling. However, significant roadblocks hinder practical DL deployments on the shop floor. Taking machine condition monitoring as an example, DL models require consistency between training scenarios and application scenarios in terms of machine conditions and data distributions. But data from normal operating conditions dominate training sets since it can be collected easily; data from faulty conditions can be cost- and time-intensive to capture. Thus, real-world fault modes could span more categories than those anticipated or available during training, and deployed models may be confronted with previously unseen, novel faults. In addition, while the high-velocity data streams from the factory floor could stock large repositories of historical data, those data will be unlabeled and missing ground truth information about the machine’s actual condition, notwithstanding the cost and complexity of managing large, unwieldy data sets.
Addressing these constraints, this presentation presents a novel approach to automatically learn from unlabeled data streaming and continuously update DL models to adapt to new conditions without prior knowledge of condition change points, through the integration of Self-Supervised Learning (SSL) with Continuous Learning (CL). Specifically, SSL is realized through Barlow Twins with necessary time-series augmentation transformations to homogenize similar samples and cluster unlabeled sensing signals. A Mixed-UP Experience Replay (MixER) strategy is developed to improve the state-of-the-art of CL techniques for adapting to new conditions while preventing catastrophic forgetting of previous conditions’ characteristics, through diversifying limited examples randomly collected throughout CL. Experimental evaluation and comparison to existing unsupervised CL techniques have been conducted. The results validate the effectiveness of the developed unsupervised CL framework in clustering unlabeled data from successive motor health conditions (gradually increasing from 2 conditions to 8 conditions) without prior knowledge of change points or access to data from previously seen conditions.
Bio: Dr. Peng (Edward) Wang joined the Department of Electrical and Computer Engineering at the University of Kentucky since August, 2019. He his Ph.D. degree in Mechanical and Aerospace Engineering from Case Western Reserve University in 2017, respectively. His research interests are in the areas of stochastic modeling and machine learning for machine condition monitoring and performance prediction, manufacturing process modeling and optimization, and human-robot collaboration. Dr. Wang has published over 30 peer-reviewed papers in journals such as CIRP Annals-Manufacturing Technology, IEEE Transactions of Automation Science and Engineering, SME Journal of Manufacturing Systems, and gets over 3,000 citations according to Google Scholar. He is the recipient of the Outstanding Young Manufacturing Engineer Award from the Society of Manufacturing Engineers in 2023, Best Student Paper Award from the IEEE Conference on Automation Science and Engineering (CASE) in 2015, the Outstanding Technical Paper Award from the SME North American Manufacturing Research Conference in 2017, 2020, and 2021, the Best Paper Award from CIRP Conference on Manufacturing Systems in 2020. He also received the First Prize in the Digital Manufacturing Commons (DMC) Hackathon, organized by DMDII in 2016.
Leslie Woltenberg, PhD
Isaac Joyner, MPH
Abstract:
Purpose: It is widely accepted in Health Professions education that preceptor feedback is valuable as part of the formative learning experience for students. However, there is a paucity of literature around effective utilization of such feedback. Prior studies describe the challenges around using feedback to affect behavior change [1] and achieve mastery learning [3]. Also problematic is the lack of a clear educational strategy and theoretical framework by which to operationalize this feedback [2]. The purpose of this study was to explore and describe qualitative themes in preceptor feedback, investigate signals in the data regarding board exam task areas, and identify opportunities to support student achievement learning outcomes.
Methods: This exploratory study used text analysis software (freeware package AntConc) to evaluate qualitative data from the Preceptor Evaluation of Student Performance tool collected over three academic terms (n=1242). The preceptor evaluation/feedback tool is completed by the preceptor at the end of each clinical rotation for each student providing an overall narrative evaluation with suggestions for improvement. Preceptor evaluations were evaluated using board exam blueprint task areas and commonly identified terms and phrases, e.g., “differential diagnosis.” The software also captured the context in which the key words and phrases were used. Mind mapping diagrams were used to depict relationships among the themes.
Results: The text analysis identified 11 themes including “Confidence, Presentation, History” with context. These themes were identified from qualitative data from preceptors across 12 four-month supervised clinical experiences representing academic terms 2020-2023. The mind mapping analysis revealed two broad areas 1) maturity/professionalism and 2) medical task area competence. This broad view highlighted themes observed across a variety of clinical settings and indicated opportunities to further evaluate our curriculum and provide focused student support around board exam task areas.
Discussion: Results of this exploratory study suggest that themes from qualitative preceptor feedback may provide meaningful information around curricular modifications that support student attainment of programmatic learning outcomes. Additionally, results from this study can be used to inform a variety of constructive learning strategies to help students further develop non-cognitive skill sets.
Bio :
Shelley Irving is a PA educator, has served rural communities as a practicing PA-C for over a decade, and is engaged in scholarship regarding contemporary health professions education and practice.
Leslie Woltenberg has over ten years of experience as an educator, earned a doctoral degree in educational evaluation, and is actively engaged in scholarship regarding innovation in curriculum, assessment, and health professions education. Both presenters are deeply committed to and actively involved in rural education and research.
Isaac Joyner earned an MPH which focused on health program planning and evaluation and 30 years' experience in data management and reporting across a diverse portfolio of systems.
Abstract:
Because people used stone tools in nearly all times and places, these artifacts are widely considered one of the most important material classes for understanding ancient societies. In my current research, I combine dynamic image analysis and machine learning to the analysis of soil samples taken from archaeological sites in order to classify the tiniest pieces of stone (measuring < 6.0 mm) that are knocked off when a stone tool is being made. Because these tiny artifacts (known as microdebitage) tend to become embedded in the soils where stone tools were made, archaeologists can map the areas where microdebitage is located in order to identify where ancient people were manufacturing stone tools.
When combined with other artifactual data, this has the potential to tell not only how stone tool production was spatially organized but also the class, gender, and occupation of the primary producers of stone tools. In spite of this, archaeologists have shied away from analyzing microdebitage due to the tedious and time-consuming nature of manually sorting, counting, and measuring microdebitage within soil samples. Dynamic image analysis and machine learning have proven to be a powerful combination of tools for this problem, however, decreasing the time needed to analyze a single sample by 98 percent while maintaining a minimum classification accuracy rate of 96 percent.
Bio: Phyllis Johnson is currently a University Research Postdoctoral Scholar in the Department of Anthropology at the University of Kentucky. Dr. Johnson's research involves the development and application of computational methods to address difficult archaeological questions surrounding ancient economies, site formation processes, and social structure. Dr. Johnson has over 15 years of archaeological experience in the Eastern US and Mesoamerica, and her current research combines experimental archaeology with novel machine and deep learning techniques to the examination of stone tool production to illuminate ancient actors, such as commoners and women, who are often rendered invisible in the archaeological record.
Abstract: The interior of the Earth is inaccessible by direct measurements and observation. We rely on numerical modeling of dynamics of the interior to understand heat flow, convection and plate tectonic motion. My research group uses different geodynamic codes to model planetary interiors and the surface expression of the dynamics inside. Recent projects include exploring the onset of plate tectonics, modeling tears in slabs and exploring plume transport. The onset of plate tectonics is explored in an early Earth project. The early Earth may have possessed a large hot magma ocean trapped near the core-mantle boundary after formation during differentiation, and likely containing different elements from the surrounding mantle. We examine how composition-dependent properties in the deep mantle affect convection dynamics and surface mobility in high Rayleigh number models featuring plastic yielding. This work uses a new convection code with an improved tracer ratio method. Using another geodynamic code, Underworld, we model subduction zones with various tears in the top and bottom of the slab and examine the surface expression of these features. We run models with and without an over-riding continental lithosphere which produce various subsurface (and surface) topographical and geological features of interest. Other geodynamical codes are used to incorporate seismic data into our models and explore changes in physical properties of the interior. The different geodynamic codes allow for improved resolution in the areas of interest for the given research project. All these models are run in parallel on super computers including the LCC at UK.
Abstract: University of Kentucky Libraries has been curating digital archival collections for over 25 years. These resources are one-of-kind collections, not to be found at any other research library. They consist of digitized, born-digital, and web-based material and are freely available via several locally hosted digital libraries. While these primary source materials have traditionally been utilized by the humanities and social sciences, new technologies and software have expanded their potential and opened the door to being applied in computationally driven research. This presentation will provide an overview of key University of Kentucky digital libraries and the collections they hold as well as review how these resources may be used for analytical research. It will also highlight the benefits of using cultural heritage resources and how new tools and techniques can transform these collections into rich datasets. It will close by outlining UK Libraries’ future direction in growing and improving access as well as supporting research and engagement with these materials.
Abstract: Background The past two decades have witnessed the emergence of digital pathology as one more tool in the pathologist toolbox. Yet adoption among pathology practices has been variable at best and often mired by implementation barriers ranging from cost, regulatory and validation considerations, and institutional and pathologist buy-in among others. In addition, the motives behind adopting digital pathology have evolved significantly over time. Initially, most adopters were driven by a desire to improve logistics or meet a well-defined need such as education or research. With the arrival of clinical grade artificial intelligence (AI)-based systems and machine learning, digital pathology has become more of a means to an end as opposed to an end in and of itself. The two, digital pathology and artificial intelligence, have now become inseparable terms and the digital pathology adoption question is shifting from “why go digital to when and how to implement an optimal, scalable, and future-proof solution. Implementation, particularly in academic medical center setting, lacks standardization presenting additional challenges to an already complex process. Adopters must contend with the fragmented nature of available solutions, the lack of optimal interoperability and the risk of technologies going obsolete prematurely. As such, some advocate for a piecemeal approach to implementation which entails deploying sequential solutions to address small-use cases, such as digitizing consultation cases, tumor board, or specific services, hoping that small wins will set a foundation for a full transition with prospective scanning. We aim to share an accelerated implementation strategy of a comprehensive AI-enabled digital workflow experienced at our academic health system as it may serve as a potential future implantation roadmap for other adopters. Methods By evaluating the status of digital pathology technologies, we have concluded that the piecemeal approach provided limited value and no actual guarantee of streamlined implementation. Therefore, we endeavored to take a more holistic approach to creating a solution for 100% prospective scanning. We optimized physical integration of scanning into the histology laboratory and created interfaces between the laboratory information system and digital pathology applications. Results It took 9 months to get to 100% prospective scanning. Our pathologists have been increasingly using digital pathology organically. AI is integrated into the viewer with no need to open third-party applications. We have validated Paige Prostate AI as a second read system for prostate biopsies and we are in the process of deploying two additional AI modules, one to read breast cancer biopsies and another to read lymph nodes slides of breast cancer cases. We also adopted 8K monitors to allow maximum “field of view. Most importantly, this approach enables primary diagnosis, efficient AI, and consultation workflows since all new cases are scanned by default. Conclusion Our experience demonstrates that focusing on 100% prospective scanning is a viable and potentially a very effective approach. Presently, technology is mature enough to allow accelerated implementation, but integration must be prioritized to create a seamless experience for the laboratory and pathologists.
Abstract: Clinical implementation of digital pathology and artificial intelligence (AI) has the potential to revolutionize how pathology is perceived in health care systems. We conducted a validation study to evaluate the most recent version of Paige Prostate Detect (Paige.AI Inc., New York, USA). The novelty is that this new algorithm version (2.0.0) is not only capable of detecting carcinoma, but also quantifying percentage of Gleason patterns which is fundamental for clinical care. 50 cases were randomly selected from a list of 80 consecutive cases. Whole slide images were acquired from 1,220 H&E slides scanned in a Leica Aperio GT450 at 400x. Paige Prostate Detect was applied to each slide and results were compared with the original diagnoses. A data analysis pipeline was built with Python 3.9 program language and Pandas library. Paige Prostate Detect results were recorded for each whole slide image, while pathologists' evaluations were recorded by part, which consisted of evaluation of two or more slides. 50/1220 (~4%) diagnostic discrepancies were found based on individual diagnostic categories assigned to each slide. The slide-based and case-based discrepancies of multiple variables (extent of involvement, Gleason pattern breakdown, Gleason score, grade group) were also calculated. Clinical validation of AI-based systems is paramount. As the Paige Prostate Detect product has evolved, validation of each output is required. Accurate quantification has the potential to assist on documentation of all variables needed for clinical care. Prospective multi-institutional validation is required to further develop best practices related to application of AI in pathology.
Abstract: Members of the Medical Library Association’s Data Caucus and RDAP are working on guidance documents that will be available for researchers, librarians, and others working with NIH-funded research data. These guidance documents will help researchers write data management plans and find repositories for their data sets. Though not released yet, the presenters will discuss some of the overarching concepts that will be provided in these documents. In addition, the presenters will describe what the Office of Science and Technology Policy's new guidance to make federally funded research freely available without delay means for researchers. As medical librarians, we will discuss tools available to help researchers with these new standards and what NIH and other granting agencies mean by the sharing of research data.
Abstract: Drug discovery is critical for improving life and many companies with significant financial resources at their disposal do it in the quest for large profits. However, for academics wanting to make a difference by potentially discovering helpful drugs but with extremely limited financial resources, there are still ways to do drug discovery. We present an overview of the computational process and list free tools and resources for each step that occurs before wet lab testing.
Abstract: One-dimensional parallel tempering simulations are an important tool used to understand the thermodynamics of physical systems. The efficiency of such simulations can vary greatly depending on the free energy landscape of the system. In this talk, I will present a two-dimensional parallel tempering approach that offers greatly improved efficiency over the traditional one-dimensional approach. The efficiency increase is achieved by linking difficult-to-simulate systems with easier-to-simulate counterparts. Data from an example polymer simulation will be given.
Abstract: The Louie B. Nunn Center for Oral History at the University of Kentucky Libraries has unlocked its oral history archive and invited the world inside. The Nunn Center provides online access through its catalogue and content management system, SpokeDB, to 17,000+ audio and video oral history interviews. Research and discoverability of the recordings are enhanced by indexing and transcription sync using the Oral History Metadata Synchronizer (OHMS). In the 14 years since its development at the Nunn Center, OHMS remains a free and open-source, web-based tool used by over 700 institutions worldwide. The combination of SpokeDB and OHMS has transformed the oral history user experience for UK Libraries serving over 200,000 interviews each year. This presentation will provide an overview of SpokeDB and OHMS, the mechanics behind each, and how they work together to strengthen knowledge exploration.
Abstract: Horse genetics and genomics: We're on the right track The horse genetics and genomics community has just completed their work in the USDA funded Functional Annotation of ANimal Genomes project. We now have enough data and have not lonely built a very contiguous, complete, and accurate reference genome but we are now able to provide comprehensive annotation derived from high throughput sequencing including genomic variants, IsoSeq, RNA-Seq, and a variety of ChIP-Seq markers that will allow us to assess the genome of virtually any equid in a functional genomic context. In the next phase of our work, we are embarking on the creation of a Horse Pan-Genome that will include not only reference quality genomes for many breeds, but a Telomere-2-Telomere genome for the horse and the donkey. Additionally, we are developing resources such as containerized mapping and variant calling pipelines, and we are working to revolutionize the way genomic data is stored and shared in animal research. Much of this work is being done on, or is based at the University of Kentucky Morgan Compute Cluster.
Abstract: A couple of data science nerds will discuss practical steps to improve the chances of getting a project across the finish line while delivering more value along the way. This experience has been gained through 20 years of applying data science solutions across various industries of all shapes and sizes.
Bio:
Doug has been working in data science and innovation for almost 20 years, previously VP Data Science at 84.51 and Sr. Director Research and Development at Kroger. His teams have delivered real-time technology that transformed organizations. At AMEND, Doug collaborates with many local companies to leverage data science to improve top and bottom performance, and takes pride in delivering value with over 90% of the data science projects.
Joe is a senior data scientist at AMEND Consulting focusing on technical solutions. His skills & interests include prescriptive analytics, process automation, and data wrangling. Using these skills he is able to build custom, sustainable tools to drive growth & efficiency gains for clients across various industries.
Abstract: Increasing age is the biggest risk factor for chronic disease and death, and therefore identifying ways to accurately predict biological age is imperative in understanding how interventions and therapeutics are affecting the aging process. In recent years, DNA methylation (DNAm) has shown remarkable associations to aging and longevity, as recent research efforts have provided compelling evidence to establish DNAm as a powerful biomarker for various disease states. One application of this biomarker is the predictive ability of DNAm in estimating biological age. In this presentation, I will describe recent developments of epigenetic age predictors based on DNAm datasets using multivariate and machine learning methods. I will share TruDiagnostic’s development of an epigenetic age predictor, which includes the development of an epigenetic clock of high accuracy and precision using Deep Learning and Elastic Net regression modeling upon DNAm blood data (N = 10,251, R2 > 0.94, MAE < 2.59 years, RMSE < 3.29 years). I will then describe how this epigenetic clock compares to previously developed clocks by using DNAm data collected for clinical trials conducted at TruDiagnostic. We observe that the developed clock is sensitive in identifying biological age changes due to interventional treatment, such as application of senolytic treatment Dasatinib and Quercetin, as well as exhibiting strong association to frailty and surgery outcome measures. Lastly, I will use these findings to make the case for developing a multi-omic clock, and highlighting some of our efforts. Ultimately, the conclusions from this presentation will provide insights in utilizing advanced statistical and computational methods that are imperative in developing diagnostic tools of aging from biomedical data.
Bio: Dr. Varun Dwaraka received his BS in Molecular, Cellular, and Developmental Biology from University of California, Santa Cruz, and a PhD in Biology from the University of Kentucky. Dr. Dwaraka has used NGS and array technologies to identify key genes implicated in initiating salamander limb regeneration, and using multi-omic approaches to identify their upstream epigenetic targets. These studies have been presented and published widely. He now leads the development of epigenetic and multi-omic based clocks at TruDiagnostic Inc., a company focusing on methylation array-based diagnostics to empower clinicians and researchers in better understanding the fluid epigenome
Abstract: Rare events are characterized by many observations of negative events and few observations of positive events. If the loss function weighs each observation equally, the aggregate weight on negative events is far greater than that on positive events. As a result, the trained model performs well in predicting negative events and poorly in predicting positive events. This paper uses dividend initiations to show that both oversampling and class-weighting can mitigate this problem. We find that (1) the machine learning (ML) method of random forest performs better than the traditional econometric method of logit regression in predicting dividend initiations: the area under the receiver operating characteristic curve (AUC) scores for the two methods are 0.68 and 0.62, respectively; (2) oversampling and class-weighting improve the AUC scores of random forest and logit regression by 2-4%; (3) predictions based on random forest with SMOTE oversampling generate monthly five-factor alphas (Fama and French, 2015) of 0.28% (equal-weighted), 0.36% (value-weighted), and 0.37% (probability-weighted).
Bio: Dr. Mark H. Liu is an associate professor of finance with tenure and the founding director of the master of science in finance program at the University of Kentucky. He is the author of the book Make Python Talk: Build Apps with Voice Controls and Speech Recognition (2021, No Starch Press). His research interest is in machine learning and corporate finance (IPOs, Mergers & Acquisitions, Corporate Governance, Financial Analysts, Dividend Policy, and Corporate Restructuring). He obtained his Ph.D. in finance from Boston College. Dr. Liu has published his research in top finance journals such as Journal of Financial Economics, Journal of Financial and Quantitative Analysis, Journal of Corporate Finance, and Review of Corporate Finance Studies.
Abstract: The WKU Bioinformatics and Information Science Center is being rebranded as The WKU Applied Center for Data Science. Our center will focus on-establishing industrial connections, aiding in research collaborations across campus, and develop educational outreach opportunities. In this talk, we will introduce ourselves and our values that will help WKU grow and further develop data science across the WKU campuses and the surround regions.
Bio: Richard Schugart is an associate professor of mathematics at Western Kentucky University. In the Fall 2021, Richard Schugart became the director of the WKU Bioinformatics and Information Data Science Center. Richard will continue as director of the Center re-branded as The WKU Applied Center for Data Science.
Abstract: Existing literature shows that cyber-psychological issues among online users are on the rise, with mental health being the new cybersecurity attack surface, and COVID related misinformation, disinformation, and “fake news” being the corresponding attack vector amidst the ongoing pandemic. The threat of an online user being a victim to this is so significant that the World Health Organization calls this a COVID 'infodemic'. Psychological experts have termed this as a form of COVID psyber-security attack (COVID-PSA). Research into this area is new and emerging, with a lot of prospective scope of work. In this talk, we present a R&D project, in which we have attempted to address this COVID-PSA threat by implementing a machine-learning driven knowledge recommender, which is meant to be an adviser for users regarding the credibility of online COVID information.
Bio: Seth Adjei, Ph.D. is an assistant professor in the Computer Science department and a co-director of the Data Science Program at the College of Informatics, NKU. His research is in the application of data science to education and, recently, cyber-security. Seth’s work involves the application of machine learning techniques in the development of and research into tools that enhancing students learning in Mathematics and Elementary Computer Programming. In recent times, Seth has collaborated with Cyber Security experts to explore the application of machine learning in detecting Cybersecurity Threats around the dissemination of information about the COVID Pandemic.
Abstract: Genetic data are notoriously difficult to analyze because of the breadth of data available for each individual, high natural variation throughout the genome, multicollinearity of genetic variants, interactions between genes and environmental factors, and various other genetic biases. We developed the first and only algorithm to identify one of these genetic biases called ramp sequences. Ramp sequences are small stretches of slowly-translated codons at the beginning of genes that counterintuitively slow ribosomes in order to increase overall protein levels by limiting downstream ribosomal collisions. Ramp sequences are found in all domains of life, but can significantly differ between human populations and tissues based on the relative codon adaptiveness within each cell. We developed The Ramp Atlas (https://ramps.byu.edu) to allow users to interactively explore how ramp sequences relate to cellular gene expression. Additionally, we demonstrate that ramp sequences may partially explain SARS-CoV-2 proliferation patterns and reveal a biological mechanism underpinning some genetic associations with Alzheimer's disease. Ramp sequences are an exciting avenue of research that can facilitate large-scale data analyses that may lead to better understanding the basic biology underlying the relationship between human genetics and disease.
Bio: Dr. Justin Miller is an Assistant Professor in the Division of Biomedical Informatics and affiliated with the Sanders-Brown Center on Aging at the University of Kentucky. Dr. Miller actively collaborates with various researchers at the University of Kentucky, Brigham Young University, and Washington University in St. Louis. He uses machine learning to identify genetic and phenotypic subtypes of Alzheimer's disease that he hopes will lead to subtype-specific treatments of the disease. Dr. Miller has published over 25 peer-reviewed articles, including over 15 first or last author papers, and he is currently supported through the BrightFocus Foundation grant #A2020118F and the University of Kentucky Alzheimer's Disease Research Center funded through the National Institutes of Health grant #1P30AG072946-01.
Abstract: This talk will define some of the exciting progress being made in Heritage Science at the University of Kentucky, such as the "virtual unwrapping" of ancient documents, with a focus on advanced computational infrastructure and machine learning techniques, which are helping to create new approaches to longstanding problems. The talk will also describe the emerging EduceLab midscale infrastructure, to be commissioned over the next five years, which will encourage the development of new techniques and user communities in Heritage Science.
W. Brent Seales is the Alumni Professor of Computer Science at the University of Kentucky and a Getty Conservation Institute Scholar (2019-20). Seales’ research applies data science and computer vision to challenges in the digital restoration and visualization of antiquities. In the 2012-13 academic year he was a Google Visiting Scientist in Paris, where he continued work on the “virtual unwrapping” of the Herculaneum scrolls. In 2015, Seales and his research team identified the oldest known Hebrew copy of the book of Leviticus (other than the Dead Sea Scrolls), carbon dated to the third century C.E. The reading of the text from within the damaged scroll has been hailed as one of the most significant discoveries in biblical archaeology of the past decade.
Abstract:High-Performance Computing and Artificial Intelligence (HPC/AI) solutions are addressing today’s grand challenges in Health, Sustainability, Security, and Education. Frontier became the first Exascale computer on the Top 500 list this spring. Today, there’s an edge server powerful enough to have led the 2001 Top 500 list. Both these developments are game changers for universities and research organizations.
This presentation discusses the architectures delivering HPC/AI from the edge to the cloud and space.
Bio: Steve Heibein is the Public Sector Artificial Intelligence Lead for Hewlett Packard Enterprise. Before HPE, Steve served as CIO, CTO, or Technology VP for 20 years at several tech and media companies. In these roles, he oversaw AI, machine learning, and data analytics projects in life science, energy forecasting, fraud prevention, natural language processing, identity theft, cybersecurity, and satellite imagery.
Steve advises organizations on using and deploying AI solutions and regularly presents at artificial intelligence and high-performance computing events.
Abstract:As research creates increasing amounts of data, the challenges of storing, managing and accessing that data can be difficult. Organizations are trying to determine what mix of NAS, object storage, automated tape and cloud are right for them.
Join this session for an in-depth look at how organizations are combining a variety of storage technologies in hybrid approaches, including automating the movement of data to various tiers of storage based upon usage patterns and user policies, to help them to find the right balance of manageability, flexibility, cost and performance they need.
Bio: Bachelor’s in electrical engineering. MS - Wireless Communications.Over 20 years of professional experience in the IT, Software and Broadcast industries (Media Production, Media Asset Management, Media Archive, Automation) and AWS Certified Cloud Practitioner.
Abstract: HPC adoption, usage, customer and workload diversity, and scale have all been a variety of factors to consider, and now we are poised at multiple major transitions to support this growth in importance, usage, and impact. Cloud computing services have now fully embraced HPC due its growing importance and market, and are increasingly offering performance and features that have long been limited to on-prem, bare-metal dedicated systems—and now we’re starting to see ‘power source and grid aware’ data centers supporting HPC resources. Analytics and AI enabled by the steep exponential increase in digital data from computers, devices and sensors are driving changes in the HPC software ecosystem, from the use of Python and R to web interfaces and Jupyter notebooks to containers and Kubernetes, as well as new, scalable analytics and AI software. Real-time applications are driving the instrumentation of the world’s infrastructure and places and the desire for more autonomous process and equipment, pushing the need for performance to the edges: the places where decisions must be applied, in real-time, often without human guidance. Even the technologies for providing high performance (and performance per watt and per dollar) are seeing a renaissance, from emerging HPC CPUs, new features in GPUs, AI-focused accelerators, smart NICs and more, and the buzz in quantum computing is growing as infant technologies evolve rapidly under heavy investment. HPC was once (mostly) on-prem, bare-metal, dedicated systems for massively parallelized simulation applications, but it’s applications, usage, locations, technologies, and time constraints are all diversifying. HPC will be everywhere, all the time, and this talk will present the rapid changes now and near future.
Jay Boisseau is an experienced, recognized leader and strategist in advanced computing technologies, with over 25 years of experience in building and leading computing programs, departments, and organizations. As the AI & HPC Technology Strategist at Dell Technologies, Jay evaluates new technologies and approaches for HPC and machine/deep learning, assesses customers’ workloads needs and plans, and helps design new solutions to increase customers’ innovation, productivity, and efficiency. His current primary focus areas are cloud-enabled HPC for simulation, data analytics (HPC on demand), contributing to the Omnia cluster software project, working with strategic HPC/AI customers, and developing strategies for new HPC/AI solutions in key emerging verticals. Additional projects include evaluations of several new AI & HPC technologies, assessing quantum computing prospects for HPC/AI, leading the Dell Technologies HPC Community, and some confidential (NDA required) projects.
Before coming to Dell Technologies in 2014, Jay created the Texas Advanced Computing Center (TACC) at The University of Texas at Austin and led it (2001-14) to world prominence. Under Jay’s leadership, TACC deployed numerous world-class computing systems for U.S. open scientific research and developed new programs to enable thousands of researchers to accomplish more with these powerful technologies. Prior to TACC, Jay worked as an associate director at the San Diego Supercomputer Center (1996-2001) and at the Arctic Region Supercomputing Center (1994-96). Jay received his doctorate in astronomy from UT Austin for his advanced computational research in modeling the structure supernovae explosions.
Abstract: This presentation will discuss the key elements of Fujifilm’s Object Archive software, how it extends long-term data management capabilities related to computational workflows, scientific research and digital preservation efforts. Object Archive addresses long-term data archiving challenges with its modern tape technology and simple integration with S3-compatible API. Learn how Fujifilm tape technology and software works together with 3rd party products such as iRODS, Starfish, and Spectra Logic to increase data management efficiency and resiliency across research and computational workflows.
Chris Kehoe is head of Infrastructure Engineering and Data Management Solutions, Fujifilm Recording Media U.S.A., Inc. Chris has been working as a data management practitioner for over 26 years. Driving solutions for companies like STK/SUN Microsystems, Spectra Logic, NOAA and Xerox. As an expert in data management solutions, he leads the charge for FUJIFILM’s next gen storage solutions. Chris Kehoe and avid adventurer, resides in Colorado with his family.
Abstract: This session overviews the options for storing all forms of research data for both short-term with quick access to long-term archival storage (viable for many years). It will cover multiple storage technologies from Networked Attached Storage (NAS), object storage, to automated robotic tape archives, and hybrid approaches including automating the movement of data to various tiers of storage based upon usage patterns and user policies.
Mr. Fitch has spent 42 years in technology, and 16 years at Spectra Logic helping some of the largest research universities to implement more efficient and cost-effective ways to manage research data. His clients manage from tens of TB’s to hundreds of PB’s of data.
Abstract: Since their beginning in a NASA project around 1994, Beowulf clusters have become the essential class of high performance computing (HPC) architectures. Although the supercluster platform has widely served the community, the innovation paradigm has come to HPC and its system infrastructure—composable disaggregated infrastructure or CDI. Fully disaggregated infrastructure has several benefits over monolithic or stranded resources, eliminating the rigid, hardware-bound silos that have held companies back from their digital transformation strategies. This discussion will outline why CDI is a future-proofing strategy for the enterprise's HPC, AI, and advanced computing needs.
Earl J. Dodd is World Wide Technology's Global HPC Business Practice Leader. He provides HPC/HPDA/Supercomputing strategy, technology enablement, business development, and marketing and sales support to WWT's global enterprises and governments. Earl helps achieve a customer's desired ROI by leveraging HPC technology, extreme data in motion, and the Cloud on secure ultra-scale architectures and collaboration environments. This effort drives the next generation of computationally steered workflows in decision support environments for real-time situational awareness and institutional learning.
Abstract: As Deep Learning continues to become more widely adopted, it is imperative to have the appropriate software infrastructure to enable scaled model training and experimentation, whether in the Cloud or at the Core. This presentation will discuss the challenges of training Deep Learning models and how the HPE Cray AI Development Environment provides a unified platform for large scale model development from Cloud to Core.
Hoang leads Solution Engineering for Determined AI as part of the Artificial Intelligence Strategy and Solutions group, where he helps customers who are building cutting-edge AI products across a wide range of industries dramatically accelerate their time to market by leveraging the HPE Cray AI Development Environment. Prior to Determined, Hoang was a Solution Architect developing machine learning solutions at Box.
Abstract: This presentation will discuss the key value elements of the Fujifilm Object Archive product, how archive strategy extends data management capabilities related to computational workflows, scientific research and digital preservation efforts. Fujifilm’s Object Archive software is designed to provide simple, secure, data access for data intensive tasks that require highly reliable, highly scalable and cost-effective solutions for maintaining long-term data. With its capabilities to provide on-premise chain-of-custody and air-gap data protection the Object Archive software will increase the security and availability of long term data. With industry standard S3 API compatibility, Object Archive supports a simple integration and wide array of 3rd party vendors. This presentation will demonstrate use cases and integration points with data management and storage products from iRODS, Starfish, Cloudian, NetApp and DataCore.
Bio: Head of Infrastructure Engineering and Data Management Solutions, Fujifilm Recording Media U.S.A., Inc. Chris has been working as a data management practitioner for over 26 years. Driving solutions for Oracle/SUN Microsystems, Spectra Logic, NOAA and Xerox. As an expert in data management solutions, he leads the charge for FUJIFILM’s next gen storage solutions. Chris Kehoe is an avid outdoor adventurer and resides in Colorado with his family.
Abstract:
In today's hybrid cloud reality, distributed data holds the key to unlocking value through new business and operational insights. However, data complexity and data silos are top barriers to gathering advanced insight into data using AI and analytics. Overcoming this is critical for improving data access, gaining new insight into data using AI and transforming business processes.
Addressing these challenges, IBM software-defined storage and the all-new IBM ESS3500 provide high performance data access anywhere and are designed to accelerate data delivery for AI and analytics workloads helping speed time to market with cloud-scalable performance and capacity. IBM ESS 3500 is engineered to help clients to accelerate data science, modernize and optimize application development, simplify and accelerate DevOps and optimize content repositories. IBM ESS 3500, enabled by Spectrum Scale, is designed to provide enterprise class security and data availability with a global namespace supporting the unification of data from multiple sources across core, edge, and cloud without the need to make additional copies of data.
Bio:
Sanjay is a Technical Sales Leader with business, and technical, experience in software-defined storage and Hybrid Cloud.
Leading a highly skilled team of technical experts in the Americas region and working as a leader on pre-sale and post-sale to gain customer satisfaction and trust. Build and execute technical strategy for Hybrid Cloud, Data Resilience, and Data & AI.
Sanjay is also an IBM Inventor with multiple US and International patents.
Abstract: Every interconnect challenges brings into question the benefits of aggregation and disaggregation, with PCIe Gen4 we are a unique phase in our data center, with so many different devices accessible on the same high-speed interconnect we can now take workload fitting into a PCIe device granularity. Many see this as the next revolution, going from Hyperconverged bricks into composable disaggregated grains of sand.
Bio:
Mr. Stevenson is Chief Architect Americas East for Liqid and delivers future roadmap discussions with key strategic enterprise customers and works with Liqid Engineering to ensure customer feedback is incorporated into future releases.
Prior to joining Liqid, Mr. Stevenson was a CTO at Isilon (EMC/Dell Division), Chief Strategy Officer at Hitachi (HDS Division), Managing Director at 451 Research and a Chief Technologist at Nielsen.
Mr. Stevenson has also held strategic engineering and management roles at Sun Microsystems, Ernst & Young LLP and Motorola and is unique for his work as a strategic end user, strategic industry analyst and as technology executive. Mr. Stevenson is also a honorable veteran of the USMC, having served with the Marine Wing Communication Squadron 48 during Desert Storm.
Mr. Stevenson holds a B.S. in Electrical Engineering and M.S. in Computer Science from Illinois Tech.
Abstract:
Today’s research computing institutions are faced with “a dazzling array of
new acceleration hardware” according to Dr. Wuerthwein of SDSC, from different types
of GPUs to FPGAs, ASICs, IPUs, DPUs, etc., and users may clamor for some of or all
these expensive resources for specialized workloads. And as Dr. Stanzione of TACC
noted at last week’s Hot Interconnect conference, “picking system configurations is
among our hardest and most important tasks”. Find out in this talk why both these
institutions have chosen GigaIO’s memory fabric to disaggregate and compose
resources for some of the most complex workloads in science.
Yet another layer of complexity for those universities in the Commonwealth
conducting research at the edge, in settings as diverse as archeology to agriculture to
manufacturing, is the challenge of gathering and processing the data at the edge. This
presentation will cover an innovative modular and AI-ready composable edge system
and architecture, with rack-based and portable systems that seamlessly interoperate
with a common software stack.
Today’s presentation will outline how a turnkey solution available from GigaIO, in
partnership with AMD and NVIDIA Bright Cluster Manager can address these
challenges. We will demonstrate how a low-latency memory fabric can be deployed to
disaggregate resources to solve the issues of incorporating that “dazzling array” of
new hardware as it becomes available, so your researchers can have access to and
share the latest in hardware acceleration, without provisioning each individually. The
same infrastructure can be used to both teach students and reconfigured on the fly to
serve the larger computational needs of researchers, saving budget, space, power and
cooling, and enabling scientists to get to results faster.
Bio:
A seasoned IT executive, Matt brings two decades of experience in sales and solutions architecting. Matt has built federal, healthcare, and education-based vertical solutions at companies like Dell, where he was a Senior Solutions Architect, and Pivot 3, where he led regional sales. Immediately prior to joining GigaIO, he served as Field CTO at Liqid.
Matt spent seven years in IT in the US Air Force and has a deep expertise in Federal IT procurement through his subsequent work as a Senior Consultant with Booz Allen Hamilton, and the partnerships he has built helping systems integrators win Federal contracts.
Matt holds a Bachelor’s degree in Information Technology from American InterContinental University, and an MBA from Concordia University Austin.
Abstract: Memory has always been one of the most prized and expensive resources in a HPC cluster just like the datacenter. Memory hasn’t changed much over the last 10 years. With speeds enhancing and now the invention of CXL, The way memory is addressed and used within the datacenter will have a dramatic shift. This discussion will address the details of the shift, new capabilities as well as changes needed to take advantage of the new capabilities.
Matt Demas is the CTO at GigaIO. He has been delivering composable disaggregated infrastructure to HPC customers for over 4 years. Matt has been at the edge of building solutions to revolution industries since serving as an enlisted Air Force technician at the Pentagon to his current role. He mission is to deliver cloud like capabilities from edge-to-core-to-cloud for HPC, AI and traditional datacenter workloads.
Session D & G: (1PM – 3PM)
NOTE: If you have interest and plan to attend this workshop you must contact Tony Elam (tony.elam@uky.edu) for further information.
Abstract: During this workshop participants will be walked through how CloudyCluster automatically creates an HPC Environment with Open OnDemand in Google Cloud using CloudyCluster. An overview will also be given of what the automation produces behind the scenes. Participants will be shown how to use Open OnDemand in a cloud environment and how to launch HPC jobs in the cloud. We will also review how CloudyCluster provides similar capabilities in AWS. Attendees can follow along or actively participate hands-on with an internet-accessible laptop with the chrome browser installed.
Boyd Wilson, CEO of Omnibond, formerly, Boyd served as Software CTO and also Executive Director over computing facilities and operations at Clemson University and had research computing as part of his purview. Prior to that he was Director of Computing for the Engineering School at Miami University. He has participated in various academic grants, papers, and workshops in the areas of parallel storage, cloud computing and computational infrastructure.
Boyd has extensive experience and leadership in the areas of software development, research infrastructure, security & identity, and transportation analytics. He has worked in development and product management for products sold by Novell, StorageTek, Sun, Oracle, and Omnibond.
Boyd’s current focus lies in 4 areas: first- computer vision, ai, and real-time traffic analytics with TrafficVision; second- high-performance computing and storage with CloudyCluster and OrangeFS; third- identity and security management; fourth - wake surfing when weather permits.
Abstract: Julia is an open-source programming language targeting scientific computing and data science with many abstractions built into the language and ecosystem. It has been gaining popularity since its first stable release in 2018 with the promise of being an approachable language that compiles to efficient native code via LLVM, while providing rich capabilities for interactive data analysis and visualization. This combination of easily accessible abstractions while not sacrificing performance makes Julia an excellent language to explore data analysis, statistics, and machine learning algorithms.
This tutorial will give a gentle introduction to the Julia language. We then cover important data science tools and techniques, using hands-on exercises in Jupyter notebooks covering: i) accessing and manipulating data (eg. using data frameworks and SQL); ii) exploring data using interactive data visualization tools, (eg. using Jupyter and Pluto.jl); iii) exploring statistics, automatic differentiation and machine learning algorithms (eg. using Zygote.jl and Flux.jl).
The objective of this tutorial is to provide the audience with enough hands-on experience in Julia to start applying it to their own projects, or use Julia as a teaching tool in the classroom.
Johannes Blaschke is a computer systems engineer at the National Energy Research Scientific Computing Center, where he is engaging with scientists to help them optimize their software for the next-generation supercomputers. His work focuses on enabling the real-time data analysis using extreme-scale computing environments.
Session D & G: (1PM – 3PM)
NOTE: If you have interest and plan to attend this workshop you must contact Tony Elam (tony.elam@uky.edu) for further information.
Are you doing exciting research or new educational programs in Data Science or Artificial Intelligence including but not limited to data mining, big data analytics, modeling and simulation, machine learning, deep learning, neural nets, large language models, generative AI, natural language processing, expert systems/decision support, modeling and simulation, etc. Are you one of the growing faculty expanding the use of data science and AI computational oriented research or education in new areas of business, agriculture, social sciences, communications, and the humanities? If so, we hope you will consider speaking at our Commonwealth Computational on October 16 & 17 (Optional). We are looking for speakers that are using data science and AI in their educational programs, general research or specific to medicine and healthcare! Please consider answering our call for participation. We are looking for lightning talks (10 minutes) and featured talks (30 minutes)! Should you have any questions, please contact Tony Elam (tony.elam@uky.edu).
Are you a graduate student or PostDoc in the Commonwealth of Kentucky (or in our region) doing research involving HPC, Data Science or AI such as Big Data Analytics, Data Mining, Complex Modeling and Simulation, Machine Learning or Deep Learning, Neural Nets, Natural Language Processing, Generative AI, Expert Systems/Decision Support, LLM, etc.? Would you like to present a poster at our Summit, expand your resume/vita and potentially win a trip to the National Supercomputing Conference in Atlanta Georgia in November or some other cool prize? If so, please register and submit a title and abstract for your poster.
Select Registration and follow instructions for Poster Competition. Following your submission, you will be contacted regarding any further needed details and poster session selection results. NOTE: Master students and postdocs are welcome to compete as well. (Note: Undergraduates doing appropriate research can potentially compete, but their faculty sponsor should contact Tony Elam (tony.elam@uky.edu) to request and discuss their participation.)
Following your submission, you will be contacted regarding any further needed details. Later, you will receive the poster selection results. If selected, you will be given further instructions and the judging criteria.
Important Dates:
Deadline for poster competition submission: Wednesday, October 9, 2024
Deadline for speaker abstract submission: Friday, September 27, 2024
Poster acceptance notification: Friday, October 11, 2024
Speaker selection: FRIDAY, October 4, 2024
Summit Industry Day & Education Session - Tuesday, October 15, 2024
Summit Keynotes & Research Session (non-medical) – Wednesday, October 16, 2024
(Setup of Posters will be that morning in the Gatton Student Center Ballroom area)
Address all questions to:
Tony Elam Office phone: (859) 257-2326.
Assc. Dir, CCS Cell phone: (713) 859-9860
University of Kentucky
Email: (tony.elam@uky.edu)