Skip to content
View jon-chun's full-sized avatar

Block or report jon-chun

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jon-chun/README.md

AI Digital Humanities

Jon A Chun
Co-Founder, Kenyon DH Colab

AI Digital Humanities


 

Contents (UPDATED July 2024, see www.jonachun.com for archive 2023 information)



Overview


Hello, my name is Jon Chun and I'm an interdisciplinary ML/AI researcher and educator. I focus on bridging traditional academic divisions, AI research, industry best practices, and related social topics like government regulation, ethics and entrepreneurship. My research centers on ML/AI approaches to language, narrative, emotion, cognition, and persuasion/deception using data science, statistical machine learning and deep learning including NLP, LLM, and LMM. I also work on eXplainable AI (XAI), fairness-accuracy-transparency-explainability (FATE), AI ethical auditing and AI regulation.

I'm a lifetime entrepreneur, intrapreneur, and innovator in diverse fields from network security and education to finance, insurance, and healthcare across environments ranging from hyper-efficient Silicon Valley startups to hidebound traditions in higher education. I've presented and published the first interdisciplinary AI research on storytelling and emotion at major conferences and journals like Narrative and the Modern Language Association (MLA). In 2016 I co-founded the world’s first human-centered AI curriculum to engage domain experts from every discipline from literature and music to political science to economics. To the best of our knowledge, we coined the term ‘AI Digital Humanities’ and have mentored over 300 ML/AI DH projects with approximately 60,000 downloads from top institutions worldwide as of October 2024. I'm a co-principal investigator for the US NIST AI Safety Institute representing the Modern Language Assocation and with the IBM-Notre Dame Tech Ethics Lab researching LLM prediction capabilities.

Previously, I co-founded the world’s largest privacy and anonymity website with investors including In-Q-Tel. I took over as CEO to pivot the company in the face of collapsing ad revenue, co-authored several patents on the first browser-based VPN appliance and successfully sold the Silicon Valley startup to Symantec. There, as a Director of Development, I oversaw the successful launch of our rebranded product. Before that, I served as CIO for the premier boutique return to work firm in Silicon Valley and co-founded and was CTO for two international startups in Latin America and Japan. In medical school, I was an American Heart Association research fellow and published on gene therapy and the first web-based electronic medical record system in the American Medical Informatics Journal. In grad school, I was the first US-based Japanese localization engineer for DELL and first engineer analyst of Japanese patents for the US semiconductor consortium SEMATECH. I also worked in financial reporting in Tokyo, the Lawrence Berkley Labs’ synchrotron facility (ALS), and Computer Associates serving the aerospace IT industry.

There isn’t a topic I’m not curious about although keeping up with AI is my primary obsession. It’s exciting to work at the epicenter of technology that mirros so many core human traits in a field that progresses weekly and is poised to terraform every sphere of humanity. I have enjoyed working on exceptionally driven, focused, curious and creative teams on high-impact projects. I like collaboration, creative engineering, making functional design beautiful, presentations, and sales. I speak English (native), Spanish (US Foreign Service Exam), Japanese (日本語能力試験), French (college) and would like a chance to relearn my forgotten Chinese (college). Former baseball, soccer, wrestling, robotics, and improv Destination Imagination coach.

Research


I created the open-source library SentimentArcs in 2019, at the time the largest ensemble for diachronic sentiment analysis and the basis for Katherine Elkins's “The Shapes of Stories” (Cambridge UP 2022). I presented some of the earliest GPT-2 story generation work at Narrative2020 and have since published in Cultural Analytics and Narrative on AI and narrative. I've mentored approximately three hundred computational Digital Humanities projects since 2017 across virtually every department of Kenyon College as part of the Integrated Program for Humane Studies and the Scientific Computing programs. I co-founded the AI Digital Humanities Colab, the world's first human-centered AI Digital Humanities curriculum at Kenyon, and our AI KDH research colab. I currently have research papers pending on using LLMs to compare multiple translations of Proust, multimodal (dialog+image) diachronic sentiment arcs in film, emotional hacking of LLM high-stakes decision-making, a novel benchmark on semantic similarity, IP infringement and creativity using Narrative theory, and an updated and expanding ethical audit of the leading LLMs. I'm also a co-author on an ICML position paper that was invited as an oral presentation this year in Vienna. My current research projects focus on AI persuasion, manipulation and deception as well as using LLMs for predictive analytics and decision-making on structured tabular data. (July 2024)

Recent Highlights

This research project targets a pivotal issue at the intersection of technology and ethics: surfacing how Large Language Models (LLMs) reason in high-stakes decision-making over humans. Our central challenge is enhancing the explainability and transparency of opaque black-box LLMs and our specific use-case is predicting recidivism—a real-world application that influences sentencing, bail, and early release decision. To the best of our knowledge, this is the first study to integrate and contrast three different sources of ethical decision: human, statistical machine learning (ML), and LLMs. Methodologically, we propose a novel framework that combines state-of-the-art (SOTA) qualitative analyses of LLMs with SOTA quantitative performance of traditional statistical ML models. Additionally, we compare these two approaches with documented predictions by human experts. This multi-model human-AI approach aims to surface both faulty predictions across all three as well as correlate patterns of both valid and faulty reasoning by LLMs. This configuration offers a more comprehensive evaluation of their performance, fairness, and reliability that is essential for building trust in LLMs. The anticipated outcomes of our project include a test pipeline to analyze and identify discrepancies and edge cases in both predictions and the reasoning behind them. This pipeline includes automated API scripts, an array of simple to complex prompt engineering strategies, and well as various statistical analyses and visualizations. The pipeline architecture will be designed to generalize to other use cases and accommodate future models and prompt strategies to provide maximal reuse for the AI safety community and future studies. This project not only seeks to advance the field of XAI but also to foster a deeper understanding of how AI can be aligned with ethical principles. By highlighting the intricacies of AI decision-making in a context fraught with moral implications, we underscore the urgent need for models that are not only technologically advanced but also ethically sound and transparent.

  • DESCRIPTION: Emotional hacking high-stakes AI decision-making models (accepted, under review)

ANONIMIZED ABSTRACT: As artificial intelligence becomes increasingly integrated into various technologies and decision-making processes, concerns about trust, safety, and potential manipulation of humans by AI systems are growing. This study, however, explores the reverse scenario: how humans might influence AI decision-making. The research examines the impact of prompt reframing and empathetic backstories on the ethical decision-making processes of advanced language models. A novel benchmark is introduced, enabling human-in-the-loop evaluations of how both confidence and compassion in AI ethical decision-making are affected by framing and empathy. This research represents a pioneering effort in understanding the bidirectional nature of human-AI influence in ethical contexts.

As a powerful and rapidly advancing dual-use technology, AI offers both immense benefits and worrisome risks. In response, governing bodies around the world are developing a range of regulatory AI laws and policies. This paper compares three distinct approaches taken by the EU, China and the US. Within the US, we explore AI regulation at both the federal and state level, with a focus on California's pending Senate Bill 1047. Each regulatory system reflects distinct cultural, political and economic perspectives. Each also highlights differing regional perspectives on regulatory risk-benefit tradeoffs, with divergent judgments on the balance between safety versus innovation and cooperation versus competition. Finally, differences between regulatory frameworks reflect contrastive stances in regards to trust in centralized authority versus trust in a more decentralized free market of self-interested stakeholders. Taken together, these varied approaches to AI innovation and regulation influence each other, the broader international community, and the future of AI regulation.

Stories are central for interpreting experiences, communicating and influencing each other via films, medical, media, and other narratives. Quantifying the similarity between stories has numerous applications including detecting IP infringement, detecting hallucinations, search/recommendation engines, and guiding human-AI collaborations. Despite this, traditional NLP text similarity metrics are limited to short text distance metrics like n-gram overlaps and embeddings. Larger texts require preprocessing with significant information loss through paraphrasing or multi-step decomposition. This paper introduces AIStorySimiliarity, a novel benchmark to measure the semantic distance between long-text stories based on core structural elements drawn from narrative theory and script writing. Based on four narrative elements (characters, plot, setting, and themes) as well 31 sub-features within these, we use a SOTA LLM (gpt-3.5-turbo) to extract and evaluate the semantic similarity of of diverse set of major Hollywood movies. In addition, we compare human evaluation with story similarity scores computed three ways: extracting elements from film scripts before evaluation (Elements), directly evaluating entire scripts (Scripts), and extracting narrative elements from the parametric memory of SOTA LLMs without any provided scripts (GenAI). To the best of our knowledge, AIStorySimilarity is the first benchmark to measure long-text story similarity using a comprehensive approach to narrative theory. Code and data are available at https://github.com/anon.

Affective artificial intelligence and multimodal sentiment analysis play critical roles in designing safe and effective human-computer interactions and are in diverse applications ranging from social chatbots to eldercare robots. However emotionally intelligent artificial intelligence can also manipulate, persuade, and otherwise compromise human autonomy. We face a constant stream of ever more capable models that can better understand nuanced, complex, and interrelated sentiments across different modalities including text, vision, and speech. This paper introduces MultiSentimentArcs, combination of an open and extensible multimodal sentiment analysis framework, a challenging movie dataset, and a novel benchmark. This enables the quantitative and qualitative identification, comparison, and prioritization of conflicting sentiments commonly arising from different models and modalities. Diachronic multimodal sentiment analysis is especially challenging in film narratives where actors, directors, cinematographers and editors use dialog, characters, and other elements in contradiction with each other to accentuate dramatic tension. MultiSentimentArcs uses local open-source software models to democratize artificial intelligence. We demonstrate how a simple 2-step pipeline of specialized open-source software with a large multimodal model followed by a large language model can approximate video sentiment analysis of a commercial state-of-the-art Claude 3 Opus. To the best of our knowledge, MultiSentimentArcs is the first fully open-source diachronic multimodal sentiment analysis framework, dataset, and benchmark to enable automatic or human-in-the-loop exploration, analysis, and critique of multimodal sentiment analysis on long-form narratives. We demonstrate two novel coherence metrics and a methodology to identify, quantify, and explain real-world sentiment models and modalities. MultiSentimentArcs integrates artificial intelligence with traditional narrative studies and related fields like film, linguistic and cultural studies. It also contributes to eXplainable artificial intelligence and artificial intelligence safety by enhancing artificial intelligence transparency in surfacing emotional persuasion, manipulation, and deception techniques. Finally, it can filter noisy emotional input and prioritize information rich channels to build more performant real-world human computer interface applications in fields like e-learning and medicine. This research contributes to the field of Digital Humanities by giving non-artificial intelligence experts access to directly engage in analysis and critique of research around affective artificial intelligence and human-AI alignment. Code and non-copyrighted data will be available at https://github.com/jon-chun/multisentimentarcs.

Machine translation metrics often fall short in capturing the challenges of literary translation in which translators play a creative role. Large Language Models (LLMs) like GPT4o and Mistral offer new approaches to assessing how well a translation mirrors the reading experience from one language to another. Our case study focuses on the first volume of Marcel Proust's “A la recherche du temps perdu,” a work known for its lively translation debates. We use stylometry and emotional arc leveraging the newest multilingual generative AI models to evaluate loss in translation according to different translation theories. AI analysis reveals previously undertheorized aspects of translation. Notably, we uncover changes in authorial style and the evolution of sentiment language over time. Our study demonstrates that AI-driven approaches leveraging advanced LLMs yield new perspectives on literary translation assessment. These methods offer insight into the creative choices made by translators and open up new avenues for understanding the complexities of translating literary works.

In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.

  • "Informed AI Regulation: Comparing the Ethical Frameworks of Leading LLM Chatbots Using an Ethics-Based Audit to Assess Moral Reasoning and Normative Values"
  • ArXiv.org (Jan 9, 2024)

With the rise of individual and collaborative networks of autonomous agents, AI is deployed in more key reasoning and decision-making roles. For this reason, ethics-based audits play a pivotal role in the rapidly growing fields of AI safety and regulation. This paper undertakes an ethics-based audit to probe the 8 leading commercial and open-source Large Language Models including GPT-4. We assess explicability and trustworthiness by a) establishing how well different models engage in moral reasoning and b) comparing normative values underlying models as ethical frameworks. We employ an experimental, evidence-based approach that challenges the models with ethical dilemmas in order to probe human-AI alignment. The ethical scenarios are designed to require a decision in which the particulars of the situation may or may not necessitate deviating from normative ethical principles. A sophisticated ethical framework was consistently elicited in one model, GPT-4. Nonetheless, troubling findings include underlying normative frameworks with clear bias towards particular cultural norms. Many models also exhibit disturbing authoritarian tendencies. Code is available at https://github.com/jonchun/llm-sota-chatbots-ethics-based-audit.

  • "eXplainable AI with GPT4 for story analysis and generation: A novel framework for diachronic sentiment analysis"
  • Springer International Journal of Digital Humanities 5, 507–532 (2023). https://doi.org/10.1007/s42803-023-00069-8 (Oct 11, 2023)

The recent development of Transformers and large language models (LLMs) offer unique opportunities to work with natural language. They bring a degree of understanding and fluidity far surpassing previous language models, and they are rapidly progressing. They excel at representing and interpreting ideas and experiences that involve complex and subtle language and are therefore ideal for Computational Digital Humanities research. This paper briefly surveys how XAI can be used to augment two Computational Digital Humanities research areas relying on LLMs: (a) diachronic text sentiment analysis and (b) narrative generation. We also introduce a novel XAI greybox ensemble for diachronic sentiment analysis generalizable to any AI classification data points within a structured time series. Under human-in-the-loop supervision (HITL), this greybox ensemble combines the high performance of SOTA blackbox models like gpt-4–0613 with the interpretability, efficiency, and privacy-preserving nature of whitebox models. Two new local (EPC) and global (ECC) metrics enable multi-scale XAI at both the local and global levels. This greybox ensemble framework extends the SentimentArcs framework with OpenAI’s latest GPT models, new metrics and a modified supervisory HITL workflow released as open source software at https://github.com/jon-chun/SentimentArcs-Greybox.

This article outlines what a successful artificial intelligence digital humanities (AI DH) curriculum entails and why it is so critical now. Artificial intelligence is rapidly reshaping our world and is poised to exacerbate long-standing crises including (1) the crisis of higher education and the humanities, (2) the lack of diversity, equity and inclusion (DEI) in computer science and technology fields and (3) the wider social and economic crises facilitated by new technologies. We outline a number of ways in which an AI DH curriculum offers concrete and impactful responses to these many crises. AI DH yields meaningful new avenues of research for the humanities and the humanistic social sciences, and offers new ways that higher education can better prepare students for the world into which they graduate. DEI metrics show how an AI DH curriculum can engage students traditionally underserved by conventional STEM courses. Finally, AI DH educates all students for civic engagement in order to address both the social and economic impacts of emerging AI technologies. This article provides an overview of an AI DH curriculum, the motivating theory behind design decisions, and a detailed look into two sample courses.

SOTA Transformer and DNN short text sentiment classifiers report over 97% accuracy on narrow domains like IMDB movie reviews. Real-world performance is significantly lower because traditional models overfit benchmarks and generalize poorly to different or more open domain texts. This paper introduces SentimentArcs, a new self-supervised time series sentiment analysis methodology that addresses the two main limitations of traditional supervised sentiment analysis: limited labeled training datasets and poor generalization. A large ensemble of diverse models provides a synthetic ground truth for self-supervised learning. Novel metrics jointly optimize an exhaustive search across every possible corpus:model combination. The joint optimization over both the corpus and model solves the generalization problem. Simple visualizations exploit the temporal structure in narratives so domain experts can quickly spot trends, identify key features, and note anomalies over hundreds of arcs and millions of data points. To our knowledge, this is the first self-supervised method for time series sentiment analysis and the largest survey directly comparing real-world model performance on long-form narratives.

  • Chun, Jon. AI Improv DivaBot in collaboration with Katherine Elkins, James Dennen (Denison University and Wexner Arts), Lauren Katz (Thymele Arts, LA), 100th anniversary of the premiere of “R.U.R.,” by Czechoslovakian playwright Karel Capek. “R.U.R.” (for “Rossum’s Universal Robots”) opened on January 25th, 1921, at the National Theater of Prague and marks the first use of the word “robot,” coined by Capek and derived from the Czech word for “forced labor.”, 25 Jan 2021

  • Elkins, Katherine, and Jon Chun. "Can GPT-3 pass a Writer’s Turing Test?." Journal of Cultural Analytics 5, no. 2 (2020): 17212.

Until recently the field of natural language generation relied upon formalized grammar systems, small-scale statistical models, and lengthy sets of heuristic rules. This older technology was fairly limited and brittle: it could remix language into word salad poems or chat with humans within narrowly defined topics. Recently, very large-scale statistical language models have dramatically advanced the field, and GPT-3 is just one example. It can internalize the rules of language without explicit programming or rules. Instead, much like a human child, GPT-3 learns language through repeated exposure, albeit on a much larger scale. Without explicit rules, it can sometimes fail at the simplest of linguistic tasks, but it can also excel at more difficult ones like imitating an author or waxing philosophical.



Trulli

SentimentArcs is the open-source code for
The Shapes of Stories
by Katherine Elkins
(Cambridge Press, Aug 2022)

Sentiment analysis has gained widespread adoption in many fields, but not―until now―in literary studies. Scholars have lacked a robust methodology that adapts the tool to the skills and questions central to literary scholars. Also lacking has been quantitative data to help the scholar choose between the many models. Which model is best for which narrative, and why? By comparing over three dozen models, including the latest Deep Learning AI, the author details how to choose the correct model―or set of models―depending on the unique affective fingerprint of a narrative. The author also demonstrates how to combine a clustered close reading of textual cruxes in order to interpret a narrative. By analyzing a diverse and cross-cultural range of texts in a series of case studies, the Element highlights new insights into the many shapes of stories.

Back to Top



Innovation in Higher Ed


I creatively apply the best of industry practices and state-of-the-art AI/ML techniques on interesting and high-impact interdisciplinary research. The combination of AI/ML, math/statistics and a diversity of domain expertise provides fresh insights and countless new paths of discovery.

I've also long been interested in bringing diverse voices to urgent debates surrounding technology’s growing impact on society. Our AI Digital Humanities computing curriculum has succeeded in attracting a majority female (61%), non-STEM (91%) and Under-Represented Minorities (11% Hispanic, 13% Black) as of 2022. Enrollments have steadily grown to become one of the most popular courses on campus. Both our research and that of our students have seen exponential growth in terms of citations and thousands of visits from top academic institutions around the world.

Over most of the last decade, I have been developing a new human-first approach to teaching computation grounded in ML, AI and Data Science with real-world applications inseparable from ethics. One challenge was to bridge the STEM and non-STEM divide. Another challenge was harmonizing the rigorous specialization of academia with practical, interdisciplinary and generalizable real-world solutions. The final challenge was to bootstrap an entirely new AI Digital Humanities computing curriculum without a budget, support staff, or academic credit toward any major/minor.

Over the first 6 years, our foundational course has become one of the most popular on campus. Both our professors' and students' research have been published in top journals, presented at leading conferences and have been read by thousands from top universities and research centers around the world. Both founders of our program have been involved in several organizations beyond Kenyon dedicated to AI, Ethics and innovating CS Education.

Philosophically, my goal is to cultivate in students a technologically informed worldview grounded in universal humanistic values. This integrated worldview is designed to intimately align the core strengths of traditional education with more ethical, practical and beneficial uses of technology for all.



Back to Top



Diversity from A Human-Centered AI Curriculum




Trulli

UPDATE: Progress on UMR Diversity

Fall 2022 IPHS 200 Programming Humanity (estimate)
Category Count Percent
Male 41 53%
Female 36 47%
TOTAL 78 100%
  • 13% African-American (10)



Trulli

Progress on Gender Diversity in AI Digital Humanities curriculum since
the 2017-2018 academic year
(61% female as of Spring 2022)



At Kenyon College, I co-founded the world’s first human-centric AI curriculum. I am the sole technical advisor and the primary collaborative content creator. Over the last six years of teaching this curriculum, we have achieved the following milestones:

Research: Published research in top publications and conferences (Cambridge UP, Narrative, Journal of Cultural Analytics, etc.) with clear growth in citations.

AI Digital Humanities/DH Colab Research: Organically grew (no marketing/PR) to ~15k hits from top universities worldwide (#4 CMU, #5 Berkeley, #6 Stanford, #7 Columbia, #9 NYU, #16 Princeton, #22 Oxford, #23 MIT, #25 Cambridge, etc.)

Diversity:

  • Female Grew from 18% to 61% between 2017-2021
  • Hispanic participation rates are often at or above college averages
  • Black 13% (Fall 2022 estimate above)
  • Non-STEM Our classes are ~90% non-STEM from across nearly all departments, enfranchising many students who may otherwise feel alienated by traditional CS programs
  • 100% Pass rate (Quality of student work independently confirmed by success of their research archive at digital.kenyon.edu/dh)
  • 0% Drop rate

Enrollment: Experienced enrollment growth from 20 to 120 between 2017-2022 becoming one of the largest classes at Kenyon as an elective with no credit toward the traditional STEM computing major/minor

Budget: With no budget or antecedent, innovated from scratch a globally recognized computational DH Colab research center and AI Digital Humanities. This includes no funds for hardware, software, cloud computing, support staff or other common expenses. This is achieved thru continual strategic planning, careful curation and testing fully open-source, robust, best-of-breed and/or freely available resources informed by decades of experience in industry.

Our interdisciplinary AI DH research has been published in top presses, journals, and conferences. We have also mentored hundreds of ML/AI DH projects that synthesize Artificial Intelligence with literature, history, political science, art, dance, music, law, medicine, economics and more. Various sample AI DH projects are given at the bottom of this page.

Timeline

  • 1992-99: The Integrated Program for Humane Studies (IPHS, the oldest interdisciplinary program at Kenyon) established a computer lab in Timberlake House for DH scholarship under Director Michael Brint
  • 2002 Jul: Katherine Elkins joined Kenyon and began mentoring traditional Digital Humanities projects (e.g. critiques of technology, websites, media, etc.) in the IPHS program
  • 2003 May: Launched product Symantec Clientless VPN appliance as Director of Development and relocated from Silicon Valley
  • 2005 Mar: Proposed new humanity-centered AI Digital Humanities curriculum in conjunction with a multi-million Ewing Marion Kauffman Foundation grant
  • 2015 Aug: Formulated detailed interdisciplinary AI Digital Humanities curriculum after years of research and training
  • 2017 Mar: Lead DH Kenyon Team at the HackOH5 Hackathon to explore challenges and opportunities in implementing computational Digital Humanities and effecting collaboration across disciplines
  • 2017 Aug: Kenyon supports the first 'Programming Humanity' course co-taught with a Humanities and Comparative Literature professor.
  • 2018 Aug: Kenyon adds first 'AI for the Humanities' course with a differentiated approach to GOFAI/ML through DNN, RL, and GA
  • 2018 Aug: Katherine Elkins awarded a multi-year National Endowment of the Humanities Distinguished Professorship to continue developing a campus-wide Digital Humanities program to include every interested department
  • 2022 Jan: Collaboration with Scientific Computing program at Kenyon mentoring several majors on interdisciplinary research
  • 2022 Aug: Kenyon offers first computational 'Cultural Analytics' DH methodology course for Social Sciences and Humanities
  • 2022 Aug: First collaboration with local industry via 'Industrial IoT Independent Study' targeting technical reference implementation and strategic whitepaper



Trulli
Kenyon College's
The National Endowment for the Humanities Professorship



Our AI research and DHColab were collaboratively developed, and the curriculum is currently co-taught by a technology expert (Jon Chun) and an accomplished academic (Katherine Elkins). Both have broad experiences, publications, and interests transcending traditional domain boundaries. Support was provided with a 3-year National Endowment for the Humanities (NEH) appointment described here.



Trulli
Collaborator Katherine Elkins work as
Kenyon College's National Endowment for the Humanities Professorship



Trulli
A Humanity-First approach to AI Digital Humanities
consistently attracts over 90% non-STEM majors
(Kenyon College Institutional Research)



Back to Top



Code, Products and Patents




Trulli
Block Diagram for
SentimentArcs Notebooks

Stories are everywhere. Here are a few examples of original research projects using SentimentArcs to extract and analyze narrative emotional arcs in:



Royal Wedding (1951) 10% SMA Plot
Multimodal SentimentArcs: Royal Wedding (1951) Video 10% SMA Plot (2024)



Royal Wedding (1951) 10% SMA Plot
Multimodal SentimentArcs: Royal Wedding (1951) Transcript 10% SMA Plot (2024)



Royal Wedding (1951) KDE Plot
Multimodal SentimentArcs: Royal Wedding (1951) KDE Plot (2024)

Back to Top



Kenyon AI Digital Humanities


Trulli
Top 10 Institutions reading our AI DH Research in 2022
digital.kenyon.edu/dh





Trulli
Leading Institutions reading our AI DH Research in 2022
digital.kenyon.edu/dh



Trulli
Eurasian Institutions
digital.kenyon.edu/dh



Trulli
Institutions from The Americas
digital.kenyon.edu/dh



Trulli
Countries Worldwide
digital.kenyon.edu/dh



Trulli
Institutions Worldwide (2023 May)
digital.kenyon.edu/dh



images\kenyon_dh_analytics_institutions_1958.png

Back to Top



Social Media




Trulli
@jonchun2000
Main Social Media Account





Back to Top



Mentored Research


Trulli
Brainstorming to translate new theories into testable models for (a) Literary Analysis, (b) Financial Forensics and (c) the Latent Space of Generative Art Prompts.



Integrated Program for Humane Studies (2017-)



Back to Top



Course Descriptions


Trulli
The virtuous cycle, feedback and tension between
the 3 models that guide our interdisciplinary innovation



Integrated Program for Humane Studies (2017-)

OVERVIEW:

This upper-division course offers an in-depth exploration of advanced AI concepts, focusing on interdisciplinary applications of large language models, AI information systems, and autonomous agents. Over 15 weeks, students will engage with a progressive curriculum, starting with a review of Python and a series of four hands-on projects: (a) OpenAI API programming a GPT-based chatbot, (b) mechanistic interpretations of transformer internals using Huggingface Transformers, (c) Retrieval-Augmented Generation (RAG) using LangChain, and (d) simulations of autonomous multi-agent systems using AutoGen. The course includes four substantive subprojects and one final project, enabling students to apply theoretical knowledge to practical, real-world AI challenges. This course is designed to equip students with the skills and knowledge necessary to innovate in the rapidly evolving field of artificial intelligence, emphasizing both technical proficiency and ethical considerations. Introductory Python programming experience required.

NOTE: These 4 broad frontiers of AI research are rapidly evolving and based upon my AI research and industry consulting with Meta, IBM, the Whitehouse/NIST AI Safety Institute, etc. There is a constant flow of major new AI research, libraries, frameworks and startups nearly every week. Since this course will begin 9 months after this syllabus was written, expect updates to reflect the most recent in AI research breakthroughs, tooling, and industry best practices as of August 2024. Nonetheless, the class will be structured around these 4 broad and relatively consistent universal areas in AI.



Scientific Computing Mentored Projects(2020-)

  • SciComp Senior Seminar/Research
    • Noisy Time Series Filtering, Smoothing and Feature Detection
    • Narrative Metrics for NLG using LLM Transformers
    • Diachronic Sentiment Analysis Central Bank Speeches using SentimentArcs
  • SciComp Independent Study



Back to Top



Organizations


  • US NIST AI Safety Institute Consortium
    • *Principle Investigator (2024-) for the Modern Language Association
    • Announcement: For over 100 years, the MLA has become the principle organization of scholars in language and literature with over 25,000 members in over 100 countries. The MLA is joining more than 200 of the nation’s leading artificial intelligence (AI) stakeholders to participate in a US Department of Commerce initiative to support the development and deployment of trustworthy and safe AI. Established by the Department of Commerce’s National Institute of Standards and Technology (NIST) on 8 February 2024, the US AI Safety Institute Consortium (AISIC) brings together AI creators and users, academics, government and industry researchers, and civil society organizations to meet this mission.The MLA-sponsored team will be led by Katherine Elkins and Jon Chun at Kenyon College. The team will evaluate model capabilities with a special focus on linguistic edge cases and ethical frameworks.

The AISIC includes companies and organizations on the front lines of developing and using AI systems as well as the civil society and academic teams building the foundational understanding AI’s potential to transform our society. Consortium members represent the nation’s largest companies and innovative startups; creators of the world’s most advanced AI systems and hardware; representatives of professions with deep engagement in AI’s use today; state and local governments; and nonprofits. The consortium will also work with organizations from other nations in order to establish interoperable and effective safety around the world.

  • Human Centered AI Lab.Org

    • *Cofounder (2023-)
    • About: Our mission is to facilitate efficient collaboration on interdisciplinary AI between individual researchers and domain experts separated by geographic, organizational, doctrinal, and legal divisions. We focus on human-centered AI topics like safety, bias, explainability, ethics, and policy grounded in careful experimentation and expert interpretation. Our goal is to enable fast, focused and flexible research funding and collaboration overlooked by traditional institutional research structures. We are a purely volunteer and wholly virtual non-profit corporation conducting human-centered AI research and education in the public interest.
  • The Helix Center, NY, NY

    • Executive Committee (2022-)
    • Round-Table: Living in Difficult Times, Nov 19, 2022
    • About: The original inspiration for interdisciplinary forums arose from the observations by our director, Dr. Edward Nersessian, of the constraints in both communication and creativity among scientists at professional meetings, fueled both by narrow specialization and the grant process, that with its demand for sharply defined investigation seemed, in fact, to be limiting curiosity and inquiry. This motivated him to form discussion groups drawing on multiple disciplines, the creative productivity of which inspired the formation of the Philoctetes Center for the Multidisciplinary Study of the Imagination.
    • Mission: The primary mission of The Helix Center is to draw together leaders from distinct spheres of knowledge in the arts, humanities, sciences, and technology for interdisciplinary roundtables, the unique format of which potentiates new ideas, new questions, and facilitates emergent creative qualities of mind less possible in conventional collaborations. Such a drawing together of leaders of various disciplines irrespective of their academic affiliation allows the Helix Center to function as a kind of university without walls. In addition, through audience attendance and its Q&A engagement with the roundtable participants, and live streamed and archived events, we aim to expand public understanding and appreciation of the sciences and technology, the arts and humanities.



Back to Top

Trulli
Kenyon DHColab
(Kenyon AI Digital Humanities Colab)

Pinned Loading

  1. sentimentarcs_notebooks sentimentarcs_notebooks Public

    SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

    Jupyter Notebook 35 8

  2. iiot-time-series-prediction-system iiot-time-series-prediction-system Public

    An End-to-End Industrial IoT Time Series Prediction System

    Python

  3. cultural-analytics.github.io cultural-analytics.github.io Public

    Computational Cultural Analytics Course at Kenyon College

    1