publications
2022
- Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political IdentitySimmons, Gabriel2022
Large Language Models (LLMs) have demonstrated impressive capability in generating fluent text. LLMs have also shown a tendency to reproduce social biases such as stereotypical associations between gender and occupation. Like race and gender, morality is an important social variable. This work investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability I refer to as "moral mimicry". I explore this hypothesis in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, I show that these LLMs are indeed "moral mimics". When prompted with a "liberal" or "conservative" political identity, the models generate text reflecting the moral biases associated with these groups. I investigate how moral mimicry relates to model scale. I hope that this work encourages further investigation of the moral mimicry capability, including how to leverage it for social good and minimize its risks.
2020
- CDNIdentification of Differential, Health-Related Compounds in Chardonnay Marc through Network-Based Meta-AnalysisSimmons, Gabriel, Lee, Fanny, Kim, Minseung, Holt, Roberta, and Tagkopoulos, IliasCurrent Developments in Nutrition 2020
Food/residue waste streams may be a significant source of bioactive compounds that benefit human health. Dietary intervention trials demonstrate the health benefits of such residues, but they are resource and time intensive. Bioinformatics meta-analyses can elucidate putative pathways, genes and chemicals that are relevant to human health, hence guiding further experimentation and intervention trials. To this end, we integrated publicly available phytochemical datasets related to general grape marc from different varieties (GM) and Chardonnay grape marc (CM) to investigate their differences and potential implications to human health through a network-based meta-analysis.To characterize the phytochemical profile of grape marc, compositional data was aggregated from publicly available literature. To identify potential health effects based on this chemical information, associations between disease states and the chemical profiles of GM/CM were extracted from the Comparative Toxicogenomics Database (CTD). Disease associative networks were constructed for a) marc products, b) all marc-related phenolics, c) compounds that are differentially abundant in CM.The union of available marc composition datasets from 14 articles contained 66 phenolic compounds; 29 of these were associated with at least 1 disease state in the CTD. There were 5 differentially over-abundant compounds in CM versus other grape marcs (red varietals n = 75, white varietals n = 57). These were flavan-3-ols catechin, epicatechin, epigallocatechin, gallocatechin, and proanthocyanidin C1 (P \< 0.001); with gallocatechin unique to CM. Studies investigating marc products indicated associations to 15 diseases. CTD evidence from 934 studies associated the phenolic profile of GM to 358 diseases of 34 disease classes. Network-based meta-analysis suggested associations between GM and CM phenolics and several disease targets. This includes confirmatory associations between flavan-3-ols and cardiovascular disease outcomes.Chardonnay marc is not widely studied; however, the developed framework of network-based meta-analysis utilizing composition information provides a holistic view of the knowledge space for grape marc, and highlights suggested health effects that can guide future research programs.Sonomaceuticals, LLC.
2019
- NutrientsNutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with LactoseChin, Elizabeth L., Simmons, Gabriel, Bouzid, Yasmine Y., Kan, Annie, Burnett, Dustin J., Tagkopoulos, Ilias, and Lemay, Danielle G.Nutrients 2019
The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients (“Nutrient-Only”) or the nutrient and food descriptions (“Nutrient + Text”). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24.