I am a Computer Science major at Syracuse University. I have interest in programming applications in relation to data and am currently delving into the world of front-end development. I am also an Army ROTC Cadet and a soldier in the New York Army National Guard. I am currently honing my leadership and teamwork abilities, with the ultimate goal of one day serving as a Cyber Warfare Officer in the Army National Guard.
I love soccer (Liverpool is the best), chasing new running PRs, spending time outside, and exploring the world.
Research
YouTube Comment Bot Detector DARPA
In March of 2022 during the onset of the Russian-Ukraine War, a video of the President of Ukraine, Volodymyr Zelenskyy surfaced. In the video, Zelenskyy calls for the immediate laydown of weapons by Ukrainian soldiers. However, this video was nothing more than a hoax, perpetuated by artificial intelligence (AI) deepfakes. The prevalence of disinformation facilitated by AI is a growing concern, posing a threat to not only individuals, but organizations, countries, and arguably civilization itself. However, the Defense Advanced Research Projects Agency (DARPA) is working towards solutions to detect and combat such threats. DARPA's Semantic Forensics program aims to develop semantic technologies that can detect and assess falsified media, including images, text, audio and video.
Under the mentorship of research professor Dr. Jason Davis, the SemaFor team at Syracuse University is a specialized research group dedicated to collecting, cleaning, and labeling data for training AI models as part of DARPA’s broader SemaFor initiative. This work is supported by funding from the Syracuse Office of Undergraduate Research and Creative Engagement (SOURCE).
Bots and fake comments have become a huge problem on many social media platforms. It has become increasingly difficult for the average person to distinguish real comments from fake. While some bots merely serve to manipulate an algorithm for more outreach, some are instead being used for the purpose of scamming people out of their hard-earned money. In the image below, bots emulate a real conversation with one eventually referring to a fake financial advisor. An unsuspecting victim would then search the advisor on the internet, and eventually find a web-page where they can then get in contact with the scammer.
I collected over 10,000 comments from different YouTube financial videos (where scam comments are most prevalent) using a Python script and YouTube’s Cloud API. I then labeled each YouTube comment conversation as malicious or not. To do so, I utilized the following parameters to classify the data:
- The comment must be part of a conversation thread of at least 3 comments.
- The conversation must have a comment that refers to a "financial advisor" which includes a name.
- The entire conversation must take place within one week of the first comment.
Analyzing the dataset of 10,000 comments, roughly 300 were classified as bot comments. While this only accounts for 3.41% of our entire dataset, bot comments usually like each others comments to manipulate the YouTube algorithm, and generate higher search queries. Essentially, bot comments almost always have hundreds and even thousands of likes, bringing them to the top of YouTube comment sections, making it easier for people to spot them.
Furthermore, by analyzing average comment length, it is clear that bot comments on average are roughly 50 words longer than a typical comment. This is because bot comments try to emulate a real conversation compared to simply making a brief statement on a YouTube video comment section. This is a key factor that the model will use to classify bot comments from regular comments.
Another key factor, and arguably the most important factor that plays into classification will be the use of names in the comment. The main part of the scam is getting a real person to search up the fictional financial advisor, where the goal is to lure the victim into sending money. Generative AI models use tokenization, the break-up of long sentences into words, to score a comment's validity. Here, the model would identify “Grace Adams Cook” as a name and use that as a clue that it is likely a bot comment.
My current model is a convolutional neural network. The model is currently in its early stages, utilizing a feedforward architecture trained on TF-IDF vectorized text data to classify bot-generated comments. The model consists of an input layer with 186 features, corresponding to the most relevant words from the dataset, followed by four hidden layers with 500 neurons each, using ReLU activation to introduce non-linearity. The final output layer applies LogSoftmax activation for binary classification (bot or not bot). Training is conducted using the Negative Log-Likelihood Loss (NLLLoss) function, optimized with Adam optimizer (learning rate = 0.01) to adjust weights via gradient descent. The dataset undergoes preprocessing with NLTK, including stopword removal and stemming, ensuring clean data for efficient training. While the current model focuses solely on textual features, future iterations will incorporate like count, reply count, and comment timestamps as additional parameters, as bots often manipulate engagement metrics to gain visibility.
Eventually, I plan to explore alternative machine learning techniques, such as Random Forest Classifiers, Decision Tree Classifiers, and Transformer-based architecture models, to evaluate their performance against our neural network in bot detection. By integrating these techniques and refining our training pipeline, I aim to improve classification accuracy and develop a more robust model capable of detecting evolving bot behaviors.
Ultimately, the goal is to create a model that can successfully distinguish phishing comments on YouTube. By implementing technologies such as this, we can minimize the chances of people falling victim to scams. Though our project is still developing, one day we hope our technology can be utilized by social media platforms such as YouTube to create a safer internet for all.
Check out some repositories here:
Personal Projects

Analyzing Syracuse Municipal Complaint Resolution CuseHacks Datathon
This project analyzed municipal complaint resolution efficiency in Syracuse, leveraging data visualization and predictive modeling.
- Recognized for Best Presentation at CuseHacks 2025 for delivering compelling visual storytelling and actionable municipal insights.
- Used Python, Pandas DataFrame, and Seaborn to clean, categorize, and visualize large-scale city complaint reports, revealing key trends in departmental performance and resolution efficiency.
- Leveraged Scikit-Learn (sklearn) with Linear Regression to predict complaint resolution times and evaluate how different city departments improve over time.

Data2LinearRegression CuseHacks Hackathon
The project's goal is provide a web interface where a user can upload a CSV file, store its data in a SQLite database, and create a linear regression model based off of two provided variables.
- Developed a web-based application utilizing Python, Flask, and SQLite that allows users to upload CSV files, organize the data into SQL tables and generate linear regression models for data analysis.
- Implemented a front-end interface with HTML, CSS, and JavaScript, featuring drag and drop functionality, input validation for assigning graph variables, and regression plot visualization, ensuring a user-friendly experience.
- Integrated Matplotlib and Seaborn to generate real-time regression plots for visual analysis.

Chess Personal
The goal of this project is to create a modified verison of chess that can be played in the terminal.
- Designed and implemented a functional modified version of chess, utilizing algorithms to track game state, and maintain accurate board consistency.
- Utilized object-oriented design principles to develop chess piece classes (Pawn, Rook, etc.) with specific movement rules and validations.
Coursework
CIS 321: Introduction to Probability & Statistics
Programming-oriented introduction to fundamentals in statistics and probability; elementary statistics, graphical and numerical representation; probability distributions; tests and confidence intervals; regression, and correlation.
CIS 341: Computer Organization & Programming Systems
Digital logic, data types and their representations, instruction set architecture, assembly language, program construction, processors, memory hierarchy, traps and interrupts, privilege and security, input-output subsystems.
CIS 352: Programming Language: Theory & Practice
Environments, stores, scoping, functional and imperative languages, modules, classes, data encapsulation, types, and polymorphism. Implementation of these constructs in a definitional interpreter.
CSE 384: Systems and Network Programming
Unix programming and shell scripting for systems and network software. Makefiles, compilers, linkers, debuggers, software with multiple source files. Dynamic memory allocation, system calls, C programming, pointers, concurrent/parallel programming, defensive programming techniques, network programming.
CIS 375: Introduction to Discrete Mathematics
Basic set theory and symbolic logic. Methods of proof, including mathematical induction. Relations, partitions, partial orders, functions, and graphs.
CIS 351: Data Structures
Abstract data structures including arrays, lists, trees, binary search trees, priority queues, graphs. Algorithm analysis. Examples include data structures used for security-related applications.
CIS 252: Elements of Computer Science
Introduction to key computer-science concepts through functional programming. Recursion, data representation, data abstraction, and computational patterns. Algebraic data types and higher-order functions. Models of computation.
PHI 251: Logic
Logic as a formal language, as a component of natural language, and as a basis of a programming language. Varieties of logical systems and techniques. Syntax, semantics and pragmatics.