Project Group Formation

Group formation due <2024-10-04 Fri>

Groups of 7
Signup assignment and Google sheet will be announced early next week
You don’t need to have an idea yet to form a group
But it’s good to start thinking
Hopefully this lecture will help

How do I find my team members?

You choose your groups
Piazza
Class discord?*
Talk to your neighbors

\* class discord is not an officially supported communications channel. I’m not on it, I’m not monitoring it, use it at your own discretion.

Project Proposal

Due <2024-10-11 Fri>

Assignment Description on the course website soon
Examples on course website soon

Length:

Submit a 1-page (single spaced) document in paragraph form
Include at least 1 system diagram

Content:

Describe what your group plans to do for the course project. Answer the following questions:

What type of project is this?

A research re-implementation?
Novel research?
A web app?
A profiling/performance evaluation report?
An ethics evaluation?
Something else?

Ways to choose:
- Choose a kind of problem first, then choose an algorithm (recommended)
- Choose an algorithm, then find a problem to solve (also ok)
- Choose an algorithm, choose a problem, then find out whether they are compatible (not recommended!)

How will this project make use of AI?

What kind of AI will be used in this project?
- Supervised Machine Learning
- Unsupervised Machine Learning
- Pretrained generative models
- Reinforcement Learning
- Search
- Retrieval
- Knowledge Representation
- …

What value could someone get by viewing or interacting with your project?

This is an introductory class, of course
But it helps to have a use case in mind
Some past student projects were quite creative
- Creative projects tended to have a very specific goal, motivated by a problem or topic area where some groups members had prior exposure

Types of Projects

Research Re-implementation

Re-implement one of the algorithms from an AI research paper.
This involves
- reading the paper to understand what you’re implementing,
- writing code, and
- running (a subset of) the evaluations from your selected paper.

System Implementation

E.g. train a supervised machine learning model and evaluate its performance

System Evaluation

Evaluate an existing AI system for its functional performance.
This often includes an overall assessment of correctness/accuracy using a variety of metrics, as well as analysis of system failures.
How well does X algorithm perform on Y task? When does it fail? Why?

Performance Profiling/Evaluation/Optimization

Evaluate an AI system for its non-functional performance.
This often includes an assessment of how resource consumption (wall clock time, CPU time, memory, …) varies with input size (number of data points, number of features, …).
Profiling involves tracing system operation to identify the most time-consuming steps.

Novel Algorithm?

Develop a new AI technique or algorithm
Have seen only a few examples of this, from relatively advanced students
Doesn’t mean it’s impossible!
Highly recommend seeking advice from TAs or myself to check feasibility
Not recommended if this is your first “AI” class

Literature Survey

Synthesize trends in the way that AI is developed and used
Aggregate many primary sources into secondary commentary
Only project that does not require implementing or running an AI system
Most of the work is reading and synthesis
Cite at least 20 references
Include a review table
Search semantic scholar for “systematic review AI” for examples
Contact me if you’re interested in this track and need more help

Ethics or Socio-technical Systems Analysis

Analyze an existing system or technique from an ethical or STS perspective
Not just freeform commentary or personal opinion
choose a framework/lens and justify its use
Must demonstrate concrete engagement with the system or technique under study
- E.g. collect inputs and outputs that demonstrate a particular failure mode

Types of AI

One view on intelligence is intelligence = problem-solving
Different kinds of AI ←> Different problems
Class is linear, some topics will be later than others
Hopefully this preview helps

Symbolic AI, Logic-based systems

Represent problems, facts, beliefs, as a system of symbols that are related to each other

Image credit: Dabbeeru, M. (August 18, 2021 )

Example: Path-finding
- Nodes in a graph symbolize locations on a map
- Finding the shortest path in the graph means finding an efficient route from point A to point B
- Dijkstra’s, A*
  - Next week

Example: Knowledge Graph Completion
- From a set of facts, what other facts can be inferred?
- I will attempt to squeeze this in around Week 4… TBD

Game-Playing

Multiple agents
Board games like Go, Chess, Connect-4
Any setting with defined game rules and multiple players

Lee Sedol vs. AlphaGo, 2016

Example: Min-Max
- We’ll cover this in a couple weeks from now
Example: AlphaGo/AlphaGoZero/AlphaZero/MuZero
- A series of game-playing AIs from Google Deepmind
- We’ll talk about this briefly in a few weeks, but you’ll have to look into implementation details yourself
- Third-party packages available

Supervised Machine Learning

Prediction
- Given values for a set of input variables, predict the values for a set of output variables

Example: Trash Classification

Image Credit: New AI Proves to Be a Trash Sorter Extraordinaire, IEEE Spectrum

-   Given a picture of a piece of trash, classify whether it's landfill, compost, or recycleable
-   Involves:
    -   finding data with labels
    -   "training" a model
    -   Evaluating the model's performance
-   Former students got "Best Statistical Model" for implementing this at HackDavis '24
-   We'll talk about these techniques in the middle of the quarter
-   LOTS of online examples thanks to Kaggle and Medium

2. Algorithms:

-   Decision Tree (tabular)
-   Random Forest (tabular)
-   SVM (tabular)
-   Neural Networks
    -   Feed-forward (tabular data)
    -   Convolutional (image data)
    -   Recurrent (sequential data)
    -   &#x2026;

Unsupervised Machine Learning

Discovering patterns in data without pre-existing labels
Focuses on finding structure or relationships in data

Example: Clustering
- Grouping similar data points together
- K-means, hierarchical clustering

Example: Dimensionality Reduction
- Reducing the number of features while preserving important information
- Principal Component Analysis (PCA), t-SNE

Image Credit: Turing Finance

Example: Anomaly Detection
- Identifying unusual patterns that do not conform to expected behavior
- Useful in fraud detection, system health monitoring

Image credit: IBM Developer

”Generative AI”

AI systems that can create new content
Based on patterns learned from existing data
These days, often a deep neural network trained “self-supervised” (more on that later)
Data and compute-intensive to train from scratch
Can use pre-trained models
We’ll talk about this in the middle of the quarter

Example: Large Language Models (LLMs)
- Generate human-like text based on input prompts
- GPT-4, Claude, LLaMA
- Operate based on conditional probability

Image Credit: The Gradient

Example: Image Generation
- Create new images from text descriptions or other images
- DALL-E, Midjourney, Stable Diffusion

Image Credit: Our World in Data

Example: Music Generation
- Compose new music in various styles
- MuseNet, Jukebox

Recommender Systems

Predict user preferences and suggest relevant items
Used in e-commerce, streaming services, and social media
Not in class

Example: Collaborative Filtering
- Recommend items based on user behavior and preferences of similar users
- Netflix movie recommendations, Amazon product suggestions

Image Credit: Ashmi Banerjee

Example: Content-Based Filtering
- Recommend items similar to those a user has liked in the past
- Spotify playlist recommendations, news article suggestions
Example: Hybrid Systems
- Combine collaborative and content-based approaches for more accurate recommendations

Reinforcement Learning

Learning through interaction with an environment
Agent learns to make decisions by receiving rewards or penalties
We’ll talk about this later in the quarter (Probably Week 7 or later)

Example: Game AI
- Learning to play video games or board games
- DeepMind’s AlphaGo, OpenAI’s Dota 2 AI

2. Example: Robotics

-   Training robots to navigate and manipulate objects in real-world environments

Crowd-Comfort Robot Navigation Among Dynamic Environment Based on Social-Stressed Deep Reinforcement Learning

Example: Resource Management
- Optimizing systems like traffic light control or data center cooling
Algorithms:
- Q-Learning
- Deep Q-Network (DQN)
- Policy Gradient Methods
- Actor-Critic Methods

Trends

Supervised Machine Learning and RL system implementations are most popular
- Probably due to the many examples that are available, and student interest in developing hands-on skills
As mainstream ML moves towards larger and more general models, system evaluation becomes more complicated, potentially more in-demand
- Famously, GPT 4 attempted to recruit a crowd worker to solve CAPTCHAs during METR’s early-access evaluation

Back to the project

What to look out for

Compute

What computing resources will be required for your project?
Keep in mind that much of today’s “fancy AI” is compute-intensive.
Several cloud compute providers offer free trials, but there is overhead involved to set this up.
If you need more compute than what’s available to you on your own machine, CSIF, etc., start looking early and think ahead.
Ask me or the TAs for advice if you’re not sure

Scaffolding

Applies to all projects.

What scaffolding (external resources) will be used? Examples of scaffolding:
- Kaggle submissions
- GitHub code for research paper
- Medium/Towards Data Science blog post
- Github search for “Intro to AI projects”
Your project proposal should include a plan for what scaffolding you will use, and in what way

Representation

For most projects, be sure to consider representation.

Representation: You need to consider how you you will describe the world to your algorithm.

If you’re doing home price prediction, what will your algorithm know about any given home? What dimensions characterize a home?

Overall Scope

You’ll have ~8 weeks to implement this, so be mindful - don’t take on too much scope.

Supervision

Applies to many projects, especially system implementations, and especially ML.

If you’re doing supervised ML, you must comment on this in your proposal.

Supervision:

how will your algorithm learn what kinds of predictions it should make? (supervised ML)
how will you know if your algorithm is performing well or not? (any system implementation)

Def:

A source of supervision is any artifact that captures what humans think the “right answer” is for a given problem.
- Most often, this is provided in the form of a dataset of paired input-output examples for the problem that you’re working on.
- For example, if you want to predict home listing price, you need a dataset with many examples of houses and their prices.
- For reinforcement learning projects, supervision comes in the form of a reward function rather than input-output examples.

📚 gabe's wiki

Explorer

Lecture 1-2 Project Tips