Write an example project that matches this assignment and rubric

Project Proposal

Submit a 500-700 word markdown document in paragraph form describing what your group plans to do for the course project. Your document should be at least 1 page not including figures, and no longer than 2 pages including figures. Answer the following questions:

What type of project is this?

A research re-implementation?
System Implementation
System Evaluation
Performance Profiling/Optimization
Novel Algorithm?
Literature Survey
Ethics or Socio-Technical Systems Analysis

Explain why your group is interested in working on a project of this kind.

What kind of AI System(s) do you plan to study?

Symbolic AI/Logic-based systems
Game-playing
Supervised Machine Learning
Unsupervised Machine Learning
Generative AI
Recommender Systems
Reinforcement Learning
Other/Multiple

If you know which specific algorithms and/or pretrained models you will use, state them here. If not, name at least one example algorithm that your group is thinking about using.

What resources will you use?

Datasets
Compute (see below)
Pretrained Models
Code Examples (see below)
APIs
Algorithm descriptions (if you plan to reimplement an algorithm)
Ethical or Sociotechnical frameworks (for corresponding project type)
Other

What value could someone get by viewing or interacting with your project?

A reader should be able to understand what you plan to do. They should have a sense that it is feasible, based on your description of how you plan to do it.

The reader should also understand why you think it is useful or interesting to do this project (though they don’t have to agree). The project is limited in scope, so its value may also be limited. That’s ok. If it’s been done before, you can write about why solutions to this problem are valuable in general, but write from your own belief.

For research re-implementations, this will naturally reiterate some content in the paper you’re re-implementing. However, this part of the proposal should reflect your own understanding of why the work is valuable.

What computing resources will be required for your project?

Keep in mind that much of today’s “fancy AI” is compute-intensive.
Several cloud compute providers offer free trials or a free usage tier, but there is overhead involved to set up cloud resources. Are you comfortable with environment setup and working on remote machines? It’s certainly something you can learn for this project, but be sure to budget time for that.

What scaffolding (external resources) will be used?

I expect most projects to start from some kind of scaffolding — resources that help you structure your project to get something “off the ground”.

Examples of scaffolding:

Kaggle submissions
GitHub code for research paper
Medium/Towards Data Science blog post
Github search for “Intro to AI projects”

“Reimplementation” means write the code that implements the algorithm in a research paper, consulting any existing code as a reference. It doesn’t mean simply git-cloning the existing repo and running a single script.

The more scaffolding your project uses, the higher the bar for your final deliverable. If you apply AI to a completely new problem, it would be unfair to expect stellar results; the problem might be hard! In this case you will get credit for trying methods that seem appropriate, and evaluating them thoroughly. If you apply AI to a very well-known problem like MNIST digit classification, you would be well-advised to submit strong results, or to branch out from the scaffolding you used and try something innovative.

I do recommend you start with some scaffolding. I do not recommend that you submit the scaffolding without having done any development of your own.

Your proposal should clearly state the types of resources you plan to use, with examples. You should state how you plan to use each resource.

Addressing Project Risks

State any project risks you foresee at this point. Note that “zero risks” is not necessarily the desired answer — appropriately ambitious projects will have some risks :).

Compute

If you need more compute than what’s available to you on your own machine, CSIF, etc., start looking early and think ahead. Outline your compute plan in the proposal.

In the past, students have often been surprised how much of their project was simply waiting for long compute runs to finish.

For a detailed guide on calculating memory requirements for Transformer training, see also: https://blog.eleuther.ai/transformer-math/.

Scaffolding

Some students propose projects that already have very complete code examples online. In this case, your report should state clearly what new work you plan to to that builds from the scaffolding.

Some students propose projects that are very innovative. In this case, your report should state clearly how you can break down the uncertainty into something manageable. The more uncertainty you have about whether your idea can succeed, the more value there is in having a Plan B, C, etc.

While you’re working on your proposal, search for projects similar to yours on sites like Kaggle, Huggingface, GitHub, etc.

Representation

You need to consider how you you will describe the world to your algorithm. If you’re doing home price prediction, what will your algorithm know about any given home? What dimensions characterize a home?

Overall Scope

You’ll have ~8 weeks to implement this, so be mindful - don’t take on too much scope.

Presentability

The project is meant to be an opportunity for you to create something tangible that can be presented to others. It’s an AI implementation project first and foremost, but budget for presentation effort so that others can understand what you did.

Think ahead to how you will present your results, then work backwards. What kinds of charts, diagrams, tables, etc. will you want to use to convey your project results? What kind of data would you need to make such a chart? What kind of experiments would you need to collect such data?

Supervision

You need some learning signal that will tell your algorithm what kinds of predictions it should make for a given example. Most often, this is provided in the form of a dataset of paired input-output examples for the problem that you’re working on. For example, if you want to predict home listing price, you need a dataset with many examples of houses and their prices.

What does supervision look like?

For supervised machine learning projects, supervision comes in the form of input-output examples.
For reinforcement learning projects, supervision comes in the form of a reward function.

Project Proposal Rubric

Criteria	Excellent (4)	Good (3)	Fair (2)	Needs Improvement (1)
Project Type Identification	Clearly identifies and justifies the project type (e.g., "This is a research re-implementation of the XYZ algorithm, chosen because...")	Identifies the project type with some justification (e.g., "We plan to do a system implementation of a recommendation engine")	Mentions the project type without clear justification (e.g., "Our project is about machine learning")	Project type is unclear or not specified
AI System(s) Description	Thoroughly describes the AI system(s) to be studied, including specific algorithms or models (e.g., "We will implement a BERT-based NLP model for sentiment analysis")	Describes the AI system(s) with some detail (e.g., "Our project uses deep learning for image classification")	Mentions the AI system(s) without much detail (e.g., "We plan to use machine learning")	AI system(s) are unclear or not specified
Resource Identification	Comprehensively lists and explains all resources to be used, including specific datasets, APIs, and libraries (e.g., "We will use the MNIST dataset, TensorFlow library, and Google Cloud Platform for compute")	Lists most resources with some explanation (e.g., "We plan to use a public dataset and Python libraries for machine learning")	Mentions some resources without clear explanation (e.g., "We'll need some data and machine learning tools")	Resources are vague or not specified
Project Value	Clearly articulates the project's value and interest, with specific examples or use cases (e.g., "A project like this could improve medical diagnosis accuracy, resulting in benefit to X population")	Explains the project's value with some clarity (e.g., "Our project could be useful in healthcare applications")	Attempts to explain the project's value (e.g., "Machine learning is important")	Project's value is unclear or not addressed
Feasibility	Presents a highly feasible plan with clear implementation steps, timeline, and risk mitigation strategies	Presents a feasible plan with some implementation details and consideration of potential challenges	Presents a plan with limited feasibility or details, or with unrealistic expectations	Plan lacks feasibility or clear implementation steps
Computing Resources	Thoroughly outlines computing requirements and plan (e.g., "We require a GPU instance on AWS for 100 hours, estimated cost $X")	Describes computing requirements with some detail (e.g., "We'll need access to GPUs for training")	Mentions computing needs without clear plan (e.g., "We might need powerful computers")	Computing resources are not addressed or unclear
Scaffolding/External Resources	Clearly identifies all scaffolding and explains its use (e.g., "We'll start with the XYZ GitHub repo as a base, specifically using their data preprocessing module")	Identifies most scaffolding with some explanation of use (e.g., "We'll use some existing code from GitHub for parts of our project")	Mentions some scaffolding without clear explanation (e.g., "We might use some online resources")	Scaffolding is not addressed or unclear
Risk Assessment	Thoroughly addresses potential risks and mitigation strategies (e.g., "If the initial approach fails, we have a backup plan to use algorithm B instead")	Addresses some risks with mitigation strategies (e.g., "We might face challenges with data quality, so we'll implement data cleaning steps")	Mentions risks without clear mitigation strategies (e.g., "The project might be difficult")	Risks are not addressed or poorly considered
Scope	Project scope is well-defined, challenging yet achievable within the timeframe (e.g., "We will implement and compare 3 specific algorithms on 2 datasets")	Scope is defined but may be slightly ambitious or not quite challenging enough (e.g., "We will implement 1 algorithm on 1 dataset, with stretch goals if time permits")	Scope is somewhat unclear, overly ambitious, or not ambitious enough (e.g., "We want to solve all of computer vision" or "We will classify images using a pre-built model")	Scope is undefined, unrealistic for the timeframe, or far too simple for a course project
Presentation Consideration	Clearly outlines plans for making the project presentable (e.g., "We will create an interactive demo and prepare visualizations of our results")	Mentions presentability with some consideration (e.g., "We plan to create graphs of our results")	Briefly addresses presentability (e.g., "We'll make some kind of presentation")	Presentability is not addressed
Writing Quality	Well-written, clear, and concise. Uses technical language appropriately and explains complex concepts effectively	Mostly clear with minor issues in writing. Some technical concepts could be explained more clearly	Some clarity issues or verbosity. Technical language is sometimes used incorrectly or concepts are poorly explained	Significant issues with clarity or conciseness. Technical writing is poor or confusing
Format Adherence	Fully adheres to the 1-2 page format requirement, uses appropriate sections and formatting	Mostly adheres to format with minor deviations (e.g., slightly over/under page limit)	Partially adheres to format requirements (e.g., missing some key sections)	Does not adhere to format requirements (e.g., significantly over/under page limit, lacks structure)
Figure Inclusion	Includes at least one clear, relevant figure (e.g., system architecture diagram, workflow chart) that effectively represents the project plan or system; figure is well-explained in the text	Includes one figure that represents the project plan or system; figure is somewhat explained in the text	Includes a figure, but it's not clearly relevant or well-explained (e.g., a generic or poorly labeled diagram)	Does not include a figure, or the figure is irrelevant or unexplained

Note: Each criterion is scored on a scale of 1-4. The total score can be calculated by summing the scores for each criterion, with a maximum possible score of 52 points.

Banned Projects

These are projects that I have seen too many times now… Nothing wrong with them, but I am retiring their jerseys. Nobody is allowed to do the following projects. Get creative!

Rainforest Bird Song Classification
Dog Breed Classification
Anything using Spotify data

📚 gabe's wiki

Explorer

Project Proposal

Project Proposal

What type of project is this?

What kind of AI System(s) do you plan to study?

What resources will you use?

What value could someone get by viewing or interacting with your project?

What computing resources will be required for your project?

What scaffolding (external resources) will be used?

Addressing Project Risks

Compute

Scaffolding

Representation

Overall Scope

Presentability

Supervision

What does supervision look like?

Project Proposal Rubric

Banned Projects

Table of Contents

Table of Contents