Too many students struggle with math. In fourth grade, only 36% of students are proficient in math. By eighth grade, that number drops to 26%. Kids are stumped by fractions. Unsure of integers. Confused by calculus. Math derails their dreams.

That’s why we unveiled Khanmigo, our pilot AI tutor and teaching assistant, last year. When AI is carefully adapted for the classroom, it has enormous potential. Khanmigo can guide students as they learn and ask them questions like a tutor would do.

When AI is carefully adapted for the classroom, it has enormous potential.

As we come to the end of our first full pilot school year, we’re enthusiastic about Khanmigo’s ability to tutor in math (and many other subjects!). Khanmigo occasionally makes mistakes, which we expected. (In fact you can read about math mistakes in last year’s very first blog post about Khanmigo.) Even human tutors make mistakes sometimes. Regardless, we’re committed to making Khanmigo better.

But getting the math right is just one part of the challenge. The other part of the challenge is making sure Khanmigo evaluates student work correctly. Can Khanmigo follow the student’s steps? Sometimes Khanmigo makes mistakes when evaluating whether a student is right or wrong, even when it calculates the math correctly.

But getting the math right is just one part of the challenge.

This is a complex problem facing our field. To address it, here are some of the recent improvements made by our team of engineers, researchers, and former teachers:

- Khanmigo now uses a calculator to solve numerical problems instead of using AI’s predictive capabilities. If you’ve been using Khanmigo recently, you may have seen that it will sometimes say it is “doing math.” This is when the math problem is running through the calculator behind the scenes.
- We’ve upgraded parts of Khanmigo to a more capable large language model, which is the software that generates human language. The more capable large language model is called GPT-4 Turbo. Our internal testing shows an improvement in math after we made the switch.
- We are beginning to test the capabilities of a new large language model called GPT-4o, and we’re evaluating other models too to see if they are stronger at math.
- We’ve improved the way AI “thinks” during a tutoring session before responding to a student. We have instructed the AI to write out all the ways in which the student may have arrived at their answer. This approach mimics how a tutor in real life works with a student. We’ve found it significantly improves the quality of math interactions.
- We’ve built new tools to track our progress on math.
- We’re sharing math examples and learnings with others in our field so that we can learn from each other.
- We’re studying the latest research papers on math performance.

Also, we’ve assembled a set of math tutoring examples to evaluate new AI models and new fixes. This enables us to run every new fix through our set of examples to evaluate its performance and prevent the reintroduction of old problems when we fix a new problem (which is a common occurrence in software engineering).

As we come to the end of our first full pilot school year, we’re enthusiastic about Khanmigo’s ability to tutor in math (and many other subjects!).

Is there still work to be done? Absolutely.

It won’t be easy, but we’re motivated to tackle this problem for a very important reason. Think about all the kids whose dreams could be achieved if they could overcome exponents or conquer calculus.

Onward!

P.S. Khanmigo tutors in humanities too. Check out our AI essay tool, which helps students write better essays—without doing the writing for them.