Hey AnkiHub team! I study while walking on a treadmill and use a Nintendo Switch controller to flip cards. This setup works great… until I want to ask the AI a question. Typing mid-walk is clunky at best and dangerous at worst. I’d love to just talk to the AI like a study buddy through my headphones.
I actually built a janky add-on for this last year using Gemini’s API before the AnkiHub AI was released. My goals were:
- Let users ask questions verbally without spoiling the answer (e.g., “Why is this term controversial?” → AI gives hints without the card’s answer text until you flip it).
- Force active recall by making the AI avoid priming. For example, if I ask “What’s the mechanism of action?” on an unflipped card where the answer is directly relating to the mechanism of action, it should respond with “Think about the enzyme pathways that got us here” instead of reciting the answer.
- Keep costs low (Gemini was affordable, but now I’m eyeing DeepSeek R1 even though Sonnet 3.6 has the best vibe).
But despite my prompting being good, my code is amateurish. I couldn’t solve key issues like image occlusion card support or audio/vision processing. Your AI integration is way smoother – could voice interaction be added natively?
Proposal Details
Use Cases:
- Treadmill/controller users: Speak questions via mic, hear answers through headphones without touching a keyboard.
- Anti-spoiler mode: AI references the card’s broader context (tags, linked notes) but never leaks the answer until you flip/reveal it.
- Active recall reinforcement: If you ask about the answer pre-flip, the AI guides you to discover it yourself (“What do you remember?” or “Try guessing. It’s better to get it wrong than to not try at all.”).
Technical Approach:
- Speech-to-text: Whisper (fast) or browser-native Web Speech API (no setup).
- Text-to-speech: Options like Kokoro or Piper to reduce latency/cost.
- Trigger: Physical button mapping (for Switch controllers) or “always listen” with a toggle.
- Backend: Use DeepSeek R1/Claude Haiku for cheap text, save Sonnet/Gemini for future image/video queries.
Future Possibilities:
- “AnkiHub, create a cloze from this” → auto-generate cards via voice.
- Image occlusion support via screenshot + voice (“Make a new card to occlude the third step in this diagram”).
- Access to Notability backups to reference what course material has been covered or not
- Persistent memory on card progress using the memory Model Context Protocol server so the AI can best tailor its responses to you based on your previous conversations on certain cards or topics
Voice would let kinetic learners like me interact with AnkiHub’s AI without breaking focus. The anti-spoiler piece is crucial – I want to struggle with active recall first, then use the AI to deepen understanding post-flip. Given y’all’s knack for polish (and my failed attempts), I’d love to see this happen!