Voice-Enabled AI Queries During Study Sessions (Hands-Free + Anti-Spoiler Mode)

ie_kirkpatrick · January 21, 2025, 8:40pm

Hey AnkiHub team! I study while walking on a treadmill and use a Nintendo Switch controller to flip cards. This setup works great… until I want to ask the AI a question. Typing mid-walk is clunky at best and dangerous at worst. I’d love to just talk to the AI like a study buddy through my headphones.

I actually built a janky add-on for this last year using Gemini’s API before the AnkiHub AI was released. My goals were:

Let users ask questions verbally without spoiling the answer (e.g., “Why is this term controversial?” → AI gives hints without the card’s answer text until you flip it).
Force active recall by making the AI avoid priming. For example, if I ask “What’s the mechanism of action?” on an unflipped card where the answer is directly relating to the mechanism of action, it should respond with “Think about the enzyme pathways that got us here” instead of reciting the answer.
Keep costs low (Gemini was affordable, but now I’m eyeing DeepSeek R1 even though Sonnet 3.6 has the best vibe).

But despite my prompting being good, my code is amateurish. I couldn’t solve key issues like image occlusion card support or audio/vision processing. Your AI integration is way smoother – could voice interaction be added natively?

Proposal Details

Use Cases:

Treadmill/controller users: Speak questions via mic, hear answers through headphones without touching a keyboard.
Anti-spoiler mode: AI references the card’s broader context (tags, linked notes) but never leaks the answer until you flip/reveal it.
Active recall reinforcement: If you ask about the answer pre-flip, the AI guides you to discover it yourself (“What do you remember?” or “Try guessing. It’s better to get it wrong than to not try at all.”).

Technical Approach:

Speech-to-text: Whisper (fast) or browser-native Web Speech API (no setup).
Text-to-speech: Options like Kokoro or Piper to reduce latency/cost.
Trigger: Physical button mapping (for Switch controllers) or “always listen” with a toggle.
Backend: Use DeepSeek R1/Claude Haiku for cheap text, save Sonnet/Gemini for future image/video queries.

Future Possibilities:

“AnkiHub, create a cloze from this” → auto-generate cards via voice.
Image occlusion support via screenshot + voice (“Make a new card to occlude the third step in this diagram”).
Access to Notability backups to reference what course material has been covered or not
Persistent memory on card progress using the memory Model Context Protocol server so the AI can best tailor its responses to you based on your previous conversations on certain cards or topics

Voice would let kinetic learners like me interact with AnkiHub’s AI without breaking focus. The anti-spoiler piece is crucial – I want to struggle with active recall first, then use the AI to deepen understanding post-flip. Given y’all’s knack for polish (and my failed attempts), I’d love to see this happen!

Rottiatti · February 14, 2025, 7:41am

I was just about to suggest this feature! Talking though things helps me remember things better and having an AI that could carry conversations in the future would certainly help. Maybe adding a separate button similar to the advanced voice mode that chatgpt pro would truly be seamless. you can talk to it and can enable/disable microphone in between to avoid interrupting it or have someone else activate it accidentally.

Perhaps it is currently hard to implement, but it would be a truly helpful feature.

Topic		Replies	Views
Voice response from Ankihub AI 🥺 Feature Request	1	21	January 4, 2025
AI-Powered Text-to-Speech for Flashcards 🥺 Feature Request	0	18	March 23, 2025
A way for cards to be spoken aloud 🙋 Support web-app	2	243	October 16, 2024
AnkiHub AI "pop up" 🥺 Feature Request	4	71	January 8, 2025
AnkiHub AI Suggestion 🥺 Feature Request	0	48	October 10, 2024

Voice-Enabled AI Queries During Study Sessions (Hands-Free + Anti-Spoiler Mode)

Related topics