A 60-minute lecture, processed to study materials in under four minutes.
intelliQ
University students spend 3–5 hours per lecture on work that adds no understanding: transcribing recordings by hand, extracting key concepts, building flashcard decks, writing practice questions. intelliQ's founders wanted to automate the entire pipeline — but doing it right meant solving hard infrastructure problems, not just calling a transcription API.
Cloud speech-to-text services are slow, expensive at scale, and send student audio to third-party servers. The platform needed a transcription engine fast enough to feel instant, accurate enough to handle accented lecture speech, and private enough that student data never left the infrastructure. That ruled out every major cloud ASR vendor.
Beyond transcription, the system had to be intelligent — not just produce a raw transcript, but structure it into hierarchical notes, generate flashcards, produce graded quizzes, and power a tutor that could answer follow-up questions grounded in the actual lecture content. And it all had to work reliably under concurrent load across a web app, a mobile app, and an admin control plane.
We deployed NVIDIA Parakeet — one of the fastest open ASR models available — on a dedicated GPU EC2 instance, wrapping it in a FastAPI service that accepts any audio or video file and returns a clean transcript via a single HTTP call. Parakeet processes a 60-minute lecture in approximately 2.5 minutes, running entirely within intelliQ's own AWS infrastructure. Student audio never touches a third-party transcription API.
Once a transcript exists, a Claude Sonnet pipeline takes over. A dedicated notes engine uses Claude's tool-use API to structure the raw transcript into hierarchical sections, extract definitions and key concepts, seed the spaced repetition flashcard deck (SM-2 algorithm), and generate a graded quiz. A separate recall-grading prompt evaluates student answers with a correct / partial / wrong verdict. Embeddings are computed locally via Ollama (mxbai-embed-large) and indexed in Pinecone for the AI tutor's evidence retrieval — so every tutor response cites the student's own lecture, not general knowledge.
The processing pipeline runs on BullMQ across six Redis-backed queues: audio merge, transcription, post-processing, notes indexing, preprocessing test, and a dead-letter queue for failed jobs. A worker process on EC2 picks up transcription jobs, calls the Parakeet service, and posts results back to the web API via a shared secret. Students see live progress via Server-Sent Events — uploading → processing → transcribing → structuring → saving → complete — with partial transcripts appearing in the UI as chunks finish. We measured this end-to-end pipeline in production Redis logs: a real university lecture completed in 3 minutes 58 seconds from job start to done.

Ready to build something similar?
Book a discovery meeting ↗