Hybrid Automated Grading & Learning Analytics Platform
A neuro-symbolic AI platform combining deterministic code evaluation with generative AI feedback and longitudinal learning analytics — my final year thesis project.
Visit websiteThe Problem
In modern computer science education, manually grading programming assignments at scale is slow, inconsistent, and impossible to personalise. Traditional automated graders return only binary pass/fail results — they tell students what failed, not why, and offer no path forward. At the same time, relying on LLMs alone introduces hallucination risk: AI feedback without grounding in actual program output can actively mislead learners.
Neuro-Symbolic Architecture
The grading pipeline is built around two complementary layers. First, a symbolic evaluation layer — short Python scripts running static analysers, compilers, linters, and unit testing frameworks — verify syntactic correctness and logical validity, producing structured execution logs. Only after this deterministic pass does the system invoke theGoogle Gemini API, anchoring its explanations to real program output. This verify-then-generate protocol dramatically reduces hallucination risk.
Learning Analytics
Beyond grading, the platform captures high-resolution learning telemetry: syntax error frequency, compilation latency, test-case failure patterns, time between error introduction and resolution, and re-submission counts. These feed into Longitudinal Skill Profiles — continuous representations of how each student's abilities evolve across the course.
Instructors gain an analytics dashboard that spots students struggling with specific concepts, taking unusually long to debug, or making repeated errors — enabling early intervention rather than post-hoc remediation.
Time-Weighted Error Quotient
The research introduced a novel metric — the Time-Weighted Error Quotient (EQ) — which quantifies the intensity and persistence of coding errors during development. By weighting error events by the time spent in an erroneous state, EQ provides a more nuanced picture of cognitive difficulty than simple error counts alone.
Tech Stack
Python · Google Gemini API · React · PostgreSQL · Vercel