Rather than relying on an embedding space, my approach is to have the cards themselves be grammars that can define the relationships between concepts explicitly. Then the problem becomes what specific sampling of all the possible outputs is optimal for a learner to see at any given time, given their knowledge state.
This is awesome. I've been using Bunpro for a while, which has great content, but I find myself memorizing the sentences rather than the grammar. Randomly generating cards based on the grammar points and vocab makes a ton of sense.
Some questions / comments / suggestions:
1. Is there a way to import vocab / kanji from Wanikani? WK is quite popular and has a good API. Bunpro integrates nicely with it, where it will or won't show furigana for kanji in the example sentences based on whether you've already learned the word in Wanikani. I'm guessing in your case you'd just want to import all the vocab. Even though I did the placement test, Grsly is still trying to teach me basic vocab like uta and obaasan. This is slowing down my progress through the grammar points.
2. Similar to question 1, is there a way to import grammar progress from Bunpro? Or even just click a button and have it assume I know everything from N5. The placement test only seemed to test a handful of basic grammar points.
3. Some of the sentences it has generated are quite awkward, like "ironna musume" ("all kinds of my daughter"). I guess that's grammatically correct, but it seems pretty unlikely to show up anywhere in real life. Have you considered using a local/small LLM to score or bias the example sentence generation? It's possible to constrain an LLM to only generate output that matches a grammar. You could construct such a grammar for each nontrivial element in your deck, with the vocab currently available for use. I guess you'd have to change the answer in your FAQ if you started using AI.
1. yes that's quite doable. the placement test only gets you to see a mix of basic and advanced vocab. without importing learning history from another platform you do have to see everything at least a few times eventually, easy or not.
2. this is more challenging as there's very often not a 1-to-1 relationship between grammar points.
3. I have a branch on the hsrs github that changes the sampling to be prefix-order so an llm can guide it, with mixed results. There's a tension between picking common outputs, and picking the output that will maximize your increase in retention across multiple cards. That being said 色んな娘 is definitely me forgetting to tag 娘 as non-attributive (like pronouns), will fix. you can read about the mechanisms I have to keep the content as natural as possible here: https://github.com/satchelspencer/hsrs/blob/main/docs/deck-c...
Amazing work! In https://rember.com the main unit is a note representing a concept or idea, plus some flashcards associated to it, hsrs would fit perfectly! I'll look more deeply into it.
yeah! hsrs elements are the notes, and their learnable properties would be the flashcards.
however, individual grammar outputs aren't their own cards, you get a fresh example every time you see a card. this requires a very different scheduling approach, since you have to estimate how all the cards in the 'call tree' contribute to the overall result and reschedule them as well https://github.com/satchelspencer/hsrs/blob/main/docs/overvi...
Pretty much all spaced rep systems except for Anki structure their data this way - an editable data atom with flashcards auto-derived from it, on template or otherwise.
See how it's applied to Japanese learning here: https://elldev.com/feed/grsly