Problem of overlapping note ids between decks

Just wanted to put this here so it is documented somewhere and there can be discussion about it. Not sure what the priority on this should be.

The add-on currently doesn’t properly handle the case when you have overlapping note ids between multiple decks. Overlapping note ids means in this case e.g. that the same Anki note id is in two different decks and you subscribe to them both from the add-on.

Currently this is handled very poorly: When the note is updated in either of the AnkiHub decks, the update will be applied to the note in the Anki collection once you sync. The note will potentially flip-flop between two versions or will even be some combination between the two versions.

This issue also causes other problems in the add-on. (For example #2 in this user report: Optional Tag List Bugs, etc)
GitHub issue: https://github.com/ankipalace/ankihub_addon/issues/158
Related user support request: Subscribed Deck Not Showing Up in Decks

Potential solution 1

When you try to import overlapping notes, there is a dialog which tells the user that there are # many overlapping notes between deck A and deck B and asks them if they want to sync the note with deck A or note B.

What about existing overlapping notes (from before this change)?
The add-on shows a one-time dialog that asks the user which deck they want to sync the overlapping notes with.

The caveat of this solution is that you can’t have both version of the note in your collection.

Potential solution 2

When you try to import overlapping notes, there is a dialog which tells the user that there are # many overlapping notes between deck A and deck B and asks them if they want to sync one version or both. One set of notes would need to have their note ids and guids changed.

I’d vote for them choosing which deck they want it synced with

I now implemented a different, simpler solution to this problem. The add-on will from now on just skip notes that have the same anki id as notes in other installed ankihub decks when importing and syncing.

This is certainly better than the current situation which causes bugs and changes notes from one version to the other whenever it has one version and is updated in the other deck.

To make this apparent to the user a dialog like this can be shown:

When no notes are skipped it looks like this:

What do you think?

Edit: You can’t see this on the screenshots but I now changed it to say “Added # note(s)” instead of “Added # notes”.

1 Like

@andrew

It would probably be better to only consider notes with the same guids as conflicting (if they are in different ankihub decks) instead of notes with the same anki note id. I’m not saying we need to do this now, but we might have to change this sometime.

If we do this, we e.g. should also not display anki note ids on the ankihub website anymore and instead show ankihub note ids (they look like ankihub deck ids). (You can search by ankihub note ids in the Anki browser btw.)

Explanation:
Anki note ids can be the same by chance and that’s why Anki itself is using guids for determining if notes should be merged during importing, because they are “globally unique” unlike anki nids. If you import a note into anki (e.g. from a deck package file) anki will check if there is a note with the same guid in your collection, if this not the case, but there is a note with the same anki note id, the note id of the new note will be changed. Basically Anki note ids are what Anki uses internally, but for identifying notes from external sources guids are used.

1 Like

Good solution for now

1 Like

That’s actually kind of cool :laughing:

Won’t this potentially result in duplicate notes that are only unique on the guid/anki note id level? I’m not sure this is a complete solution because users could unknowingly have a bunch of duplicates across decks, resulting in more reviews.

I’m wondering if we should try to come up with a solution such that normal users won’t need to make a decision about related to this detail. I.e., I wonder if we can make this problem totally opaque to normal users.

That said, from a UX perspective, I think the primary thing we should solve for here is to avoid

  • average users needing to care about this
  • inadvertent proliferation of duplicate notes, resulting in more reviews

@jakub.f , I think the solution of just skipping duplicates is a fine solution for now. I think the average user doesn’t need to care and this will avoid errors.

In the future, we may need an “advanced,” option which would essentially allow users to choose which version of the notes to prefer, as @jakub.f suggested. I could imagine a simple way of ordering subscribed decks in order of priority/preference and the ordering decides which deck/note version to prefer when installing or updating. However, we may be able to avoid this (which would be preferable, probably) with an alternative approach (see below)

My original solution for this was to use a single ID for notes such that the primary key in AnkiHub’s database was the same as the Anki Note ID. and that we could therefore treat the Anki Note ID as a GUID. This would have required us modifying the Anki Note ID at deck creation time.

Other questions that come to mind:

Should we consider solving this on the backend? For example:

  • a data migration that ensures all Anki Note IDs are unique
    • modify Anki Note IDs locally on the next sync
  • modify the deck creation process to ensure Anki Note IDs are unique
    • upload the deck
    • return a payload with new Anki Note IDs

The above would have the downside I mentioned above: users could subscribe to multiple decks and unwittingly get a bunch of duplicate notes.

Perhaps this could be mitigated one the AnkiHub side by helping deck maintainers identify duplicate notes, not just in their own decks but across all AnkiHub notes.

1 Like

I think that skipping notes with conflicting Anki note IDs when installing/syncing a deck will solve the most urgent problems related to this topic. That said, if we want a more solid solution that prevents weird behaviour in some cases, we might have to make further changes.

We can’t simply change anki note ids of notes to solve this. We need a way to merge the notes of the installed deck with existing notes in the users collection to get the scheduling data for the cards of the note so that people keep their review progress from before using AnkiHub. We have to use either the Anki note id or the guid for that, because these are the only ids that notes have without AnkiHub.

The advantage of using guids over anki note ids is that guids are reliable for telling if two notes stem from the same source and note ids are not. That’s why Anki uses guids to identify notes during importing / exporting. Anki notes ids are timestamps of the note creation and are not guaranteed to be globally unique. They can be changed when importing from an apkg file, guids stay the same. Anki note ids could theoretically also be the same just by chance.

We also have the requirement that guids of notes should be the same as in the original uploaded deck for all users that install it. The reasons is that this allows you to export / import apkg files between AnkiHub and non-AnkiHub versions of the same deck without problems.

We should also allow multiple versions of the same note (as identified by it’s guid) on AnkiHub as long as they are in different AnkiHub decks.

Proposed solution:
I think the solution would be to use guids and ankihub note ids in a 1:N relation and skip notes that have the same guid while importing/syncing. Then we could ignore anki note ids completely on the webapp side and not modify them in any way in the add-on and let Anki manage them. People would have different Anki notes ids for the same ankihub note and the add-on would identify notes by guid and or ankihub note id depending on the situation.

Guids are reliable for identifying notes that stem from the same source. The same source means one person created the note and then people shared it using apkg / colpkg files.

The second point would be satisfied by the solution described above, because notes with the same guids would be skipped when importing.

When someone installs two decks with conflicting guids, the deck import summary dialog would tell them that notes were skipped. If they don’t want the notes to get skipped for the second deck they installed, they currently could remove the first deck or install the decks the other way around. If needed we could add something like the priority order you described:

One other related issue is that people might not expect that importing a deck from AnkiHub overwrites existing non-AnkiHub notes in their collection (which we currently do and this would also happen for the solution I described above). Currently they could even not notice this at all, or not at first after installing the deck. And then they could notice some time later that some of their notes are missing (and wonder if AnkiHub has something to do with this).

The deck import summary dialog should help to combat this issue, because it shows the number of updated notes. It’s not ideal though, because this dialog is shown after the fact and if they want to undo this, they have to restore from a backup.

It might be good to show a dialog after the deck is downloaded, but before installing the deck, that shows how many existing notes will be changed and asks the user to confirm.

Got it. Thanks. So you’re basically saying that we are going to honor Anki’s guid and use that, where we have historically been ignoring it?

Yes, at least this would be a more correct solution than the current one where we use anki note ids in terms of false positives / false negatives when detecting duplicate notes. (Guids are reliable, Anki note ids aren’t).

We could do some research on how much it matters in practice, by e.g. checking how often it happens that one guid is related to multiple anki note ids in the notes we have in our database.