Csv data of uworld id,subject,system,topic

Hello
I have these csv data of UWorld step 1,2,3
I wish them to be added to Anking deck as tags
I think they’ll be helpful for many people
uworld_csv.zip (122.9 KB)

I made this script
put the script next to this csv file
AIO.zip (120.9 KB)
and while your Anki is open run it

import csv
import json
import re
import urllib.request
from collections import defaultdict

ANKI_CONNECT_URL = "http://localhost:8765"
CSV_FILE = "uworld.csv"
BATCH_SIZE = 500
ROOT_TAG = "!MyAnking"

def anki_request(action, params=None):
    payload = json.dumps({"action": action, "version": 6, "params": params or {}}).encode("utf-8")
    req = urllib.request.Request(ANKI_CONNECT_URL, payload, headers={"Content-Type": "application/json"})
    with urllib.request.urlopen(req) as response:
        return json.loads(response.read().decode("utf-8"))["result"]

def normalize_tag(text):
    if not text: return "Unknown"
    text = re.sub(r"[\[\]\(\)\{\}]", "", text)
    text = re.sub(r"[^\w\s-]", "_", text)
    text = re.sub(r"\s+", "_", text).strip("_")
    return text

# 1️⃣ Load CSV data
id_to_meta = {}
with open(CSV_FILE, newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        uid = row["id"].strip()
        id_to_meta[uid] = {
            "sys": normalize_tag(row.get('system_name', 'Unknown')),
            "sub": normalize_tag(row.get('subject_name', 'Unknown')),
            "top": normalize_tag(row.get('topic', 'Unknown'))
        }

# 2️⃣ Find notes
note_ids = anki_request("findNotes", {"query": 'tag:"#AK_Step*::#UWorld::*"' })
print(f"Found {len(note_ids)} notes with UWorld tags.")

# 3️⃣ Process
matches_found = 0

for i in range(0, len(note_ids), BATCH_SIZE):
    chunk = note_ids[i:i + BATCH_SIZE]
    notes_info = anki_request("notesInfo", {"notes": chunk})
    batch_ops = defaultdict(list)

    for note in notes_info:
        for tag in note["tags"]:
            # Logic for Explicit Tag Structures
            parts = tag.split("::")
            
            # --- Check Step 1 & 2 ---
            # Format: #AK_Step1_v12 (parts[0]) :: #UWorld (parts[1]) :: Step (parts[2]) :: ID (parts[3])
            if len(parts) == 4 and parts[2] == "Step":
                uid = parts[3].strip()
                if uid in id_to_meta:
                    step_match = re.search(r"Step(\d)", parts[0])
                    step_val = f"Step{step_match.group(1)}" if step_match else "StepUnknown"
                    
                    m = id_to_meta[uid]
                    new_tags = (f"{ROOT_TAG}::{step_val}", f"{ROOT_TAG}::{step_val}::System::{m['sys']}", f"{ROOT_TAG}::{step_val}::Subject::{m['sub']}", f"{ROOT_TAG}::{step_val}::Topic::{m['top']}")
                    batch_ops[new_tags].append(note["noteId"])
                    matches_found += 1

            # --- Check Step 3 ---
            # Format: #AK_Step3_v12 (parts[0]) :: #UWorld (parts[1]) :: ID (parts[2])
            elif len(parts) == 3 and "#AK_Step3" in parts[0] and parts[1] == "#UWorld":
                uid = parts[2].strip()
                if uid in id_to_meta:
                    m = id_to_meta[uid]
                    new_tags = (f"{ROOT_TAG}::Step3", f"{ROOT_TAG}::Step3::System::{m['sys']}", f"{ROOT_TAG}::Step3::Subject::{m['sub']}", f"{ROOT_TAG}::Step3::Topic::{m['top']}")
                    batch_ops[new_tags].append(note["noteId"])
                    matches_found += 1

    # 4️⃣ Apply tags to Anki
    for tags, nids in batch_ops.items():
        anki_request("addTags", {"notes": nids, "tags": " ".join(tags)})

    print(f"Processed {min(i + BATCH_SIZE, len(note_ids))}/{len(note_ids)} (Matches: {matches_found})")

print(f"\n✅ Done! Successfully matched {matches_found} associations.")

I also did an analysis and figured 796 id are missing in Anking (across all 1,2,3 steps)
And many ids in Anking are missing in my csv data (they are from September 2025)

The analysis

[PART 1] GAPS IN ANKI (IDs in CSV missing these specific tags)
CSV Subject/System             | Count
---------------------------------------------
Advanced Clinical Medicine     | 197
Anatomy                        | 5
Behavioral science             | 74
Biostatistics                  | 12
Foundations of Independent Practice | 300
Genetics                       | 3
Immunology                     | 3
Medicine                       | 112
Microbiology                   | 1
Obstetrics & Gynecology        | 11
Pathology                      | 8
Pathophysiology                | 5
Pediatrics                     | 19
Pharmacology                   | 13
Physiology                     | 6
Psychiatry                     | 17
Surgery                        | 10
TOTAL MISSING                  | 796

[PART 2] EXTRA IN ANKI (ID tags not in CSV)
AnKing Step                    | Count
---------------------------------------------
Step 1                         | 514
Step 2                         | 726
Step 3                         | 221

So we have 796 id missing in Anking from my csv data
and we have 1461 id missing in my csv data from Anking (these question are not used in September 2025 UWORLD itself - maybe archived!)


Step1 → 9151 cards
Step2 → 7353 cards
Step3 → 3399 cards