Updated Readme.md

HarrySu123 · HarrySu123 · commit bc0ecd453545 · 2025-07-17T10:43:24.000+01:00
diff --git a/.gitignore b/.gitignore
@@ -278,8 +278,9 @@ ENV/
 env.bak/
 venv.bak/
 
-# markdown
-**/conversion_content/
+# markdown - ignore everything inside conversion_content but keep the folder
+**/conversion_content/*
+!**/conversion_content/.gitkeep
 
 # Spyder project settings
 .spyderproject
diff --git a/conversion2025/README.md b/conversion2025/README.md
@@ -23,8 +23,11 @@ Ensure you have the following installed:
 - The notebook is designed for scientific documents, but can be extended to other text formats.
 
 ## How to use
-Place a pdf of your choice into the folder, `/conversion_content`. Name is example.pdf
+Place a pdf of your choice into the folder, `/conversion_content`. Name the pdf file as `example.pdf`.
 Run the converter in Jupiter. A folder with all the convertion content will be produced.
-Right now, a markdown made by Mathpix called `example.md` will be made. To save tokens, Mathpix will not run if `example.md` exists.
-
+for `mathpix_to_llm_to_in2lambda_to_JSON.ipynb`, it will produce a folder called `/mathpix_to_llm_to_in2lambda_to_JSON_out`.
+This will contain all the output of the converter.
 
+There is a markdown file called `example.md` inside `/mathpix_to_llm_to_in2lambda_to_JSON_out`, this is the markdown version of the pdf.
+As Mathpix rather reliably generates a consistent markdown version of the pdf, the converter will simply start from `example.md`.
+Meaning that if you wish to convert a different pdf, you must delete `example.md` first.
diff --git a/conversion2025/conversion_content/.gitkeep b/conversion2025/conversion_content/.gitkeep
@@ -0,0 +1,2 @@
+# This file ensures the conversion_content folder is tracked by git
+# while ignoring all other contents
diff --git a/conversion2025/mathpix_to_llm_to_in2lambda_to_JSON.ipynb b/conversion2025/mathpix_to_llm_to_in2lambda_to_JSON.ipynb
@@ -397,8 +397,9 @@
     "    1.  **Content Extraction:**\n",
     "        -   Identify a suitable `name` for the set of questions.\n",
     "        -   Identify the `year` if mentioned; otherwise, use \"0\".\n",
-    "        -   For each question, extract the full question text into `question_content` and the revelant full solution text into `solution_content`.\n",
+    "        -   For each question, carefully extract the full question text into `question_content` and the corresponding full solution/answer text into `solution_content`. They may not be in the same section.\n",
     "        -   If no solution is found, leave `solution_content` as an empty string `\"\"`.\n",
+    "        -   Preserve all image tags like `![pictureTag](filename.jpg)`, making sure they are placed with their respective \"question_content\" and \"solution_content\".\n",
     "        -   For Each Question extract all image references (e.g., `filename.jpg`) found within the `question_content` and `solution_content` and place them in the `images` list.\n",
     "\n",
     "    2.  **Output Format (Crucial):**\n",
@@ -545,7 +546,10 @@
     "    1.  **Content Splitting:**\n",
     "        -   From the input `question_content`, identify the main introductory text (the stem) and place it in the `content` field.\n",
     "        -   Identify all sub-questions (e.g., \"(a)\", \"(b)\", \"i.\", \"ii.\") and place their text into the `parts` list.\n",
-    "        -   Parts may also be implied, you may also use the solution to infer the parts.\n",
+    "        -   Parts may also be implied.\n",
+    "        -   All Question Must have at least one part.\n",
+    "        -   Ensure that images references are correctly placed with their respective parts.\n",
+    "        -   Preserve all content perfectly, including text, LaTeX, and image tags like `![pictureTag](filename.jpg)`.\n",
     "        -   Ensure no solution content is included in the `content` or `parts` fields.\n",
     "        -   The `title` should be a concise summary of the question.\n",
     "        -   The `images` list should be copied exactly from the input.\n",
@@ -566,7 +570,9 @@
     "\n",
     "    1.  **Content Extraction:**\n",
     "        -   From the `full solution`, find the worked solution that corresponds to the given `question part`.\n",
+    "        -   Make sure the solutions for all parts together include the entire full solution text, with no missing content.\n",
     "        -   Place this exact text into the `part_solution` field.\n",
+    "        -   Ensure that images references are correctly placed with their respective parts.\n",
     "        -   Preserve all content perfectly, including text, LaTeX, and image tags like `![pictureTag](filename.jpg)`.\n",
     "        -   If no specific solution is found, use an empty string `\"\"`.\n",
     "\n",
@@ -711,8 +717,8 @@
     "    You MUST return ONLY a single, raw, valid JSON string that strictly follows the original schema. Do NOT add any explanations, comments, or markdown code blocks.\n",
     "\n",
     "    Apply these correction rules to the content inside the JSON fields:\n",
-    "    1.  **JSON Escaping:** All LaTeX backslashes (`\\`) MUST be escaped as double backslashes (`\\\\`). For example, `\\cup` must be written as `\\\\cup`.\n",
-    "    2.  **Math Delimiters:** All mathematical content must be enclosed in `$...$` for inline math or `$$...$$` for display math. Ensure all delimiters are correctly balanced and closed. '$' and '$$' should not be used for any other purpose.\n",
+    "    1.  **JSON Escaping:** All LaTeX backslashes (`\\`) MUST be escaped as double backslashes (`\\\\`). For example, `\\cup` must be written as `\\\\cup`. Never escape backslashes for newlines (`\\n`), as they should remain as is.\n",
+    "    2.  **Math Delimiters:** All mathematical content must be enclosed in `$...$` for inline math or `$$...$$` for display math. Ensure all delimiters are correctly balanced and closed. '$' and '$$' should not be used for any other purpose. Move all `\\n` outside the math delimiters.\n",
     "    3.  **Display Math:** `$$` delimiters must be on their own separate lines.\n",
     "    4.  **Image Tags:** Preserve image tags like `![pictureTag](filename.jpg)` exactly as they are.\n",
     "    5.  **Content Integrity:** Do not change, paraphrase, or summarize any text, formulas, or image links. Only fix formatting errors according to these rules.\n",
@@ -931,11 +937,12 @@
     "    print(json.dumps(extracted_dict, indent=2))\n",
     "    print(\"Now validating the content...\")\n",
     "\n",
-    "    content_validated_dict = content_texdown_check(extracted_dict)\n",
-    "    print(\"successfully validated the content.\")\n",
-    "    print(\"successfully converted markdown to JSON.\")\n",
+    "    # content_validated_dict = content_texdown_check(extracted_dict)\n",
+    "    # print(\"successfully validated the content.\")\n",
+    "    # print(json.dumps(content_validated_dict, indent=2))\n",
+    "    # print(\"successfully converted markdown to JSON.\")\n",
     "    \n",
-    "    return content_validated_dict"
+    "    return extracted_dict"
    ]
   },
   {
diff --git a/conversion2025/testing.json b/conversion2025/testing.json
@@ -1,41 +0,0 @@
-{
-  "name": "Mathematical Analysis (1st Year) - Problem Sheet 1",
-  "year": "0",
-  "questions": [
-    {
-      "question_content": "1. To gain some familiarity with sets and operations on them, prove the following relations algebraically, and where possible draw the corresponding Venn diagrams.\n(a) $A \\cup \\emptyset=A$\n(b) $A \\cap \\emptyset=\\emptyset$\n(c) $A \\cup B=B \\cup A$\n(d) $A \\cap B=B \\cap A$\n(e) $A \\subseteq A \\cup B$\n(f) $A \\cup(B \\cap A)=A$\n(g) $A \\cap(B \\cup A)=A$\n(h) $(A \\backslash C) \\cap(B \\backslash C)=(A \\cap B) \\backslash C$\n(i) $(A \\cap B)^{\\prime}=A^{\\prime} \\cup B^{\\prime}$",
-      "solution_content": "1. (a) $A \\cup \\phi=A$  \n(i) $x \\in A \\cup \\phi \\Rightarrow x \\in A$ or $x \\in \\varnothing$ so $x \\in A$ $\\Rightarrow A \\cup \\Phi \\subseteq A$.  \n(ii) $x \\in A \\Rightarrow x \\in A$ or $x \\in \\phi$ (although $x \\notin \\phi$ ) $\\Rightarrow x \\in A \\cup \\varnothing$ so $A \\subseteq A \\cup \\phi$ hence $A=A \\cup \\phi$.  \n(b) $A \\cap \\phi=\\phi$.  \nLHS means $x \\in A$ and $x \\in \\phi$, but that are no $x \\in \\varnothing$ so the statement is vacuous and therefore $A \\cap \\varnothing=\\varnothing$.  \n(c) $A \\cup B=\\{x \\mid x \\in A$ or $x \\in B\\}$  \n$$\n=\\{x \\mid x \\in B \\text { or } x \\in A\\}=B \\cup A\n$$\n(d) $A \\cap B=\\{x \\mid x \\in A$ and $x \\in B\\}=B \\cap A$.  \n(e) $A \\subseteq A \\cup B$ $x \\in A \\Rightarrow x \\in A$ or $B$, ie $x \\in A \\cup B$ $\\Rightarrow A \\subseteq A \\cup B$.  \n(f) $A \\cup(B \\cap A)=A$ $x \\in A \\cup(B \\cap A)$ means $x \\in A$ or $x \\in(B \\cap A)$ $x \\in B$ and $x \\in A$.  \n(g) $A \\cap(B \\cup A)=A$ $x \\in A \\cap(B \\cup A)$ means $x \\in A$ and $(x \\in B$ or $x \\in A)$.  \n(h) $(A \\backslash C) \\cap(B \\backslash C)=(A \\cap B) \\backslash C$ $x \\in(A \\backslash C) \\cap(B \\backslash C)$ $\\Rightarrow x \\in(A \\backslash C)$ and $x \\in(B \\backslash C)\\Rightarrow x \\in A, x \\notin C$ and $x \\in B, x \\notin C\\Rightarrow x \\in A$ and $x \\in B$ and $x \\notin C\\Rightarrow x \\in A \\cap B, x \\notin C\\Rightarrow x \\in A \\cap B \\backslash C$.  \n(i) $(A \\cap B)^{\\prime}=A^{\\prime} \\cup B^{\\prime}$ $x^{\\notin(A \\text { and } B)}$.  \n$$\n\\begin{aligned}\n(A \\cap B)^{\\prime}= & \\{x \\mid x \\notin(A \\cap B)\\} \\\\\n& \\Rightarrow x \\notin A \\text { or } x \\notin B \\\\\n& \\Rightarrow x \\in A^{\\prime} \\text { or } x \\in B^{\\prime} \\\\\n& =A^{\\prime} \\cup B^{\\prime}\n\\end{aligned}\n$$\n2. (a) $A \\cap B=\\{1,2,3\\} \\cap\\{1,2\\}=\\{1,2\\}=B$.  \n(b) $A \\cup B=\\{1,2,3\\} \\cup\\{1,2\\}=\\{1,2,3\\}=A$.  \n(c) $A \\cap(B \\cap C)=A \\cap\\{1\\}=\\{1\\}=E$.  \n(d) $(C \\cup A) \\cap B=A \\cap B=B$.  \n(e) $A \\backslash B=\\{3\\}=G$.  \n(f) $C \\backslash A=\\phi=H$.  \n(g) $(D \\backslash F) \\cup(F \\backslash D)=\\{3\\} \\cup \\phi=\\{3\\}=G$.  \n(h) $G \\backslash A=\\phi=H$.  \n(i) $A \\cup((B \\backslash C) \\backslash F)=A \\cup(\\{2\\} \\backslash F)\\Rightarrow A \\cup \\phi=A=\\text{(k) } H \\cup H=H$.  \n3. $x_{1}, x_{2}, \\ldots x_{n+1}$  \nLet $\\frac{x_{1}}{n}=p_{1}+F_{1} \\cdot \\frac{x_{i}}{n}=p_{i}+F_{i}$ etc. $i=1, n+1$.  \n$p_{i} \\in \\mathbb{N}, F_{i}$ is the fractional remainders and clearly $F_{i}$ takes one of the values $F_{c}=0 / n, 1 / n, \\cdots \\text { or } n-1 / n$  \nHence there are $n$ usable distinct values for $(n+1) F_{i} \\Rightarrow 2$ of the $F_{i}$ must be equal, $F^{\\prime}$ and $F^{\\prime \\prime}$ say.  \n4. Let $f: A \\rightarrow B, g: B \\rightarrow C$.  \n(a) If $f, g$ are surjective. If $c \\in C$, then $\\exists \\quad b \\in B$ such that $g(b)=c$. Also $f$ is surjective so $\\exists a \\in A$ such that $f(a)=b$.  \nSo $(g \\circ f)(a)=g(f(a))=g(b)=c$ hence $g \\circ f$ is SURJECTIVE.  \n(b) If $f, g$ are injective. $a, a^{\\prime} \\in A$ and $y \\quad a \\neq a^{\\prime}$, then $f(a) \\neq f(a^{\\prime})$, since $f$ injective. Since $g$ is also injective, $g(f(a)) \\neq g(f(a^{\\prime}))$, so $g \\circ f$ is injective.  \n5. (i)  \n$$\n\\begin{aligned}\n& y=(f \\circ g)^{-1}(x) \\\\\n\\Rightarrow \\quad & x =(f \\circ g)(y) \\\\\n& =f(g(y)) \\\\\n\\Rightarrow \\quad & f^{-1}(x)=g(y) \\\\\n\\Rightarrow \\quad & g^{-1}(f^{-1}(x))=y\n\\end{aligned}\n$$\nComparing (1) and (2) $\\Rightarrow (f \\circ g)^{-1}=g^{-1} \\circ f^{-1}$.  \n(ii) $(f \\circ(g \\circ h))(x)$  \n$$\n\\begin{aligned}\n& =f((g \\circ h)(x)) \\\\\n& =f(g(h(x)) \\\\\n& =(f \\circ g)(h(x)) \\\\\n& =((f \\circ g) \\circ h)(x)\n\\end{aligned}\n$$\nso $f \\circ(g \\circ h)=(f \\circ g) \\circ h$.",
-      "images": [
-        "0_2025_07_14_0dc1d2fe9dc3cfc99f10g-03.jpg",
-        "1_2025_07_14_0dc1d2fe9dc3cfc99f10g-04.jpg",
-        "2_2025_07_14_0dc1d2fe9dc3cfc99f10g-04.jpg",
-        "3_2025_07_14_0dc1d2fe9dc3cfc99f10g-04.jpg",
-        "4_2025_07_14_0dc1d2fe9dc3cfc99f10g-05.jpg",
-        "5_2025_07_14_0dc1d2fe9dc3cfc99f10g-05.jpg",
-        "6_2025_07_14_0dc1d2fe9dc3cfc99f10g-06.jpg",
-        "7_2025_07_14_0dc1d2fe9dc3cfc99f10g-09.jpg",
-        "8_2025_07_14_0dc1d2fe9dc3cfc99f10g-09.jpg"
-      ]
-    },
-    {
-      "question_content": "2. Let $A=\\{1,2,3\\}, B=\\{1,2\\}, C=\\{1,3\\}, D=\\{2,3\\}, E=\\{1\\}, F=\\{2\\}, G=\\{3\\}$, $H=\\emptyset$. Simplify the following expressions. In each case the answer should be one of the sets $A, B \\cdots H$.\n(a) $A \\cap B$\n(b) $A \\cup B$\n(c) $A \\cap(B \\cap C)$\n(d) $(C \\cup A) \\cap B$\n(e) $A \\backslash B$\n(f) $C \\backslash A$\n(g) $(D \\backslash F) \\cup(F \\backslash D)$\n(h) $G \\backslash A$\n(i) $A \\cup((B \\backslash C) \\backslash F)$\n(j) $H \\cup H$\n(k) $A \\cap A$\n(l) $((B \\cup C) \\cap C) \\cup H$",
-      "solution_content": "2. (a) $A \\cap B=\\{1,2,3\\} \\cap\\{1,2\\}=\\{1,2\\}=B$.  \n(b) $A \\cup B=\\{1,2,3\\} \\cup\\{1,2\\}=\\{1,2,3\\}=A$.  \n(c) $A \\cap(B \\cap C)=A \\cap\\{1\\}=\\{1\\}=E$.  \n(d) $(C \\cup A) \\cap B=A \\cap B=B$.  \n(e) $A \\backslash B=\\{3\\}=G$.  \n(f) $C \\backslash A=\\phi=H$.  \n(g) $(D \\backslash F) \\cup(F \\backslash D)=\\{3\\} \\cup \\phi=\\{3\\}=G$.  \n(h) $G \\backslash A=\\phi=H$.  \n(i) $A \\cup((B \\backslash C) \\backslash F)=A \\cup(\\{2\\} \\backslash F)=A \\cup\\phi=A=\\text{(k) } H \\cup H=H$.",
-      "images": []
-    },
-    {
-      "question_content": "3. Use the pigeonhole principle to prove that in any set of $n+1$ integers, there must be two integers whose difference is divisible by $n$.",
-      "solution_content": "3. $x_{1}, x_{2}, \\ldots x_{n+1}$  \nLet $\\frac{x_{1}}{n}=p_{1}+F_{1} \\cdot \\frac{x_{i}}{n}=p_{i}+F_{i}$ etc. $i=1, n+1$.  \n$p_{i} \\in \\mathbb{N}, F_{i}$ is the fractional remainders and clearly $F_{i}$ takes one of the values $F_{c}=0 / n, 1 / n, \\cdots \\text { or } n-1 / n$  \nHence there are $n$ usable distinct values for $(n+1) F_{i} \\Rightarrow 2$ of the $F_{i}$ must be equal, $F^{\\prime}$ and $F^{\\prime \\prime}$ say.",
-      "images": []
-    },
-    {
-      "question_content": "4. Show that the composition of two injective maps is injective, and the composition of two surjective maps is surjective. Deduce that the composition of two bijective maps is bijective.",
-      "solution_content": "4. Let $f: A \\rightarrow B, g: B \\rightarrow C$.  \n(a) If $f, g$ are surjective. If $c \\in C$, then $\\exists \\quad b \\in B$ such that $g(b)=c$. Also $f$ is surjective so $\\exists a \\in A$ such that $f(a)=b$.  \nSo $(g \\circ f)(a)=g(f(a))=g(b)=c$ hence $g \\circ f$ is SURJECTIVE.  \n(b) If $f, g$ are injective. $a, a^{\\prime} \\in A$ and $y \\quad a \\neq a^{\\prime}$, then $f(a) \\neq f(a^{\\prime})$, since $f$ injective. Since $g$ is also injective, $g(f(a)) \\neq g(f(a^{\\prime}))$, so $g \\circ f$ is injective.",
-      "images": []
-    },
-    {
-      "question_content": "5. In the following question you may assume that the domains and codomains of the functions $f, g, h$ are suitably defined and that inverses exist. Show that\n(i) $(f \\circ g)^{-1}=g^{-1} \\circ f^{-1}$\n(ii) $f \\circ(g \\circ h)=(f \\circ g) \\circ h$",
-      "solution_content": "5. (i)  \n$$\n\\begin{aligned}\n& y=(f \\circ g)^{-1}(x) \\\\\n\\Rightarrow \\quad & x =(f \\circ g)(y) \\\\\n& =f(g(y)) \\\\\n\\Rightarrow \\quad & f^{-1}(x)=g(y) \\\\\n\\Rightarrow \\quad & g^{-1}(f^{-1}(x))=y\n\\end{aligned}\n$$\nComparing (1) and (2) $\\Rightarrow (f \\circ g)^{-1}=g^{-1} \\circ f^{-1}$.  \n(ii) $(f \\circ(g \\circ h))(x)$  \n$$\n\\begin{aligned}\n& =f((g \\circ h)(x)) \\\\\n& =f(g(h(x)) \\\\\n& =(f \\circ g)(h(x)) \\\\\n& =((f \\circ g) \\circ h)(x)\n\\end{aligned}\n$$\nso $f \\circ(g \\circ h)=(f \\circ g) \\circ h$.",
-      "images": []
-    }
-  ]
-}

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+# This file ensures the conversion_content folder is tracked by git`
	`2`	`+# while ignoring all other contents`