|
397 | 397 | " 1. **Content Extraction:**\n", |
398 | 398 | " - Identify a suitable `name` for the set of questions.\n", |
399 | 399 | " - Identify the `year` if mentioned; otherwise, use \"0\".\n", |
400 | | - " - For each question, extract the full question text into `question_content` and the revelant full solution text into `solution_content`.\n", |
| 400 | + " - For each question, carefully extract the full question text into `question_content` and the corresponding full solution/answer text into `solution_content`. They may not be in the same section.\n", |
401 | 401 | " - If no solution is found, leave `solution_content` as an empty string `\"\"`.\n", |
| 402 | + " - Preserve all image tags like ``, making sure they are placed with their respective \"question_content\" and \"solution_content\".\n", |
402 | 403 | " - For Each Question extract all image references (e.g., `filename.jpg`) found within the `question_content` and `solution_content` and place them in the `images` list.\n", |
403 | 404 | "\n", |
404 | 405 | " 2. **Output Format (Crucial):**\n", |
|
545 | 546 | " 1. **Content Splitting:**\n", |
546 | 547 | " - From the input `question_content`, identify the main introductory text (the stem) and place it in the `content` field.\n", |
547 | 548 | " - Identify all sub-questions (e.g., \"(a)\", \"(b)\", \"i.\", \"ii.\") and place their text into the `parts` list.\n", |
548 | | - " - Parts may also be implied, you may also use the solution to infer the parts.\n", |
| 549 | + " - Parts may also be implied.\n", |
| 550 | + " - All Question Must have at least one part.\n", |
| 551 | + " - Ensure that images references are correctly placed with their respective parts.\n", |
| 552 | + " - Preserve all content perfectly, including text, LaTeX, and image tags like ``.\n", |
549 | 553 | " - Ensure no solution content is included in the `content` or `parts` fields.\n", |
550 | 554 | " - The `title` should be a concise summary of the question.\n", |
551 | 555 | " - The `images` list should be copied exactly from the input.\n", |
|
566 | 570 | "\n", |
567 | 571 | " 1. **Content Extraction:**\n", |
568 | 572 | " - From the `full solution`, find the worked solution that corresponds to the given `question part`.\n", |
| 573 | + " - Make sure the solutions for all parts together include the entire full solution text, with no missing content.\n", |
569 | 574 | " - Place this exact text into the `part_solution` field.\n", |
| 575 | + " - Ensure that images references are correctly placed with their respective parts.\n", |
570 | 576 | " - Preserve all content perfectly, including text, LaTeX, and image tags like ``.\n", |
571 | 577 | " - If no specific solution is found, use an empty string `\"\"`.\n", |
572 | 578 | "\n", |
|
711 | 717 | " You MUST return ONLY a single, raw, valid JSON string that strictly follows the original schema. Do NOT add any explanations, comments, or markdown code blocks.\n", |
712 | 718 | "\n", |
713 | 719 | " Apply these correction rules to the content inside the JSON fields:\n", |
714 | | - " 1. **JSON Escaping:** All LaTeX backslashes (`\\`) MUST be escaped as double backslashes (`\\\\`). For example, `\\cup` must be written as `\\\\cup`.\n", |
715 | | - " 2. **Math Delimiters:** All mathematical content must be enclosed in `$...$` for inline math or `$$...$$` for display math. Ensure all delimiters are correctly balanced and closed. '$' and '$$' should not be used for any other purpose.\n", |
| 720 | + " 1. **JSON Escaping:** All LaTeX backslashes (`\\`) MUST be escaped as double backslashes (`\\\\`). For example, `\\cup` must be written as `\\\\cup`. Never escape backslashes for newlines (`\\n`), as they should remain as is.\n", |
| 721 | + " 2. **Math Delimiters:** All mathematical content must be enclosed in `$...$` for inline math or `$$...$$` for display math. Ensure all delimiters are correctly balanced and closed. '$' and '$$' should not be used for any other purpose. Move all `\\n` outside the math delimiters.\n", |
716 | 722 | " 3. **Display Math:** `$$` delimiters must be on their own separate lines.\n", |
717 | 723 | " 4. **Image Tags:** Preserve image tags like `` exactly as they are.\n", |
718 | 724 | " 5. **Content Integrity:** Do not change, paraphrase, or summarize any text, formulas, or image links. Only fix formatting errors according to these rules.\n", |
|
931 | 937 | " print(json.dumps(extracted_dict, indent=2))\n", |
932 | 938 | " print(\"Now validating the content...\")\n", |
933 | 939 | "\n", |
934 | | - " content_validated_dict = content_texdown_check(extracted_dict)\n", |
935 | | - " print(\"successfully validated the content.\")\n", |
936 | | - " print(\"successfully converted markdown to JSON.\")\n", |
| 940 | + " # content_validated_dict = content_texdown_check(extracted_dict)\n", |
| 941 | + " # print(\"successfully validated the content.\")\n", |
| 942 | + " # print(json.dumps(content_validated_dict, indent=2))\n", |
| 943 | + " # print(\"successfully converted markdown to JSON.\")\n", |
937 | 944 | " \n", |
938 | | - " return content_validated_dict" |
| 945 | + " return extracted_dict" |
939 | 946 | ] |
940 | 947 | }, |
941 | 948 | { |
|
0 commit comments