Lab
Token Detective
Lab: Token Detective
Objective
Build a small Markdown report that shows how token counts differ from word counts, character counts, and human intuition. You will use the LMS Token Explorer, copy tokenizer reports into a file, and explain what you notice about ordinary text, long words, punctuation, code-like text, and URLs.
Success Criteria
By the end of this lab, you should have:
- Created a project folder named
token-detective. - Created a Markdown file named
token-detective.md. - Used the LMS Token Explorer on at least five required phrases.
- Pasted tokenizer reports into your Markdown file.
- Written predictions before checking the tokenizer results.
- Explained at least three tokenization surprises in your own words.
- Completed a short token-budget reflection.
- Submitted a public GitHub repository URL for the project.
Instructions
Start in Codex and create a lab project folder.
Ask Codex:
Create a folder for this lab at ~/labs/token-detective.
Inside it, create a file named token-detective.md.
Open that folder as my current project.
In Codex, click the terminal button in the upper right. The terminal should open in the lower part of the window.
From that terminal, open the project in Zed:
zed .
In Zed, edit token-detective.md as your lab report.
Use this structure:
# Token Detective
## Prediction Table
## Tokenizer Reports
## What Surprised Me
## Token Budget Reflection
## Final Takeaway
Open the Token Explorer from the Tools section in the LMS.
For each phrase below, write a prediction before you check the tokenizer:
- How many words do you think it has?
- How many characters do you think it has?
- How many tokens do you expect?
- What part might tokenize in a surprising way?
Use these required phrases:
Hello world
unbelievable
ChatGPT is useful.
function getUserName()
https://lms.turingguild.com/courses/8
After each phrase, enter it in the Token Explorer. Use the Report copy button to copy the tokenizer report, then paste that report into the ## Tokenizer Reports section of your Markdown file.
After all five required phrases, add one phrase of your own. Choose something that includes at least one of these:
- punctuation
- a long word
- code-like text
- a URL
- unusual spacing
- a name or invented term
In ## What Surprised Me, write at least three short observations. Each observation should connect a tokenizer result to something visible in the text.
Good observation:
The URL used more tokens than I expected because slashes, dots, and mixed words do not behave like ordinary English words.
Weak observation:
Tokens are weird.
For ## Token Budget Reflection, choose one short prompt and one longer prompt that you might give to an AI assistant. Test both prompts in the Token Explorer.
Then answer these questions:
- Which prompt used more tokens?
- If the context limit were 500 tokens and you wanted to reserve 150 tokens for the answer, would your longer prompt fit comfortably?
- What would you shorten or remove if the prompt were too large?
Use this formula:
remaining output space = context limit - input tokens
In ## Final Takeaway, write two or three sentences explaining why token count is useful even though it is not the same as word count.
Required Deliverables
Your GitHub repository should contain:
token-detective.md- tokenizer reports for the five required phrases
- tokenizer report for one phrase you chose
- your prediction notes
- your surprise observations
- your token-budget reflection
- your final takeaway
Before submitting, check that your Markdown renders cleanly in Zed.
Submit Your Lab With GitHub
Ask Codex to help you publish the lab:
Stage and commit all changes with the message "Complete Token Detective".
Push the project to a public GitHub repository.
Then show me the public GitHub URL to submit.
Submit the public GitHub repository URL for this lab in the LMS.