4.1 Understanding Tokenization
LLMs don't process text character by character or word by word -- they work with "tokens." A token is a chunk of text that the model treats as a single unit. Understanding tokenization helps you write more efficient prompts and stay within context limits.
Tokenization Rules of Thumb
- 1 token ≈ 4 characters in English (including spaces)
- 1 token ≈ 0.75 words on average
- 100 tokens ≈ 75 words (useful for estimation)
- 1,000 tokens ≈ 750 words or about 1.5 pages of text
- Common words are usually single tokens; rare words may split
- Non-English text often uses more tokens per word
Both input (your prompt) AND output (AI's response) count toward token limits and costs. A verbose prompt that could be written more concisely wastes tokens you could use for longer, more detailed responses.
4.2 Context Windows Explained
The context window is the total amount of text (measured in tokens) that an LLM can process at once -- including both your input and its output. Think of it as the AI's "working memory" for the current conversation.
Current Model Context Windows
| Model | Context Window | Approximate Text |
|---|---|---|
| GPT-4 Turbo | 128,000 tokens | ~300 pages |
| Claude 3 Opus | 200,000 tokens | ~500 pages |
| Gemini 1.5 Pro | 1,000,000 tokens | ~1,500 pages |
| GPT-3.5 Turbo | 16,000 tokens | ~40 pages |
Larger context windows don't mean unlimited memory. Performance often degrades with very long contexts, especially for information in the "middle" of the input. For critical analysis, keep relevant information near the beginning or end of your prompt.
Working Within Context Limits
- Prioritize essential information: Put the most important content first
- Summarize when possible: Replace lengthy documents with concise summaries
- Chunk large documents: Process in sections rather than all at once
- Remove redundancy: Don't repeat information unnecessarily
- Use references: "Refer to the contract above" instead of restating
4.3 API Pricing and Cost Management
When using AI through APIs (application programming interfaces), you pay per token. Understanding pricing helps you make cost-effective choices and budget for AI integration in legal practice.
Typical Pricing Structure (as of 2024)
| Model Tier | Input Cost | Output Cost | Example Models |
|---|---|---|---|
| Economy | $0.50/1M tokens | $1.50/1M tokens | GPT-3.5, Claude Haiku |
| Standard | $3-5/1M tokens | $10-15/1M tokens | GPT-4, Claude Sonnet |
| Premium | $10-15/1M tokens | $30-75/1M tokens | GPT-4 Turbo, Claude Opus |
Analyzing a 10-page contract with GPT-4 Turbo
Use cheaper models for simple tasks: GPT-3.5 or Claude Haiku for summarization, formatting, simple Q&A.
Reserve premium models for complex analysis: Use GPT-4 or Opus only when you need superior reasoning.
Minimize output tokens when possible: Request concise responses; output tokens cost more than input.
4.4 Prompt Optimization Strategies
Efficient prompts achieve better results with fewer tokens. This section covers practical techniques for writing economical prompts without sacrificing clarity or effectiveness.
Token-Efficient Writing
- Be direct: "Summarize this contract" not "I would like you to please provide a summary of the following contract document"
- Use abbreviations contextually: After first defining "Information Technology Act" you can use "IT Act"
- Eliminate filler phrases: Remove "I think," "perhaps," "maybe," "in my opinion"
- Use structured formats: Bullet points and numbered lists are often more token-efficient than prose
- Reference don't repeat: "Analyze the clause above" instead of copying it again
Before and After Examples
| Inefficient (More Tokens) | Efficient (Fewer Tokens) |
|---|---|
| "I would like you to please help me understand what the implications of this particular clause might be for my client who is a small business owner" | "Explain this clause's implications for a small business owner" |
| "Can you take a look at the following contract and let me know if there are any issues that I should be concerned about?" | "Identify potential issues in this contract" |
| "In your response, please make sure to include information about the relevant legal provisions and also provide some examples if possible" | "Include relevant provisions and examples" |
Handling Long Documents
- Summarize first: Ask the AI to summarize the document, then ask follow-up questions
- Extract relevant sections: Only include the clauses actually relevant to your query
- Process in chunks: Analyze sections separately, then synthesize findings
- Use hierarchical analysis: Start with high-level overview, drill down as needed
When reviewing a 50-page contract, don't paste the entire document and ask for "issues." Instead: (1) Extract the table of contents first, (2) Identify high-risk sections based on headings, (3) Analyze those specific sections in detail. This approach is faster, cheaper, and produces better results.
4.5 Practical Budgeting for AI Usage
Integrating AI into legal practice requires thoughtful budgeting. Here's how to estimate costs and make economically sound decisions about AI tool usage.
Cost-Benefit Framework
Before using AI for a task, consider:
- Time saved: How long would this take manually?
- AI cost: Estimated token cost for the task
- Quality needs: Does this need premium model accuracy?
- Verification effort: How much review will the output require?
Typical Monthly Costs by Usage Level
| Usage Level | Description | Estimated Cost |
|---|---|---|
| Light | Occasional research queries, simple drafting help | $20-50/month |
| Moderate | Daily use for research, document review, drafting | $100-300/month |
| Heavy | Extensive document analysis, complex legal research | $500-1,500/month |
| Enterprise | Firm-wide deployment, high-volume processing | $2,000+/month |
If AI saves a junior associate 2 hours per day (billed at Rs. 3,000/hour), that's Rs. 1.8 lakh/month in potential time savings. Even with heavy AI usage costs, the ROI is typically positive if the tool is used effectively.
Key Takeaways
- Tokens are the currency of LLMs -- both input and output count toward limits and costs
- Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words; 1,000 tokens ≈ 750 words
- Context windows range from 4K to 1M+ tokens; larger isn't always better for accuracy
- Output tokens typically cost 2-5x more than input tokens -- request concise responses
- Use economy models for simple tasks, reserve premium models for complex analysis
- Optimize prompts by being direct, eliminating filler, and processing long documents in chunks
