> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/alblandino/tokenizador/llms.txt > Use this file to discover all available pages before exploring further. # Quickstart Guide > Get started with Tokenizador in minutes - analyze AI tokens in real-time for 48 models ## Welcome to Tokenizador Tokenizador is a professional AI tokenization analyzer that helps you visualize how different AI models process and tokenize your text. Whether you're optimizing prompts, calculating costs, or comparing models, Tokenizador gives you instant insights into token usage. **Live Demo Available**: Try Tokenizador instantly at [tokenizador.alblandino.com](https://tokenizador.alblandino.com/) - no installation or signup required. ## Quick Start Visit [tokenizador.alblandino.com](https://tokenizador.alblandino.com/) in your web browser. The application works on all modern browsers and devices. Tokenizador is a fully client-side application - all tokenization happens in your browser, ensuring your text stays private. Choose from 48 supported AI models using the dropdown selector. Models are organized by provider: GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-4, GPT-3.5 Turbo Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku Gemini 1.5 Pro, Gemini 1.5 Flash Llama 3.1 (405B/70B/8B), Llama 3 (70B/8B) Plus models from Mistral AI, Cohere, Alibaba, DeepSeek, Microsoft, xAI, Amazon, NVIDIA, and many more! Type or paste your text into the input area. Tokenization happens **instantly as you type** - no need to click any buttons. ```text Example Input theme={null} Texto a analizar... ``` Try different languages! The tokenizer works with any text including Spanish, English, Chinese, emojis, and code. Watch as Tokenizador displays comprehensive analytics in real-time: * **Token Count**: Total tokens the model will process * **Character Count**: Total characters including spaces * **Word Count**: Number of words detected * **Cost Estimate**: Precise cost calculation based on the model's pricing The results section (`results-section`) automatically updates as you type, showing: * Live statistics in an interactive grid * Colorful token visualization * Detailed token list with individual tokens * Model-specific information ## Understanding the Interface ### Statistics Dashboard The stats grid displays four key metrics in real-time: Exact token count using tiktoken library for maximum accuracy Total character count including spaces and special characters Intelligent word count with proper text segmentation Real-time cost calculation in USD based on official pricing ### Token Visualization Tokenizador provides two complementary views of your tokenized text: **Colored Token Display** - Each token is shown with a unique color, making it easy to see how the model segments your text. The `tokens-container` displays tokens inline, helping you understand: * How words are split into sub-word tokens * How spaces and punctuation are handled * The efficiency of different text patterns **Detailed Token Array** - A comprehensive list view in the `tokens-array` section showing each individual token extracted from your text. Perfect for: * Counting specific tokens * Analyzing token boundaries * Debugging prompt issues ## Model Information For each selected model, Tokenizador displays: * **Model Name**: Full name and provider * **Context Limit**: Maximum tokens the model can process * **Tokenization Type**: The encoding algorithm used (e.g., `cl100k_base`, `o200k_base`) * **Active Algorithm**: Real-time display of the tokenization method * **Cost Details**: Input and output costs per 1M tokens * **External Link**: Direct link to [Artificial Analysis](https://artificialanalysis.ai) for detailed benchmarks The tokenization service automatically selects the appropriate encoding based on your chosen model. OpenAI's newer models (GPT-4o, GPT-4o Mini) use `o200k_base`, while most others use `cl100k_base`. ## Practical Examples ### Example 1: Simple Text Analysis ```javascript Input theme={null} Hola mundo ``` ```javascript GPT-4o Result theme={null} Token Count: 3 Encoding: o200k_base Tokens: ["Hola", " mundo"] ``` ```javascript Claude 3.5 Sonnet Result theme={null} Token Count: 3 Encoding: cl100k_base (approximation) Token Ratio: 1.1x ``` ### Example 2: Cost Comparison When analyzing the same text across different models: | Model | Tokens | Input Cost (per 1M) | Output Cost (per 1M) | | ----------------- | ------ | ------------------- | -------------------- | | GPT-4o | \~100 | \$2.50 | \$10.00 | | GPT-4o Mini | \~100 | \$0.15 | \$0.60 | | Claude 3.5 Sonnet | \~110 | \$3.00 | \$15.00 | | Llama 3.1 8B | \~95 | \$0.055 | \$0.055 | Notice how different models may produce slightly different token counts for the same text due to different tokenization algorithms. ### Example 3: Context Limit Awareness ```javascript Understanding Context Limits theme={null} Model: GPT-4 → Context Limit: 8,192 tokens Model: GPT-4 Turbo → Context Limit: 128,000 tokens Model: Claude 3.5 Sonnet → Context Limit: 200,000 tokens Model: Gemini 1.5 Pro → Context Limit: 2,097,152 tokens ``` Tokenizador will alert you when your text approaches or exceeds a model's context limit, helping you avoid API errors. ## Advanced Features ### Clear Function Use the **Clear button** to quickly reset the input and start fresh: ```javascript Clear Implementation theme={null} // Clears text input and resets all displays document.getElementById('clear-btn').addEventListener('click', () => { analyzer.handleClear(); }); ``` ### Real-Time Analysis The application uses a sophisticated debouncing system to analyze text as you type without overwhelming your browser: ```javascript Real-Time Analysis theme={null} async handleTextChange() { await this.performRealTimeAnalysis(); } async performRealTimeAnalysis() { const text = this.uiController.getTextInput().trim(); const selectedModel = this.uiController.getSelectedModel(); if (!text) { this.resetDisplays(); return; } const tokenResult = await this.tokenizationService.tokenizeText(text, selectedModel); const statistics = this.statisticsCalculator.calculateStatistics( text, tokenResult, selectedModel ); this.updateDisplays(tokenResult, statistics); } ``` ### Model Comparison Switch between models instantly to compare how different tokenizers handle your text: Type your prompt or content in the text area Select different models to see how they tokenize the same text Notice differences in token count, cost, and tokenization patterns ## Tips for Effective Use * **Shorter is better**: More concise prompts use fewer tokens and cost less * **Watch for patterns**: Some phrases tokenize more efficiently than others * **Test variations**: Try different wordings to find the most token-efficient version * Different models have different token ratios (visible in `MODELS_DATA`) * GPT-4o with `o200k_base` encoding is often more efficient than `cl100k_base` models * Llama models tend to use slightly fewer tokens (0.95x ratio) * Mini/Small variants cost significantly less: GPT-4o Mini is \~17x cheaper than GPT-4o * Consider input vs output costs: Some models charge differently for generation * Open source models (Llama, Mistral) via APIs are often the most economical * Spaces are often separate tokens * Common words may be single tokens, while rare words split into multiple * Special characters and emojis may use multiple tokens * Code typically uses more tokens than natural language ## Browser Compatibility Tokenizador works on all modern browsers: * ✅ Chrome/Edge 90+ * ✅ Firefox 88+ * ✅ Safari 14+ * ✅ Opera 76+ JavaScript must be enabled for the application to function. The app uses the tiktoken library loaded via CDN with automatic fallback mechanisms. ## Architecture Overview Tokenizador uses a modular architecture for maximum maintainability: ```javascript Modular Structure theme={null} // Main application class class TokenAnalyzer { constructor() { this.tokenizationService = new TokenizationService(); this.uiController = new UIController(); this.statisticsCalculator = new StatisticsCalculator(); this.init(); } } // Initialized on page load document.addEventListener('DOMContentLoaded', () => { const analyzer = new TokenAnalyzer(); }); ``` The application consists of: * **Configuration** (`models-config.js`): All 48 models with pricing and specs * **Services** (`tokenization-service.js`): Core tokenization logic using tiktoken * **Controllers** (`ui-controller.js`): DOM manipulation and event handling * **Utils** (`statistics-calculator.js`): Token counting and cost calculations * **Main App** (`token-analyzer.js`): Orchestrates all components ## Next Steps Visit the live demo and start analyzing your AI prompts Explore detailed information about all 48 supported models Learn more about how tokenization works across different models Deep dive into cost estimation and optimization strategies ## Troubleshooting Make sure you've entered text in the input area. The application requires at least one character to begin tokenization. Check your browser console for any errors with the tiktoken library. Different models use different tokenization algorithms. The counts shown are accurate for each specific model. Claude models use an approximation of `cl100k_base` which may differ slightly from their actual tokenizer. For very large texts (>10,000 tokens), tokenization may take a moment. The application includes a loading indicator during processing. Consider analyzing smaller chunks for better performance. Ensure the model data is loaded correctly. All model configurations are defined in `models-config.js` with details like context limits, costs, and URLs. Refresh the page if information doesn't appear. *** Tokenizador is built by [Alex Blandino](https://github.com/alblandino). For questions, issues, or feature requests, visit the project repository or contact the developer.