> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/alblandino/tokenizador/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart Guide

> Get started with Tokenizador in minutes - analyze AI tokens in real-time for 48 models

## Welcome to Tokenizador

Tokenizador is a professional AI tokenization analyzer that helps you visualize how different AI models process and tokenize your text. Whether you're optimizing prompts, calculating costs, or comparing models, Tokenizador gives you instant insights into token usage.

<Note>
  **Live Demo Available**: Try Tokenizador instantly at [tokenizador.alblandino.com](https://tokenizador.alblandino.com/) - no installation or signup required.
</Note>

## Quick Start

<Steps>
  <Step title="Access the Application">
    Visit [tokenizador.alblandino.com](https://tokenizador.alblandino.com/) in your web browser. The application works on all modern browsers and devices.

    <Tip>
      Tokenizador is a fully client-side application - all tokenization happens in your browser, ensuring your text stays private.
    </Tip>
  </Step>

  <Step title="Select an AI Model">
    Choose from 48 supported AI models using the dropdown selector. Models are organized by provider:

    <CardGroup cols={2}>
      <Card title="OpenAI" icon="robot">
        GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-4, GPT-3.5 Turbo
      </Card>

      <Card title="Anthropic" icon="brain">
        Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku
      </Card>

      <Card title="Google" icon="magnifying-glass">
        Gemini 1.5 Pro, Gemini 1.5 Flash
      </Card>

      <Card title="Meta" icon="book">
        Llama 3.1 (405B/70B/8B), Llama 3 (70B/8B)
      </Card>
    </CardGroup>

    <Note>
      Plus models from Mistral AI, Cohere, Alibaba, DeepSeek, Microsoft, xAI, Amazon, NVIDIA, and many more!
    </Note>
  </Step>

  <Step title="Enter Your Text">
    Type or paste your text into the input area. Tokenization happens **instantly as you type** - no need to click any buttons.

    ```text Example Input theme={null}
    Texto a analizar...
    ```

    <Tip>
      Try different languages! The tokenizer works with any text including Spanish, English, Chinese, emojis, and code.
    </Tip>
  </Step>

  <Step title="View Results">
    Watch as Tokenizador displays comprehensive analytics in real-time:

    * **Token Count**: Total tokens the model will process
    * **Character Count**: Total characters including spaces
    * **Word Count**: Number of words detected
    * **Cost Estimate**: Precise cost calculation based on the model's pricing

    The results section (`results-section`) automatically updates as you type, showing:

    * Live statistics in an interactive grid
    * Colorful token visualization
    * Detailed token list with individual tokens
    * Model-specific information
  </Step>
</Steps>

## Understanding the Interface

### Statistics Dashboard

The stats grid displays four key metrics in real-time:

<CardGroup cols={2}>
  <Card title="Total Tokens" icon="hashtag">
    Exact token count using tiktoken library for maximum accuracy
  </Card>

  <Card title="Characters" icon="font">
    Total character count including spaces and special characters
  </Card>

  <Card title="Words" icon="layer-group">
    Intelligent word count with proper text segmentation
  </Card>

  <Card title="Cost Estimate" icon="dollar-sign">
    Real-time cost calculation in USD based on official pricing
  </Card>
</CardGroup>

### Token Visualization

Tokenizador provides two complementary views of your tokenized text:

<Tabs>
  <Tab title="Visual Display">
    **Colored Token Display** - Each token is shown with a unique color, making it easy to see how the model segments your text.

    The `tokens-container` displays tokens inline, helping you understand:

    * How words are split into sub-word tokens
    * How spaces and punctuation are handled
    * The efficiency of different text patterns
  </Tab>

  <Tab title="Token List">
    **Detailed Token Array** - A comprehensive list view in the `tokens-array` section showing each individual token extracted from your text.

    Perfect for:

    * Counting specific tokens
    * Analyzing token boundaries
    * Debugging prompt issues
  </Tab>
</Tabs>

## Model Information

For each selected model, Tokenizador displays:

* **Model Name**: Full name and provider
* **Context Limit**: Maximum tokens the model can process
* **Tokenization Type**: The encoding algorithm used (e.g., `cl100k_base`, `o200k_base`)
* **Active Algorithm**: Real-time display of the tokenization method
* **Cost Details**: Input and output costs per 1M tokens
* **External Link**: Direct link to [Artificial Analysis](https://artificialanalysis.ai) for detailed benchmarks

<Note>
  The tokenization service automatically selects the appropriate encoding based on your chosen model. OpenAI's newer models (GPT-4o, GPT-4o Mini) use `o200k_base`, while most others use `cl100k_base`.
</Note>

## Practical Examples

### Example 1: Simple Text Analysis

<CodeGroup>
  ```javascript Input theme={null}
  Hola mundo
  ```

  ```javascript GPT-4o Result theme={null}
  Token Count: 3
  Encoding: o200k_base
  Tokens: ["Hola", " mundo"]
  ```

  ```javascript Claude 3.5 Sonnet Result theme={null}
  Token Count: 3
  Encoding: cl100k_base (approximation)
  Token Ratio: 1.1x
  ```
</CodeGroup>

### Example 2: Cost Comparison

When analyzing the same text across different models:

| Model             | Tokens | Input Cost (per 1M) | Output Cost (per 1M) |
| ----------------- | ------ | ------------------- | -------------------- |
| GPT-4o            | \~100  | \$2.50              | \$10.00              |
| GPT-4o Mini       | \~100  | \$0.15              | \$0.60               |
| Claude 3.5 Sonnet | \~110  | \$3.00              | \$15.00              |
| Llama 3.1 8B      | \~95   | \$0.055             | \$0.055              |

<Tip>
  Notice how different models may produce slightly different token counts for the same text due to different tokenization algorithms.
</Tip>

### Example 3: Context Limit Awareness

```javascript Understanding Context Limits theme={null}
Model: GPT-4 → Context Limit: 8,192 tokens
Model: GPT-4 Turbo → Context Limit: 128,000 tokens
Model: Claude 3.5 Sonnet → Context Limit: 200,000 tokens
Model: Gemini 1.5 Pro → Context Limit: 2,097,152 tokens
```

<Warning>
  Tokenizador will alert you when your text approaches or exceeds a model's context limit, helping you avoid API errors.
</Warning>

## Advanced Features

### Clear Function

Use the **Clear button** to quickly reset the input and start fresh:

```javascript Clear Implementation theme={null}
// Clears text input and resets all displays
document.getElementById('clear-btn').addEventListener('click', () => {
    analyzer.handleClear();
});
```

### Real-Time Analysis

The application uses a sophisticated debouncing system to analyze text as you type without overwhelming your browser:

```javascript Real-Time Analysis theme={null}
async handleTextChange() {
    await this.performRealTimeAnalysis();
}

async performRealTimeAnalysis() {
    const text = this.uiController.getTextInput().trim();
    const selectedModel = this.uiController.getSelectedModel();
    
    if (!text) {
        this.resetDisplays();
        return;
    }
    
    const tokenResult = await this.tokenizationService.tokenizeText(text, selectedModel);
    const statistics = this.statisticsCalculator.calculateStatistics(
        text, tokenResult, selectedModel
    );
    
    this.updateDisplays(tokenResult, statistics);
}
```

### Model Comparison

Switch between models instantly to compare how different tokenizers handle your text:

<Steps>
  <Step title="Enter your text once">
    Type your prompt or content in the text area
  </Step>

  <Step title="Switch models from the dropdown">
    Select different models to see how they tokenize the same text
  </Step>

  <Step title="Compare the results">
    Notice differences in token count, cost, and tokenization patterns
  </Step>
</Steps>

## Tips for Effective Use

<AccordionGroup>
  <Accordion title="Optimizing Token Usage">
    * **Shorter is better**: More concise prompts use fewer tokens and cost less
    * **Watch for patterns**: Some phrases tokenize more efficiently than others
    * **Test variations**: Try different wordings to find the most token-efficient version
  </Accordion>

  <Accordion title="Comparing Model Efficiency">
    * Different models have different token ratios (visible in `MODELS_DATA`)
    * GPT-4o with `o200k_base` encoding is often more efficient than `cl100k_base` models
    * Llama models tend to use slightly fewer tokens (0.95x ratio)
  </Accordion>

  <Accordion title="Cost Optimization">
    * Mini/Small variants cost significantly less: GPT-4o Mini is \~17x cheaper than GPT-4o
    * Consider input vs output costs: Some models charge differently for generation
    * Open source models (Llama, Mistral) via APIs are often the most economical
  </Accordion>

  <Accordion title="Understanding Tokenization">
    * Spaces are often separate tokens
    * Common words may be single tokens, while rare words split into multiple
    * Special characters and emojis may use multiple tokens
    * Code typically uses more tokens than natural language
  </Accordion>
</AccordionGroup>

## Browser Compatibility

Tokenizador works on all modern browsers:

* ✅ Chrome/Edge 90+
* ✅ Firefox 88+
* ✅ Safari 14+
* ✅ Opera 76+

<Note>
  JavaScript must be enabled for the application to function. The app uses the tiktoken library loaded via CDN with automatic fallback mechanisms.
</Note>

## Architecture Overview

Tokenizador uses a modular architecture for maximum maintainability:

```javascript Modular Structure theme={null}
// Main application class
class TokenAnalyzer {
    constructor() {
        this.tokenizationService = new TokenizationService();
        this.uiController = new UIController();
        this.statisticsCalculator = new StatisticsCalculator();
        this.init();
    }
}

// Initialized on page load
document.addEventListener('DOMContentLoaded', () => {
    const analyzer = new TokenAnalyzer();
});
```

The application consists of:

* **Configuration** (`models-config.js`): All 48 models with pricing and specs
* **Services** (`tokenization-service.js`): Core tokenization logic using tiktoken
* **Controllers** (`ui-controller.js`): DOM manipulation and event handling
* **Utils** (`statistics-calculator.js`): Token counting and cost calculations
* **Main App** (`token-analyzer.js`): Orchestrates all components

## Next Steps

<CardGroup cols={2}>
  <Card title="Start Analyzing" icon="play" href="https://tokenizador.alblandino.com">
    Visit the live demo and start analyzing your AI prompts
  </Card>

  <Card title="Model Reference" icon="book" href="/guides/supported-models">
    Explore detailed information about all 48 supported models
  </Card>

  <Card title="Understanding Tokens" icon="lightbulb" href="/guides/understanding-tokenization">
    Learn more about how tokenization works across different models
  </Card>

  <Card title="Cost Calculator" icon="calculator" href="/guides/cost-estimation">
    Deep dive into cost estimation and optimization strategies
  </Card>
</CardGroup>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Tokens not appearing">
    Make sure you've entered text in the input area. The application requires at least one character to begin tokenization. Check your browser console for any errors with the tiktoken library.
  </Accordion>

  <Accordion title="Incorrect token counts">
    Different models use different tokenization algorithms. The counts shown are accurate for each specific model. Claude models use an approximation of `cl100k_base` which may differ slightly from their actual tokenizer.
  </Accordion>

  <Accordion title="Slow performance">
    For very large texts (>10,000 tokens), tokenization may take a moment. The application includes a loading indicator during processing. Consider analyzing smaller chunks for better performance.
  </Accordion>

  <Accordion title="Model information not displaying">
    Ensure the model data is loaded correctly. All model configurations are defined in `models-config.js` with details like context limits, costs, and URLs. Refresh the page if information doesn't appear.
  </Accordion>
</AccordionGroup>

***

<Card title="Questions or Feedback?" icon="message">
  Tokenizador is built by [Alex Blandino](https://github.com/alblandino). For questions, issues, or feature requests, visit the project repository or contact the developer.
</Card>
