Skip to main content

1. Overview

Local AI Chat is a browser-based chat tool for running supported ONNX language models with Transformers.js and WebGPU. The chat interface is designed for private drafting, summarizing, rewriting, document review, and general assistant workflows.

Local processing: prompts, uploaded files, generated text, and chat history are processed and stored in your browser. ASD123.ai does not receive your chat contents for remote inference.

2. Browser requirements

Recommended browser

Use a current Chromium-based browser such as Chrome or Edge with WebGPU enabled. Safari and Firefox support can vary by version and operating system.

Memory and GPU

Browser AI models can use several gigabytes of RAM and GPU memory. Close heavy tabs before loading larger models or larger context windows.

3. Models and loading

Choose a model from the Model dropdown, then click Load Model. The first load downloads model and runtime files through the browser. Later loads may be faster because the browser can reuse cached assets.

Gemma 4 E2B

Default balanced model for general chat. Start here when you want stronger answers and can spare browser memory.

Gemma 4 E4B

Larger Gemma option. It may produce better answers, but needs more time and memory to load and run.

Qwen3.5

Available in smaller sizes for lighter browser use. Use 0.8B first if you are testing compatibility.

4. Basic workflow

1

Choose model, context, and temperature

Select a model, keep context at 4K for the first test, and leave temperature at 0.7 unless you need more deterministic output.

2

Load the model

Click Load Model and wait until the status says the model is ready. The Stop button can cancel loading or generation.

3

Send prompts

Press Enter to send. Press Shift+Enter for a new line. Responses stream into the message list as the browser generates them.

5. Attachments

Use the attachment button to add images or documents. Supported file types include images, TXT, Markdown, PDF, and DOCX. Text extraction happens in the browser before the content is inserted into the chat context.

  • Images are sent to multimodal-capable model paths when supported by the selected model.
  • TXT and Markdown files are read as plain text.
  • PDF and DOCX files are parsed locally, then added as attachment context.
  • Very large documents can exceed the selected context window. The context meter warns before sending when possible.

6. Context and memory

Context controls how much conversation and attachment text the model can consider. Larger context windows can be useful for longer documents, but they increase browser memory usage.

Recommendation: start with 4K. Move to 8K, 16K, or 32K only when the context meter shows that your prompt or attachment needs more room.

The app can compact older messages into a local summary when the context gets too full. This keeps recent turns available while reducing the amount of text sent into the next generation.

7. History, editing, and export

Local history

Chats are stored in browser IndexedDB. New Chat starts a fresh thread. Delete All Chats clears saved chat records from the browser.

Edit and regenerate

Edit a sent user message to regenerate the following assistant response from that point in the conversation.

Markdown export

Export MD downloads the current chat as Markdown with metadata, settings, attachment summaries, and messages.

Titles

Chat titles are generated locally from the first user message. If no message exists, the title remains New Chat.

8. Privacy model

Local AI Chat is designed so chat content stays in the browser. Prompts, uploaded file contents, generated text, and chat history are not sent to ASD123.ai for processing.

Online model loading can request model and runtime files from Hugging Face or a CDN. Those providers may receive standard technical request data, such as IP address, browser information, and requested file URLs, as part of serving those assets.

Delete local history from the chat sidebar when you no longer want saved conversations in this browser profile.

9. Troubleshooting

Model does not load

Use Chrome or Edge, close heavy tabs, refresh the page, then try Qwen3.5 0.8B at 4K context first.

Context error

Reduce attachment text, use the compacted conversation, or select a larger context window before loading the model again.

Generation is slow

Browser inference is slower than native desktop inference. Smaller models, shorter prompts, and lower context windows improve responsiveness.

10. Limits and review

Local model output can be incorrect, incomplete, or inconsistent. Review generated text before using it in business, legal, compliance, medical, financial, or other high-risk workflows.

Browser memory behavior depends on browser version, GPU driver, operating system, loaded model, selected context, and open tabs. The memory labels in the interface are practical estimates, not guarantees.