AdvertisementRemove ads, dark theme, and more with

What's new in MLCChat 1.0

Android

The demo APK is available to download. The demo is tested on Samsung S23 with Snapdragon 8 Gen 2 chip, Redmi Note 12 Pro with Snapdragon 685 and Google Pixel phones. Besides the Getting Started page, documentation is available for building android apps with MLC LLM.

About MLCChat 1.0

llm.mlc.ai/

AdvertisementRemove ads, dark theme, and more with

Premium

github.com/mlc-ai/mlc-llm

MLC LLM

Documentation | Blog | Discord

Machine Learning Compilation for Large Language Models (MLC LLM) is a high-performance universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of this project is to enable everyone to develop, optimize and deploy AI models natively on everyone's devices with ML compilation techniques.

Universal deployment. MLC LLM supports the following platforms and hardware:

	AMD GPU	NVIDIA GPU	Apple GPU	Intel GPU
Linux / Win	✅ Vulkan, ROCm	✅ Vulkan, CUDA	N/A	✅ Vulkan
macOS	✅ Metal (dGPU)	N/A	✅ Metal	✅ Metal (iGPU)
Web Browser	✅ WebGPU and WASM
iOS / iPadOS	✅ Metal on Apple A-series GPU
Android	✅ OpenCL on Adreno GPU		✅ OpenCL on Mali GPU

Scalable. MLC LLM scales universally on NVIDIA and AMD GPUs, cloud and gaming GPUs. Below showcases our single batch decoding performance with prefilling = 1 and decoding = 256.

Performance of 4-bit CodeLlama-34B and Llama2-70B on two NVIDIA RTX 4090 and two AMD Radeon 7900 XTX:

Scaling of fp16 and 4-bit CodeLlama-34 and Llama2-70B on A100-80G-PCIe and A10G-24G-PCIe, up to 8 GPUs:

News

[10/18/2023] [Post] Scalable multi-GPU support for CUDA and ROCm are official.
[09/02/2023] Prebuilt ROCm 5.7 and CUDA 12.2 package is available.
[08/25/2023] CodeLlama support is up.
[08/14/2023] [Post] Mali GPU support is up on Orange Pi.
[08/09/2023] [Post] ROCm backend is mature to use.
[08/02/2023] Dockerfile is released for CUDA performance benchmarking.
[07/19/2023] Support for Llama2-7B/13B/70B is up.
[05/22/2023] [Post] RedPajama support is up.
[05/08/2023] [Post] MLC LLM is now available on Android.
[05/01/2023] [Post] MLC LLM is released with Metal, Vulkan and CUDA backends.
[04/14/2023] WebLLM is released prior to MLC LLM with WebGPU and WebAssembly backend.

Getting Started

Please visit our documentation for detailed instructions.

Model Support

MLC LLM supports a wide range of model architectures and variants. We have the following prebuilts which you can use off-the-shelf. Visit Prebuilt Models to see the full list, and Compile Models via MLC to see how to use models not on this list.

Architecture	Prebuilt Model Variants
Llama	Llama-2, Code Llama, Vicuna, WizardLM, WizardMath, OpenOrca Platypus2, FlagAlpha Llama-2 Chinese, georgesung Llama-2 Uncensored
GPT-NeoX	RedPajama
GPT-J
RWKV	RWKV-raven
MiniGPT
GPTBigCode	WizardCoder
ChatGLM
StableLM
Mistral
Phi

Universal Deployment APIs

MLC LLM provides multiple sets of APIs across platforms and environments. These include

Citation

Please consider citing our project if you find it useful:

@software{mlc-llm,
    author = {MLC team},
    title = {{MLC-LLM}},
    url = {https://github.com/mlc-ai/mlc-llm},
    year = {2023}
}

The underlying techniques of MLC LLM include:

References (Click to expand)

What's new in MLCChat 1.0

Android

About MLCChat 1.0

MLC LLM

News

Getting Started

Model Support

Universal Deployment APIs

Citation

Links

Follow APK Mirror

Popular In Last 30 Days

Popular In Last 7 Days

Popular In Last 24 Hours

Latest Uploads