LLMs we’d recommend to anyone starting an AI Project
Here’s a look at some of the most significant large language models currently shaping natural language processing and influencing future AI architectures.Here’s a look at some of the most significant large language models currently shaping natural language processing and influencing future AI architectures.
BERT
Introduced by Google in 2018, BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based model designed to convert sequences of data into other sequences. It features a stack of transformer encoders and boasts 342 million parameters. After being pre-trained on a vast corpus, BERT was fine-tuned for specific tasks such as natural language inference and sentence similarity, significantly enhancing query understanding in Google Search.
Claude
Developed by Anthropic, Claude focuses on constitutional AI, guiding its outputs with principles that ensure helpfulness, safety, and accuracy. The latest version, Claude 3.5 Sonnet, demonstrates improved understanding of nuance and humor, operating at double the speed of its predecessor, Claude 3 Opus. It is freely accessible through Claude.ai and the Claude iOS app.
Cohere
Cohere is an enterprise AI platform offering various LLMs, including Command, Rerank, and Embed, which can be customized for specific business needs. Founded by one of the authors of the influential paper “Attention Is All You Need,” Cohere is notable for its independence from a single cloud provider, unlike OpenAI, which is tied to Microsoft Azure.
Ernie
Baidu’s Ernie powers the Ernie 4.0 chatbot, released in August 2023 and quickly attracting over 45 million users. Rumored to have 10 trillion parameters, Ernie excels in Mandarin and has capabilities in other languages.
Falcon 40B
Developed by the Technology Innovation Institute, Falcon 40B is a transformer-based, causal decoder-only model available in open source. It has been trained on English data and offers smaller variants: Falcon 1B and Falcon 7B. You can find Falcon 40B on Amazon SageMaker and GitHub for free.
Gemini
Google’s Gemini family of LLMs powers its rebranded chatbot, transitioning from Bard to Gemini. These multimodal models can process text, images, audio, and video. Available in three sizes—Ultra, Pro, and Nano—Gemini outperforms GPT-4 in various benchmarks and integrates seamlessly with Google products.
Gemma
Gemma, another Google initiative, consists of open-source models trained with the same resources as Gemini. It features two sizes: a 2 billion and a 7 billion parameter model, both capable of being run locally and outperforming similarly sized Llama 2 models in benchmarks.
GPT-3
Released in 2020 by OpenAI, GPT-3 is a transformer model with over 175 billion parameters. It utilizes a decoder-only architecture and was initially trained on extensive datasets like Common Crawl and Wikipedia. Microsoft secured exclusive use of its underlying model in September 2022.
GPT-3.5
An enhanced version of GPT-3, GPT-3.5 was fine-tuned using reinforcement learning from human feedback. This version powers ChatGPT, with GPT-3.5 turbo being its most capable variant. Its training data goes up to September 2021 and it was briefly integrated into Bing before being succeeded by GPT-4.
GPT-4
The largest model in OpenAI’s GPT series, GPT-4 was released in 2023 and features an undisclosed number of parameters, speculated to be over 170 trillion. This multimodal model can process both text and images, allowing for user-defined tone and task specifications. GPT-4 showcases human-level performance in various academic evaluations and powers Microsoft Bing and ChatGPT Plus.
GPT-4o
GPT-4 Omni (GPT-4o) is the successor to GPT-4, featuring improvements for more natural interactions and multimodal capabilities. It processes inputs like audio and images, allowing real-time engagement and emotional responsiveness. With a response time of 232 milliseconds, GPT-4o is faster than its predecessors and will be available for developers and customers.
LaMDA
Announced by Google Brain in 2021, LaMDA (Language Model for Dialogue Applications) uses a decoder-only transformer architecture. It gained attention when a Google engineer claimed it demonstrated sentience. LaMDA was built on Seq2Seq architecture and focuses on dialogue applications.
Llama
Meta’s Llama was released in 2023 and features a maximum of 65 billion parameters. Initially restricted to approved researchers, it is now open source and comes in smaller variants to facilitate broader use and experimentation. Llama was trained on diverse public data sources and has inspired several offshoot models.
Mistral
With 7 billion parameters, Mistral outperforms similarly sized Llama models across benchmarks. It includes a fine-tuned variant designed for better instruction adherence and is suitable for self-hosting and business applications, released under the Apache 2.0 license.
Orca
Developed by Microsoft, Orca features 13 billion parameters and is compact enough to run on laptops. It aims to replicate the reasoning capabilities of advanced LLMs, achieving comparable performance to GPT-4 while maintaining efficiency on par with GPT-3.5.
PaLM
The Pathways Language Model (PaLM) is a 540 billion parameter model from Google that powers its AI chatbot Bard. It specializes in reasoning tasks like coding and question answering, trained across multiple TPU Pods. PaLM is part of a broader initiative to create versatile models applicable in various domains, with specialized versions like Med-PaLM for healthcare.
Phi-1
Phi-1 is a 1.3 billion parameter transformer model from Microsoft, emphasizing quality training on textbook-level data. It exemplifies the trend toward smaller, high-quality models and specializes in Python coding, though it has fewer general capabilities due to its size.
StableLM
Developed by Stability AI, the StableLM series includes open-source models ranging from 3 billion to 175 billion parameters, with several in development. StableLM prioritizes transparency and accessibility.
Vicuna 33B
Derived from Llama, Vicuna was developed by LMSYS and fine-tuned using data from sharegpt.com. Although it has only 33 billion parameters and is less capable than GPT-4, it performs well for its size.