Open-source 6B-parameter JAX Transformer rivaling GPT-3 Curie.
Visit GPT-J-6B: 6B JAX-Based TransformerGPT-J-6B is an open-source, 6-billion-parameter language model based on the JAX (Mesh) Transformer architecture. Designed for researchers, developers, and enthusiasts, it achieves performance comparable to GPT-3 Curie (6.7B) on various downstream tasks. The model exemplifies scalable model parallelism using xmap on JAX and can be accessed via a Colab notebook or web demo.
Visit GPT-J-6B: 6B JAX-Based Transformer's official website for product details and getting started.