Recently in AI | April 2024

Recently in AI | April 2024

In the fast-growing area of artificial intelligence, new and important research is making powerful systems do even more impressive things.

Jamba, AI21's groundbreaking SSM-Transformer model—a fusion of Mamba's Structured State Space model with Transformer architecture, heralding a new era in AI. With an expanded 256K context window and 3X throughput on long contexts compared to Mixtral 8x7B, Jamba sets a new standard for efficiency and accessibility. It's the only model in its class to fit up to 140K context on a single GPU, released with open weights under Apache 2.0. Available on Hugging Face and soon on the NVIDIA API catalog, Jamba not only outperforms its counterparts but also democratizes access to cutting-edge AI technology, paving the way for a future where innovation knows no bounds.

Source: https://www.ai21.com/blog/announcing-jamba

Hume introduced EVI, the Empathic Voice Interface, a conversational AI equipped with emotional intelligence. EVI utilizes vocal tones to understand user cues, predict preferences, and refine responses over time for enhanced satisfaction. Unlike traditional AI voice products, EVI aims for immersive conversations by emulating natural speech patterns and human-like conversational flow. Integrating large language models with expression measures, EVI dynamically adjusts its vocabulary and tone based on context and user emotional expressions. With features like end-of-turn detection, interruptibility, and responsiveness to expression, EVI sets a new standard for engaging voice-first experiences.

Demo: https://demo.hume.ai/

Mind-blowing developments in AI music generation:

  • Stable Audio has unveiled its latest innovation, Stable Audio 2.0, offering users the ability to create high-quality, full tracks with coherent musical structures up to three minutes long at 44.1 kHz stereo—all from a single natural language prompt. With features like full-length track generation, audio-to-audio transformation, variations and sound effects creation, and style transfer, Stable Audio 2.0 is poised to redefine the possibilities of AI music generation.
  • Suno v3, capable of producing radio-quality music. Now accessible to all users on their platform, v3 allows music enthusiasts to effortlessly create full, two-minute songs in a matter of seconds. With enhancements in audio quality, an expanded array of styles and genres, and improved prompt adherence, v3 represents a significant advancement in music production technology.
  • Udio offers a range of features to enhance user experience. From remixing and extending songs to downloading audio/video creations, Udio provides flexibility and creative control. With options for custom, instrumental, and auto-generated tracks, users can experiment with different styles and sounds. Advanced features like detailed remixing options underscore Udio's commitment to offering a comprehensive music creation experience.

Google has unveiled CodeGemma, a suite of advanced large language models tailored for code generation and understanding. With models like CodeGemma 2B for fast code completion and CodeGemma 7B for broader tasks, including instruction following, Google aims to enhance logical reasoning in coding. Notably, CodeGemma-7B's superior performance on benchmarks like HumanEval and MultiPL-E signifies a significant step forward in AI-driven code comprehension. This release promises increased productivity and efficiency for developers, marking a milestone in the evolution of coding assistance tools. HuggingFace Page: https://huggingface.co/blog/codegemma.

Single-line and multi-line code completion capability of CodeGemma compared to other FIM-aware code models. Source: https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf

Mistral AI released Mixtral 8x22B, a Sparse Mixture-of-Experts (SMoE) model that is meticulously crafted, utilizing a mere 39B active parameters out of 141B, ensuring exceptional cost efficiency without compromising on capability. Its multilingual fluency, robust mathematics and coding prowess, and expansive context window for precise information recall make it a formidable tool across various domains. What's more, its release under the Apache 2.0 license underscores a commitment to openness and collaboration, empowering developers and researchers worldwide. With its unmatched performance-to-cost ratio and optimization for reasoning, multilingual tasks, mathematics, and coding, Mixtral 8x22B sets a new precedent for open models, poised to drive innovation and advancement in AI technology.

Soruce: https://mistral.ai/news/mixtral-8x22b/

x.ai introduced Grok-1.5V, their first-generation multimodal model signaling a transformative leap in capabilities. Beyond its formidable text processing prowess, Grok now possesses the remarkable ability to analyze an extensive array of visual data, including documents, diagrams, charts, screenshots, and photographs. This notable advancement propels Grok-1.5V into the forefront of frontier multimodal models, showcasing unparalleled proficiency in real-world spatial understanding, as evidenced by its great performance in the rigorous RealWorldQA benchmark.

Source: https://x.ai/blog/grok-1.5v

Stanford HAI's 2024 Artificial Intelligence Index Report is out, highlighting pivotal advancements in AI's societal influence. This edition offers an in-depth exploration of critical trends, from technical innovations to public perceptions and geopolitical dynamics. Packed with original data, the report unveils fresh insights into AI training costs, responsible practices, and its impact on science and medicine. Key findings include AI's exceeding of human performance in certain tasks, industry dominance in frontier research, and the significant costs of training state-of-the-art models. It emphasizes the US's leadership in AI innovation and the need for standardized evaluations. Additionally, the report addresses growing public concerns about AI's impact while highlighting its transformative effects on labor and scientific progress. Serving as a vital resource for policymakers and researchers, the AI Index Report offers a comprehensive view of AI's complex landscape and its profound implications for society.

Source: https://aiindex.stanford.edu/wp-content/uploads/2024/04/HAI_AI-Index-Report-2024.pdf