Back to home
Technology

Show HN: Chonky – a neural text semantic chunking goes multilingual

Source

Hacker News

Published

TL;DR

AI Generated

Chonky, a neural text semantic chunking tool, has been expanded to support multiple languages. This update allows users to analyze and understand text in various languages using the same technology. The tool, developed by bookcorpus/bookcorpus, aims to improve text processing and comprehension across different linguistic contexts. The update was released on May 3, 2024, with enhancements to support 5.44k languages and 334 unique features.

Read Full Article

Similar Articles

OpenAI releases GPT-5.2 after “code red” Google threat alert

OpenAI releases GPT-5.2 after “code red” Google threat alert

OpenAI has launched GPT-5.2, a new series of AI models for ChatGPT, in response to competitive pressure from Google's Gemini 3 AI model. The release includes three versions: Instant, Thinking, and Pro, each tailored for different tasks. GPT-5.2 boasts a 400,000-token context window and a knowledge cutoff date of August 31, 2025, enhancing its processing capabilities. The new model is available to paid ChatGPT subscribers and developers via API, with pricing at $1.75 per million input tokens. OpenAI's strategic shift follows Google's recent advancements in AI technology, prompting a renewed focus on improving ChatGPT's performance to maintain its market position.

Ars Technica
Syntax hacking: Researchers discover sentence structure can bypass AI safety rules

Syntax hacking: Researchers discover sentence structure can bypass AI safety rules

Researchers from MIT, Northeastern University, and Meta found that large language models like ChatGPT may prioritize sentence structure over meaning when answering questions, potentially leading to AI safety issues. The team, led by Chantal Shaib and Vinith M. Suriyakumar, tested this by prompting models with nonsensical but grammatically correct questions, showing that models can rely on structural shortcuts over semantic understanding. This reliance on syntactic patterns can override actual meaning in certain cases. The researchers plan to present their findings at NeurIPS, highlighting the importance of understanding how AI models process instructions.

Ars Technica
Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules

Sam Altman, the CEO of OpenAI, recently celebrated a minor victory with ChatGPT finally adhering to custom instructions to avoid using em dashes. This development, following the release of OpenAI's GPT-5.1 AI model, sparked mixed reactions from users who have struggled with formatting preferences. The delay in achieving this simple punctuation rule raises questions about the progress towards artificial general intelligence (AGI) and the level of control over AI systems. Despite Altman's discussions about AGI and superintelligence, the challenges with punctuation control highlight the ongoing limitations in current AI capabilities.

Ars Technica
OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

OpenAI has launched GPT-5.1 Instant and GPT-5.1 Thinking, updated versions of its AI models in ChatGPT, featuring new personalities and improved performance on technical benchmarks. The company aims to address previous criticisms by offering preset communication styles like Professional, Friendly, and Cynical. GPT-5.1 Instant focuses on faster responses, while GPT-5.1 Thinking tackles complex problem-solving tasks with adaptive reasoning. OpenAI plans a gradual rollout to subscribers before expanding to free users and integrating the models into its API.

Ars Technica

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.