A new coding challenge AI has just published its first results - and they are cute

The Forge Bulletin

104 views 4 mins 0 Comments

A new coding challenge Ai revealed its first winner and has set a new bar for AI well software engineers.

On Wednesday at 17:00 PST, the Non Profit Laude Institute Annoudéd the first winner of the K award, a coding challenge to the multi-sell launched by Databrks and the co-founder of the perplexity Andy Konwinski. The winner was a Brazilian rapid engineer named Eduardo Rocha de Andrade, who will receive $ 50,000 for the prize. But more surprising of the victory was the final score: he won with correct answers to only 7.5% of the test questions.

“We are pleased to have built a point of reference actually difficult,” said Konwinski. “The reference parameters should be difficult if they go into the matter,” he continued, adding: “The scores would be different if the big workshops had an agreement with their greatest models. But that type of point. K the prize is offline with a limited calculation, therefore it favors the smaller and open models. Level the playing field.”

Konwinski has promised $ 1 million to the first open source model capable of obtaining a score of more than 90% in the test.

Similar to the well-known SWE-Bench system, the K TEST OF MODEL AWARD against Github marked issues as tests on how models can manage the real world’s planning problems. But while SWE-Bench is based on a fixed set of problems that models can train, the K award is designed as a “Sweet-contamination without contamination”, using a time entrance system to protect yourself from any workout of the reference space. For the first round, the models were scheduled by March 12th. The organizers of the K prize then created the test using only Github reported after that date.

The 7.5% highest score is clearly contrasting with Swe-Bench itself, which currently shows a 75% higher score in its most simple “verified” test and 34% in its hardest “complete” test. Konwinski is not yet sure where the disappearance is due to contamination on the sweater or only to the challenge of collecting new problems from Github, but expects the project of the K award to answer the question soon.

“As we get the most racing than the thing, we will have a better sense,” he told Techcrunch, “because we expect people to adapt to the compacting dynamics every few months”.

Techcrunch event

San Francisco
|
27-29 October 2025

It might seem like a strange place to fall shorts, given the wide range of coding tools to already publicly available – but with the reference parameters that become too easy, many critics see projects such as the K award as a necessary step towards the resolution of the resolution of the step towards The growing problem of evaluation of the AI.

“I am quite confident in building new tests for existing benches,” says Princeton Sayash Kapoor researcher, who presented a similar idea In a recent article. “Without this experiment, we cannot actually say if the result is contamination, or even just to target the ranking with a human being in the cycle.”

For Konwinski, it is not only a better reference point, but an open challenge for the rest of the sector. “If you have listened to the hype, it is as if we were to see artificial intelligence doctors and artificial intelligence lawyers and artificial intelligence software engineers, and this is not true,” he says. “If we can’t even get more than 10%, honey, this is the control of reality for me.”