Oleg Ekhlakov

Intaro Senior Developer Oleg Ekhlakov on Why Developer Experience Tools Must Survive Their Own Medicine

The engineer who builds web platforms for one of Russia’s leading digital agencies spent 72 hours testing nine DX diagnostic tools against real codebases — and found that the gap between a polished demo and a trustworthy scanner is where most developer tools quietly die.

Every engineering team has a version of the same conversation. Someone proposes a new tool — a linter, a CI scanner, a documentation checker — and the first question from the senior engineer in the room is not “what does it do?” but “have you actually run it on our codebase?” The distinction matters more than it seems. A tool that produces impressive output on a curated demo repository and a tool that produces reliable output on a production codebase with eight years of accumulated decisions are fundamentally different products.

Oleg Ekhlakov has been that senior engineer in enough rooms to know the pattern well. As a Senior Developer at Intaro, a major Russian digital agency specializing in e-commerce and enterprise web platforms, Ekhlakov builds and maintains systems where developer experience is not an abstraction — it is the difference between a team that ships weekly and one that spends half its sprint fighting its own toolchain.

DX-Ray Hackathon 2026, organized by Hackathon Raptors, challenged teams to build diagnostic tools that expose hidden friction in developer workflows. Thirty-eight teams spent 72 hours building scanners, analyzers, and dashboards targeting everything from flaky CI pipelines to stale documentation to PR review bottlenecks. Ekhlakov evaluated nine of those submissions — and did something that not every judge does: he actually ran them.

“I cloned every project that claimed to work,” Ekhlakov explains. “I pointed the scanners at real repositories I maintain professionally. If a tool claims to diagnose developer experience problems, the minimum bar is that it actually works when a developer tries to use it.”

The Trust Gap in DX Tooling

The strongest submission in Ekhlakov’s batch was Ram — a repository health scanner built by a solo developer that earned the highest scores across nearly every criterion. What separated it from the competition was not feature count but a single design decision: the tool works without requiring authentication. A developer can paste a public GitHub URL and get a meaningful diagnostic report in seconds, with no OAuth flow, no token generation, no permissions dialog.

“This is a masterclass in reducing friction,” Ekhlakov notes. “Most of the other diagnostic tools in this batch required GitHub tokens, environment variables, manual configuration. Ram understood that the first thing a DX tool must diagnose is its own onboarding experience. If a developer experience scanner creates a bad developer experience during setup, you have already lost.”

Ram’s analysis covers bus factor calculations, knowledge silo detection, and contribution pattern mapping — signals that reveal organizational health rather than just code quality. Ekhlakov found the output genuinely useful on repositories he works with professionally. “The bus factor metric is particularly interesting because it exposes a problem that no linter can catch. You can have perfect code coverage, zero security vulnerabilities, and still be one resignation away from losing all institutional knowledge about a critical subsystem.”

The project earned 4.55 out of 5.00 in Ekhlakov’s evaluation — not because it was the most technically ambitious, but because every feature it shipped actually worked as advertised.

When Heuristics Betray Their Creators

At the opposite end of the trust spectrum, Ekhlakov encountered a pattern he sees regularly in production tool evaluation: analysis engines that look impressive but do not survive contact with real data.

TeamHM built a PR review scanner that classifies changes by cognitive complexity and domain, producing detailed reports with severity scores and categorization. The concept is strong — understanding whether a 200-line diff contains routine refactoring or a subtle authentication change is genuinely valuable for review prioritization.

“I tested the scanner on a real project and saw a twelve-line change classified as ‘tests’ with an extremely high cognitive score,” Ekhlakov observes. “The implementation analyzes the full file contents rather than the changed diff, and some domain keywords are matched as simple substrings. The output looks impressive, but it is not reliable enough for real workflow adoption.”

This is a pattern that experienced engineers recognize immediately: the gap between a tool that produces output and a tool that produces trustworthy output. In Ekhlakov’s professional work at Intaro, where platform decisions affect multiple client projects simultaneously, adopting a tool with unreliable heuristics is worse than having no tool at all. False confidence in automated analysis leads teams to skip manual review steps they would otherwise perform.

“The product framing and UX are strong,” he acknowledges. “The next step is making the scoring engine trustworthy enough to match that presentation quality. That is always the harder part.”

The Ecosystem Trap

Dev Labs submitted a multi-category diagnostic platform combining CI pipeline analysis, documentation health checking, and build performance monitoring. The scope was ambitious — addressing three of the hackathon’s eight tracks in a single tool — and the product thinking was solid. Integrating multiple DX signals into a unified dashboard is the direction the industry is moving, with platforms like LinearB and Jellyfish attempting similar consolidation at enterprise scale.

But Ekhlakov found a fundamental limitation that undermined the tool’s reliability across different technology stacks. “Several analyzers are strongly biased toward the JavaScript and TypeScript ecosystem, but the product is presented as a general project scanner. In practice, this produces misleading recommendations on non-JS repositories.”

This is what Ekhlakov calls the ecosystem trap — a tool built by developers who work primarily in one stack unconsciously encoding that stack’s assumptions into their diagnostic logic. npm-centric dependency checks fail silently on Python projects. Node-specific CI patterns produce false positives on Go repositories. The tool does not crash; it produces confident-looking output that happens to be wrong.

“I would encourage them to either clearly declare the supported stacks and current limitations, or deepen multi-language support so the diagnosis is trustworthy across different tech ecosystems,” Ekhlakov advises. “Honesty about scope is a feature, not a weakness.”

Visualization as Understanding

Github Office took an entirely different approach to developer experience diagnostics. Instead of producing reports and scores, the project transforms a repository into a navigable pixel-art office — files become rooms, directories become floors, and code health metrics manifest as visual properties of the space. A module with high cyclomatic complexity might appear cluttered; a well-tested component might look clean and organized.

“This is the most creative project in the batch,” Ekhlakov says. “Turning a repository into a zoomable pixel-art office is an incredible way to visualize the neighborhoods of a codebase.”

The approach has genuine pedagogical value. New team members joining a large codebase typically spend days or weeks building a mental map of where things live and how they relate. A spatial metaphor accelerates that mapping process by leveraging the human brain’s natural facility for navigating physical spaces — a cognitive shortcut that flat file trees and dependency graphs do not provide.

Ekhlakov flags the limitation honestly: “While it might not be a daily debugging tool, it is an unforgettable way to help a new hire visualize where things live.” The distinction is important. Not every developer tool needs to be a daily-use utility. Some tools serve their purpose by reframing a problem in a way that changes how a team thinks about their codebase, even if the tool itself is used only occasionally.

The project’s diagnostic layer, however, showed the same pattern Ekhlakov observed elsewhere in the batch — heuristic analysis that produces interesting but not always defensible conclusions. “Some of the current insights appear to rely on rough heuristics, so the next step would be to make the recommendations more precise and more actionable for real engineering teams.”

The Axiom of Execution-Based Testing

Axiom — submitted under the name Phantom DX — distinguished itself through a methodology that Ekhlakov considers the gold standard for onboarding diagnostics: instead of analyzing configuration files and README instructions statically, the tool actually clones the repository and attempts to build it.

“Most tools just guess if your project works,” Ekhlakov explains. “Phantom DX actually clones it and tries to build it. That hard truth approach is exactly what a DX lead needs.”

The difference between static analysis and execution-based testing mirrors a broader tension in software quality. Static analysis can catch syntax errors, dependency declarations, and configuration mismatches. Execution-based testing catches the failures that only manifest when code actually runs — missing environment variables, incompatible transitive dependencies, build scripts that assume specific operating system features.

Phantom DX also includes a score simulation feature — a “what if” tool that shows how specific improvements would affect the project’s overall DX score. Ekhlakov sees commercial potential in this feature specifically: “The Score Simulation feature is a great sales tool for a developer to show their manager exactly how much better the project could be with a few fixes.”

The project earned strong marks despite one notable gap: the shipped version had frontend dependency issues, lint errors, and no automated backend tests. Ekhlakov’s evaluation captures the irony directly — a developer experience diagnostic tool with developer experience problems in its own codebase. “The product vision is strong, but the implementation would benefit from one more pass focused on installability, validation, and polish so that the real experience matches the excellent demo.”

RepoXray and the Integration Challenge

RepoXray combined CI/CD analytics, bottleneck detection, flaky test investigation, and documentation health into a single platform. The project demonstrated strong product thinking — consolidating signals that developers typically check across multiple tools into a unified view.

“The strongest part is the product thinking and UX,” Ekhlakov observes. “The interface is polished, the main views are easy to understand, and the insights are framed in a way that feels actionable.”

What held the project back was the gap between ambition and operational polish. Local setup required manual fixes, several flows depended on external tokens without clear documentation, and some engineering quality issues remained visible in the codebase. This is a common pattern in hackathon projects that attempt broad scope — the conceptual architecture is sound, but the 72-hour constraint means that integration points between subsystems receive less attention than the subsystems themselves.

For Ekhlakov, who integrates third-party tools into client platforms at Intaro, integration reliability is not a nice-to-have — it is the primary criterion. “With stronger onboarding, clearer integration guidance, and more implementation hardening, this could become a genuinely useful developer tool beyond the hackathon.”

The Diagnostic Paradox

Across nine evaluations, Ekhlakov identified a paradox that runs through the entire DX tooling space: the tools built to improve developer experience often suffer from the same problems they diagnose. Scanners with poor documentation. Analyzers that fail on first install. Diagnostic dashboards that require twenty minutes of configuration before producing their first result.

“I started every evaluation by trying to use the tool the way a developer would,” he explains. “Clone, install, run. If I hit friction in those three steps, the tool has already failed its own thesis. You cannot credibly diagnose onboarding problems if your own onboarding is broken.”

The projects that scored highest in his batch — Ram, Phantom DX, RepoXray — shared a common trait: they reduced time-to-first-result to under a minute. Ram achieved this through zero-auth scanning. Phantom DX achieved it through execution-based testing that requires only a repository URL. RepoXray achieved it through a polished demo environment that let evaluators see the platform in action before committing to local installation.

The projects that scored lowest shared a different common trait: their shipped experience did not match their presentation claims. Demo videos showed features that were not fully implemented. README instructions led to build failures. Configuration that worked on the developer’s machine but not on a fresh installation.

“The gap between demo and reality is the single most important metric in DX tooling,” Ekhlakov concludes. “Every engineering team has been burned by a tool that looked perfect in the sales demo and then required three weeks of configuration to match what was shown in a five-minute video. The hackathon projects that understood this — that treated their own installability and reliability as their first diagnostic test — are the ones that could actually become products.”

The pattern holds beyond hackathons. In his work at Intaro, where technology choices affect the velocity of entire client portfolios, Ekhlakov applies the same test to every tool evaluation: does it survive contact with a real codebase, maintained by a real team, with real accumulated complexity? The DX-Ray submissions that passed that test earned their scores. The ones that did not revealed exactly the kind of invisible friction the hackathon was designed to expose.

DX-Ray Hackathon 2026 was organized by Hackathon Raptors, a Community Interest Company (CIC #15557917) supporting innovation in software development. The event featured teams competing across 72 hours, building diagnostic tools to expose hidden friction in developer workflows. Oleg Ekhlakov served as a judge evaluating projects across five weighted criteria: Problem Diagnosis (25%), Solution Impact (25%), Technical Execution (20%), User Experience (15%), and Presentation & Demo (15%). 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *