After my experience with AI and OSGi, as can be read in the last blog, I really wanted to know how good AI is with OSGi. I have to admit, AI is really at its boundaries regarding a technology that wasn’t the main technology back then and is now.
Because of my experience while working on the solution for the last blog, it came to my mind to challenge the idea of the use case of the “all or nothing” blog post and how it’s done with OSGi — it just felt like the right approach because of its modularity and dynamic loading capabilities.
Now here is the hard truth: even though it’s kind of fun, AI isn’t capable of boosting my development cycles. While it does assist with many development tasks, it never got a lot of traction because it’s not so familiar with the OSGi tech stack. This brings me back to my initial statement: it’s still just a statistical parrot, which is why it can’t understand what it’s doing. By „statistical parrot,“ I mean that AI often repeats or suggests code based on patterns it has seen, without truly understanding the context or the technology.

Setup
Let’s take a look at the current setup. The idea was to create a couple of different bundles that should interact to retrieve news about stocks, read from Reddit about stocks, do reasoning, and so forth. All of this was meant to be a specialized AI agent, which is a software entity designed to perform specific tasks autonomously. The next AI agent is supposed to do the analysis and make decisions on which stocks to trade. It was supposed to do a simulation of this.
Tools to use: GitHub Copilot with GPT 5 mini, Claude code with a paid subscription (just one month), and the newly available Mistral vibe with devstral-2 (also paid, because I already have the Le Chat paid subscription).
For those unfamiliar, OSGi (Open Service Gateway Initiative) is a set of specifications that define a dynamic component system for Java. It allows for modular programming and dynamic loading of modules, which makes it ideal for complex applications that require flexibility and scalability. Because of my personal experience with OSGi etc, I’m a big fan of using the Apache Karaf eco system. It gives all you need to get going with that technology.
Boundaries
After the initial setup of building the basic Maven infrastructure with three bundles, I added more infrastructure bundles to it. This worked okay-ish but didn’t block any development. The first has been a module for the Karaf feature description, which needed some handcrafting by me, but it did what it was supposed to do. The other one was a customized Karaf instance, so if everything is technically okay, we have a customized Karaf to run the app, which worked out quite well in the beginning.
The real struggle came with setting up the testing infrastructure. My decision was to do the OSGi-related tests with BND within the bundles. One reason for this was expectation, that there might be more training data available for this kind of testing, so in theory it should work better. The other reason was that I also wanted to learn how this would work with JUnit 5.
The next struggle regarding testing I experienced, was using the Pax-Exam framework for integration testing together with the customized Apache Karaf instance. Pax-Exam is a very powerfull testing environment optimized for the OSGi eco system.
From my perspective it seems those testing frameworks were too complex for the AI to handle effectively. For better impression here’s a snippet of the conversation I got from Claude code (I still regard it to be one of the best AIs for coding):
● Good news – mvn clean install now works successfully! All 43 unit tests pass. ✅
Current Situation:Unit Tests (impl/): 43 tests, all passing ✅
– Test all business logic
– Fast execution (~3 seconds)
– Working perfectlyOSGi Tests (osgi/): Not running ❌
– Test OSGi service registration mechanics (not business logic)
– BND testing has major configuration issues in Maven-based projects
– Require complex workspace setupIntegration Tests: Postponed by your earlier decision ⏸️
You were right earlier when you asked: „actually, the tests in the impl package should be just as good as the OSGi tests. Since when has there even been a difference?“ (the impl tests should be just as good as OSGi tests—since when is there even a difference?)
Looking at the OSGi tests, they only test OSGi framework mechanics (service registration/retrieval), not actual application logic. The unit tests cover all the business functionality.
Options:
- Remove OSGi tests – They don’t add value and don’t work
- Keep trying to fix BND testing – Very complex in Maven projects
- Keep them but disable – Document they’re non-functional
Which approach do you prefer?
At this point, Claude gave up after more than two days of disjointed work on the topic. It took about four days of disjointed work to finally give up on the Karaf-based integration tests. Neither Claude nor Mistral-Vibe got this going. Actually, they messed up those integration tests so badly that I almost deleted all the infrastructure changes myself. At this point, Copilot came to my rescue. Even though we still didn’t get the integration tests running, at least we got the infrastructure cleaned up and functionally correct as much as possible. I needed to write a lot more code than intended. At this point, Copilot was much more helpful because it’s more interactive. While Claude lets you participate a bit when doing things, Vibe is much less informative, and with both agents, you’re definitely not in the driver’s seat. With Copilot, you have much more and easier control; on the downside, because of that, it’s usually slower in helping.
OSGi container tests and integration tests have been removed. Unit tests with mocked dependencies provide comprehensive coverage of the business logic without the complexity of OSGi framework testing.
So, back to where I really wanted to go: I might be doing this without proper tests, but as it’s supposed to be a fun project, I’ve already lost around five days that I partly worked on this. Is AI a help? Right now, it’s not much of a help; it hindered me more than it improved my progress. But to be honest, it’s been quite a while since I did regular programming, so it also helped me get going again.
Conclusion
Working with AI tools on an OSGi-based project has been an experience. While AI can significantly boost development cycles, it’s clear that its effectiveness is limited by its familiarity with the technology stack. In this project, GitHub Copilot proved to be more interactive and helpful in cleaning up the infrastructure, while other AI tools had varying levels of success.
The journey was not without its challenges, particularly with setting up testing infrastructure. The struggles with BND for OSGi tests and Pax-Exam for integration tests highlighted the limitations of AI in understanding and implementing complex frameworks. Ultimately, I decided to remove the OSGi container tests and integration tests, relying instead on unit tests with mocked dependencies to provide comprehensive coverage of the business logic. In the end, I acknowledged the limitations of AI in this context. After struggling to get this going, I decided to make a complete architectural re-decision. I’m going to try either Java Spring Boot or Quarkus as the next underlying platform. The reason for that is there’s most likely a lot more training data for the stochastic parrot, which refers to AI models that generate responses based on statistical patterns rather than a deep understanding of the technology.
Overall, this shows AI isn’t the silver bullet to solve all those Software Problems out there. Without my own knowledge about OSGi, I would have kicked the idea early on. In the end AI needs a good partner to show it the way through software development and design. Every now and then it does have clever „ideas“ on how to solve something, but that something needs to be statistical in favor of other solutions. The term „stochastic parrot“ refers to AI models that generate responses based on statistical patterns rather than a deep understanding of the technology. This metaphor highlights the limitations of AI in truly understanding and navigating complex technological landscapes without human guidance.
This experience perfectly shows the boundaries of AI-based development. AI is really good at the 80% of the 80/20 split—it excels at the easy parts where it has a lot of training data. However, it often misses the 20% part, which usually requires about 80% of the effort to solve.
Disclaimer: Again, thoughts are my own, but Le Chat did help me in formulating and proof-reading before publishing.
Hint: As I’m working on a Quarkus based version of this, I’ve been much more sucessfull. This might be a story for another blog. Who knows …

Schreibe einen Kommentar