Integrating AI

Welcome to the sixth part of this LSP series. This is perhaps the most exciting installment as we explore how to enhance our LSP server with AI capabilities. After covering the foundational aspects in previous parts, we’re now ready to integrate AI to provide even more intelligent suggestions in our code completion.

Previously in Our LSP Journey

For readers joining us in this part, here’s a quick recap of our journey so far:

Now in Part 6, we’ll focus on integrating AI to enhance our code completion features, while Part 7 will tackle optimizing performance to meet low-latency requirements.

📦 View the source code on GitHub – Explore the complete implementation. Leave a star if you like it!

Understanding the AI Integration Challenge

Before diving into the technicalities of implementation, it’s important to understand the pros and cons of integrating generative AI capabilities into our LSP server.

Advantages of AI Integration

Challenges to Address

ai integration challenges

Non-Deterministic Output: The inherent randomness makes testing difficult and can occasionally produce incorrect syntax
Deceptive Confidence: AI can provide convincing but incorrect suggestions, potentially introducing subtle bugs
Performance Costs: Even the fastest models introduce latency compared to rule-based systems
Unbounded Response Times: Remote API calls can have unpredictable response times
Prompt Engineering Complexity: Creating effective prompts requires significant refinement and domain knowledge

Mitigation Strategies

To address these challenges, we’ll implement several strategies:

1. Limiting AI’s Scope of Action

We’ll focus AI on a controlled use case: enhancing tabstops in code templates:

ai tabstops process,- Limits AI to enhancing predefined template structures rather than generating complete code - Preserves the overall syntax and structure of our suggestions - Forces user review of values through the tabstop mechanism

2. Performance and Timing Management

ai performance mitigations,- Implement timeouts to prevent excessive waiting - Fall back to rule-based suggestions when AI is too slow - (In Part 7) We'll explore precomputation to drastically reduce perceived latency

3. User-Driven Prompt Engineering

Rather than locking in our prompts, we’ll make them configurable, allowing users (who are often domain experts) to fine-tune how the AI understands SDLB configuration files.

Choosing the Right LLM

A critical decision is whether to use a local or remote LLM. While local models would reduce latency, they introduce hardware requirements we can’t guarantee. After testing various options, we selected Google’s Gemini 2.0 Flash Lite for its balance of performance and quality:

ai response times,- OpenAI's GPT-4o-mini: ~2.5s average response time - Google's Gemini 2.0 Flash Lite: ~1.5s average response time

Both models provided comparable quality for our use case, but Gemini’s faster response time made it the preferred choice.

Note: The AI landscape evolves rapidly. As of this writing (May 2025), these were the best options, but newer models may offer better performance.

Implementation: Setting Up the Infrastructure

Since the Java ecosystem lacked mature LLM integration libraries (like Python’s LangChain) as of March 2025, we need to build our own integration. Let’s start by adding the necessary dependencies:

<!-- https://mvnrepository.com/artifact/io.circe/circe-core -->
<dependency>
    <groupId>io.circe</groupId>
    <artifactId>circe-core_3</artifactId>
    <version>0.15.0-M1</version>
</dependency>

<!-- https://mvnrepository.com/artifact/io.circe/circe-parser -->
<dependency>
    <groupId>io.circe</groupId>
    <artifactId>circe-parser_3</artifactId>
    <version>0.15.0-M1</version>
</dependency>

<!-- https://mvnrepository.com/artifact/io.circe/circe-generic -->
<dependency>
    <groupId>io.circe</groupId>
    <artifactId>circe-generic_3</artifactId>
    <version>0.15.0-M1</version>
</dependency>

<!-- https://mvnrepository.com/artifact/com.softwaremill.sttp.client3/core -->
<dependency>
    <groupId>com.softwaremill.sttp.client3</groupId>
    <artifactId>core_3</artifactId>
    <version>3.10.3</version>
</dependency>

<!-- https://mvnrepository.com/artifact/com.softwaremill.sttp.client3/circe -->
<dependency>
    <groupId>com.softwaremill.sttp.client3</groupId>
    <artifactId>circe_3</artifactId>
    <version>3.10.3</version>
</dependency>

We’re using:

Circe: For JSON parsing and serialization
STTP: For making asynchronous HTTP requests to the Gemini API

Building the Gemini Client

Let’s create our Gemini client to handle API interactions. The client consists of initialization logic and the core completion method:

GeminiClient.scala scala

class GeminiClient(apiKey: Option[String], defaultModel: String = "gemini-2.0-flash-lite")(using ExecutionContext) extends ModelClient with SDLBLogger:
  // Custom defined models
  import GeminiModels.*

  private val backend = HttpClientFutureBackend()
  private val baseUrl = "https://generativelanguage.googleapis.com/v1beta/models"
  trace(s"Gemini API Key: ${if apiKey.isDefined then "correctly set" else "not set"}")

  def isEnabled: Boolean = apiKey.isDefined

completeAsync.scala scala

def completeAsync(prompt: String, model: String = defaultModel): Future[String] =
    if !isEnabled then
      warn("Gemini API key is not set. Skipping request.")
      Future.successful("")
    else
      val request = GeminiRequest(
        contents = List(Content(parts = List(Part(text = prompt)))),
        generationConfig = Some(GenerationConfig(responseMimeType = Some("application/json")))
      )
      val startInferenceTime = System.currentTimeMillis()
      basicRequest
        .post(uri"$baseUrl/$model:generateContent?key=${apiKey.getOrElse("")}")
        .header("Content-Type", "application/json")
        .body(request)
        .response(asJson[GeminiResponse])
        .readTimeout(3.seconds)
        .send(backend)
        .map { response =>
          response.body match
            case Right(geminiResponse) =>
              debug(s"Gemini response time: ${System.currentTimeMillis() - startInferenceTime} ms")
              geminiResponse.candidates.headOption
                .flatMap(candidate => candidate.content.parts.headOption)
                .map(_.text)
                .getOrElse("")
            case Left(error) =>
              warn(s"API error: ${error.toString}")
              ""
        }.recover {
          case e: Exception =>
            warn(s"Request failed: ${e.getMessage}")
            ""
        }

There are two critical aspects in the client initialization:

We use an asynchronous HTTP backend to avoid freezing the editor while waiting for responses.
We include an isEnabled check to gracefully handle cases where no API key is provided

Note: this presented version will actually freezes the client because we’re breaking the asynchronous advantage by awaiting answers first. It will be however correctly handled when re-designing the system for respecting the low-latency constraints.

ai integration challenges,Key implementation details: - We skip the request entirely if no API key is provided - We monitor and log the response time for benchmarking - We apply a 3-second timeout to prevent excessive waiting - We use Scala's extension methods (e.g., 3.seconds) for cleaner code

Crafting the Prompt

The prompt is crucial for getting good results from the LLM. Here’s our initial prompt for enhancing tabstops:

You're helping a user with code completion in an IDE.
The user's current file is about a Smart Data Lake Builder configuration, where the "dataObjects" block provides all the data sources
and the "actions" block usually defines a transformation from one or more data sources to another.
Extract a list of suggested tab stops default values.
Tab stops have the following format in the default insert text: ${number:default_value}.
Use the context text to suggest better default values.
Concerning the title of the object, try to infer the intention of the user.
For example, copying from the web to a json could be interpreted as a download action.
Output should be valid JSON with this schema:
[
  {
    "tab_stop_number": number,
    "new_value": string
  }
]

Default insert text:
$insertText

Suggested item is to be inserted in the following HOCON path:
$parentPath

Context text, HOCON format, the user's current file:
$contextText

ai integration challenges,This prompt: - Explains the domain (Smart Data Lake Builder) - Specifies exactly what we want (better tabstop values) - Defines a strict output format (JSON) - Provides context (insert text, parent path, and full HOCON content)

Implementing the AI Completion Engine

Now, let’s implement the AICompletionEngine that will use our Gemini client and prompt:

class AICompletionEngineImpl(modelClient: ModelClient)(using ExecutionContext) extends AICompletionEngine with SDLBLogger:

  private val maxInsertTextLength = 80_000
  private val defaultTabStopsPrompt = (as defined above)
  
  var tabStopsPrompt: Option[String] = None

  export modelClient.isEnabled

  def generateInsertTextWithTabStops(insertText: String, parentpath: String, context: String): Future[String] =
    val promptText = tabStopsPrompt.getOrElse(defaultTabStopsPrompt).stripMargin
                         .replace("$insertText", insertText)
                         .replace("$parentPath", parentpath)
                         .replace("$contextText", context)
                         .take(maxInsertTextLength)
    
    trace("calling Gemini client asynchronously...")
    val jsonResponse: Future[String] = modelClient.completeAsync(promptText)
    jsonResponse.map { jsonResponse =>
      applyTabStopReplacements(insertText, jsonResponse)
    }.recover {
      case e: Exception =>
        trace(s"Error generating insert text with tab stops: ${e.getMessage}")
        insertText
    }

Notable design choices: - We use a var for tabStopsPrompt to allow runtime configuration by the user - We limit the prompt length to avoid API limitations - We fall back to the original insert text if any errors occur

While using a var might seem un-Scala-like, it’s a pragmatic choice here since the user’s prompt configuration is only known at runtime and set only once when the LSP server starts.

Regarding security, allowing user-defined prompts comes with considerations. Since we’re using the user’s own API token, any potential misuse primarily affects the user themselves.

Integrating with the Completion System

Finally, let’s modify our completion method in the TextDocumentService to use AI-enhanced suggestions:

override def completion(params: CompletionParams): CompletableFuture[messages.Either[util.List[CompletionItem], CompletionList]] = Future {
  val uri = params.getTextDocument.getUri
  val context = uriToContexts(uri)
  val caretContext = context.withCaretPosition(params.getPosition.getLine+1, params.getPosition.getCharacter)
  val completionItems: List[CompletionItem] = completionEngine.generateCompletionItems(caretContext)
  val formattedCompletionItems = completionItems.map(formattingStrategy.formatCompletionItem(_, caretContext, params))
  
  if aiCompletionEngine.isEnabled then
    val enhancedItems = Await.result(
      Future.sequence(
        formattedCompletionItems.map(enhanceWithAICompletions)
      ), 
      4.seconds // Wait a bit more than the HTTP backend timeout
    )
    
    Left(enhancedItems.toJava).toJava
  else
    Left(formattedCompletionItems.toJava).toJava
}.toJava

private def enhanceWithAICompletions(item: CompletionItem): Future[CompletionItem] = {
  Option(item.getData).map(_.toString)
    .flatMap(CompletionData.fromJson)
    .filter(_.withTabStops)
    .map { data =>
      aiCompletionEngine
        .generateInsertTextWithTabStops(item.getInsertText, data.parentPath, data.context)
        .recover {
          case ex: Exception =>
            debug(s"AI inference error: ${ex.getMessage}")
            item.getInsertText // Fallback to original text
        }
        .map { enhancedText =>
          item.setInsertText(enhancedText)
          item
        }
    }
    .getOrElse(Future.successful(item)) // Return the original item if it doesn't pass the filter
}

The key approach here: - We process multiple AI enhancements in parallel to reduce waiting time - We apply a timeout to limit the maximum waiting time - We fall back to the original suggestions if AI enhancement fails - We only apply AI enhancement to completion items with tabstops

One limitation is that we’re still potentially waiting around 1.5 seconds for the completions to return, which might feel slow to users. This will be the primary focus of Part 7.

Setting Up Dependency Injection

Finally, we need to set up our dependency injection in AppModule:

trait AppModule:
  // ... other components ...
  val modelClient: ModelClient = new GeminiClient(Option(System.getenv("GOOGLE_API_KEY")))
  val aiCompletionEngine: AICompletionEngineImpl = new AICompletionEngineImpl(modelClient)

Note that we fetch the API key at the highest level of the application and pass it down to the client. This approach keeps our core implementation pure and makes it easier to test.

Conclusion

In this article, we’ve successfully integrated AI capabilities into our LSP server to enhance code completion.

While our implementation works, there’s a notable performance challenge: the ~1.5 second waiting time for AI enhancements might make the editor feel sluggish. In Part 7, we’ll address this by implementing advanced optimization techniques to dramatically reduce the perceived latency, making the AI-powered suggestions feel almost instantaneous.

We’ll also explore how to make the system even more customizable by allowing users to configure their own prompts, leveraging their domain expertise to get the best results from the AI.

Stay tuned for “Building a Scala 3 LSP Server - Part 7: Optimizing AI Completions for Low Latency” where we’ll solve the performance challenges and make our AI integration truly seamless.