PLM: Windchill Troubleshooting Agent for Customizations and Runtime Issues

As a trained Windchill Support Engineer for the core product and customizations, a significant part of the job goes beyond writing Form Processors and customizing actions.xml. It requires a deep understanding of Windchill’s intricate architecture—from MethodServers to Tomcat configurations—and knowing how to troubleshoot them effectively.

Recently, I decided to bridge traditional PLM administration with modern Artificial Intelligence. I took a massive, complex Windchill troubleshooting flowchart and engineered it into an interactive, AI-powered support agent using Python, NetworkX, and Large Language Models (LLMs).

1. The Foundation: Deep Windchill Domain Knowledge

Before building tools for Windchill, you have to understand how to build within it.

Understanding this MVC architecture (Actions -> Wizard Builders -> Step Builders -> Form Processors) is crucial, but I wanted to take my system-level knowledge a step further.

2. The Challenge: Digitizing Windchill Support

Windchill administrators often rely on massive, static flowchart diagrams (often BMPs or PDFs) to diagnose system issues. These charts dictate complex decision trees: Is the MethodServer hanging? Check wt.properties. Are users seeing errors? Check TOMCAT_HOME/logs.

Static images cannot be queried. I wanted to turn this static knowledge into a dynamic, queryable system.

3. The Solution: Graph Modeling with NetworkX

Instead of writing a massive nested if/else script, I modeled the Windchill troubleshooting flowchart as a Directed Graph.

Using Python’s networkx library, I mapped out the states and transitions that I’m familiar with. To keep the graph clean and scalable, I separated the Node IDs from their payload data.

For example, a node requiring the administrator to check property files was modeled like this:

  
# Creating a semantic node with Windchill-specific payload data
G.add_node("Property_Files_List", 
           type="action", 
           content="WT_HOME/codebase/wt.properties\nWT_HOME/db/db.properties\nTOMCAT_HOME/conf/server.xml")

4. Building the AI Support Agent (In-Context RAG)

With the Windchill knowledge digitized into a structured Graph, I exported the network to JSON (networkx.readwrite.json_graph). This JSON became the factual grounding for a Retrieval-Augmented Generation (RAG) system.

I integrated the OpenAI API (gpt-4o-mini) and fed it the JSON representation of the flowchart. To ensure the AI provided actionable, accurate advice, I engineered a specific system prompt with a Senior Technical Support Engineer persona.

The prompt enforced strict rules:

Factuality: The AI must strictly follow the logical paths defined in the JSON graph. No hallucinations.

Elaboration: Instead of just repeating node names, the AI uses its general IT knowledge to expand on the steps. If the graph says “Check Property Files”, the AI specifically references the WT_HOME and TOMCAT_HOME paths stored in the node’s attributes.

5: Scaling up to reduce token bloat

The AI would occasionally “lose the thread” in the middle of a massive JSON payload. The json could be massive as well depending on the size of the flowchart. The solution would be to move to a graph db such as Neo4j or PostgreSQL with Apache AGE.

Example:

Ask a question: How would I troubleshoot an edown as an initial step?

Thinking...
 > Searching Neo4j for: 'edown'
db context is: [Edown Steps] --(First | Cond: None)--> [Understand Problem Specifics]
[Understand Problem Specifics] --(Understand | Cond: None)--> [What do the users see?]
[What do the users see?] --(& | Cond: None)--> [Is there a known workaround?]
[Understand Problem Specifics] --(Second | Cond: None)--> [Request Data]
[Request Data] --(& | Cond: None)--> [Property_Files_List]

--- Answer ---
As an initial step in troubleshooting an edown situation, you will begin at the very first point of the flowchart. Here’s a detailed breakdown of what you should do:

1. **Start at 'Understand Problem Specifics'**:
   - This is your starting point. Your primary goal here is to gather as much information as possible regarding the edown situation.
   - Engage with users who are experiencing the edown. Ask them specific questions to clarify the nature of the problem. For example:
     - What actions were they performing when the edown occurred?
     - Are there any error messages or specific symptoms they noticed?
     - How long has the issue been occurring, and how frequently does it happen?

2. **Transition to 'What do the users see?'**:
   - After gathering initial information, move to the 'What do the users see?' step.
   - This step is crucial for understanding the user experience and can provide insights into the severity and impact of the edown.
   - Document any specific error messages or behaviors that users report. This information is vital for diagnosing the issue.

3. **Check for Known Workarounds**:
   - Next, proceed to the 'Is there a known workaround?' step.
   - If users are aware of any existing workarounds, this could provide immediate relief while you investigate the underlying issue.
   - If a workaround exists, document it and communicate it to the affected users to minimize disruption.

4. **Prepare to Request Data**:
   - If you find that the problem is not resolved through understanding user experiences or if no workaround is available, prepare to move to the 'Request Data' step.
   - This will involve gathering additional information that may be necessary for further diagnosis.

By following these initial steps, you will establish a solid foundation for understanding the edown issue. This approach will not only help in diagnosing the problem but also in communicating effectively with users and stakeholders. If you need further assistance as you progress, feel free to reach out for support. Good luck!
--------------

Ask a question: