I worked with a client who operated three medical laboratories. Each location handled semen analysis workflows for DNA, vitality, motility, and morphology, but the process was manual and tied to local machines. There was no shared platform, no centralized visibility, and no automation.
The requirement sounded simple at first: a centralized server all labs could connect to, AI-based analysis, GPU acceleration, and a web interface so technicians could work from anywhere.
Then came the real constraint: everything had to work in real time.
That is where the project stopped being a simple web app and became an infrastructure and performance problem.
Getting the Microscope on the Web
The first major challenge was the microscope itself. The client wanted to stream the live microscope feed directly into the browser. Technicians needed to view it and control it remotely. Lens movement. Frame rate. Speed. Contrast. Saturation. Gamma. Hue. Alpha values. Everything. After digging around, I found the Toupcam SDK. It supported streaming through Python, which worked well since the backend was already planned in Flask. I used Flask SocketIO to push frames and control events in real time. When a user changed a slider on the web dashboard, the backend updated the microscope instantly. That part worked surprisingly well. Until the client said, “There’s a lag when I move the lens.” At first, I assumed dropped frames. Or network delay. Or maybe the browser struggling. It wasn’t. The lag was coming straight from the Toupcam SDK.
When the Docs Don’t Help
I searched everywhere. Forums. GitHub issues. SDK documentation. Nothing. So I did the thing most people don’t want to do. I opened the SDK source code and started reading it line by line. Buried in comments were function calls that weren’t documented at all. The functions existed. The parameters existed. They just weren’t mentioned anywhere officially. Using the inline comments and type hints, I figured out how to call them properly. Once I wired those into the control flow, the lens lag disappeared. That was one of those moments where you either give up or go deeper. If it wasn’t in the code, I would’ve had to build a workaround. Luckily, it was there. Just hidden.
AI Models and Real-World Lab Needs
Each test had its own AI model behind it. Different inputs. Different processing logic. Different outputs. But once lab technicians started using the system, new requirements came up. Real, practical ones. For example, debris in samples. Sometimes non-sperm particles showed up and confused the analysis. The client wanted technicians to manually mark debris and exclude it from results. I told them it was doable. On the frontend, I used simple JavaScript and CSS. When a user marked debris, they were really placing an overlay div on top of the image. Behind the scenes, I captured the pixel X and Y coordinates. Those coordinates were sent to the backend with the sample. During processing, the AI pipeline ignored those pixel regions entirely.
Simple idea. Very effective.
That same attention to detail showed up everywhere. Image manipulation. Masking. Preprocessing. Edge cases. Every test had its own quirks, and each one needed careful handling to feel reliable in a lab setting.
Deployment and Real-World Infrastructure
Once the app was ready, I helped deploy it on the client’s on-premise server. That included networking setup, NAT configuration with their ISP, and exposing the server securely through a reverse proxy. Everything was stable. The labs were using it daily. A few months later, the client came back. They’d upgraded the server with a GPU. And they wanted things faster.
The Motility Bottleneck
Most tests were image-based, so optimizing them was straightforward. Motility was different. Motility analysis used short video clips showing live sperm movement. You couldn’t just shrink the video or drop quality without hurting results. Initially, processing took around 17 seconds per video. I moved the model to CUDA and ran it on the NVIDIA GPU. That alone brought processing time down to about 10 seconds. Roughly a 40% improvement. Still not enough.
Going Deeper: Frames and Threads
I applied two optimizations. First, frame skipping. Some frames in the video were identical or nearly identical. Out of roughly 90 frames, around 5 to 10 were duplicates. Skipping those reduced unnecessary computation without affecting accuracy.
Second, multithreading.
I split the video into chunks and processed them in parallel. Each thread ran the AI model independently and returned partial results. At the end, everything was aggregated into a final score. Thread count mattered. Too many threads and the overhead killed performance. After testing, five threads turned out to be the sweet spot. The result? Processing time dropped from 17 seconds to around 5 seconds. In some runs, as low as 3.95 seconds. That’s roughly a 70% reduction. The client was very happy. So was I.
Tech Choices and Trade-offs
The backend was built with Flask and Flask SocketIO. Python made sense because of heavy image manipulation, AI models, and SDK integration. For the frontend, I used EJS. Not React. Intentionally. The client was comfortable with PHP-style templates and wanted everything in a single index-style structure. I pushed for EJS as a middle ground. More maintainable, still familiar. I offered a React refactor and a modern UI, but the client declined. They wanted a legacy look. It was a private internal app used only by lab employees. Modern design wasn’t a priority. My job was to guide them, then respect their decision.
Looking Back
This project combined deep research, undocumented SDK behavior, real-time systems, GPU optimization, and production deployment work. It reinforced something I strongly believe: if a problem looks impossible, it usually means the answer is not obvious yet.
You dig, read code, test carefully, and keep going. That is where the real engineering work happens.
Key Results
- Centralized semen analysis for three physical labs
- Real-time microscope streaming and control via web
- Multiple AI-powered tests running on a GPU server
- Motility processing reduced from 17s → ~5s (as low as 3.95s)
- Reliable, production-ready system used daily by lab staff
