Introduction
This project is designed to develop an intelligent system to solve a specific person identification challenge in the sports domain.
The system works by ingesting an input image (e.g., a frame extracted from a video stream). It then employs a dual-approach methodology:
- Text-based Analysis: It uses Optical Character Recognition (OCR) to read any text that might appear on-screen (a player’s nameplate).
- Image-based Analysis: It extracts visual features from the player in the image and compares them against a pre-existing database of known athletes to find the best match.
The ultimate purpose is to provide the player’s correct name, solving a task that is tedious and time-consuming when performed manually.
Team size
4 members
Industry
AI
Technology
Python, EasyOCR, CLIP (OpenAI), Llama3.1, Ollama, MMCV
Highlights
Value Delivered
- Feasibility Demonstration: The PoC successfully proved that combining OCR+LLM with the CLIP model is a viable and effective approach for automated player identification.
- Potential for Increased Efficiency: This solution lays the groundwork for a fully automated system that could dramatically reduce manual labor, accelerate content production workflows, and improve the accuracy of player tagging.
- Innovative Aspect: The creative combination of a Large Language Model to “understand” OCR output and an advanced vision model to “see” and compare images is the solution’s most unique and powerful feature, allowing it to overcome challenges that traditional methods struggle with.
Challenges
- Visual Identification Difficulties:
- Images may be captured from a distance, making the player appear small and difficult to identify in detail.
- A player’s appearance can change significantly (e.g., hairstyle changes from long to short), which reduces the effectiveness of traditional recognition methods based on simple facial or appearance matching.
- Data Dependency and Uncertainty:
- The system requires a predefined list of potential players (specifically Japanese golfers) to perform the matching.
- The presence of the player’s name on-screen is a critical assumption that still needs to be confirmed (“to be confirmed”). If no text is available, the system must rely solely on visual identification.
- Inefficiency of the Current Solution:
- The implied current process is manual identification by a human operator. This process is slow, labor-intensive, prone to error, and requires specialized knowledge of the players.
Solutions
We proposed and designed a hybrid AI solution to comprehensively address the challenges.
-
- Intelligent Text Extraction:
- Uses EasyOCR to scan the image and recognize any available text.
- Feeds the extracted text into a Large Language Model (Llama3.1 8B) to intelligently parse the output, accurately identify the player’s name, and discard irrelevant text.
- Visual Feature-Based Recognition:
- Employs the CLIP model from OpenAI to convert the image of the player into a unique visual feature vector.
- Builds a library during a “Train” phase that stores the feature vectors of known athletes.
- Cross-Validation and Matching:
- Compares the feature vector of the test image against the stored library using Cosine Similarity to find the closest visual match.
- The final output is determined by combining the results from both the text-based and visual-based streams, enhancing overall accuracy and reliability.
- Intelligent Text Extraction: