Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video

Abstract

As basketball’s popularity surges, fans often find themselves confused and overwhelmed by the rapid game pace and complexity. Basketball tactics, involving a complex series of actions, require substantial knowledge to be fully understood. This complexity leads to a need for additional information and explanation, which can distract fans from the game. To tackle these challenges, we present Sportify, a Visual Question Answering system that integrates narratives and embedded visualization for demystifying basketball tactical questions, aiding fans in understanding various game aspects. We propose three novel action visualizations (i.e., Pass, Cut, and Screen) to demonstrate critical action sequences. To explain the reasoning and logic behind players’ actions, we leverage a large-language model (LLM) to generate narratives. We adopt a storytelling approach for complex scenarios from both first and third-person perspectives, integrating action visualizations. We evaluated Sportify with basketball fans to investigate its impact on understanding of tactics, and how different personal perspectives of narratives impact the understanding of complex tactic with action visualizations. Our evaluation with basketball fans demonstrates Sportify’s capability to deepen tactical insights and amplify the viewing experience. Furthermore, third-person narration assists people in getting in-depth game explanations while first-person narration enhances fans’ game engagement.

Sportify Example Answers

Design Considerations

R1. Reliable – Explaining Tactics with Grounded Video Data. To ensure accuracy in QA systems analyzing video content, explanations must be closely aligned with the actual video data. This involves extracting tactical information and actions from the video, ensuring explanations are verifiable and precise to enhance user understanding and prevent confusion.

R2. Understandable - Explaining Tactics with Narratives. Narratives helps users comprehend team tactics. Different narrative perspectives, such as first or third person, can enhance the viewer's understanding and immersion.

R3. Engaging – Explaining Tactics using Embedded Visualizations. Embedding visualizations within the video enhances the explanation of tactics by connecting actions to visual representations. The visualizations should be designed to align with the narrative perspective (first-person or third-person) to create a cohesive and engaging experience for the user.

The Pipeline of Sportify

Based on the considerations, we have developed Sportify, a visual QA system answering questions about videos and comprising three major components: a Data Processor (A), a Narrative Agent (B), and a Visualizer (C). At the heart of Sportify lies the Narrative Agent, which leverages a LLM to interpret the user’s question and generate explanations in response. For a system designed for basketball videos, the capability to understand video content is indispensable. Although multi-modal LLMs are capable of processing image data, they often underperform in domain-specific tasks and require a tremendous computation costs, such as detecting actions or tactics from a sports video. To overcome this challenge, our methodology employs a text-only LLM, enriched through the integration of a Retrieval-Augmented Generation (RAG) framework and a Reasoning-and-Actioning (ReAct) prompting strategy for different types of questions. Importantly, Sportify leverages the data extracted from the video as the context (R1) to generate the explanation in a narrative format (R2). These extracted data and explanation are then presented as visualizations embedded in the video (R3). In the subsequent sections, we delve into the specific design and implementation of each component.

Design Iterations

The iteration design process to design action visualizations (i.e., Pass, Cut, and Screen). From P1 to P4, we remove the occlusion and highlight the two players who send and receive the ball. For the cut, we indicate the exact location that a player will move with flash-forward animation from C1 to C2, while the screen demonstrate a wall to be easily identified a player set on screen from S1 to S2.

Third-person and First-person perspectives

The figure (A) shows the third-person perspective narrative like commentaries, whereas the figure (B) demonstrates the first-person perspective by integrating the action visualizations and narratives around the players to make people more engaged and immersive.

Results

A task 1 user study results. Figure (a) reveals that the First person perspective ranks highest in helpfulness and confidence across three conditions. Figure (b) indicates positive participant ratings for each embedded visualization’s helpfulness. Figure (c) compares usability across two different narrative perspectives.

Sportify received better results compared to existing tactical explanation videos, with positive outcomes in all questionnaire of usability.

BibTeX

@article{lee2024sportify,
    title={Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video},
    author={Lee, Chunggi and Lin, Tica and Pfister, Hanspeter and Zhu-Tian, Chen},
    journal={IEEE Transactions on Visualization and Computer Graphics},
    year={2024},
    publisher={IEEE}
}