In your place, i'd try to use a text editing widget for the script text and use it in read-only mode. That may detecting which word a user clicked on easier because it's likely built in. Also, you need a way to search in the text, again, just like in a text editor.
one thing I'd add is an LLM powered search bar to search for nuggets in natural language. So something like "show me where I talk about X", and it jumps to that timestamp
Should highlight the current position in the text window to help with debugging to you know exactly what words are at which specific parts in the timelines
So much fun statistics can be collected with this too. How many words spoken during a stream. Word occurrence histogram. Average word complexity. Then group the words together to identify a set of tags for the video. This is so cool.
Along with the text highlighting which others have mentioned, maybe change the audio based on text edits in the transcript, although this might be a very intense thing to do
Hi, I really like your project and seeing you go through the dev process. I think it would be easier to track if the current spoken word would be in bold, or a different color. A bit like karaoke. And from what I understand, Whisper already gives you the ranges of each word. While playing the video, iterating through the list would be fast. And on seek, since the word & time range list is in a sorted order, doing a binary search should be pretty fast.
A QoL feature might be highlighting the words on the right panel if they are included in the current clips on the timeline.
"is it a bug or is it just a little bit of a surprise" 😂😂
Next step: Editing the text and the video gets updated.
whats up whats up whats up
In your place, i'd try to use a text editing widget for the script text and use it in read-only mode. That may detecting which word a user clicked on easier because it's likely built in.
Also, you need a way to search in the text, again, just like in a text editor.
one thing I'd add is an LLM powered search bar to search for nuggets in natural language. So something like "show me where I talk about X", and it jumps to that timestamp
Should highlight the current position in the text window to help with debugging to you know exactly what words are at which specific parts in the timelines
So much fun statistics can be collected with this too. How many words spoken during a stream. Word occurrence histogram. Average word complexity. Then group the words together to identify a set of tags for the video. This is so cool.
I miss the time where we played trackmania on the school computers. It was fun
Along with the text highlighting which others have mentioned, maybe change the audio based on text edits in the transcript, although this might be a very intense thing to do
This is looking great
Hi,
I really like your project and seeing you go through the dev process.
I think it would be easier to track if the current spoken word would be in bold, or a different color. A bit like karaoke.
And from what I understand, Whisper already gives you the ranges of each word.
While playing the video, iterating through the list would be fast.
And on seek, since the word & time range list is in a sorted order, doing a binary search should be pretty fast.