'Audio' 태그의 글 목록

visual and audio cross modal reasoning2 - speech separation, lip move generation

1. speech separation 구조 영상에서 어떤 인물이 말한건지 분리해내는 task N명의 얼굴이 영상에 등장한다고 하면 일정한 frame(여기서는 75 frame?)의 이미지를 network에 넣어 face embedding vector를 각각 뽑아낸다 dilated convolution를 사용한거랑 shared weight 특징이 보인다 영상의 audio waveform을 (noisy input) spectrogram으로 바꾸고 network에 넣어 speech feature를 뽑는다 STFT는 아마 short time fourier transform이고 dilated convolution을 사용했다고 한다 두 stream에서 뽑은 face feature와 speech featu..

format_list_bulleted 딥러닝/Computer Vision
· 2024. 9. 10.
textsms

multimodal learning의 기본 개념, 왜 어려운 문제인가?

1. motivation 다양한 타입, 형태, 특성을 가지는 데이터를 특별한 제한없이 모두 사용하여 모델을 학습시키는 방법 지금까지는 이미지 하나만을 사용해서 모델을 학습시켜왔지만 사실 사람들은 눈으로 보면서(이미지) 귀로 동시에 듣기(소리)도 하며 맛을 보기 전에도 코로 냄새를 맡는 등 자연스럽게 multimodal을 사용하는 것이 기본이다. 그 외에도 사람들은 social perception, 3D world의 depth perception등 느끼지는 못하지만 실제로는 사용하고 있는 다양한 감각들을 모두 사용하여 학습을 한다. 2. 어려운 점 다양한 형태의 데이터를 사용하여 학습을 하고자하는데 여러 타입의 데이터를 표현하는 방법이 모두 다르다는 것이 문제다. 오디오는 1d signal wa..

format_list_bulleted 딥러닝/Computer Vision
· 2023. 7. 3.
textsms

FastAPI에서 front로 파일을 제공하는 방법 - static file serving, Fileresponse + vue.js에서 음성파일 재생하기

1. static file serving FastAPI에서 만든 정적 파일(static file, HTML, CSS, Javascript, 이미지, 음성파일 등)을 front에 제공하고 싶을때, 한가지 방법 정적 파일 경로를 지정하고, frontend에서 해당 경로로 직접 접근하여 파일을 사용하는 방법 공식 문서 피셜 https://fastapi.tiangolo.com/tutorial/static-files/ Static Files - FastAPI Static Files You can serve static files automatically from a directory using StaticFiles. Use StaticFiles Import StaticFiles. "Mount" a StaticFi..

format_list_bulleted 프로그래밍/FastAPI
· 2023. 5. 10.
textsms

visual and audio cross modal reasoning2 - speech separation, lip move generation

multimodal learning의 기본 개념, 왜 어려운 문제인가?

FastAPI에서 front로 파일을 제공하는 방법 - static file serving, Fileresponse + vue.js에서 음성파일 재생하기

티스토리툴바