diff --git a/README.md b/README.md
index 5ecd491..c3ddcee 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 
 CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning.
 
-<img width="832" alt="clap_diagrams" src="https://github.com/bmartin1/CLAP/assets/26778834/c5340a09-cc0c-4e41-ad5a-61546eaa824c">
+<img width="832" alt="clap_diagrams" src="./docs/diagram.png">
 
 ## Setup
 
diff --git a/docs/diagram.png b/docs/diagram.png
new file mode 100644
index 0000000..040e719
Binary files /dev/null and b/docs/diagram.png differ