Week 3: How a 2 AM Samba Band Fixed My Music App
Week 3 of building an app that will replace me (as a sound designer). Follow along as I either succeed or fail - probably the latter.
I'm building an app that generates music for commercials and brand videos. After two weeks of trying to make the drum generation work, a Samba band playing at 2am in a nightclub was what helped me fix the app.
The Problem
Last week I focused on just generating drum beats. I split the whole process into three phases:
-
Video analysis
-
Drum beat generation
-
Audio generation
I created a library of patterns and used video analysis to accent the drum beat to video cuts. It didn't really work. The beats didn't sound musical and didn't accent most of the key video moments. Also, the audio samples I used were bad and I didn't have a proper audio generation engine.
Three Versions Later
At the start of this week, I tried to rework the drum pattern generation. I liked the idea and the principle of the results, but they weren't good enough.
What I learned about vibe coding is that you have no clue how your code actually works. So if you want to improve something, you basically have to recreate the whole thing.
If your app is small enough, this can work, since it's fast and easy to do that with AI tools. The more features you wanna add, the more you should understand your code. You don't need to know how to write it, just understand which part is doing what. Iterating with that knowledge becomes SO much easier. You can pinpoint the area you want to improve and let AI fix it.
I know what each file in my codebase does on a high level, as in why it's there and what it does. But do I know the exact code inside each function? Absolutely not.
I can tell that if I understood the code in detail, improving functionality would be much easier. But it takes time to learn that detail. Before I'm done with that, I could also have Gemini and Claude create three new versions and test each one. The cost of creating and throwing away hundreds of lines of code is small.
Still trying to find the right balance of how deep I should go into the code before it starts wasting time.
Anyway, it took me about three new versions before I could audibly improve the results. For the first time I got close to something I would be okay releasing as an MVP.
The Audio Quality Problem
My drum samples weren't high quality, that's an easy fix. But the actual generation engine wasn't good either. I was considering whether I should find some open source drum machine or make my own.
I came across SoundFont, which is a script that plays back MIDI notes through sample packs. This made me shift toward an overall MIDI approach: generating MIDI files and having SoundFont play them back.
This created similar results to the audio output, but it gave me an idea that it would be much easier to build a MIDI bass line and melody on top of existing drum MIDI beats.
Getting Distracted (And Why That's Dangerous)
The MIDI approach got me excited. Even though it didn't dramatically improve the results, I felt like since MIDI can be treated like data, it would be possible to get a decent bass line on top of the drum beat. From there, some harmony and melody.
I spent probably a day trying to figure this out. Then I reminded myself that I'm trying to do too many things.
Building products requires focus on the main goal. The reason is that with building products - whether apps, games, or other software - the amount of features you can add is unlimited. There are always hundreds of things you could potentially implement.
But you need to test ideas. And test them fast. And they still need to solve real problems.
The best way to do this is to focus on a super narrow, small problem and solve that in the best way possible. Find a single problem. Don't try to solve hundreds of things at once.
Once you validate that the problem is real, and that you understand it enough and have built a solution that people want, you can start thinking about expanding. But even then, most good products fail because they deviate from their core mission, trying to do too many things.
And that goes even for massive companies. You're probably just one person building this. Just like me.
Getting Back on Track
After wasting a day looking at AI models that could work with MIDI bass lines and melodies, I got back to creating drum beats. I actually reverted back to generating audio wav files rather than MIDI.
I realised using sample packs (the specific format for SoundFont) makes it difficult to create custom and high quality drum kits. Also, the control over the output is much more limited.
I spent more time building a good audio engine with volume controls and audio effects.
Almost Settling
I felt satisfied with the improved drum beat generation from the start of the week. I knew I didn't have much time left before week 4 (I wanted to have the MVP done by week 3). After using better samples and improving the audio generation, I felt okay with it. I wasn't super happy with it, but it was okay.
It had four different pattern styles - rock, EDM, jazz, funk. Only rock was reliable. All the other ones didn't really produce good results.
So I thought, to stay lean and focused on a small solution, I would just release it with the rock beat for now. It was limiting since it didn't fit all videos. The typical bum bum tss, bum bum tss rock drum beat just didn't feel suitable for commercials. But it was the best I had.
I was ready to leave it as it was and try to validate the idea by getting it online.
The result at this point:
The Nightclub Breakthrough
I knew the typical beats of funk and rock just didn't feel sufficient by themselves. The need for melody or harmony was strong to make it good. But I didn't think of anything else.
On Saturday I went to a club. Before the DJ, there was a live band. Live Samba band. Stage full of like 15 drummers, playing something that was incredibly engaging and sounded amazing. Purely using rhythm and percussion, which is exactly what I was looking for.
My mind was really limited to just western style drum patterns. I was completely ignoring African and South American styles.
After hearing that and realizing it would fit what I'm doing perfectly, I woke up early on Sunday after a few hours of sleep. I scrapped the rock beat and built a solid engine that created Samba beats. It was too good of a coincidence to ignore.
I did deep research with ChatGPT on all the musical principles behind Samba. How the drum patterns are created, what instruments and timings are used, etc. Based on that, I completely revamped the drum pattern generation part of the app.
The results are much better with this pattern.
Of course, there's still so much work to be done. But if you compare that to what I was satisfied with on Friday evening, it's an upgrade.
With the samba beat:
The Best Ideas Come When You're Not Working
I believe hard work and putting in the hours is super important. Actually, I think it's the most important factor. You can out-perform 99% of people in whatever you want to do not by having "talent" - you can do it just by putting in the hours. By being comfortable sitting in the boredom, sitting through it even when you're not inspired and even when you have no clue what the fuck you're doing.
But to get ideas, inspiration, and clarity, you also need to step back. Go for a walk, go to the gym, paint something, meditate, whatever works for you. That's when ideas come to you the most.
Or you might just randomly come across a band in a nightclub that plays the exact drum beat you needed for your app.
Getting It Published
Next week I'm getting it online. I need to set up some simple UI for video upload and audio download. Mostly set up a backend that will run the python script.
I'm not yet sure whether I'll put it behind a paywall or not. I'll see how much backend processing power it takes to create audio for a single video. I don't think it should be that much though.
I might release it for free to use (at least limited usage) at first to get as much feedback as possible.
Goal is that next week there'll be a link to the functioning app and you'll be able to try it yourself.
Follow along as I either succeed wildly or fail spectacularly - probably the latter.
Subscribe to my Substack