This is better described in the opposite manner, rather than how different accents develop it’s a question of how they merge, and the adjacent question of dialects and language.
First there is the need to communicate, the most basic of verbal communication is different types of sounds, with a matter of urgency, calming etc. Then refine some of those sounds into a specific meaning - giving words
These words are used by particular individuals as a group, they are spoken in a particular way
You have an initial language and accent - an isolated group just has their own unique language and accent, some of the same or similar words develop independently among different communities if they mimic what they describe
Small groups or individuals go further out from their immediate surroundings and meet others, they merge together their words and phrasing which results in some variation - they may retain a slightly specific local accent and dialect whilst slightly adjusting accent and adopting words from the others.
These collectively build up to an overarching language and supposedly a generic accent.
An outsider can fail to ‘hear’ regional accents and ‘they all sound the same’
Historically this happened on a local & regional basis, with crossovers only among those who travel. You get social mobility as well, and those who gain access to court so non-regional accents and dialects occur
But when recordings, film etc became possible the languages / accents would come to you rather than you going away to encounter them.
Those making recordings then want to be understood everywhere so ‘BBC English’ occurs with slowly spoken pronunciation as opposed to day to day discussion which is often fast and in a dialect / accent
Extend that to global communication and you get the directions of generalised ‘international’ accents and also emulation of somewhere else’s local accent
The internet & US television get blamed for this today, but the same applied when Hollywood first developed talking movies