Dialogue has been doing the rounds in mainstream and social media as audiences complain that they are struggling to understand what is being said in several TV dramas. Having left viewers frustrated and rustled a few feathers within television production, it has once again become a major talking point - so much so that on the 4th of April this year it was debated in the House of Lords. We've assembled a discussion panel to try and provide some further insight and offer some technical perspective on the matter. The panel consists of Lucy Johnstone, Ben Newth, Mary Walsh, Andrew Swallow and Graeme Hales.
Firstly, thank you very much for your time and contribution. So, before we begin, would you mind telling us a little about yourself and your work in post production?
I am a freelance Sound Editor with extensive experience in dialogue and sound effects editing, mixing, foley and voice over recording. I work with many different companies across various genres including documentary, drama, factual entertainment, comedy drama, feature Film and computer games.
In 2013 I was named a "Broadcast Hotshot" by Broadcast Magazine, and listed as one of the top 30 people under 30 in the television industry. My credits include Top Gear, EastEnders, David Attenborough's Rise of the Vertebrates, The Collection and computer game Guitar Hero Live.
I am currently working for nine weeks as the sound effects editor on a new Romantic Comedy called "You, Me and Him" featuring David Tennant, Faye Marsay (Game of Thrones) and Simon Bird (The Inbetweeners). I am also mixing various episodes of EastEnders in between, so I am very busy!
I'm Head of Audio at Clear Cut Pictures, where I'm known for mixing documentary and drama-documentary. In recent years this had led to more work in drama and mockumentaries.
I graduated from Bournemouth Film School in 1998 and went straight to work at Gemini Audio where we specialised in factual programming. In 2001, I moved to Holland to work within the Dutch TV and Film industry. In 2004 I returned to London, and having done the rounds freelancing, I took a full time job at Clear Cut Pictures in 2009.
I have been working in the industry for 14 years now. Originally a musician, I went to university to read music as a keen violinist and singer but I figured that instead of being in front of the microphone, maybe I could be behind it! I had a friend who worked in TV who put me onto the idea of mixing sound for television. Balancing out sounds, music and fx appealed to me; he described it as like conducting. I started at the bottom as a runner and learnt my way up. I have now been mixing audio for television broadcasting around the world for many years and lucky enough to have worked on some top rated shows including The Voice and Top Gear.
I am a Freelance Dubbing Mixer with over 10 years of experience across a broad range of TV genres. Some of my recent broadcast credits include Car SOS (National Geographic), The Sheriffs are Coming S1-5 (BBC), Tricked S2-3 (ITV), Come Dine With Me (Channel 4).
I Graduated from the University of Derby in 1997 with a degree in Electronics and Music Technology and have since worked as a mastering engineer, a dubbing mixer at Crow TV and a senior dubbing mixer at Silverglade before moving in to Freelancing.
I'm an educator in audio production where I specialise in post production for television and film. I've worked on a whole bunch stuff from popular TV shows (sit-com, drama, factual, panel show and live performance), TV adverts and web campaigns for some high profile clients, low budget movies as well as some radio dramas and documentaries. This would be doing anything such as location recording but usually diaogue editing, sound design or mixing. I was also recently featured on the One Show (BBC One) in an interview discussing dialogue, mumbling and ADR.
There has been a lot of criticism of “poor sound quality” in TV dramas, more specifically “mumbling.” Can you give us an overview of why audiences are all of a sudden struggling to understand the dialogue?
I’m not sure if this is such a sudden thing, I think the problem has arisen over the years for a number of coagulating reasons. It's become more of a talked about problem.
I don't think it's suddenly an issue, necessarily. My mum has been moaning for years about mumbling actors especially in the American procedurals like Without a Trace. But Social media has a huge part to play in it, sites link Twitter have become a platform for people to publicly complain about these things. If twitter wasn't there, how many complaints do you think Ofcom would actually get? Probably about the same amount that any other program gets about anything. It has also become a bit of a fad, people are now listening out for these things.
The same thing happened about four or five years ago with everyone commenting on music being too loud in factual television. It meant you couldn't hear the dialogue so well, but that wasn’t the way it was phrased - the music was too loud rather than the dialogue was too quiet (... interesting). Everyone became so paranoid that some of the clients were asking us to drastically drop our music in level just in case people complained which subsequently made our mixes sound a bit lame.
I have been in the industry less than 10 years, so don't feel I am qualified to give a deeper answer, but I ask a production sound mixer from AMPS (The Association of Motion Picture Sound) who has been in the industry far longer, for some insight and he said the following:
“I have just finished a film where, although only one artist was featured much of the time, recording her performance proved to be quite a task. The male artist we had also gave a lacklustre performance much of the time and in both cases taking on board my advice to deliver intelligible dialogue was met with ambivalence.
I'm not sure this trend is a recent thing, the generation of actors that came with the 60's and 70's adopted a 'method' style of acting which was considered more true to life and for some reason gave a green light to understated dialogue. This style of performance is still with us today mostly with Cinema and TV actors... Not so much with stage trained artists.
I think the publicity surrounding this comes from the press, media and the public. Once someone writes to complain about not being able to understand someone's dialogue it starts a chain reaction of written complaints. Years ago our industry would 'swallow' problems like these and hope the media and public would not protest. The recent postings don't seem to have had much effect... artists still give voice performances in the way they see fit.”
There have been a couple of programs recently that have put mixers in the spotlight. Every now and again something pops up that gets mixers talking in the kitchens at work.
Personally I think it’s a very difficult issue that a lot of people have varying opinions of. Most often it will be tricky location audio that was nearly impossible to get in the first place (and with promises made that it could be “fixed in post”) compounded by the inability to recall the actors at a later stage to replace the odd line due to cost or geography!
The obvious answer to me is Flat Screen TV's with tiny rear facing speakers but we also have to take into consideration R-128 Loudness measuring.
Realistically as long as you hit the magic -23LUFS +/- 1LU over the averaged length of the programme or part, you can mix the dialogue as loud or soft as you like so you can have soft or intimate levels and loud sections (PPM mixing was all about max compression so less dynamic range in dialogue).
But your flatscreen TV and (potentially) noisy viewing environment can't cope with this. TV's are 'evolving' into flatter, thinner devices so where does the sound wave energy come from? Remember, it was mixed in a ‘dead’ room with expensive monitoring systems, if you watch the same sections again on good headphones you wouldn’t complain.
I think it’s useful to think about what ‘mumbling’ actually is; by definition it is speaking in a quiet or indistinct way. Speech cognition depends quite significantly on consonants. If passages of dialogue are delivered without clearly enunciated consonants (think of somebody slurring when they are drunk), it's difficult to understand what is being said. I think it helps to be aware of this if you are delivering lines in a particular accent or style.
With regards to audiences struggling to understand the dialogue, I think it has been an issue for longer than it's been reported - I can recollect quite a few films and TV shows where the dialogue is hard to understand from before this becoming a talking point. I think increased awareness is probably more of a recent thing.
But the term "poor sound quality" is thrown around a lot as well and without contextualisation I think it's fairly ambiguous. The actual sound quality and the audio production of some of the shows in question is really very good. In fact, if there is mumbling, technically speaking, it is well recorded, well mixed, good sounding mumbling.
The media can be quite quick to 'point the finger' and it tends to be towards the actors or those working on the audio post production; are the actors to blame or is this a production issue? Another argument surrounding the issue is the end user technology; does it make a difference what devices you use to watch things on?
For one, viewers are listening on a huge array of devices. At one end we have typically poor sound from flat screen TVs, tablets and phones, on the other we have impressive audio from home cinema set ups, sound bars and headphones. My question is: is everyone aware of how differently all their gadgets sound and how much they colour what they are listening to? People have asked me why they can't hear more bass when listening off their laptop speakers; it's because the speakers are too small to replicate it!
Secondly, as a result of better home cinema, TV itself has become more film-like. Dramas and box sets in particular have production values to rival any film. The sound track has evolved to match this, which means greater dynamic range, more subtlety, louder explosions etc. This has happened in tandem with the evolution of surround sound and giant non-linear leaps in post production. This way of working allows many more tracks than ever before for a tighter, fuller, more complex sound track; which might not translate as well across all home devices.
Thirdly, actors today mumble more. TV acting styles have changed a lot, and perhaps more dramatically so in TV drama. I don't know whether this is encouraged more by directors, acting schools or fellow actors; but I believe it is certainly encouraged by the use of smaller video cameras and intimate recording techniques.
For the former; a bit of both definitely. Again, the belief that audio mixers can work magic (which obviously we can sometimes) and the fear of upsetting an actor on set can combine amongst many other factors to letting things slide on set. When it comes to the mix, it is often tricky to work out what is unintelligible the first time you hear it before you’ve had a chance to EQ and de-noise a section. So by the time you’ve played over it 4 times, you’ve probably got it! If ever I’m unsure that something is clear enough, I always grab someone who hasn’t heard it before for a fresh pair of ears. If they don’t get it then something else needs to be done.
There is such a variety of home cinema kit too, from people who still watch stuff on old black and white TVs to people with high end kit to rival a mixer’s studio! Differing audio settings on each telly which are often ‘customised’ at home makes for a very difficult playing field; your mix can and will sound different in pretty much every household it’s viewed in!
You'll read arguments on either side. Sometimes lines are mumbled and sometimes the dialogue sits too quietly in the mix compared to the sound effects or music and subsequently gets drowned out. However, a recent update of the DPP specs for content delivery does acknowledge the recent issues surrounding dialogue intelligibility and states that the responsibility lies with the producer.
Also, the devices which people use are becoming increasingly more compact. This does not leave any room for good quality speakers that are able to accurately reproduce sound, more specifically, these tiny speakers will not be able to reproduce any meaningful low frequencies and subsequently higher frequencies might sound a bit louder or harsher, like "S"es and "F"s. There is some really important frequency content in the lower range of our voices; the fundamental frequency of a man's voice usually lies somewhere around 80-160Hz and between around 130-260Hz in a woman's voice. If our devices can not reproduce these frequencies accurately, then of course the quality of dialogue will be affected, but so will the sound effects and the music.
A solution is to invest in good sound system for your home and earphones for your gadgets. However, if the dialogue is mumbled or lost in the mix, good quality playback systems will not fix this as the problem originates from somewhere earlier on in the chain.
It’s not the fault of actors because they are directed. A trend now is to have less people on a TV shoot because of the budget so sound is 'handled' by a cameraman (who can often be the director). When it comes to mixing the director says 'too high' or 'too low' based upon what they can hear (in a quiet room with expensive monitoring equipment).
Again, flat screen TVs with small speakers Vs. big old CRT's with huge drivers either side (I'm looking at adding a sound bar to my TV at home because of dialogue!).
Of course each case is different, and it would be unfair for me to categorically pin it on one person or department for every single instance. Actors do often mumble, usually whilst trying to give a ‘realistic' performance, especially those without a background in theatre. However, if they cannot be understood then this should be picked up on set at the time by those wearing headphones; so mainly the director or production sound mixer. Admittedly, it can be difficult for those who know the script and are aware of what the line should be. I often have to ask people in other departments who haven't read the script if they understand something.
I have spoken to some sound mixers who mention these instances when they crop up on set even if they know that there's no time to do another take (budget restraints etc.) or the director doesn't want to do another, purely to cover their back for any future backlash that may come their way. It's a shame that they feel they need to do this. Of course, some sound mixers may miss these too!
Some directors don't want any ADR because they think it sounds fake, so when these hard to hear lines are pointed out, they don't want to fix it as using the mumbling sync sound is "the lesser of two evils.” Of course, again, it is not always the actor or director's fault, it could be a mix issue! Music and sound effects being too loud for example or over-processing dialogue and ruining its clarity.
Yes and no, technology-wise; I have heard people complain about the sound on flat screen TVs because (stupidly) the speakers are at the back and therefore the sound plays directly into the wall which can dull the sound and make it seem muffled. Yes, that isn't great and proper speakers would sound better. However, the sound is still far better than the sound from my old square television, plus the audio quality that is actually recorded and broadcast was worse than today, and I could hear what people were saying back then!
Similarly, listening through tinny headphones on my iPhone will sound different so mixers should account for this by either making a universal mix that is acceptable on all devices, or if they have the luxury of time, create different mixes for them. A lot of shows these days will have different mixes for different devices and post production facilities have always delivered different television and cinema mixes. Even for TV you have a 5.1 mix and stereo. Some tech specs require mixes for YouTube/online and also for tablets.
Whenever I mix, I alway do a pass listening through crappy TV speakers rather than my lovely studio speakers just incase something isn't clear. Often the low end effects get lost and, similarly, deep voiced men can be harder to hear. This should be noticed by the mixer, and fixed accordingly.
It is often quoted that “sound levels will be reviewed and adjusted.” Talk us through how they are they set in the first place and whether an adjustment would be effective in increasing the clarity of dialogue?
Sound levels got a big change via DPP a few years ago. Since we're now digital our audio can peak 'louder' but all programmes must average to the same loudness. I personally think the switch to the spec has improved mixes a lot. Some mixers used to be a bit number obsessed when mixing to PPMs, and the new spec relies more on the ear.
The question is how do we measure 'the average?' Mixing a fairly quiet programme means you have to push the level of dialogues to meet the average loudness spec. However, mixing a noisy programme may mean that the dialogue levels are reduced to meet the desired "average."
Different mixers will do different things. They might just pull up all of the dialogue so that it is louder in relation to the music and effects. Similarly they could do the opposite and just pull down all the music and effects by a couple of dB. This sometimes helps, depending on initial reason for lack of clarity. Maybe they'll re-eq the dialogue.
In terms of being set in the first place, instead of having a suggested dialogue level and an overall volume limit of 6ppm like we used to, we now have a system whereby the overall loudness is measured as an average; EBU R128. This gives us more dynamic range where you can make the loud bits louder and the quiet bits quieter, moving more towards film sound and creating a potentially more interesting mix. This is across television, commercials, promos and idents, so this loudness normalisation should make all of these averagely have the same loudness, therefore not having audio levels appear to jump when adverts come on etc.
There is an argument to say that because of this, the quiet bits being quieter could make the dialogue harder to hear and therefore make you grab our remote to turn up the volume. However, the re-recording mixers should be using their ears, and just because they can make it super quiet, doesn't mean they should. The viewer won't want a too cinematic dynamic range!
Levels are set by measuring Loudness on dialogue and voiceover (I use iZotope Insight for 2.0 or 5.1 mixing but Nugen Vis-LM is very popular). This sets the mix and everything else falls in around it (well, thats how I mix anyway). Sections can be easily boosted because the measurement is averaged over the length of the programme or part. Or you can do the opposite and bring down everything else underneath (music and effects) to reveal the dialogue more.
There are mixing standards in place for all the main broadcasters so that every program supplied to them has to meet certain specifications. We have limits in place to make sure things aren’t too loud and even settings for recognising dialogue and whether it is clear enough! But ultimately it’s about balance. The balance of background noise against dialogue or music against dialogue… These need to be clear enough to allow space for the dialogue to breathe.
If something were to need "reviewing and adjusting” then it would need to be looked at again in the studio. You would need to create more of a gap between the separate elements to make that dialogue clear. Hopefully a program would never even leave the studio in that state having been viewed many times by the mixer and clients, and then also passed through a quality control check within the post house.
There are technical specifications set out by the Digital Production Partnership which are used by the UK's major broadcasters. The final mix must be R128 compliant with an an integrated loudness of -23LUFS for the entire mix comprised of dialogue, music and effects. The crux here is that, hypothetically, you could achieve R128 compliancy if you balanced your mix in such a way that, comparatively speaking, the dialogue is really quiet, the music is loud and sound effects are really loud, just as long as you have an integrated loudness of -23LUFS . Of course, common sense prevails and a dubbing mixer just wouldn't do this.
In short, workflow-wise, I start of with setting the dialogue level and then balancing the ambiences around this, then the sound effects and music; I use the dialogue as a point of reference for the other elements. I use VCAs to then make overall adjustments to the different elements if need be and to ensure R128 compliancy (or whatever the deliverables need to be).
However, if lines of dialogue were mumbled on set, it will not matter how much the levels are adjusted. If you make an adjustment and turn the dialogue levels up, you are merely making mumbling louder. The intelligibility does not increase, just the level.
Why might dialogue be flagged as unusable?
Simply not being clear enough either due to a misread from the actor or too much background noise from the set. It would normally be flagged by the picture editors at an early stage if something was particularly bad, to check if it was “fit for broadcast”. Once the picture has been locked we may get given some notes to look at certain scenes if they are worried. Or you don’t get any notes - yet you get a niggling feeling about a section and bring it up with the producer.
My experience is generally in TV so unusable means way too much background noise, distortion, dropout or an off mic, but sometimes is 'allowed' with subtitles (documentary, for example) but that’s obviously no good in drama etc.
Typically if there is some undesirable or extraneous noise which can not be removed then that could deem it unusable (likewise, this would probably deem any recording unusable, not just dialogue), or sometimes an actor may just fluff a few lines which go unnoticed at the time. Quite often these can be repaired or disguised with various plugins or alternative takes, but failing that, or if the dialogue is simply just indistinct - replacement.
There are many reasons dialogue could be unusable, not just mumbling actors.
The dialogue could be 'off-mic’ - when the boom is too far away or swung round too late, resulting in a swell of clarity or closeness on the mic. If I came across this in my dialogue edit, I would then look at the radio mics (although not all productions have radio mics dues to time or money constraints). The radio mics themselves could be unusable if the clothes movement (rustling) is too prominent and not removable by us dialogue editors. The same goes for mic bumps; most are removable but you never know!
There could also be digital dropout in the mics, but you'd hope these takes wouldn't make it into the final cut! I recently had dialogue that was completely distorted on my timeline. I managed to fix the majority with RX4 Advanced Declipper (lifesaver!) but had to get a few actors in for ADR. The source of the problem wasn't identified but it could be that the production mixer's compression/limiter settings were not adapted for each scene or that the actors were too close to the mics, for example. The background noise could be too loud and interfering; again if the dialogue editors perform de-noising on these and still can't make them sound good! The actors could be talking over each other and one of their lines is fine but the other is mumbled or has the above problems, so you'd have to replace both actors.
In terms of identifying, if it's not picked up on set (not everyone has a proper sound crew because of low budgets and not all directors wear headphones) then you'd hope that the director or editor will notice the issues in the picture edit before it gets to us in audio post. Us dialogue editors will mark up ADR in our sessions as we're going through when and where we hear issues that will need re-recording and add to their list. It helps the production team if the director makes note of possible ADR lines ahead of our edit however, so they have a vague idea which actors will need to be booked and potentially for how much work. Directors of course have the power to say no to our replacement suggestions (if they like the performance, for example) but as long as we point out the issue, that's all we can do!
Dialogue is deemed unusable if it's hard to understand, or has a technical issue such as mic scratches, distortion or loud background noise.
And how would you go about fixing or replacing it?
In the dialogue editing stage we can try swapping in other microphones, sometimes other takes if the actors are consistent with their delivery - I use this technique in EastEnders regularly for example, as the actors on there say their lines almost identically from take to take (apart from Danny Dyer, he ad-libs a lot of cockney slang!)
We also do a lot of de-noising which is removing hums, whistles, buzzes, general loud background sounds, clicks, pops, low end bumps, nasal junk etc. The mixer can also EQ some of the problems out - bringing up the mid-high frequencies, and/or pulling down the mid-low end can make the dialogue 'brighter' and pierce through the mix more.
I have already mentioned that music and sound effects being too loud can make the dialogue less audible, but actually sometimes the imperfections can be hidden or masked by these (that's less about the dialogue being clear I guess, but about giving the impression that it's clean).
If none of this works, then we will hold an ADR session. ADR can be expensive; paying for both the studios and the actors, so that is another reason directors or producers sometimes don't like doing it. Of course sometimes the ADR will not match the visuals exactly and it could end up being noticeable in the final mix, but if done right it can be a god send! Over 90% of the Lord of The Rings trilogy was ADR'd for various reasons including the buzzing of all the green screen equipment and set!
Some of the above can be 'cleaned' with careful editing and the use of plugins such as RX, Cedar, and Waves X bundle. However, all of these (if over used) can create further problems such as aliasing and a loss in clarity. The solution then is ADR; this is a tricky skill for both actor and engineer to get right.
There are lots of noise reducing plugins that we use to clear up background noise which can often make the dialogue a lot clearer, but if something is really unusable for whatever reason, you might be able to have a use another take if there is one and if their lips aren’t in vision. Failing that then you would need to get the actors back in to do some ADR.
It depends on what the problem with the dialogue is. If it is something like extraneous background noise, pops, clicks etc. then that's (usually) fairly easily dealt with using RX or some kind of restoration software. De-reverberation software can also work pretty effectively if the production sound has picked up a bit too much reverb. Exercise caution with these things though as you can start to introduce some undesirable sonic artefacts.
You can help increase the clarity sometimes with some EQing; using a notch filter or some attenuation where there is some resonance, typically in the low mid range, and you could also boost higher up in the spectrum to create some 'presence.' Again, use caution as you can easily ruin a good recording by over processing.
If something is to be replaced, then ADR. It's expensive though in terms of time and money. Lip sync is difficult, especially if you are working with actors who are fairly new to ADR, in which case I try to have the talent focus on matching the performance more than lip sync (obviously focus on the lip sync too!) or use a 'hear and repeat' technique; different approaches work for different actors. If the sync is slightly out, some editing is necessary which would involve syncing or nudging the (obvious) consonants, some precise cutting and fading, or using (sparingly!) elastic audio to fix any perceivable errors. This kind of editing takes time... a lot of time... And still might not work.
ADR also sounds different to the production sound because you are in a nicely treated vocal booth which sounds very different to the location, you probably have a different microphone, and you also need to think about the distance of the microphone from the talent so some extra processing and backfill is needed to achieve aesthetic consistency.
We fix dialogue using noise reduction plugins such as iZotope RX5 (a lot) and masking edits using room tone etc. If it's still no good it can be ADR'd but this is better done 'in the field' to capture a similar tonal or recording feel.
And finally, what could be done to improve sound quality and, more specifically, the clarity of dialogue?
Allowing more time and money for things to be set up properly in the first place on set. Often production sound budgets are cut or passed on to someone who might have some kit but not the know-how. If the correct attention is paid to sound in the early stages then so many problems can be avoided later on.
Sound seems to always be an after thought for productions; the sound crew are often the ones to be lost with budget cuts, the boom ops have to move out of the way to fit lights in or to get a wider shot. We would all love it to be the other way round, like on House of Cards who would rather capture good sound and remove the booms in post production (a process called painting).
Hundreds and thousands of pounds will be spent on visual effects and then there is no money left to add on a day or two of ADR. I think directors and producers should be more aware of sound at the early stages of productions, and always have experienced boom ops and production mixers! A runner holding a microphone is not the same!
I think actors should split the difference between making their performance seem real and projecting their lines so they are intelligible. Mixers also need to be aware of their audience (age could make a difference to audibility), what they expect for that genre, what technology it will be watched on, and in what environment.
I would like to see some sort of "dial norm" introduced, whereby the dialogue loudness is taken more into account than the overall loudness.
Good sound starts at the source so deliver the dialogue with clarity and allow for the sound recordist to capture a good recording; chances are the dialogue will sound really good upon transmission.
Object based broadcasting may show some potential whereby the dialogue, music and effects are broadcast as separate objects to the visuals (rather than the audio and visuals embedded together as a single piece of media) which then allows the end user to create their own balance to suit their environment. I like the idea, or at least the sentiment, but I'm not entirely sure people will want more buttons to press or UIs to navigate at home.
Get a good sound system and headphones, too. As I mentioned earlier, they might not fix all of the issues we’ve discussed but good playback systems are just so much more enjoyable to listen to and really do improve the experience of watching stuff at home!
Better recording technique - employ a sound recordist. I’ve worked on excellent sound shot in the middle of the Moroccan desert and rubbish sound shot in a hotel room! Better end user technology (TV’s, speakers etc.). The person in the middle (me!) can only do so much...
Once again, thank you very much for your contribution!