Apple’s HEVC choice: Codec battle 2018?

Wednesday, 21 June 2017

What does Apple’s choice of HEVC (H.265) mean for developers, users, viewers and streamers? Jan Ozer writes that it will take a year a so to find out. His predictions include:

No major publishers implement HEVC/HLS support before 3-6 months after iOS 11/MacOS Sierra ship. This leaves the door open for a full codec analysis between AV1 and HEVC, including encode and decode requirements, hardware support, cost, IP risk, HDR support, software support, the whole nine yards. At least in the US and Europe, one of these codecs will be codec next.

Marketing hype is global, codecs are local. Premium content distributors around the world will choose the best codec for their markets. In second and third world markets, iPhones play a very small role, and there will be plenty of low-cost Android phones, and perhaps even tablets and computers, without HEVC hardware support. In these environments, VP9/AV1 or another codec (PERSEUS?) might be best.

Frame.io Enterprise - online team edit reviews for enterprises

Tuesday, 20 June 2017

Today Frame.io announced that their online video production team collaboration system now has features that are useful for larger organisations:

Enterprise offers everything large companies need to manage their creative process at scale. Admins can organize teams by department, brand, production or whatever best suits your company structure.

With this organization teams can work in their own workspaces much like they do with Frame.io today. Admins can control team access and visibility and manage thresholds for team size and resource allocations all from a single platform.

Interesting news for Final Cut Pro X users who need to share edits and notes with other team members online.

Frame.io is a edit review system. Editors can share edits and rushes with others online.

Non-editors review edits in a web browser and can access media used in the edit and selected unused media. They can review edits and make notes at specific times in the edit. They can also make drawings that other team members can see. Useful when planning new shots or briefing changes that need to be made using VFX. Team members can even compare edits with side-by-side version control.

Editors can then import these notes as markers with comments so they can see the exact point in the edit the note is associated with.

Media companies are the beginning

Interesting that Frame.io chose the 'Enterprise' suffix for this new service. The announcement may say that Vice, Turner Broadcasting Systems and BuzzFeed are already using Frame.io Enterprise, but media companies should be the tip of the video collaboration iceberg. The very features described in the press release seem more suited to non-media companies and organisations.

Although desktop video has been around for over 20 years, it hasn't yet properly broken into the world of work as a peer to the report (word processing), financial documents (spreadsheet) and presentation (presentation). Microsoft and Adobe never got video production - or at least editing - into most offices. Now that everyone has a video camera in their pocket, it is time for someone to make this happen. Online or network collaboration will help.

Trojan Horse for Final Cut Pro X

At this point the Final Cut Pro X angle becomes relevant. Although frame.io integrates very well into the Adobe Premiere and Adobe After Effects user interfaces, those applications aren't big-business friendly. Due to their history, their metaphors are for editors and motion graphics designers. The very multiplicity of windows, panels and preferences are the kind of features that experienced editors and animators like. They look pretty threatening to people with other jobs. Final Cut Pro X is the application that can be used by people who need to get an edit done, or make last-minute changes based on some notes entered into frame.io by the CEO on her iPhone.

The question for the Final Cut ecosystem is whether a future version of X will allow the kind of third-party integration that makes the notes review process for frame.io in Adobe Premiere so much better than it is in Final Cut Pro X.

HDR production: Five concepts, 10 principles

Tuesday, 20 June 2017

It is likely that the next major versions of common NLEs will support HDR. As editors we will be asked about the right HDR workflow. For now it is a matter of picking a standard, following some guidelines and maintaining metadata.

Jan Ozer writes:

HDR sounds complex, and at a technical level it is. Abstractly, however, it involves just five simple concepts.

First, to acquire the expanded brightness and color palette needed for HDR display, you have to capture and maintain your video in 10-bit or higher formats. Second, you’ll need to color grade your video to fully use the expanded palette. Third, you’ll have to choose and support one or more HDR technologies to reach the broadest number of viewers. Fourth, for several of these technologies, you’ll need to manage color and other metadata through the production workflow to optimize display on your endpoints. Finally, although you’ll be using the same codecs and adaptive bitrate (ABR) formats as before, you’ll have to change a few encoding settings to ensure compatibility with your selected HDR TVs and other devices.

Jan is a great commentator on streaming technologies, read his HDR production workflow guide at StreamingMedia.com

What happens to cross-platform post applications when OSes are less equal?

Monday, 19 June 2017

Steven Sinofsky runs down his take on Apple's 2017 Worldwide Developer Conference announcements in a Medium post. He writes that Apple’s announcements on Machine Learning…

further the gap between device OS platforms (not just features, but how apps are structured) while significantly advancing the state of the art.

On how the iPad will soon be the best solution for day-to-day productivity:

Developers take note, as iPad-specific apps will become increasingly important in productivity categories.

In case you think Steven Sinofsky is an Apple-only commentator who believes they can do no wrong, he spent years competing with Apple at Microsoft. He started there in 1989, going on to run the team developing cross-platform technologies for Microsoft Office in the 90s and ended up as President of the Windows Division in 2009.

For the past 20 years it has been assumed that the Mac and Windows operating systems will have roughly the same features. Features for users to use every day and features for post production application developers to take advantage of.

Where there are gaps or differences in implementation, developers create solutions that work on both sides. Macromedia (in the 1990s) and Adobe created their own cross-platform media control layer to even out the abilities of the Windows and Mac operating systems. Companies that developed their code on Linux workstations had to implement many features not available in the operating system.

Allow me some Apple fanboy-ism: What if one OS pulls ahead? Can Adobe take advantage of new macOS abilities and add the required code to their applications so those features are also available on Windows? Will Blackmagic Design limit the features of DaVinci Resolve to those it can implement on Linux, Windows and macOS?

As Steven says, it is about application structure as well as operating system features. Can Adobe efficiently make Premiere work with macOS in very different ways than it works with Windows? 

Will there be a point where the fact that an application works on multiple operating systems - each benefitting from different hardware ecosystems - be less important than adding features to support evolving post production needs?

That point will come sooner if the ProApps team are able to update Final Cut Pro X, Motion 5 and Logic Pro X to make the most of the new features in Apple operating systems in coming months.

BBC R&D’s IP Studio: Live production of big TV events in a web browser

Wednesday, 07 June 2017

Interested in cloud-based high end post production? Live events and TV shows need live production. A post on the BBC R&D blog explains the challenges of making a system that can do live TV production in a web browser.

They are basing their research on a system ‘IP Studio’:

…a platform for discovering, connecting and transforming video streams in a generic way, using IP networking – the standard on which pretty much all Internet, office and home networks are based.

No buffering allowed:

It’s unacceptable for everyone watching TV to see a buffering message because the production systems aren’t quick enough.

Production systems (and their IP networks) must be able to handle 4K streams - even if final broadcast is viewed at lower resolution:

We’re not just transmitting a finished, pre-prepared video, but all the components from which to make one: multiple cameras, multiple audio feeds, still images, pre-recorded video. Everything you need to create the finished live product. This means that to deliver a final product you might need ten times as much source material – which is well beyond the capabilities of any existing systems.

The trick is dealing with time. All the varying delays from hardware and software have to be synchronised.

IP Studio is therefore based on “flows” comprising “grains”. Each grain has a quantum of payload (for example a video frame) and timing information. the timing information allows multiple flows to be combined into a final output where everything happens appropriately in synchronisation. This might sound easy but is fiendishly difficult – some flows will arrive later than others, so systems need to hold back some of them until everything is running to time.

The production setup has to be able to deal with all this data, so browser-based switching and mixing software has to be tuned to fit the PC/tablet/phone it is running on and the servers it interacts with:

…we are showing lower resolution 480p streams in the browser, while sending the edit decisions up to the output rendering systems which will process the 4k streams, before finally reducing them to 1080p for broadcast.

Find out more at the BBC R&D blog.

Notes on Apple HEVC and HEIF from WWDC17

Wednesday, 07 June 2017

Apple are standardising on the next generation HEVC codec for video and image encoding, decoding and playback. HEVC (H.265) is a much better codec for dealing with video resolutions greater than HD. Here are my notes from Apple’s  2017 Worldwide Developers Conference sessions so far this week.

Here’s what Apple said about HEIF in the Platforms State of the Union address (from 1:08:07):

We've also selected a new image container called HEIF… HEIF supports the concept of compound assets. In a single file you can have one or more  photos or one or more images, you can have videos, you can have auxiliary data such as alpha and depth. It is also highly extensible: It supports rich metadata, animations and sequences, and other media types such as audio. HEIF is an ISO standard, which is critical for ecosystem adoption.

The pictures shown on screen during this section shows how flexible a HEIF contain can be.

A moment in time can be made up of multiple shots taken by cameras at the same time - such as the two in a iPhone 7 Plus. It can also have computed content, such as the depth map derived from the two images:

HEIF documents can also include multiple timelines of stills, video, metadata and data that structures all these things together:

I watched the first WWDC17 session on HEVC and HEIF. Here are my live tweets:

Here are some frames from the presentation.

The nature of HEVC .mov files. Each frame is an HEVC-encoded image. Both 8-bit and 10-bit encoding.

These devices will be able to decode HEVC movies. They may not be fast enough to play them back in real time. That might require a transcode to .H264.

Only some iOS and macOS devices will have HEVC hardware encode support, but all Macs that run macOS Sierra today will be able to encode in software.

More on the advantages of HEIF:

 

AV Foundation for HEIC capture

The AV Foundation Camera and Media Capture subsystem provides a common high-level architecture for video, photo, and audio capture services in iOS and macOS.

New:

Class AVCaptureDepthDataOutput ‘A capture output that records scene depth information on compatible camera devices.’

Class AVDepthData ‘A container for per-pixel distance or disparity information captured by compatible camera devices.’

It has been extended to deal with ‘Synchronised Capture’ - for metadata as well as depth maps.

Superclass: AVCaptureSynchronizedData ‘The abstract superclass for media samples collected using synchronized capture.’

Class AVCaptureDataOutputSynchronizer ‘An object that coordinates time-matched delivery of data from multiple capture outputs.’

Class AVCaptureSynchronizedDataCollection ’A set of data samples from multiple capture outputs collected at the same time.’

Class AVCaptureSynchronizedDepthData ’A container for scene depth information collected using synchronized capture.’

Class AVCaptureSynchronizedMetadataObjectData ’A container for metadata objects collected using synchronized capture.’

Class AVCaptureSynchronizedSampleBufferData ’A container for video or audio samples collected using synchronized capture.’

A last thought from me:

iPhone 7 Plus cameras capture depth maps. iOS 11 can store them in HEVC .mov files. Camera manufacturers had better step up!

Using Adjustment Layers as coloured scene markers in Final Cut Pro X

Tuesday, 06 June 2017

Here’s an interesting use of my Alex4D Adjustment Layer to label scenes using different colours:

Will uses the new coloured Roles features introduced in Final Cut Pro X 10.3.

Download Alex4D Adjustment Layer from my old free Final Cut Pro X plugins site.

More on assigning roles to clips and  changing the names and colours of roles

Apple WWDC17 post-production and VR sessions

Monday, 05 June 2017

Here are the sessions worth tuning into this week for those interested in what Apple plans for post-production and VR. You can watch these streams live or review the video, slides and transcripts in the weeks and months to come.

Interesting sessions include ones on

  • Vision API to detect faces, compute facial landmarks, track objects, and more. It can recognise and track, faces, elements of faces, rectangles, barcodes, QR codes and other common shapes. The Vision API can be combined with machine learning models to recognise new objects. For example, if I buy a machine learning model that recognises car number places (license plates) or even whole cars, that can be fed into the Vision API so that those things can be recognised in stills and footage, and also be tracked.
  • Depth: In iOS 11, iPhone 7 camera depth data is now available to iOS apps - both for stills and as a continuous low-resolution stream to go with video. This means iOS video filters will be able to replace the backgrounds of stills and videos, or apply filters to objects in the middle distance without affecting the background or foreground.

Monday

Keynote

Apple covered all the announcements of interest to the media and general public, including a high-end focus in post production, VR and AR on iOS.

Platforms State of the Union

Apple went into more details on all news of updates to macOS, iOS, tVOS and watchOS. Video and PDF of presentation now available.

Tuesday

What's New in Audio

1:50 PM (PDT)

Apple platforms provide a comprehensive set of audio frameworks that are essential to creating powerful audio solutions and rich app experiences. Come learn about enhancements to AVAudioEngine, support for high-order ambisonics, and new capabilities for background audio recording on watchOS. See how to take advantage of these new audio technologies and APIs in this session.

Introducing Metal 2

1:50 PM (PDT)

Metal 2 provides near-direct access to the graphics processor (GPU), enabling your apps and games to realize their full graphics and compute potential. Dive into the breakthrough features of Metal 2 that empower the GPU to take control over key aspects of the rendering pipeline. Check out how Metal 2 enables essential tasks to be specified on-the-fly by the GPU, opening up new efficiencies for advanced rendering.

Introducing HEIF and HEVC

4:10 PM (PDT)

High Efficiency Image File Format (HEIF) and High Efficiency Video Coding (HEVC) are powerful new standards-based technologies for storing and delivering images and audiovisual media. Get introduced to these next generation space-saving codecs and their associated container formats. Learn how to work with them across Apple platforms and how you can take advantage of them in your own apps.

Advances in HTTP Live Streaming

5:10 PM (PDT)

HTTP Live Streaming allows you to stream live and on-demand content to global audiences. Learn about great new features and enhancements to HTTP Live Streaming. Highlights include support for HEVC, playlist metavariables, IMSC1 subtitles, and synchronized playback of multiple streams. Discover how to simplify your FairPlay key handling with the new AVContentKeySession API, and take advantage of enhancements to offline HLS playback.

Introducing ARKit: Augmented Reality for iOS

5:10 PM (PDT)

ARKit provides a cutting-edge platform for developing augmented reality (AR) apps for iPhone and iPad. Get introduced to the ARKit framework and learn about harnessing its powerful capabilities for positional tracking and scene understanding. Tap into its seamless integration with SceneKit and SpriteKit, and understand how to take direct control over rendering with Metal 2.

Wednesday

VR with Metal 2

10:00 AM (PDT)

Metal 2 provides powerful and specialized support for Virtual Reality (VR) rendering and external GPUs. Get details about adopting these emerging technologies within your Metal 2-based apps and games on macOS High Sierra. Walk through integrating Metal 2 with the SteamVR SDK and learn about efficiently rendering to a VR headset. Understand how external GPUs take macOS graphics to a whole new level and see how to prepare your apps to take advantage of their full potential.

SceneKit: What's New

11:00 AM (PDT)

SceneKit is a fast and fully featured high-level 3D graphics framework that enables your apps and games to create immersive scenes and effects. See the latest advances in camera control and effects for simulating real camera optics including bokeh and motion blur. Learn about surface subdivision and tessellation to create smooth-looking surfaces right on the GPU starting from a coarser mesh. Check out new integration with ARKit and workflow improvements enabled by the Xcode Scene Editor.

What's New in Photos APIs

1:50 PM (PDT)

Learn all about newest APIs in Photos on iOS and macOS, providing better integration and new possibilities for your app. We'll discuss simplifications to accessing the Photos library through UIImagePickerController, explore additions to PhotoKit to support new media types, and share all the details of the new Photos Project Extensions which enable you to bring photo services to Photos for Mac.

Vision Framework: Building on Core ML

3:10 PM (PDT)

Vision is a new, powerful, and easy-to-use framework that provides solutions to computer vision challenges through a consistent interface. Understand how to use the Vision API to detect faces, compute facial landmarks, track objects, and more. Learn how to take things even further by providing custom machine learning models for Vision tasks using CoreML.

Capturing Depth in iPhone Photography

5:10 PM (PDT)

Portrait mode on iPhone 7 Plus showcases the power of depth in photography. In iOS 11, the depth data that drives this feature is now available to your apps. Learn how to use depth to open up new possibilities for creative imaging. Gain a broader understanding of high-level depth concepts and learn how to capture both streaming and still image depth data from the camera.

Thursday

SceneKit in Swift Playgrounds

9:00 AM (PDT)

Discover tips and tricks gleaned by the Swift Playgrounds Content team for working more effectively with SceneKit on a visually rich app. Learn how to integrate animation, optimize rendering performance, design for accessibility, add visual polish, and understand strategies for creating an effective workflow with 3D assets.

Image Editing with Depth

11:00 AM (PDT)

When using Portrait mode, depth data is now embedded in photos captured on iPhone 7 Plus. In this second session on depth, see which key APIs allow you to leverage this data in your app. Learn how to process images that include depth and preserve the data when manipulating the image. Get inspired to add creative new effects to your app and enable your users to do amazing things with their photos.

Advances in Core Image: Filters, Metal, Vision, and More

1:50 PM (PDT)

Get all the details on how to access the latest capabilities of Core Image. Learn about new ways to efficiently render images and create custom CIKernels in the Metal Shading Language. Find out about all of the new CIFilters that include support for applying image processing to depth data and handling barcodes. See how the Vision framework can be leveraged within Core Image to do amazing things.

Friday

Working with HEIF and HEVC

11:00 AM (PDT)

High Efficiency Image File Format (HEIF) and High Efficiency Video Coding (HEVC) are powerful new standards-based technologies for storing and delivering images and video. Gain insights about how to take advantage of these next generation formats and dive deeper into the APIs that allow you to fully harness them in your apps.

Apple courts high-end post production with macOS High Sierra and new Macs

Monday, 05 June 2017

Today’s Apple’s Mac announcements were prefaced by saying that the next version of macOS will not be about adding new features, but about improving current and adding new underlying technologies for future versions. Despite that, their software and hardware announcements seemed to be going after high-end media producers. 

The new version of macOS (High Sierra) will include support for H.265 (High Efficiency Video Coding) support. This produces the same HD quality at 40% of the data rate, and better quality at much higher resolutions: 4K, 6K and 8K. Although there isn't much use for normal 4K video playback (3840x2160), it is a minimum resolution (3840x1920) for good quality VR video playback. 

Support will be available for encoding as well as playback for all Macs that can run macOS High Sierra (all the current Macs that can currently run macOS Sierra). For more powerful Macs, hardware H.265 encoding will also be supported. Final Cut Pro X and other pro applications were specifically mentioned as going to be able to support H.265 in future versions.

Apple chooses between Oculus Rift and HTC Vive

OS support of high-end graphics will improved with Metal 2, which now explicitly supports GPU cards that are installed in external GPU boxes.

The new Apple External Graphics Development Kit seems to bless the HTC Vive is the VR headset of choice for now. Kit includes ‘Promo code for $100 towards the purchase of HTC Vive VR headset.‘ A hint as to the kind of VR device Apple might be interested in making soon - room-scale VR. The kit includes a Sonnet’s new eGFX Breakaway Box and a AMD Radeon RX 580 8GB GPU.

Apple also announced that Metal 2 is also designed to support high-end development for all kinds of VR on the Mac:

  • Unity engine
  • Unreal engine

Unreal is used by many VR experience vendors. The keynote included a demo from ILMx featuring VR experience authoring running on a new iMac.

Final Cut Pro X was mentioned a second time when it was mentioned in the context of being able to edit VR 360º Video without plugins.

High-end video features of iOS 11

Apple also announced sessions that cover features in iOS 11 that would be useful on Macs doing post production.

In iOS 11, apps can get access to depth map information from the iPhone 7 Plus camera system. That means applications will be able to build 3D models of what is captured. The API includes giving video capture apps the ability to capture a stream of depth data alongside video information. Very useful for being able to composite CG graphics into scenes, so that imaginary objects can be drawn further away and be drawn behind real-life objects and people closer to the camera.

iOS 11, macOS High Sierra and tvOS will also have a new Vision framework:

a new, powerful, and easy-to-use framework that provides solutions to computer vision challenges through a consistent interface. Understand how to use the Vision API to detect faces, compute facial landmarks, track objects, and more. Learn how to take things even further by providing custom machine learning models for Vision tasks using CoreML.

CoreML is a the new Machine Learning part of Apple’s OSes.

New MacBooks and iMacs available today

High-end pros are also being courted through new hardware today and the promise of new hardware tomorrow. Today sees improvements in speed and configurations for MacBooks, MacBook Pros and iMacs.

iMac

  • Faster Kaby Lake processors 2.3/4.5 GHz
  • SSD storage twice as fast
  • Thunderbolt 3
  • 50% faster Radeon Pro 500-series graphics 
  • 43% brighter 500-nit display

MacBook Pro

  • Faster Kaby Lake processors 3.1/4.1 GHz
  • More RAM in discrete graphics

Press information on Mac updates

iMac Pro in December

Earlier this year Apple said that they are working on a new Mac Pro with more power and more modularity. Today Apple made their task that much harder by previewing a new iMac Pro that is much more powerful than the current Mac Pro.

  • Up to 18-core processors
  • 22 Teraflops of GPU performance using Radeon Vega GPUs with 16GB of RAM
  • Up to 4TB of SSD
  • Up to 128GB of ECC RAM
  • 2 Thunderbolt 3 controllers (so two RAID arrays and 5K displays can be connected)
  • 4 Thunderbolt 3 ports
  • 10Gb Ethernet port
  • Available in Space Grey
  • Prices starting at $4,999

Press information in the iMac Pro.

Hardware for Pros

As the new iMac Pro is much more poweful than the current Mac Pro and at lower prices, there is no doubt that Apple is still interested in pros.

Who wouldn’t want to do post-production on the new iMac Pro? It looks like Apple are going directly after companies hoping to get high-end postproduction folk to switch from Mac. Could this be bad news for Dell and HP? Perhaps. Now that Apple have revealed the specs of a future iMac, competitors might be able to make sure that their hardware matches Apple by December. Apple must be confident though, otherwise they would not have revealed the price. That they did means that they probably think that their competitors don't have the ability to compete even with 5 months notice.

VR, Final Cut Pro X and Apple WWDC17

Saturday, 03 June 2017

This week at their Worldwide Developer Conference in San Jose, Apple are making announcements about future products and services. They are also giving presentations for developers about macOS, iOS, tvOS and watchOS.

Over the week I’m hoping to see updates relevant to VR and Final Cut Pro X at WWDC17.

VR

To mix reality (make it seem that graphics is interacting with the real world) it is useful to be able to record the distance of objects from the camera. This means graphics can be seen to be hidden behind objects caught on camera. So I'm hoping for depth maps recording by Apple’s device cameras ( Apple’s patent on Depth mapping based on pattern matching and stereoscopic information ) - as well as the iPhone, this would be useful for iPad, Macs and Apple TVs.

If Apple want to help applications deal with mixed reality, it would be very useful if there was at lease one new flavour of ProRes, one that records depth maps too. ProRes 44444 anyone? While we are on the subject, context sensitive video overlays are likely to become more popular in coming years, so it would be very useful if more PreRes flavours included alpha channel information: ProRes 4224 and 4224 Proxy would be a start!

Siri Speaker room sensor. If Apple to release a high-quality speaker that customises the sound it produces based on position relative to a screen and depending on the shape of the space it is in, those sensors could also be used for room-scale VR and MR. The speaker could help detect where VR/MR headsets being used are in 3D space.

The problem with phone screen resolution and VR headset resolution is that VR resolution on an iPhone is overkill, but iPhone resolution is a little low for high-quality VR. For when Mixed Reality becomes popular, you won't be able to see what is in the real world if an iPhone is in the way. However, the GPU in iPhones is very powerful. If it could be kept in a pocket (to provide relative position in 3D space) while providing GPU power to an external device via Lightning (or Thunderbolt 3), then that lightweight device can be that much lighter and need less battery. So look for iPhone GPU power for external devices via lightning or thunderbolt.

Final Cut Pro X

As regards Final Cut Pro X, look out for updates to Core Media and AV Foundation in macOS (and iOS if Final Cut Pro X/iMovie is to move to the iPad). Although some say that the ProApps team don’t use developer frameworks, you can see how the OS team and the ProApps team are thinking about media in modern OSes with the Core Media and AV Foundation updates. 2015 AV Foundation presentation, 2016 AV Foundation presentation.

For those interested in cloud-based Apple services for pros, also look out for updates on HLS - HTTP Live Streaming and iCloud. 2016 HLS presentation.

The good news is that Apple will publish videos and transcripts of all the WWDC 2017 presentations. I'll update this post with links to relevant videos as they become available.

 

 

 

The H.265/HEVC state of play

Thursday, 01 June 2017

Apple seem pretty quiet when it comes to blessing the H.265 codec. It is a codec dedicated to better quality for large raster video at lower bandwidths. These kind of codecs are needed for 4K broadcast and streaming, and are useful for 360º/VR video distribution.

Like DV, HDV and H.264, new codecs are designed to be efficient using hardware that is expected to be commonly available a few years after launch.

That means that H.265 (aka HEVC or ‘High Efficiency Video Coding’) algorithms expect to access the kind of power that isn't mainstream yet. Although today’s commonly available hardware can decode H.265 quickly, encoding is more of a problem. This is especially true of Apple’s currently anaemic Mac hardware.

Another reason is that patents and algorithms mean that the best way of encoding 4K, 6K and 8K video streams hasn't yet been settled on by the industry.

The state of High Efficiency Video Coding codecs has been summarised by Jan Ozer of Streaming Learning Center. A PDF of the presentation he gave at Streaming Media East in May describes the state of play comparing different HEVC implementations, VP9 and Bitmovin AV1.

His conclusions include: 

  • Particularly at lower bitrates, x265, Main Concept (H.265) and VP9 deliver substantially better performance than H.264
  • Both HEVC codecs and VP9 produce very similar performance
  • Choice between x265 and Main Concept (H.265) should be based on factors other than quality
  • AV1 Encoding times are still very inefficient
  • AV1 is at least as good as HEVC now, and will likely be quite a lot better when specification has been fully decided on - it is still in development

Find out more on Jan’s blog.

The bottom line? Here is Jan commenting on some feedback to a post of his at Streaming Media

HEVC will do well in broadcast, no doubt. Still not available in any browser, iOS, and Netflix prefers VP9/AV1 over HEVC for Android. VP9 gets you most browsers and many smart TVs and OTT boxes (like Roku 4), so it's the smart money UHD codec if you don't need HDR.

Automated video editing will very soon be ‘good enough’

Tuesday, 30 May 2017

A team of Stanford researchers have published a paper on automatic editing of dialogue scenes. Their system may not automatically edit well, but it can now create edits that the majority of people will see as ‘good enough.’ This means editors have new competition. As well as being technically proficient, and be able to handle all sorts of political and psychological situations, they will have baseline edits to improve on.

Computational Video Editing for Dialogue-Driven Scenes describes a system where different combinations of editing priorities (which kind of shots to favour, which kind of performances to prioritise) are defined as editing idioms. These idioms can then be applied to footage of dialogue scenes when accompanied by a script.

Identify clips

Their system takes a script formatted in an industry standard way, analyses multiple takes of multiple camera setups and divides ranges of each take into candidate clips. These shots are assigned automatically generated labels defining

  • The name of the character speaking the line of script
  • The emotional sentiment of the line of script (ranging from negative to positive via neutral)
  • The number of people in the clip
  • The zoom level of the clip, i.e. the framing of the clip (long, wide, medium, closeup, extreme closeup)
  • Who is in the clip
  • The volume of the audio in the clip
  • The length of the clip (as part of a much longer take - the speed a given line is said)

Editing idioms

The researchers then analysed multiple editing idioms (pieces of editing advice) and worked out what combination of clip styles would result in edits that match a given style:

Change zoom gradually: Avoid large changes in zoom level

Emphasize character: Avoid cutting away from an important character during short lines from the other characters Favor two kinds of transitions; (1) transitions in which the length of both clips is long, and (2) transitions in which one of the clips is short and the important character is in the set of visible speakers for the other clip and both clips are from the same take.

Mirror position: Transition between 1-shots of performers that mirror one another’s horizontal positions on screen.

Peaks and valleys: Encourage close ups when the emotional intensity of lines is high, wide shots when the emotional intensity is low, and medium shots when it is in the middle.

Performance fast: Select the shortest clip for each line

Performance slow: Select the longest clip for each line

Performance loud: Select the loudest clip for each line

Performance quiet: Select the quietest clip for each line

Short lines: Avoid cutting away to a new take on short lines

Zoom consistent: Use a consistent zoom level throughout the scene

Zoom in/out: Either zoom in or zooming out over the scene

Combine idioms to make a custom style

Using an application the researchers showed how individual idioms (or pieces of editing advice) and specific instructions (“start on a wide shot’ or ‘keep speaker visible’) can be combined to make an editing style. Each element can be given a weight ranging from ‘always follow instruction’ to ‘always do opposite of instrcution.’

This UI mockup shows how an editing style can be built where the elements are ‘start with a wide shot, avoid jump cuts, show speaker’:

 

The paper comes with a demo video that explains the process and give examples of a scene professionally edited and the same scene automatically edited using different editing styles.

To see more example videos and source footage visit the paper’s site at Stamford.

Time savings

The impetus behind developing system was to save time, and to save the cost of hiring a professional editor. 

For multiple dialogue scenes the researchers timed how long it took for an professional editor to review all footage and come up with an edited scene. As this method is at the research stage, the kind of analysis that the tools need to do on the video takes a long time. In the case of the scene shown in the demo and in the screenshot, a 27 line scene with 15 takes (of varying shot size and angle) amounting to 18 minutes of rushes took 3 hours and 20 minutes to analyse. The professional editor took 3 hours to come up with an edit.

The advantage came when changes needed to be made in editing style. The automated system could re-edit the scene in 3 seconds. It would take many times longer for an editor to re-edit a scene following new instructions. The analysis stage was done on a 3.1 GHz MacBook Pro with 16GB of RAM. With software and hardware improvements the time it takes to turn multiple takes into labelled clips will reduce significantly.

What does ‘good enough’ mean for editors and post production?

To me this method marks a tipping point. For productions with many hours of rushes, these kind of automated pre-edits are good enough. Good enough to release (with a few minutes of tidying up) in some cases. Good enough to based production decisions on (such as ‘We can now strike this set’). Good enough so that a skilled editor can spend a short time tidying some of the automated edits and preparing it to be shared with the world.

Although the researchers haven't encoded the kind of editing idioms many good editors actually follow, the ones they have chosen will do for many situations. There are two reasons for this: the researchers don’t know these practices, or they don't yet have a way to detect elements of scripts and source footage that editors currently base their personal editing idioms on.

One of the great things about the job of being an editor is that it is hard for others to compare your editing abilities with other editors. Up until now, a person would have to look at all the footage and all the versions of the script for a given production to judge whether the editor got the best possible result. Even then, that judgement would only be one more person’s opinion.

Now an editor’s take can be compared with automated edits like the ones described in this paper. Their style will soon be able to be detected and encoded as an editing style for automated edits. Could I sell a plugin based on my editing idiom? 0.1% of receipts would be big enough royalty of me!

The good news for editors who are worried about being replaced is that once your skills get to the level of ‘not obviously bad’ - which is the ability to do edits that aren't jarring, that flow from moment to moment and scene to scene - other factors take over: to be the kind of person who fits into the wider organisation, to be the person who you can share a small space with for hours on end, a person who can judge the politics and psychology of situations with collaborators at all levels.

Who knows when this kind of technology will be available outside academia? For now it is worth bearing in mind that alongside the three researchers from Stanford University, Mackenzie Leake, Abe Davis and Maneesh Agrawala the authorship of the paper was also shared with Any Truong of Adobe Research. 

Today at Apple: Hours of free classes on Final Cut Pro X

Monday, 29 May 2017

‘Today at Apple’ is a new programme of events and training at Apple locations worldwide. ‘Pro Series Sessions’ is a category of free training for Final Cut Pro X and Logic Pro X. Here is a rundown of free education for those looking to learn about Final Cut Pro X. The sessions are designed so that you run your copy of Final Cut Pro X on a MacBook you bring in. If you don’t yet have Final Cut Pro X, a MacBook Pro with it installed can be provided for each session.

Go to the 'Today at Apple - Pro Series Sessions’ page for US · UK · France to find out when these sessions are available at an Apple Store near you and book your free places. There are other training sessions at Apple stores, visit the ‘Today at Apple’ page, choose your country to find out more.

Intro To Final Cut Pro X

No matter how you plan to use Final Cut Pro, this 90-minute session takes a deep dive into its features. Let us show you how to arrange clips to tell your story, perfect the look of your video and improve audio quality. A MacBook with Final Cut Pro can be provided, or bring your own. Attendees should have a good understanding of movie editing and Mac basics, or be stepping up from iMovie.

Pro Series: Import, Sort and Organise

Film editors know that to be efficient, you need to be organised. Join us for a Final Cut Pro session on workflow. We’ll show you smart settings for video importing, ways to sort your media and how to organise like a pro. Attendees should have a good understanding of movie editing and Mac basics, or be stepping up from iMovie.

Pro Series: Techniques for Storytelling with Final Cut Pro X

Join us as we explore creative storytelling in Final Cut Pro X. You’ll discover how techniques in colour, music and editing can push your narrative forward and captivate your audience. Attendees should have a good understanding of Final Cut Pro X or be stepping up from iMovie.

Pro Series: Refine Your Audio with Final Cut Pro X

Join us as we explore controls and techniques in Final Cut Pro X that allow you to sculpt sound to match your scenes. We’ll explore noise reduction, add effects and music, and mix down to create stunning audio to go with your visuals. Attendees should have a good understanding of Final Cut Pro X or be stepping up from iMovie.

Pro Series: Colour Correction and Grading with Final Cut Pro X

Join us as we explore how colour can make your project visually and emotionally stunning. We’ll use colour correction and grading to balance colours in your movie. Then we’ll explore techniques that emphasise colour in stylistic ways. Attendees should have a good understanding of Final Cut Pro X or be stepping up from iMovie.

Pro Series: Create Studio-quality Titles

Join us and we’ll focus on when and how to use titles and text to set the tone of your movie. You’ll create, alter and add exciting effects to your movie’s text. Then we’ll practise our skills by completing a mini challenge. Attendees should have a good understanding of Final Cut Pro X or be stepping up from iMovie.

If you also want to learn Logic Pro X, there is a 90-minute intro and 60-minute sessions on Looping and Layering, Editing for Emotion plus Mixing and Mastering.

Studio Hours

The Pro Series sessions aren't yet available in many countries, but Apple Stores all over the world offer ‘Studio Hours’ - these are sessions where people who have started or who are about to start a project can work in a store with an Apple Creative nearby. Creatives are there to offer advice and tips on how to design, setup and progress your project. These hours are grouped by topic. As well as video projects, studio hours are also available for music, for photos, for documents, presentations and spreadsheets and for art & design projects.

Apple’s new free year-long course in app development - How about film making next?

Wednesday, 24 May 2017

Today Apple announced a free course that is available for school and university students to learn coding:

Apple today launched a new app development curriculum designed for students who want to pursue careers in the fast-growing app economy. The curriculum is available as a free download today from Apple’s iBooks Store.

App Development with Swift is a full-year course designed by Apple engineers and educators to teach students elements of app design using Swift, one of the world’s most popular programming languages. Students will learn to code and design fully functional apps, gaining critical job skills in software development and information technology.

There is currently an iOS app development gold rush. Stories of individuals making thousands by selling on the iOS app store have captured the mainstream imagination.

In reality - much like the 19th century US gold rushes - only a small proportion of app developers will be able to support themselves on iOS app royalties.

Many who make videos and film believe that storytelling with video - video literacy - is a skill that almost everyone would benefit from having. I think Apple could offer a very similar course based on the tools they make:

Apple today launched a new media development curriculum designed for students who want to accelerate their chances of success through video literacy. The curriculum is available as a free download today from Apple’s iBooks Store.

Telling stories with Apple Applications is a full-year course designed by Apple engineers and educators to teach students the fundamentals of storytelling using Apple’s iOS and macOS applications. iMovie for iOS and macOS is the most widely distributed video editing software. Final Cut Pro X has been bought over 2 million times from the Mac App Store. Students will learn to develop stories using these tools and more - including Clips for iOS, FileMaker Pro and Motion 5 for macOS, gaining critical job skills in all fields.

For now the iBooks store offers a free enhanced book: iMovie for Mac macOS Sierra. It is a full introduction to editing with iMovie - including source video and audio footage. That's a good start. Once this is combined with similar lessons for other Apple apps and applications alongside the theory of how to communicate with video, Apple could change the lives of thousands of students and adults all over the world.

 

Documentary on Apple’s Final Cut Pro X - an echo chamber inside a bubble?

Tuesday, 23 May 2017

Off the Tracks is a forthcoming documentary about the launch and adoption of Final Cut Pro X. The first trailer dropped yesterday.

One of the bubbles that some are in is ‘editing software.’ An echo chamber inside that bubble is ‘Final Cut Pro X fans’ - who are a subset of Final Cut Pro X users. I wonder how many non-X editors will be interested in this film. Have the makers made it appealing enough for non Final Cut fans to watch? Or non-editors? Maybe trailer 2 will hint at what their take is. 

There is a chance that they have included lessons that apply outside the #fcpx echo chamber, outside the editing software bubble. That might attract a wider audience.

PS: Fellow bubble-folk: Here is the transcript of my interview with Randy Ubillos, creator of Adobe Premiere, Final Cut Pro 1.0 and Final Cut Pro X.

New video-related Apple patents - 23rd May 2017

Tuesday, 23 May 2017

Yesterday Apple were awarded patents covering frame rate conversion detection and shot stabilization.  

9,661,261: Video pictures pattern detection

For a video that has been converted from one frame rate and format to another frame rate and format, the application detects the conversion method that has been used in the conversion of the video.

9,661,228: Robust image feature based video stabilization and smoothing

The method matches a group of feature points between each pair of consecutive video frames in the video sequence. The method calculates the motion of each matched feature point between the corresponding pair of consecutive video frames. The method calculates a set of historical metrics for each feature point. The method, for each pair of consecutive video frames, identifies a homography that defines a dominant motion between the pair of consecutive frames.

 

 

 

 

 

THX innovator named in new Apple audio patent

Monday, 22 May 2017

It is curious that iMovie for the Mac offers an auto-audio ducking feature, but Final Cut Pro X doesn't. Curious because the free iMovie and the $299 Final Cut Pro X share a great deal of code and resources.

Audio ducking reduces the volume or dynamic range of other channels to make one channel’s sound easier to hear.

Tom Holman is a veteran movie sound expert. Lucasfilm's THX cinema sound certification system is named after him. Since 2011 he has worked for Apple. Last week Apple was awarded a patent with his name on it: ‘Metadata for ducking control.’ It describes a process where audio is analysed to generate metadata on how to adjust other audio channels at the point of playback:

Application of these ducking values may cause (1) the reduction in dynamic range of ducked channels/channel groups and/or (2) movement of channels/channel groups in the sound field. This ducking may improve intelligibility of audio in the non-ducked channel/channel group. For instance, a narration channel/channel group may be more clearly heard by listeners through the use of selective ducking of other channels/channel groups during playback.

I hope this kind of metadata will be generated, read and written by Final Cut Pro X and encoded in QuickTime codecs soon!

 

VR News news - The state of the art in 2017

Thursday, 11 May 2017

Zillah Watson, a news producer with more VR experience than most, has written a report on VR and news broadcasting for The Reuters Institute for the Study of Journalism.

One point in the executive summary calls for news people to join together to lobby the tech world to reduce the walled gardens and create better hardware and standards for wider VR adoption.

Some excerpts:

The proliferation of content created through experimentation is solving some of the challenges involved in VR/360 storytelling. Journalists and news organisations are devoting more time to thinking about what works in VR, and as a result news VR is expanding beyond its early documentary focus. However, most news organisations admit that there is still not enough ‘good content’ to drive an audience.

 

360 may be a good short-term solution to increasing the availability of content. Alongside developments in storytelling, we see some impressive attempts to integrate VR across production, which across the board means that hundreds of journalists have now been trained to shoot 360.

 

The news industry needs to work harder at managing public expectations of VR. Playing with 360 may be fun for journalists, but the audience needs to be put at the heart of any serious future plans for VR. Audience adoption requires consumer literacy in how to engage with the new technology. Even if part of that education happens through audiences’ consumption of VR content in other areas – sport, gaming – news still has to show them why it is worth engaging with via this new medium.

Too many standards - don't count VR video out

There are too many platforms: the ‘walled gardens’ around different VR platforms makes it expensive to produce content for a range of devices. There are parallels with the early days of mobile apps, which required different builds for each. Bandwidth is also an issue for viewers consuming this content.

Platforms and device manufacturers need to up their game if they are going to get mainstream audience adoption. This includes improved hardware and common platforms to provide a frictionless user experience, and lower costs for headsets and bandwidth. 

The news industry needs to work together on this to present a united front when lobbying the tech platforms.

[Emphasis mine]

Although many see 360/VR video only as a gateway to 'full' VR, I wonder if the multiple VR platforms will coalesce faster than video - including 360 video - gets richer. Flat rich video will eventually be broadcast as objects: a cloud of video, audio, text and 3D objects that can be played back or even interacted with using standards-based players on the internet and on set-top boxes. Once that works with flat video, it could make 360 more interesting, which could lead to a full VR standard.

Broadcasters and technologists’ report on VR

Wednesday, 03 May 2017

DVB (Digital Video Broadcasting) is an industry-led consortium of the world’s leading digital TV and technology companies, such as manufacturers, software developers, network operators, broadcasters and regulators, committed to designing open technical standards for the delivery of digital TV and other broadcast services.

Late last year DVB commissioned a report (PDF) to see whether they should set up a group to define a standard for VR to be used with digital broadcasting. Here are some quotes:

We first look at the market segmentation between the tethered devices (Oculus Rift, HTC Vive), game platforms (Sony PS VR) and untethered devices (Gear VR, Consumer HMD, Cardboard). We predict that untethered devices will be 10x the volume of tethered ones, that will appeal more to gamers ‘community.

 

We assess the size of the market on the device side considering different market researches available on a 2020 horizon. A medium scenario shows $20B revenue in 2020. This is followed by a market sizing of the VR Video services on a 2020 horizon.

We estimate that by 2020, VR will generate between $1.0B and $1.4B revenue, the largest application being Live sports. VR Theme Parks & VR arcade games will be a lucrative business for both games and video and will, just as GPS was democratized with car rental, help evangelize VR.

 

  • Principal bodies involved in VR standardisation include ISO, IEC JTG MPEG, JPEG, and DASH IF, and possibly ITU-T and ITU-F in future. It is not clear how their activities overlap which may become the dominant standards for VR.
  • MPEG are developing an Omni-directional Media Applications Format (OMAF) standard, as well as a Media Orchestration (MORE) interface for video stitching and encoding, and are considering Tiling mechanisms for region of interest encoding (using a dual layer SHVC approach).
  • JPEG are developing various file formats including: JPEG XT (omni- directional photographs), JPEG XS (low-latency compression format for VR), and JPEG PLENO (lightfield video format).
  • 3GGP are looking at VR standardisation for wireless mobile services, considering delivery of VR video content through current as well as 5G systems.
  • DASH-IF are planning test and trials of VR delivery using DASH technology
  • A VR Industry Forum is currently established to promote VR: which may develop guidelines, encourage use of common formats, and share experiences with VR.

 

  • It is likely the main commercial driver for tethered VR will come from gaming, whereas the main driver for untethered VR will come from immersive video for sports and music events. The demand for content will depend on its availability and quality of experience.
  • DVB should cooperate with standards bodies working in VR, as members will need to adopt common specifications for stream delivery of VR content. Requirements are needed for the minimum technical quality of VR video and audio, particularly to reduce cybersickness. Requirements should be completed within two years (mid-2018)
  • In terms of quality of service, consideration must be given to the desired frame rate, field of view, visual acuity, degree of visual and audio immersion, head tracking latency, and visual overlays
  • VR audio will need additional support, both for broadcast and broadband transmission.
  • In the short term support is needed to avoid a multiplicity of groups and proprietary panoramic 3 degrees of freedom VR video systems, and considering requirements key parameters such as frame rate, resolution, use with tablets etc. For example, Sky’s provisionally specifies the following VR formats: video: 2-4K resolution, H.264, 25-50 FPS, 20-60 mbps bitrate, audio: stereo or ambisonic.
  • For the longer term it is recommended to continue the study mission to follow developments such as panoramic 6 degrees of freedom VR, augmented reality, and mixed reality.
  • Commercial requirements group would begin their work the questionnaire to DVB members. In addition, the group may consider developing a DVB VR garage, where VR technologies could be neutrally badged under DVB.

 

Sennheiser pushing 360º audio recording with forthcoming prosumer headphones - UPDATED

Monday, 01 May 2017

A step towards ambisonic audio going mainstream: the forthcoming Sennheiser Ambeo Smart Headset headphones have microphones in each ear and (I assume) a sensor to record head position. The device encodes ambisonic audio which is sent to your iOS device to be recorded as an audio file or as the soundtrack to video you are recording. [Not correct, see update below]

Ambisonic audio records a sphere of audio - so that when you play it back, if you turn your head, the sound seems to stay in the same place. This is more like the real world where if you hear a door open to your left and turn to see who is coming in, the audio source will come from in front of you, not from your left.

 

Richard Devine reports in iMore:

VR and AR is the latest hotness, and a big part of the experience there is the audio. After all, having a fully immersive, 360-degree visual experience is going to lose a lot without the necessary audio to go with it. As Sennheiser said during its brief presentation at the event, your eyes see information, your ears hear emotion.

Ambeo is the branding applied to the company's 3D audio products, and it already has one of the world's first portable VR microphones. The idea is straightforward; just as you're capturing 360-degrees of video, Ambeo captures audio in the same sphere, rather than a flat plane as you'd get with regular video content.

Upon plugging the Smart Headset into an iPhone, you're prompted to install a companion app which doesn't yet exist. It's early days still, so that's something we can overlook, but you don't need it. The iPhone detects it just fine as an external microphone and you can use it with the stock camera app or a third-party one such as Filmic Pro.

Sennheiser has not yet announced a release date. The first version will have a Lightning connector for iPhones and iPads. A following version will have a USB C connector. 

2nd May UPDATE:

 

Peter Hajba has pointed out on Facebook that this product will record binaural audio, not ambisonic. Sennheiser’s more specific press release from earlier this year.

Thinking it through, it would be very impressive for this to work as I hoped without significant separation between microphones - and without a third mic!