[an error occurred while processing this directive]
By Ken Kerschbaumer
One of the biggest issues facing broadcast and teleproduction professionals is how to archive digital media. In the past, archiving was done one way and one way only--by creating shelf upon shelf of tape storage space to hold videotapes containing source material and completed programs.
But the digital age provides a number of options for archiving. They include:
The traditional method of archiving will remain relatively the same--buy some shelves and stack some tapes. The advantage of the traditional method is cost, as you would be limited to the cost of the original media, the shelving, some database software, barcoding equipment, and an archivist. The disadvantage? You won't be taking advantage of other digital and computer technologies to improve the speed and accuracy with which users can retrieve video and audio content.
The second method, the use of data storage tapes require the digital video and audio material on the original tapes to be transferred to data tape drives. Once on the data storage tape, information can be gathered off the tapes quickly, and if a cache drive or similar products from companies like StorageTek are used, a "nearline" environment is created. Nearline takes the information you call up from tapes and then stores it in a disk cache, which allows for easy access to the desired information.
Because the information on data storage tapes is kept in compressed data mode, it will require far less bandwidth and fewer tapes than the traditional tape storage method. For example, data storage systems like the one available from Ampex can currently store approximately 30 hours of material per tape.
In addition, because the tape libraries have sustained transfer rates of 15 megabytes per second, compressed video files can be moved from tape to the cache disk via SCSI data ports at multiples of realtime, reducing wear on the machines, and freeing staff for other productive activities. Fibre channel networks can be used for even faster transfer time.
Another advantage of data storage tape is that because it has similarities to videotape storage the transition is easier. For example, on high-performance helical scan data drives, longitudinal tracks are used to identify the address of a file on tape. This is similar to the way longitudinal time code is used to locate video elements.
Data storage tape is not without its disadvantages. For instance, simple audio or video insert edits are no longer available, because there is no audio or video to insert edit into. The files are only a compressed digital data representation of the original analog or digital audio or video. When a file is overwritten, or replaced, it is referred to as an "append." Once a file is appended, all information already on the tape following the append is lost unless the tape drive is sophisticated enough to support partitioning. Partitioning permits administrators to divide a tape into many small segments where only the information contained within the partition will be affected by appending.
As a result, partitioning is a key feature for broadcasters who plan to store constantly changing video such as commercial spots. Standard partition sizes can be formatted to accommodate the size of the average commercial. This allows replacement of only a single spot within a given partition, and does not affect the rest of the items stored on the tape.
The other drawback to data storage tape is that it is still tape. Unless a disk system is used, you'll still have linear access tapes on a shelf. Granted, much fewer tapes, but tapes nonetheless.
Video Servers
The introduction of video server and disk array technology into the video marketplace signaled a major turning point in the concept of how video content would be handled within a broadcast station or post production facility. In addition, because the servers use similar technology to that found in computer hard drives, the video market was introduced to the concept of rapid hardware introductions that offer vast improvements over previous generation's capabilities.
Today video server technology is still too new to provide a cost-effective, 100 percent archiving solution. Disk capacity is still too small to allow for cost-effective archive systems to be created and maintained. But video servers, for all their current limitations, do hold the key to a future filled with easy access to hour upon hour of video and audio.
Until the cost of server storage reaches a point where it's affordable to have 100 percent online storage, hybrid solutions will be needed. One example of a hybrid solution consists of RAID-based online storage used in conjunction with a nearline robotic tape library and an offline computer-controlled analog or digital videotape player array. With the use of a storage management software system that is based on a pre-programmed priority or history of usage, it's possible to contain the costs and access times involved in retrieving content.
Fast Access
The biggest advantage video servers offer is online or nearline access to desired video content (also called Video On Demand--VOD and Near Video On Demand-NVOD, respectively). Over the years, video indexing has always been the top challenge to maintaining a simple-to-use archive. Archivists would watch a videotape and then type in the key words that would allow a future user to easily gain access to the desired footage.
The difficulty, however, is that the archive process is reliant on the judgment of the archivist, and therefore is subject to differences of interpretation as to which videos would be listed under which keywords.
But in order to fully take advantage of the speed and flexibility offered by video server archiving, a more complex system for properly indexing the video content is required.
One manufacturer of a video indexing system, IslipMedia, has an indexing process that consists of five steps. The first step is the digitizing of the audio and video material, either analog or digital, into a standards-based MPEG format. Then the material is sent to a fast processor that generates a time aligned full-content topical index using speech and language understanding.
Next the video is segmented into meaningful "video paragraphs" using language, image, and audio cues. Image analysis on the video portion of the data then creates the filmstrips and icons to include key scenes. Finally, a comprehensive full-content index of the video collection is built.
Building An Index
If the video information you're looking for is stored on a video server, calling up clips on a given subject will be as simple as calling up word documents on a word processor, or doing a term search on an Internet search engine with a browser. But unlike a word document, where a keyword text search is the most accurate, an accurate video search system should include text, edits, and images. The creation of complex video indexing systems is where much energy, time and money is being expended by a number of manufacturers. But the future of archive retrieval will be keyword, image, and even based on voices--quite an advantage over current retrieval systems.
The approaches taken by the different video indexing systems may vary slightly, but all will use three broad categories of technologies to create and search a digital video library or broadcast and unedited video and audio materials.
The first way to index material is through text processing. This technique will look at the textual representation of the words that were spoken, as well as other text annotations. These may be derived from the generated transcript, accompanying notes or from the closed captioning that might be available on broadcast material.
Text analysis of scripts can work on an existing transcript and segment the text into video paragraphs. In addition, an analysis of keyword prominence allows users to identify important sections in the transcript, and to more
easily search for relevant video information. For instance, suppose you're looking for video that appears while the script mentions "the evolution of species." Simply type in the phrase and the system will pull up all clips that mention the whole phrase or the words "evolution" or "species."
Next is image analysis. This method looks at the images in the video portion of the MPEG-encoded stream. The analysis is used two ways: first, for the identification of scene breaks and to select static frame icons that are representative of a scene; second, to identify a specific image, such as that of a celebrity whose image and name are stored in a reference database and then compared to the archived material for a match.
For scene break identification, image statistics are computed for primitive image features such as color histograms, and these are used for indexing, matching, and segmenting images.
For example, color histograms will measure differences from scene to scene to automatically tell the server where to begin and end a given video paragraph. Those scenes with little change and disparity will be listed as one paragraph, while those with greater changes from frame to frame will be listed as individual scenes.
Optical flow analysis is another important method of visual segmentation and description based on interpreting camera motion. Pans, zooms and cuts can be interpreted by examining the geometric properties of the optical flow vectors. According to a report in the August 1997 SMPTE Journal, using the Lucas-Kanade gradient descent method for optical flow, individual regions can be tracked from one frame to the next. By measuring the velocity that individual regions show over time, a motion representation of the scene is created. Drastic changes in this flow indicate random motion, and therefore, new scenes. These changes will also occur during gradual transitions between images, such as fades or special effects.
The final way to search through an archive will be by speech analysis. One method used to automatically transcribe the content of the video material for American English content is the Sphinx-II system, a large-vocabulary, speaker-independent continuous speech recognizer created at Carnegie Mellon is used. The process addresses three knowledge sources: acoustic modeling, pronunciation modeling and language modeling.
The Sphinx-II system uses semi-continuous Hidden Markov Models, a statistical representation of speech events (e.g. word), to characterize context-dependent phones (triphones), including between-word context. The recognizer processes a utterance in three steps. It makes a forward time synchronous pass using full between-word models, Viterbi scoring and a trigram language model. This produces a word lattice where words may have only one begin time but several end times. The recognizer then makes a backward pass that uses the end times from the words in the first pass and produces a second lattice that contains multiple begin times for words. An A algorithm is used to generate the best hypothesis from these two lattices.
Conclusion
The future of archiving promises to be one filled with greater accuracy, flexibility and ease of use compared to yesterday's systems. Shuttling through tape after tape could very well become a thing of the past, as users instead shuttle through thumbnail images of the first scene of a given clip on the video servers and RAID arrays, using tools similar to today's World Wide Web search engines. Archiving will become even more important, as consistency and thoroughness will be needed for optimum efficiency.
In addition, constant improvements in computer processing and bandwidth will change the capabilities of video servers and indexing systems. What tomorrow's archiving systems will eventually offer is anyone's guess, but constant improvements in server technology make future capabilities seem limitless.
References
[1] Wactler, H.D., Hauptmann, A.G., Smith, M.A., Pendyala, K.V., and Garlington, D., "Automated Video Indexing of Very Large Video Libraries", SMPTE Journal, August, 1997.
[2] Pendyala, K.V., Wachtler, H.D., and Juliano, M.J., "Automated Indexing and Retrieval From Large Digital Video Libraries", Islip Media Network, 1997 NAB Broadcast Engineering Conference Proceedings.
[3] Hennessy, John, "The Role of Data Storage in Broadcasting's Future", Director of Business Development, Ampex Corporation, Television Broadcast, November, 1997.
By Craig Birkmaier
"The business of digital television broadcasting is the management of the data multiplex that feeds the 19 megabit per second channel, so as to maximize the revenue that can be produced at any given moment in time."
Welcome to the brave new world of digital television.
The statement above was the cornerstone of a paper presented by the author at the 31st SMPTE Advanced Motion Imaging Conference in February 1997. "A Visual Compositing Syntax for Ancillary Data Broadcasting," the paper explored the new medium of digital television and emerging requirements to re-engineer the way digital video and other forms of digital media content are conceived, produced, distributed and consumed.
The underlying premise of the business model suggested in the statement above infers that we can measure the tangible value of bits, in much the same way that program content and commercial inventories represent television industry assets today. The fundamental difference in this business model is that the broadcaster is no longer trying to maximize revenue by delivering the largest number of eyeballs; with digital broadcasting it will be possible to deliver bits to many different groups of eyeballs simultaneously, or multiple streams of information to a single viewer. Equally important, the broadcaster may find that it is better to deliver simultaneous "competing" services through the digital channel, rather than risk the possibility that viewers will tune into another multiplex to find the desired information.
The implications of managing a program multiplex will ripple back through every aspect of the content creation process. Major content producers will be expected to deliver not only the program content, but ancillary data about the program that enhances the viewing experience, and thus the value, of the asset.
The concept of digital asset management extends into every aspect of digital television, from concept to final delivery of content. This section will provide a quick overview of asset management and the role that it may play in the future of digital television.
Managing the Content Creation Process
Virtually every aspect of the processes we now use to create and distribute video content can be considered an asset management challenge. An asset is the actual piece of content to be tracked and catalogued. Thus, a digital asset may be a computer file that is or contains the content to be managed. When we think about the entertainment industry, we tend to think about assets as a movie, video program or commercial (in other words, a completed project) that has been digitized. In reality, any piece of a project--a frame, a cel, a scene, etc.--can become a digital media asset.
It is easy to compare the digital asset to the physical assets we work with every day.
In some cases, however, it may be difficult to actually digitize the asset, as with the costumes and props that fill up warehouses around movie studio lots. Even in this case, however, the physical asset can be managed by maintaining information about it in a database--text descriptions, photographs or the original computer generated drawings used to create the physical asset.
In most cases, there is a wealth of information about the digital assets we work with that is less tangible. We call this metadata--the data that describes the data. Metadata describes attributes of an asset that may be needed for further processing, or that may be of interest to someone who is consuming the final product.
For example, an attribute could be related to the digitized audio or video file:
Or, it could be related to the technical characteristics of the data:
Or, it could relate to the presentation of the content:
While all of this information is available at some point in the production process, it has been difficult, if not impossible, to keep track of it. Most of this data is discarded along the way, or stored in proprietary file formats that are meaningless, except to the computer that created the file.
The notion of appending metadata to the digitized asset has been enabled, in part, by the migration of virtually every aspect of the content creation process to computer-based tools. The next step is to build the links between these tools that allow the metadata to be appended to media files in a way that is virtually transparent to the artists and technicians who produce the content.
In addition to the need for computer-based tools, there are two important prerequisites that enable such an approach to the re-engineering of the content creation process:
1. The ability to network all of the systems that create and process the data together so that it can be shared; and
2. An asset management system that provides the framework for entering information about, and managing the digital assets.
An asset management system is a group of software applications and subsystems that work together to form a complete system for the customer. This could contain multiple client/server applications, database systems, as well as many different types of hardware: PC, Mac, SGI, media servers, and networked video processing devices.
It is not the intent of this section to fully describe the functionality of an asset management system. Suffice it to say that this is an area of intense product development activity among vendors of computer software, hardware and networking products, and film/video production systems.
Metadata: The Digital Vertical Blanking Interval
The analog television systems we are replacing dedicated a significant portion of the available bandwidth to synchronization signals contained within the vertical and horizontal blanking intervals, when a CRT display is resetting the scanning beam to start a new line or new video field. There is no vertical interval (or horizontal blanking) in a digital video transmission; these are issues to be dealt with in the local decoding and display hardware.
The display system is now decoupled from the transmission system, and the format that will be used locally to display the content; the video decoder/display processor must accommodate all forms of content to the local display environment; it is highly likely that the display processor will generate visual objects locally and compose them with streaming audio and video content.
Even as the MPEG-2 video coding standards, upon which virtually all digital television systems are based, begin to be deployed, work is nearing completion within the MPEG-4 committee on a standard to allow elements of programs to be composed in the television appliance (including a video or still image background and multiple visual objects or sprites). This work is converging with the rapid growth of open Internet standards to redefine the expected role of digital television appliances.
We can now begin to think of a digital television receiver as a computer with an Internet address. It will be able to receive content customized for viewer preferences or to which a viewer subscribes. In-home digital media servers will be able to filter information from a broadcast data multiplex and store it for later consumption. In other words, it may soon be possible for viewers to subscribe to content of interest and consume it on-demand, rather than synchronous with the transmission.
Broadcasters will provide a variety of "on-demand" services to their communities, both through their DTV channel and Internet Web servers. News, election returns, sports scores, the local weather forecast, program guides, city directories, restaurant and movie guides...The server that feeds this information to the Web will also feed the DTV data multiplex, periodically updating information stored in the digital media servers of the "viewers."
The traditional assumption that television broadcasting is a one-way medium incapable of delivering interactive services has been rendered meaningless by the shift to digital technology. Consumers may have several back channel options for interacting with television broadcasts--telco, cable and wireless. More important, however, they may not need or want any back channel to consume interactive services delivered as data through the DTV channel. The ability to store broadcast data locally, in an information appliance, makes it possible to deliver interactive applications in much the same way that the Internet currently broadcasts data to servers all over the world.
For example, using the full bandwidth of a 6 MHz DTV channel, a broadcaster can deliver 72 megabytes of data in 30 seconds. This data may include audio, video and graphic objects that are combined in the receiver to create a traditional linear television commercial--it may also include the elements of a Web page to provide an interactive experience for the viewer, such as an electronic brochure. In other words, the digital broadcaster can provide virtually any service that can be delivered by any other data network; and a wired back channel can be used to support transactions, including on-demand data broadcast services.
Perhaps the most important implication, however, is that broadcasters have sufficient bandwidth to deliver high quality audio and video along with these new interactive services. Equally important, they can do this in a totally non-invasive manner. The viewer can choose whether they want to send any information back, essentially building a privacy firewall between the viewer and the service provider. This level of privacy does not exist when a consumer connects to an Internet Web site, as a record of the visit can be recorded by the Web server.
The bottom line is that DTV is likely to evolve into an entirely new medium. As with the transition from radio to television, this medium will require a new business model; the approach described at the beginning of this chapter, a business model that leverages the ability to broadcast data.
Managing the Broadcast Data Multiplex
The best way to look at this model is through the management of the data multiplex. There are three general categories of data that may be included in the multiplex: programmed, periodic and opportunistic (see figure 1: the broadcast data multiplex).
Programmed data looks the most familiar. These are bits that a broadcaster contracts to deliver at a given point in time--for example, a free and clear program, perhaps simulcast on an NTSC channel, provided to meet the broadcaster's public service commitment. Programmed data is isochronous or realtime. These packets must arrive on-time or they are useless.
There are two variations on MPEG-2 encoding of realtime video/audio programs; fixed bit rate and variable bit rate. The choice can have a significant impact on quality of the video that is delivered and the management of the data multiplex.
Entropy coding techniques, such as those used in MPEG-2, produce a variable amount of data, based on the information content of the pictures that are being encoded. With fixed bit rate encoding, as the name implies, the encoder attempts to maintain a constant bit rate. It achieves this by varying the level of quantization, sometimes producing visible compression artifacts when the information content of the pictures is high.
Variable bit-rate coding attempts to maintain a constant level of picture quality by keeping the level of quantization fixed, and letting the bit rate increase with pictures of increased coding complexity. Typically, the decoder is set to operate with an average and peak bit rate in mind.
Encoding video for release using the DVD formats provides a good example. The average bit rate is typically determined by the length of the program or movie being encoded--total capacity divided by total duration determines the average bit rate target. Peak bit rate is established by the peak transfer rate for the DVD disc--rates vary with single and dual layer discs. In digital broadcasting, an example of peak bit rate would be 19 Mbps peaks in HDTV programs encoded for a DTV channel.
In DTV applications, variable bit rate coding offers a potential quality-of-service advantage by delivering consistent picture quality. It also offers a potential business advantage by maximizing the revenue produced by a DTV data multiplex. For example, a broadcaster that carries two programmed services can set the peak bit rates for each so that they do not exceed the 19 Mbps available. When they operate below these peaks--which is most of the time--any data packets left over can be used for periodic and opportunistic services.
Unlike the isochronous nature of video programs, periodic data can be delivered in an asychronous manner--when it can be fit in. A good example of periodic data is the Teletext service that is delivered in the vertical interval of PAL broadcasts in Europe.
Assume a broadcaster chooses to provide advertiser supported news headlines, sports scores, weather maps and forecasts to viewers through a Web site and their DTV channel. When a DTV receiver tunes to the channel, it will receive a program map that indicates all of the services feeding the data multiplex. The receiver can set up a memory buffer to accept periodic data identified in this program map. This data is inserted in the multiplex periodically to update information and serve new customers who are acquiring the channel. Once in memory, this information will be available to viewers on demand (full screen), or it can be displayed continuously on an unused portion of the screen (a window), or as a program overlay.
The rate of update for periodic data becomes a variable, which is factored into the software managing the data multiplex. For example, weather maps may only change every hour and be refreshed every five to ten minutes for new viewers. Sports scores may be updated as they are received for games in progress.
Periodic data can also be used to provide other new revenue streams. For example, a broadcaster could deliver movie guides for local theaters, restaurant guides and printable electronic coupons.
Like programmed data, periodic data can be sold and scheduled, however, due to its asynchronous nature, there is some flexibility in delivery time.
Opportunistic data has similar characteristics to periodic data. The major difference is that it may not be something that can be scheduled, or it may be data with a lower priority and thus may be sold at a lower rate. In either case, it will be delivered on a space-available basis.
A good example of opportunistic data would be a paging service. The message size is small and thus easy to squeeze into the limited residual packets that are left over; and there is some latitude in delivery time. Another good example is the delivery of routed data packets to wireless information appliances, for Internet type services--an appliance of this type may use a back channel to request data packets, or it may simply filter the data carried in a DTV channel, looking for information to which it subscribes.
By Lou CasaBianca
Copyright 1998 by Lou CasaBianca/Media Asset Management Association
This primer explains how to produce more creative content and generate higher profits from strategic management of existing video libraries ("media assets") and digital media-related projects. It proposes the review of a suite of business-process re-engineering models designed to help support the possible leveraging of digital production and workflow data by-products of work-for-hire projects into internal metadata and commercial media assets. Although this section may at first appear to be jargon-heavy, it's important for video professionals to become familiar with these terms.
Although media assets (video and film clips, stock footage, still images, audio content, etc.) can generate increased revenues in their own right, the metadata (data about data; see Metadata And Content later in this section) may in fact prove to be the most valuable element in this digital continuum. The insight from management of this knowledge can be used to increase ROI (return on investment) and margins for your company while also generating intellectual property rights that independent or entrepreneurially minded creative professionals can re-use, sell or license into the commercial, entertainment and/or educational media markets.
For film, video, DVD and Web content creation and production companies and post production facilities owners and managers, this primer can help organize a strategy for increasing the valuation of their business. It can be used to show potential investors and business partners how cataloged media assets and intelligent studio workflow methods accelerate content-production cycle time, increase revenue per employee and increase business valuation above traditional ROI business model annual revenue levels.
For creative and production professionals, this primer lays out an approach to maximize their time spent in creative activities and leverage the content and data sources they've already produced. They can improve the quality of their work through better content creation and production management practices and tools, generating improved efficiencies and content assets to create value in additional revenues and higher profits.
Introduction
For content-creation companies and video production service providers already involved in--or considering integrating--media asset management systems, this primer articulates possible strategies for maximizing ROI. We advocate systematically cataloging video and digital media files (produced in-house or in work-for-hire client projects) along with capturing and dynamically sharing metadata with the purpose of re-expressing content and leveraging the intrinsic knowledge in related business data in future projects. This means that video production companies, now integrating digital media and Web production capabilities, will begin to recognize the inherent value represented in their production projects and media vaults. Within every project, there are any number of media assets, ranging from company logos and color palettes, to audio and video clips that can be repurposed as creative content.
That same project may have other media assets, ranging from generic video and audio, to computer graphics and photography that can be re-expressed as cross-media content for use on the Web or on DVD projects. Additionally, metadata such as EDLs and compression tables, as well as such unstructured data as business proposals, budgets and competitive analysis, support the automation of knowledge-driven apps used in managing enterprise-wide marketing, brand and sales-automation systems.
In the development of original media for content-creation companies, we emphasize organizing an environment that allows designers and editors to participate as stakeholders in the upside potential from the re-expression of assets and optimization of workflow strategies in other departments and divisions.
We also underscore the development of strategies designed to support retaining appropriate intellectual property rights for select audio, video and digital media files, having first negotiated with a client or content provider and later selling the idea of a license to use these assets to new clients.
Competitive Strategies
At the center of media asset-driven business strategy--in other words, how to make more money from content management--rests the systematic re-use and re-expression of new and pre-existing media. To execute this strategy, the creative firm will need to invest in technology and set procedures and policies for media asset management.
Together with technology and set practices, a media asset management system helps the creative firm perform the following tasks:
Activity-based research from Gistics, Inc. shows that a media-producing firm or heavy media-using enterprise (e.g., broadcast production company) can realize an eight- to 14-times ROI over its first three years using a media asset management system. For workgroups, a media database can save hundreds of hours per person over a three-year period, resulting in significant net savings and increased production cycles. These kinds of returns now make investment in a media asset management system a fiduciary responsibility. In effect, to ignore or delay deploying some form of media asset supervision could constitute a breach of management responsibility.
Bottom Line
Re-use and re-expression of content means that creative firms can deliver client work significantly faster at marginal costs. Getting more work done faster produces several key benefits. It enables a creative firm to complete more projects, which adds up when calculating a year's worth of projects. Increases in productivity can translate into additional completed and billed projects--a source of revenue without adding more people.
This means that at the end of the year a creative firm has turned more projects per person than the competition, generally earning more profit for those projects. Getting more work done per person earns higher profit margins, maximizing the output for capital equipment and labor--traditionally the two highest costs of doing business.
Conclusion
The strategic application of media asset management systems builds shareholder value, showing investors and potential corporate partners how re-expressible media assets and leveraged metadata can help create competitive advantage in delivering faster cycles, lower costs, license revenues and boost valuation of their companies.
Media asset management systems can help content businesses leverage assets across the enterprise and service providers to transform work-for-hire projects into asset-building processes. Although some clients will insist on retaining and controlling any rights associated with their content, many clients will readily license pre-built media "elements" rather than pay significantly higher prices to commission development of their own content.
For many creative firms and production service bureaus, these systems will quickly become legacy systems, huge "digital" warehouses containing terabytes of data, video and other digital media content. Failure to develop an informed strategy to handle this technology and content migration will eventually force many of these firms into costly re-engineering efforts. This implies that business owners and managers take the time to develop a comprehensive, long-range media asset strategy and invest in a complete solution that will integrate with current systems and scale up in capability into the future.
Most media asset management vendors have moved to systems that use Web browsers to allow users to access "proxies" of high-resolution video, serving as a preview and "offline" creation mechanism. Eventually, editors and producers will edit program content using these proxies on internal intranets, enabling them to view, analyze and develop programs online. Media asset and metadata management will play a pivotal role in how advertising, brand management, broadcast and video service bureau companies find and serve customers.
Next Steps
The rapid emergence of the World Wide Web as an electronic commerce system will magnify the problems associated with inefficient media asset management approaches. Failure to master management of vital brand-related media and knowledge assets will place the enterprise in jeopardy. Media asset management, and the related aspects of digital brand building, will determine success in tomorrow's increasingly content-driven "wired" markets.
Acknowledgements
We acknowledge in particular the insight of the founding members of the Media Asset Management Association and the contributions of the co-founders of Gistics, Inc., James L. Byram and Michael Moon. For more information about MAMA, visit www.mamgroup.org> or contact Lou CasaBianca at casabianca@mamgroup.org.
By Lou CasaBianca and Christine M. Okon
It's not news that this digital age can be a very confusing one when it comes to understanding and utilizing the proliferating varieties of digital audio, video and other forms of "content." It's a situation that's making it harder and more expensive for companies--from TV and movie studios to advertising agencies and broadcasters--to manage and benefit all this "rich media" in terms of cataloging it, organizing it and making it quickly available for future access.
It is, however, essential that companies do exactly that. If so, they can not only extend the value of their media and information assets, but also gain a better understanding of their internal production processes so they can better manage their business relationships with customers and suppliers. The question is how to do this. The answer lies in metadata.
Metadata is "data about the data." It gives structure to the value of knowledge assets, which is vital for any profit-making organization--especially in this age of "repurposing," in which, for example, a movie's footage can be used to produce a "making of" documentary, a TV series, a home video or DVD, a print ad, a T-shirt or a toy; the possibilities are many.
From content creation right on through to distribution, managed metadata provides the continuity that streamlines workflow and creates new benefits. Invented by the same people who brought you the Internet, today's metadata research has its origins in "data mining," which is what the U.S. government called its early research in this area, extracting the value of patterns and trends out of huge quantities of dynamically updated complex digital media types.
Information is, however, only meaningful to the extent that the receiver or subsequent users can relate it to other information--in this case the related media, communications and data sources integrated in the "networked studio." The methodical process of distilling the value out of media and information labors under the constant threat of "information entropy," the loss of its contextual setting, or, in other words, the loss of its metadata.
For our purposes, information entropy occurs in two forms. The first occurs when metadata is either lost or not captured. The second subtler form occurs when metadata becomes separated from its source or loses its own internal consistency and descends into noise. Over time, organizational memory fades or becomes confused ("Was that clip from the original movie or the television series?"). The "half-life" of metadata can be measured in years (e.g., key political events) or sometimes in weeks or days ("Who is that in the photo with Mark McGwire?") and even in minutes or seconds ("Was that last broadcast bulletin about the storm in Florida or the Bahamas?").
Metadata can be implicit, depending on the user's own knowledge and semantic interpretation; it can be explicitly recorded, and even automatically captured. By mid-1999, dynamic capture and analysis of incoming "streaming" or off-air content--filtered by image, sound or keyword--will become the bleeding-edge for next-generation cable and broadcast media-cataloging systems. Why would a cable or broadcast network want to do such a thing? Well, think of multiple feeds being monitored by multiplexed intelligent catalogers that capture clips and browse collections to assemble programs. This technique can support associate producers in assembling programming elements from "live" and archival sources, freeing them up to work on qualitative and logistical issues.
Metadata Formats
Media-asset data derives from a combination of a company's experience and on-going consultation with a wide range of users. First and foremost in importance is identifying the essential elements and access paths required for the cataloging of any basic media type-from a still photograph to a television program. Once such elements are identified, asset managers can create sets of data fields for each media type.
The metadata format is the first domain to be defined for an integrated cataloging and management system. This forms the foundation and framework for all other development. Metadata generated or captured at the content-creation phase will almost always be more accurate, and can be more easily enhanced and modified at later stages of the workflow process.
Continuing through the production process, metadata can be expanded or added during editing and manipulation. For example, semantic descriptions can be added to automatically captured video time codes or image numbers. The harnessing of metadata to reflect continuity through the workflow process severely challenges most content-logging tool vendors and endusers, who must integrate multi-vendor solutions.
It may also be necessary to import and export information from other databases. To facilitate this, all data fields can be coded so that they can be mapped against both internal standards in production databases and other related
"legacy" data, as well as external standards. If this sounds confusing, don't despair; media-asset management may seem jargon-heavy, but it's based on common-sense ways of organizing information. The ability to share data provides increased efficiencies in the cataloging process, while facilitating wider applicability of the content and metadata across the enterprise.
The media-asset management process continues through transmission and distribution, where metadata--such as network protocols, delivery rates or aspect ratios--can be incorporated into applications. For example, a viewer selecting a digital cable excerpt of a television news story can generate new metadata in the form of user preferences to establish a customized broadcast experience. These preferences shape that viewer's customized content, and the demand creation-through-distribution cycle evolves. Customer satisfaction and user feedback also represent a type of metadata that take on new and potentially lucrative potential, when managed by brand managers and adapted to e-commerce.
Challenges
One of the ongoing challenges with managing metadata through the workflow process is the lack of interoperability protocols and inconsistencies with devices (recording, storage and playback systems) used. An overlapping set of metadata elements must be shared by most--if not all--tools. This can include names, descriptions and data types. Maintenance of this shared metadata becomes even more complicated because at various points during the production process, different tools may modify the same metadata element.
In part, this challenge has spawned the formation of several standards efforts. Standardization is not a new story. The arrival of the digital age, however, has created a sense of urgency for organizations and companies to work around the threat of information entropy. The ability to organize data and media as objects now makes it possible to dynamically bind or point to metadata. As a result, a whole new approach to data warehousing and re-expression has opened up. Among the ongoing and emerging standards efforts are the Motion Picture Experts Group's MPEG-7 and the SMPTE.
MPEG-7 (www.mpeg.org) specifies a standardized description of various types of multimedia information. The description associates itself with the content to allow fast and efficient searching for material of interest to users. As ever-increasing amounts of audio-visual information go digital, the need to access the related data requires that it be located first. At the same time, the increasing availability and requirements for timely metadata makes this search even more difficult. Currently, solutions exist that allow searching for textual information.
Currently, since no generally recognized description for identifying still and moving images exists, it's not possible to search across audio-visual content. In select cases, solutions do exist. Some catalogers and multimedia databases allow for searching for pictures using such characteristics as color, texture and information about the shape of objects in the picture. Additionally, similar capabilities exist for audio search, including voice recognition, voice-to-text keyword search and sound-pattern recognition.
Announced by the MPEG in October 1996, "Multimedia Content Description Interface," or MPEG-7, will extend the limited capabilities of proprietary solutions in identifying content that exist today, notably by including more data types. MPEG-7 will specify a standard set of descriptors that can be used to describe various types of multimedia information. It will also standardize ways to define other descriptors as well as structures (description schemes) for the descriptors and their relationships.
To allow fast and efficient searching for material of a user's interest, the combination of descriptors and description schemes will be associated with the content itself. MPEG-7 will also standardize its efforts around DDL (Description Definition Language), as a language to specify description schemes. Audio or video material that has MPEG-7 data associated with it can be indexed and searched according to graphics, 3D models, audio, speech, video and data about how these elements are combined in a multimedia presentation.
The MPEG-7 standard builds on other (standard) representations, such as MPEG-1, -2 and -4. For example, a shape descriptor used in MPEG-4 could be useful in an MPEG-7 context as well. MPEG-7 descriptors do not depend on the ways the described content is coded or stored. MPEG-7 builds on MPEG-4, which provides the means to encode audio-visual material as objects having certain relationships in time (synchronization) and space (on the screen for video). MPEG-7 will allow different granularity in its descriptions, offering the possibility to have multiple levels of differentiation.
The SMPTE
The Society of Motion Picture and Television Engineers (SMPTE--www.smpte.org) and the European Broadcasting Union (EBU--www.ebu.ch) released their latest proposal in July of 1998 called "Harmonized Standards for the Exchange of Television Program Material as Bit Streams." To attain its objectives, a joint task force comprised of members from both organizations divided its work among six separate sub-groups, with each responsible for a segment of the investigation. These sub-groups work in the following categories: Systems, Compression, Wrappers and File Formats, Metadata, File-Transfer Protocols and Physical Link and Transport Layers for Networks.
Wrappers and Metadata Summary
Leveraging the work that was already underway within the computer industry and within its own organization, SMPTE decided that the metadata requirements could be addressed through the creation of a Metadata Dictionary and a number of formatting standards, all maintained through a registry mechanism. The Sub-Group Request for Technology (RFT), to develop the Wrappers' requirements, generated responses ranging from discussions on specific items--such as Unique Material Identifiers and Frame-Index Tables for use inside Wrappers--to complete solutions for specific applications such as multimedia-presentation delivery. It also included the specifications for data models and container formats in use today in the industry, within multiple products.
This effort addresses the formatting of collections of audio-visual program material and related information for exchange within and between studios and other centers that process or store that information. The goal is to improve interoperability independent of the encoding format for the audio-visual signal through all stages of the production process, from pre-production through distribution and storage. The concept of a wrapper is defined as containing both the "essence" (i.e., audio, video or data--if the "program" is a "data feed") of content as well as the definition and description of its structure, (i.e., its metadata). The task force is advocating the establishment of a single registry of metadata identifiers and definitions.
OMF and OMM
The Open Media Framework Interchange (OMF--www.avid.com/3rdparty/ omm), promoted by Avid Technology and a group of industry partners, serves as a standard format for the interchange of digital media data among heterogeneous platforms. A full OMF interchange file defines compositions (all the information required to play or re-edit a media presentation), media data (what is played) and a source reference (e.g., videotape, time code). Like other standards efforts, the goal is to link metadata with the media to sustain continuity throughout the entire (and mostly heterogeneous) production process.
Avid and its partners announced the Open Media Management (OMM) initiative at the 1998 National Association of Broadcasters convention. The alliance seeks to make media management more efficient for content creators by linking content-creation tools with digital media-management systems. This metadata integration will help users access and manage shared media assets during the creation process while attempting to provide the flexibility and compatibility that comes with an open environment.
OMM defines an Application Programming Interface (API) that connects Avid systems with leading asset-management software. This will allow users greater availability and faster access to valuable digital assets, improved collaboration and lower management costs, while providing a more open interoperable system. The consortium plans to release the first version of the OMM specification very soon. OMM-compliant applications and asset management systems will follow.
Essential Elements Of Metadata
The specific elements of metadata vary widely among various asset types, but they fall into four general categories:
History Information about how the asset was acquired, processed and used. Dates are essential, and include shooting dates, recording dates, editing dates, transmission dates, repeat dates, review dates and archiving dates.
Ownership Information about copyrights, licenses and other constraints on the asset's use. This can include, for example, publisher details, the terms by which a stock house, say, is providing a given clip. And there may be restrictions on use, such as copyright, territorial and other use-rights-related data.
Technical Information about the format, size and location of the asset. Included in this is what's known as engineering metadata, the physical make-up of the item, e.g. its video standard, recording medium, sampling frequency or compression format. Then there's duration data, including overall interval of segments and details of timings within items. Housekeeping data can include labels, shelf locations, supplier's codes and availability notes. Item numbers, meanwhile, can include both in-house and standard reference numbers, e.g. accession numbers. There is also a trend in which capture devices (digital cameras, scanners, etc.) automatically generate metadata (e.g., time code, digital watermark, exposure, etc.).
Content Information about the content of the asset-titles with notations about subcategories for series, programs, commercials, bumpers and so on. This can include: the type of content, such as its genre (e.g. news, documentary, situation comedy); names, including all contributors, artists, performers, interviewees and production staff; descriptive text, typically a large field for free-text description of the content of the item; and/or subjects, which includes classification numbers, keywords, names and titles as subjects.
Summary
The capture and analysis of metadata within the studio will drive the capabilities of the studio's systems and operation policies. This will require that analog systems be supplemented or converted to digital infrastructure to store, transfer and track metadata. As the initiatives outlined above reach maturity, they will deliver the ability to extend existing studio infrastructure to intelligent environments, facilitating the management of realtime metadata at the required qualitative and quantitative levels.
Any explicit metadata extends the "beat-the-clock" nature of information entropy. Over time, metadata represents an organization's most valuable information. Future systems will become increasingly sophisticated to the degree that they'll be convenient for non-technical people involved with the content-creation and management process. This will lead to greater capability through the closer involvement of those skilled in the production of the content itself. Changes in these skill sets and systems will permit operations personnel to more effectively manage assets, which will eventually allow them to make decisions automatically. Metadata will prove to be one of the most valuable applications of computers in the studio, with open-ended implications for the content creation and broadcast systems of the future.
Acknowledgements
Many thanks to co-author Christine M. Okon, VP of Business Development for Arriba Software, Inc. Particular acknowledgment is made to the work of S. Merrill Weiss, Co-Chairman for the SMPTE and to Horst Schachlbauer, Co-Chairman for the European Broadcasting Union (EBU), in developing the SMPTE Task Force for "Harmonized Standards for the Exchange of Program Material as Bitstreams, Final Report: Analyses and Results" (July 1998).
The Mission of MAMA
The Media Asset Management Association (MAMA) serves as a advanced user-group and independent international industry consortium, created by and for media producers, content publishers, technology providers and value-chain partners to develop open content and metadata exchange protocols for digital media creation and asset management. The Association's mission is to produce user-driven, voluntary, best practice-based protocols and business-to-business communications conventions in the following strategic areas:
The MAMA's goal focuses on documenting and publishing open voluntary interoperable media-asset metadata file exchange protocols. The MAM Framework serves as a user base for actual ROI (return on investment) best-practice models.
The membership has defined the development of a shared MAM Lexicon technical/creative language as the focus of the group's initial end-user standardization activities.
Founding user forum members are ABC Network, Capp Studios/Leo Burnett, Digital Roadmaps, Discovery Communications, Inc., The Walt Disney Company, DreamWorks SKG, Gistics, i5Group, Lexichron and Sony Pictures Entertainment.
Founding technology forum members include Apple Computer, Archetype, Bitstream, Bulldog, Cinebase Software, Content Group, Excalibur, IBM, Informix, Islip Media, Magnifi, Oracle, Silicon Graphics, Sun, Virage, WAM!NET and WebWare Corp.
For more information visit the MAMA Web site at www.mamgroup.org.
[an error occurred while processing this directive]