DAM

1.5.6. Indexing

Here's a short overview about indexing (needs to be extended)

Detection of file changes

Using file checksums allows the detection of externally renamed file or other unexpected file changes.

Silent indexing

When a new file occurs in the File module it will be indexed silently and is available in the DAM instantly.

Auto indexing

Index setups can be created with the Media>Tools>Indexing Setup module and stored in folders to define the auto/silent indexing.

metaExtract services

The meta extract services are used to get meta data from different file types.

Selection trees for indexing time and file status

Two new selection trees allows the selection of files from the current status and the last indexing times.

Update status module

With the Tools>Update status module the whole index can easily be checked.

Cronjob script and module for configuration

A module for setting up an indexing configuration and a script

Mime Types and Media Types

While indexing DAM tries to detect which file type, mime type and media type the file has.

file type:
This is typically the same as the file suffix like
Examples: pdf, jpg,txt, ...

mime type:
The mime type is used to categorize file types. This system is widely used but is very technical.
Examples: image/jpeg, audio/x-mpeg, application/pdf

media type:
This DAM categorization overlaps with mime types but tries to be more human orientated.

Following media type categories are used in DAM:

undefined
text
image
audio
video
interactive (interactive media like flash)
service (reference to a service: URL)
font
model (3D model)
dataset (data like spreadsheets or database files)
collection (archives like zip)
software (executable binaries)
application (files that can't be handled without specific software in any way)
 

An excerpt of the mime type list (http://www.iana.org/assignments/media-types/) is used in DAM. Unfortunately is that list not exactly what is intended for DAM. For example 'pdf' is defined there as

application/pdf

and not as

text/pdf

Technically this may be correct but from a categorization like DAM use one would expect it of type 'text'. And that's how DAM has an additional internal list defined. Btw. Open Document formats are also defined as application/* in the mime type list not text.

Detecting the types

Here's how media/mime type detection works (example.pdf):

- Get the file extension, This is 99.999% correct in my eyes so I trust this information: 'pdf'

- If we have a file extension use an internal array to get a mime type for: application/pdf

- if the internal list does not know the type mime_content_type() is used to get a mime type. Now the detection depends on the servers setup. The mime type list is used by different applications like the webserver (apache) for example. Often there's a shared mime type list, but not always. So mime_content_type() is not reliable. That's why an own list is used.

- if we still don't have a mime type or there's no file extension use the 'file' command to detect these

Now following types are known

file_type = pdf
file_mime_type = application
file_mime_subtype = pdf

An internal array is used to get the media type (text, image, ...) for a file type. This overrides the mime type and makes from an pdf of file_mime_type 'application' the media_type 'text'. If that fails the file_mime_type is used.

Customization

The following function can be used to define new types.

tx_dam::register_fileType()

That way the internal lists can be customized easily by own extensions.

The internal arrays itself are located in dam/lib/tx_dam_types.php and can be modified too but may change in the future.

To top


Valid XHTML 1.0!