DAM
1.5.6. Indexing
Here's a short overview about indexing (needs to be extended)
Detection of file changes
Using file checksums allows the detection of externally renamed file or other unexpected file changes.
Silent indexing
When a new file occurs in the File module it will be indexed silently and is available in the DAM instantly.
Auto indexing
Index setups can be created with the Media>Tools>Indexing Setup module and stored in folders to define the auto/silent indexing.
metaExtract services
The meta extract services are used to get meta data from different file types.
Selection trees for indexing time and file status
Two new selection trees allows the selection of files from the current status and the last indexing times.
Update status module
With the Tools>Update status module the whole index can easily be checked.
Cronjob script and module for configuration
A module for setting up an indexing configuration and a script
Mime Types and Media Types
While indexing DAM tries to detect which file type, mime type and media type the file has.
file type:
This is typically the same as the file suffix like
Examples: pdf, jpg,txt, ...
mime type:
The mime type is used to categorize file types. This system is widely used but is very technical.
Examples: image/jpeg, audio/x-mpeg, application/pdf
media type:
This DAM categorization overlaps with mime types but tries to be more human orientated.
Following media type categories are used in DAM:
undefined
text
image
audio
video
interactive (interactive media like flash)
service (reference to a service: URL)
font
model (3D model)
dataset (data like spreadsheets or database files)
collection (archives like zip)
software (executable binaries)
application (files that can't be handled without specific software in any way)
An excerpt of the mime type list (http://www.iana.org/assignments/media-types/) is used in DAM. Unfortunately is that list not exactly what is intended for DAM. For example 'pdf' is defined there as
application/pdf
and not as
text/pdf
Technically this may be correct but from a categorization like DAM use one would expect it of type 'text'. And that's how DAM has an additional internal list defined. Btw. Open Document formats are also defined as application/* in the mime type list not text.
Detecting the types
Here's how media/mime type detection works (example.pdf):
- Get the file extension, This is 99.999% correct in my eyes so I trust this information: 'pdf'
- If we have a file extension use an internal array to get a mime type for: application/pdf
- if the internal list does not know the type mime_content_type() is used to get a mime type. Now the detection depends on the servers setup. The mime type list is used by different applications like the webserver (apache) for example. Often there's a shared mime type list, but not always. So mime_content_type() is not reliable. That's why an own list is used.
- if we still don't have a mime type or there's no file extension use the 'file' command to detect these
Now following types are known
file_type = pdf
file_mime_type = application
file_mime_subtype = pdf
An internal array is used to get the media type (text, image, ...) for a file type. This overrides the mime type and makes from an pdf of file_mime_type 'application' the media_type 'text'. If that fails the file_mime_type is used.
Customization
The following function can be used to define new types.
tx_dam::register_fileType()
That way the internal lists can be customized easily by own extensions.
The internal arrays itself are located in dam/lib/tx_dam_types.php and can be modified too but may change in the future.