Participating Frequently

Question

Please explain Bridge's file/metadata management (incl. Collections, Smart Collections)

Forum|Forum|8 years ago
October 28, 2017
3 replies
4186 views

I searched google and Bridge help file in vain (I even looked into some Bridge-specific book, without getting answers); if there are extensive explanations to my questions, please refer me to the relevant deep links, I'll then ask here again for additional info. (I'm interested in Bridge-Windows here; if there are differing details for Bridge-Mac, please explain them, too, in order to make this thread fully informative for everyone.)

From what I understand so long, from third-party info which may be wrong:

- Bridge does NOT have a database (like LR has), so this would mean any metadata is within (ie read from and written to) the pic/media files (in which formats? ITPC? EXIF? Others?)

- Bridge reads and processes the file-system folders, so it's an alternative file manager, just with lots of additional functionality for photo/media-files management

That's why I'd like to know (additionally to the questions above):

- Does Bridge maintain file/folder/path lists in text form (since there is no database?) for handling the info WHERE "things are stored"/"things are to be searched"? (I'm addressing the problem of additional drives not always, but only sometimes being connected, here, and the problem if files stored on these, currently-non-connected, drives are "virtually-available" in any basic form (filenames/paths? (in textfiles/XML?), or even thumbs? (so there would be a db, though, or does Bridge maintain special folders within the main drive which then would have to be connected anytime, for full functionality-upon-everything?)?

- As before, but for their metadata; in other words, can currently-physically-unavailable files be searched-for by (all/some of?) their metadata, since that metadata would be listed (in textfiles? (since there is no db, "they" say)) somewhere within Bridge?

- Collections vs Smart Collections: What are Collections? Just all the entries of some specific folder/sub-folder?

- Collections ditto: Since there is also "manual sort": How is this achieved? By a text file within that folder? Or by text files in some central Bridge (sub-)folder ((partial) real-data-tree tree-replication for a metadata tree, folder-by-bolder, within the file system?

- Smart Collections: From the info available to me, these are realized by Bridge by metadata, ie by tagging within the file-meta-data within the original files, and then by stored searches (stored within textfiles within the file-system, or within a db after all?) for these tags, so any Smart Collection will be rebuilt "live" whenever the user decides to display the Smart Collection again. So for the above question of currently-not-connected drives: Will files not currently available just be left out, or will they "displayed" with their filenames/paths (or even with some metadata?) in some text-only "thumbnail", or will they be "fully-available" with all their metadata, incl. some real thumbnail (and with some visual indicator that the full-resolution, original file is currently not available?)?

- Is there some replication of the file metadata within internal Bridge datasets? Effective management of Smart Collection would imply the necessity for such a thing (as above: textual metadata incl. "tags", or even thumbs), but the available info that Bridge comes without a db would indicate there is no info replication, ie whenever the original files, within their embedded metadata, are currently not available, they cannot even be searched for, by tags (Smart Collections) or other (more technical, classic) metadata.

- The same problem, as above for displaying Smart Collections (which are just stored searches then) and on-the-spot searches for any specific (single or combinations of) metadata, arises for any bulk-changes of metadata, and even for bulk-changes of filenames: Without data replication, anything not-physically-available at the very moment of the bulk-change will be excluded from those changes? And so there will arise naming-inconsistencies? All because there is NO db or other data replication-within-text-or-XML-lists? If there was data replication, functionality could be implemented which would store the changes-to-be-made within db records or text lists, and then any such necessary change waiting to be made physically, could be made whenever the drive in question is connected to the system wherein Bridge and the replica-metadata is installed.

- Finally, if there is really no data replication, and which would mean that any Bridge functionality can only be applied to the files which are physically available currently: Isn't it quite time-consuming if, let's say, 1 million physical files, spread over perhaps half a dozen external physical drives or more, for any metadata (on-the-spot or stored: "Smart Collection") search, have to be searched, one-by-one? (Considering that it's understood that (indexed) MFT searches are incredibly fast indeed, which would be of great help with tagging-by-coding within the filenames, but that on the other hand, pic metadata (IPTC, EXIF, other?) is not listed within the MFT, so that any tool which relies upon this info, but without replicating anywhere for immediate access, will have to access all files one-by-one, in order to first extract, and then only being able to begin to process the relevant metadata info.)

I fully understand that data replication comes with its own set of multiple problems, but since available info is that Bridge comes "without a db" (which also implies that is doesn't maintains data replicatation by other means (textfiles, XML, other?) either), I would like to know how Bridge does organize my data at the end of the day, in order to not develop any wishful thinking about functionality which simply cannot be there by its conceptual design.

Last but not least, I suppose it would be a good idea to have, within a second step, the answers to my questions above (and perhaps to additional ones, within the same line of discussion) in prominent position within the Intro part of Bridge's help file, since in order to efficiently use a data-management program, ie in order to optimally chose the "architecture" of their data set as a whole, the user should and would need to know how all the data, and metadata, is organized internally and physically.

bridge metadata

This topic has been closed for replies.

Stephen Marsh

Community Expert

Note: I believe that this is mostly a user to user forum, although Adobe employees sometimes answer questions, this is not always the case and the nature of your questions are more inline with a software engineer than support.

- Bridge does NOT have a database (like LR has), so this would mean any metadata is within (ie read from and written to) the pic/media files (in which formats? ITPC? EXIF? Others?)

A: Bridge is a file browser, that uses a cache to improve performance and also offers features as you mention (Collections & Smart Collections). I personally would not call these features a "database" in the same context as Lightroom's Catalogue.

- Bridge reads and processes the file-system folders, so it's an alternative file manager, just with lots of additional functionality for photo/media-files management

A: Agreed

- Collections vs Smart Collections: What are Collections? Just all the entries of some specific folder/sub-folder?

A: as per my previous reply #3 from 29th October.

brs27390596Author

Participating Frequently

Hi, Stephen,

In fact, it would be preferable to answer my questions / comment on my remarks/allegations (which may be mistaken in part) in contra-order of my posts above (3-2-1, not 1-2-3) since indeed, some of my initial questions I could answer myself later on.

I hope that some Adobe official (or some user with deep technical knowledge of Bridge) will enter the discussion since I'm not a software engineer but a technical journalist (and a photo amateur), and my own tries with Bridge have not been really convincing while at the same time I have to admit that I left the try very early on: The installation files were very near 1 gb (compare that with your, implied, assertion that Bridge was just a file manager from which we shouldn't ask what a file manager isn't able to give), the installation was a monstruosity, and then I had a real psychological problem with the license and other "Adobe account" questions with regards to my data, so I tried to de-install Bridge before really trying it thoroughly, and that de-install went wrong (perhaps my fault?) to such a degree that I had to re-install Windows and all my other programs in order to make Windows and my system stable again.

That's why now, when I'm going to compare Bridge with other DAM tools, I'd like to get the necessary info the "easy" way, ie without having to install and then to quasi-reengineer Bridge in order to know how it works. Also, it is known that while Bridge is free and its (serious) competitors are not, Bridge seems very slow in parts, from what people in the web say, and I try to dissect why this may be the case and, hopefully, how that can be overcome, by smarter decisions re how to use Bridge:

As said above, probably it would been a good, initial, idea to only use Smart Collections, avoiding Collections, but from your SML examples above, for which I'm very grateful, it seems that Smart Collections are internally organized as Collections, too, ie in "endless" textfile lists which then will have to be searched/scanned one by one, the partial results then being combined (by addition or substraction) which, if true (no indexes?) will take some time of course for building up the results list from within really big file sets (which I don't have for real on a system where I could re-install Bridge again just for trying out seriously; also, XML use could indicate not multiple textfiles but one monster XML file so I should learn more about XML before making final assertions upon the data gathering, but it's known that XML is much slower than a well-designed db indeed, and it's obvious even now that the ubiquitous (third-party) web info "Bridge isn't/doesn't use a db but is a file manager for pics, etc." or similar is totally misleading since from the above, it's obvious that Bridges FUNCTIONS as if had a db as its back-end, it just uses (technically inferior and much slower) XML files instead of the db you'd expect from a DAM.

If this sounds like a rant: it isn't: I'm just trying to understand and to (finally, correctly) describe, but as an aside, LR is said to be very slow, too, and I myself once had bought, and then quickly shelved, LR 3 which was unbearable slow indeed (on my then quite slow-processored computer I admit) - the irony being that LR using a db (SQLite, as said). But then - and I hadn't been aware of this fact then -, LR speed problems also depend, and probably for its better part (but not entirely, it seems from what users say on the web), on the size you want the previews (it displays instead of the original files) to be in. (Unfortunately, technical info from Adobe itself, in the official descriptions / help files, for the user of both of their programs, is too sparse so that the user has to rely upon third-party info or Adobe info buried deep in these fora at best there, and which gives way to multiple bad workflow design decisions on the part of the user who often discovers by chance only how to organize their things better - I'm speaking of amateur LR users here of course, photo professionals using either program will try to be better informed, but this will cost them lots of effort then, because of the poor availability of technical info.)

From what I know up to now, it seems to me that Bridge was inferior to its competitors so that it would be a good idea to spend the $100-200 (often much more if server / multi-user) they cost, but then, I may be mistaken here again, so a list of Bridge's strong points (USPs) would be more than welcome. And again, what about thumbs of files on drives currently not being connected?

brs27390596Author

Participating Frequently

So I did some research today, and it's not really pretty.

To begin with, when I said "db" above, I of course meant a RDBM, a "relational" db; for reasons of simplicity, I'll continue to use "db"; also, I called myself an "amateur" re photography, the common English term would be "enthousiast", it seems. So:

Now that I've read some 3 dozen articles about XML or XML vs db, it's perfectly obvious for me that XML for Bridge has been it's original design flaw, causing quite incredible amounts of overhead of all sorts, its speed problems, and was a choice which was never justified by anything; it was just a fatal mistake, probably made in some brainstorm in the lines of "Bridge is a file manager [or so it saw itself at its beginnings], the file system is hierarchical, XML is fine for hierarchical data: bingo!"

Whilst in fact, it's fine for little amounts of hierarchical, more or less persistent data, and in which elements have to be retrieved by quite simple criteria, so it's at the very opposite of what Bridge's daily duties are and where all sorts of Boolean queries have to be made, for which db's are optimized, and btw, the "bingo" above was utterly illogical to begin with since Bridge, as a file manager, retrieves the original data in the file system "live", on run-time anyway, so there's simply no need to replicate the hierarchical structure of the original data within a parallel one.

So we have one (only one?) monster file in Bridge, for, let's say, 1 million original pic files, and which then contains 1 million "XML documents", all of them with the (current) path, and with (probably?) all available EXIF/IPTC/other metadata, as far as the pics/etc. in question contain such data: Where in db fields, there would be NULL values, in XML you just leave out the respective data entries; since for any available value, you need those openings and closures (as in html) though, XML files are regularly around 5 times bigger than the same info would be in (also-text) .csv format or in a db.

It seems you can index xml files (which Bridge possibly does?), but normally, the whole text must be parsed, again and again, for every search, and XML indexes have to be completely rebuilt after any single, even the slightest text change (even 1 single character more or less). There is an alternative with XML IN a db (in CLOBs or, much better, by shredding, ie by pre-parsing the values into db fields which then are indexed normally as any other db field, ie without the incredible XML-index overhead), or XML db's, special db's optimized and amended for such XML-in-db storage, but that's obviously not Bridge's case, and even then the question would remain why ever you would try to put pics' meta data into XML to begin with, XML being of some use for "document-centred" data which pic datasets are not, since "document-centred" use means you regularly need to process the data of ONE document, which is very different from needing the data of every document all the time, in order to create groups of documents, depending on those values, and which could be called "dataset-centred use".

References (grouped):

* When would I use XML instead of SQL? - Stack Overflow

* database - Using XML as data storage - Software Engineering Stack Exchange

(Both discussions are "closed", but as we all know, this is nowhere near a negative quality indicator, quite the contrary, since the owner of these sites, Mr. Spolsky, doesn't like discussions except for his own, intra-man ones: "Joel on Software", and so he has closed them almost all.)

XML versus Relational Database Performance | Native XML Database and the author's Matthias Nicola site in general: Native XML Database (not too much IBM-specific but of real interest for anyone trying to store considerable amounts of data in XML), and especially:

* 5 Reasons for Storing XML in a Database | Native XML Database which is of particular interest since it also gives some hints why people would try to store non-XML-ready data within XML to begin with.

Anne Williams: Performance of relational databases versus native XML databases (2005): Abstract: Performance of relational databases versus native XML databases where you can download the pdf: http://hdl.handle.net/10523/1200 - Results and conclusions on pages 30 to 46, with mixed results but it's a student's work, and on page 46: "It was realised late in the research that the ability for eXist to provide text-indexing and support full text searches could be a confounding factor but due to insufficient time to re-run the experiments in a different form this must to be left as a problem for future research." and: "Since this research is concerned with document-centric data, the performance with text-searches is a important aspect that should be further investigated."

XML database - Wikipedia

Nicola again: Native XML versus CLOB and Shredding | Native XML Database

And dozens of other articles, but the three articles I've asterisked fully back up my assertion that unfortunately, Bridge has made a bad choice by chosing XML storage over an RDBM in 2005. All the more important, my question, where are Bridge's strong points then, over its competitors?

Stephen Marsh

Community Expert

I don’t have the time to go through your various paragraphs now, however on the point of Collections (.filelist) and Smart Collections (.collection) – inspecting their source data one can see a URI path to the Collection file/s and the search criteria used in a Smart Collection:

Collection (Mac OS file path in red):

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>

<arbitrary_collection version='1'>

<file uri='bridge:fs:file:///Users/current-user/Desktop/myfile.png'>

</arbitrary_collection>

Smart Collection (Keyword Does Not Exist):

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>

brs27390596Author

Participating Frequently

It seems that we have 3 different categories: folders, Collections and Smart Collections; Smart Collections being rebuilt "live" from a preset search as described above, and Collections are preset lists, ie the users manually gathers the links, independently of any metadata, so this has been my fault above to not having thought of this conceptual difference. In fact we have 2 catagories, folders and collections, with 2 sub-categories within the collections, made by choice (manual gathering) and by search (automatic, criteria-driven gathering, the criteria being preset).

Also, it seems that Bridge stores extensive lists within a c:\user... subfolder subtree, for Collections (lists of full paths), for Smart Collections (lists of the relevant search strings in order to rebuild the Smart Collections upon any call of those search strings), and probably also for folders and Collections (probably not for Smart Collections), in order to maintain lists for the ordering of the list entries in the respective main lists; or then, this additional info is maintained, for non-folders, by coding within the main lists themselves (e.g. a tab, and then the order number in question).

So the quite ubiquitous third-party info that Bridge comes without a db is misleading since instead of a db, it just uses textfiles within a parallel folder tree structure in order to store this metadata, and third-party info that it stores "all" metadata within the files itself, is blatantly erroneous. (The advantage of lists within the file system, over a db, may be the easier "exportability" of that metadata upon export-of-it-all or of data subsets into some other pic manager.)

Also, I have found the info that whenever the original pics/files are not available currently, for a Collection or Smart Collection, Bridge displays the info "file not available" or something, but that does not answer my question re the thumbs:

Since Bridge "centrally" stores SOME metadata at least indeed, does it also store ALL files' thumbs centrally, so that those can be displayed even when the original files are not available?

And what about textual metadata and so on? Any data replication here, for speed and/or for availability reasons? For example, whenever you have a Collection, if the original file is not available, its full-path entry within the textfile is, so Bridge at least can list the missing file, but for a Smart Collection, or for any other text-metadata search, it would not even know there ARE "hits" for the Smart Collection or for that search, since the search string would just leave out the missing files (and their metadata which is only in those original files, then?) from its search?

Which clearly indicates, I think, the real need for ALL metadata being replicated centrally, in order to integrate files "non-connected-currently" at least within (Smart Collection and regular) searches, if not in bulk renames, bulk moves and the like; the same for the thumbs, the interest of them being available all the time being, of course, to facilitate the users' decision if they want to connect the non-connected drive in order to access a relevant then, or if those "finds" are probably irrelevant in their current search context.

And what's called "indexing" would be the building-up of index-files (in text form, within the c:\user... file system subtree, or within a db?) with ALL, or just with a subset (which kind of metadata is indexed ie replicated, which one is left out?) of all the metadata stored within the original files?

And where are stored the thumbs? Centrally (ie available all the time) or together with the original files (ie not available whenever the "originals" are not)?

brs27390596Author

Participating Frequently

Also, I see a problem with the Collections. Let me briefly explain the difference between tagging (Smart Collections) and virtual folders (Collections).

For tagging (Smart Collections), the info "belongs to" is coded within the file itself or within some representation of the file, or, for speed reasons, it's replicated from the file to some representation of it, from which it's then fetched for the gathering. For example, you have 2 tags "Persons" and "Person x". If then you gather everything about Person x, you do a (live or stored) search "tag persons" AND "person x"; as you see from my example, we're into hierarchical tagging if a simple search "person x" will suffice but which internally is probably encoded as "px": tag "persons", then "person x" within that parent category/tag; this way, multi-thousand "person x" can easily be retrieved, without this needing much storage, and if you then just want to get the pics where "person x" is displayed together with "person y" or "person z" and in their common holiday in "locality z", you do a combined, "Boolean" search which internally will probably be coded "((px and py) or (px and pz)) and lz", simplified to "lz and px and (py or pz)" (the parentheses being here just for clarification since "or" has precedence over "and").

For virtual folders (Collections), though, you have a file-system-textfile "px" which perhaps then lists 5,000 full-paths for person x, you have another textfile "py" which may list 3,000 full-paths for person y, and so on, and if you only use those Collections for "parent categories", like persons, you'll have to maintain a textfile "p" with probably 500,000 full-path entries, for any persons: You see that we have, at the very least, a quite incredible waste of storage needs.

Then, when you try to combine these categories, as in our tagging example above (since you didn't do all these "Collection" assignments as tags, too, or does Bridge do this internally then, automatically?), first: is this even possible? And second, if it's possible, it's quite incredible internal processing: First, thousands of entries/full-paths of the file px must be stored, then thousands of entries of the file py must be compared, one by one, to any entry in the list px; ditto then for all the entries of the file pz (compare to px), and the program must build a sub-list which only retains the px-py joins and the px-pz joins, and from this intermediary-target list, any entry then must be compared with any entry in the file lz.

The program may maintain some sort of indexes, in order to speed this up, but in any case, this "file-based" virtual-folder listing would cause lots of unnecessary processing, and that's because for virtual folders, the "belongs to" info isn't stored with the pic-files, but all their paths, for just one single such "virtual folder" (Collection), are listed together, as redundant info and info which only with difficulty could be shortened (you could of course build additional accordance tables for translating shorter path codes into the full-paths).

From the above, it seems quite obvious that users should avoid Collections and just do hierarchical tagging instead, in order to get immediate results even for Boolean combinations of "Collection" criteria, and in order to hold their whole dataset (incl. all metadata internal to Bridge) as slim as possible, except for the fact that perhaps Bridge organizes it all quite differently from my description above? Then: How is the internal organization that Bridge has adopted?

From the above, it becomes obvious at least that in order to sensibly decide how they will best organize their "stuff" in and within Bridge, from a user-system interaction point of view, the users need to know about this internal data organization, so they can make knowledgeable decisions.

And: Did I overlook some aspect when saying virtual folders should be avoided but replaced by hierarchical tagging? Of course, there should be "virtual virtual folders", ie (adjustable) preset simili-virtual-folder structures (adjustable "defaults"), for display of the lists/trees, but which internally would be built up on-the-fly, in real time, by hierarchical tagging.

And it appears obvious to me - hence my question what I may overlook here - that a db is the technical solution of choice for all this, all the more so since not only the db's SQL power becomes available, but that then also any necessary index is built up and maintained/updated automatically by the db engine (which in LR is SQLite; embedded Postgres would probably have been a smarter choice there).

Could it be that Bridge's internal "list" management just derives from its origins, many years ago, and that a total code overhaul, needed by the implementation of a db, has then be postponed up to this day, or does the avoidance of a db in Bridge comes with real advantages I don't see, and which would overcome some of the drawbacks of the db missing? (It goes without saying, I think, that a good db design would come with export functionality, at the very least into the ubiquitous .csv format, so that (meta)data exportability reasons would not apply here.)

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded