Copy link to clipboard
Copied
The Lightroom Catalog now has a sidecar file called 'lrcat-data', which I understand stores data calculated from AI operations such as masking and denoise.
Since the denoise feature was updated to store the resulting data in the catalog instead of a DNG file, the size of lrcat-data has ballooned to unreasonable size for many users, myself included. Given we can only expect this to grow, Adobe needs to give us options to better manage this, such as manually or automatically deleting this data from the catalog similar to how previews are handled. If the data is needed again it can be recalculated or loaded from sidecar files (storing it in sidecar files is already supported). I expect this is already in the works.
But there is another issue which I don't think has been raised, which is that the lrcat-data files is structured as some sort of blob archive and, at least for me, is very very slow to copy. You can reproduce this by copying the file on to a fast external SSD, and compare the transfer rate to copying files like large media files or even the lrcat file. The transfer rate is significantly slower for the lrcat-data file, compounded by the fact that it is very large. I believe this is because the file system sees it as a folder with many small files, which have much higher overhead when copying, but regardless of the reason it is very frustrating.
Most importantly this seems to affect Lightroom backups - currently backing up Lightroom takes an unreasonable amount of time for me, and I believe it is due to the slow copying of the lrcat-data file. This should be easily fixable by wrapping the file in a tarball or with any number of other solutions sutiable for backup purposes.
Please fix this, being able to back up our catalog files seamlessly is very important and should be made a priority, even if it takes longer to address the file size issue itself.
[Moved from ‘Bugs’ to ‘Discussions’ by moderator, according to forum rules. This is a real issue, but obviously not a bug.]
Copy link to clipboard
Copied
I'm in full agreement, this should be changed or at least the user should be given the option on where they would like that information stored.
Copy link to clipboard
Copied
The explanation for the slow copy is simple. The .lrcat-data may look like a file, but it is not. It is a so-called 'package'. A 'package' is a folder that MacOS shows like a file and can be double clicked like a file. To open it as a folder you have to right-click on it and choose 'show package contents'. This is not really a Lightroom issue. Packages are used on MacOS for lots of things. Your Lightroom previews and smart previews 'files' are packages as well, but so are almost all MacOS applications. Another example is the Apple Photos library. By the way: on Windows computers the .lrcat-data and previews are folders, because Windows has no packages.
The discussion that the .lrcat-data file balloons if you use AI Denoise (or Super Resolution) a lot is not new, and Adobe should definitely consider redesigning the catalog backup process. The main problem is not copying however, but zipping the backup. If you look at the backup dialog while the process takes place, you'll see that zipping the backup takes the most time by far. That means that you could adopt a strategy where you let Lightroom occasionally make a catalog backup (say once a week), and use a separate backup utility to make daily backups or a backup each time you worked in the catalog. I can clone my entire catalog folder (including 200,000 smart previews and previews) in one third of the time it takes to make a catalog backup using Lightroom! I'll use this until Adobe comes with a new backup system.
P.S. the advantage of a package is that backup utilities can copy only those things inside the package that have changed since the last backup, rather than having to copy the entire 'file'. That is why cloning a catalog folder can be pretty fast. Most backup utilities, including Apple Time Machine, work this way.
Copy link to clipboard
Copied
There are plenty of solutions that would enable faster copying of lrcat-data, they simply need to implement one of them, or allow backing up the catalog without that data. It is absolutely a Lightroom issue, the backup feature is practically unusable at this point, which is completely unacceptable. This is a separate issue from the size of the file.
You can't rely on Time Machine to back up your Lightroom catalog, because if it creates a snapshot when Lightroom is open and the file is being edited there is a very good chance the backup will be in an inconsistant state, and possibly unusable. Even if it makes a snapshot while Lightroom is closed, there is no guarantee that will be the snapshot that is retained - it only keeps hourly backups for 24 hours, if you need to restore after this you'll only have a daily backup up and that might be one that was made with Lightroom open and the catalog file unopened.
Of course I can copy the catalog myself or use a manually triggered back up utility or whatever, I posted this in the hope Adobe would address the issues with the built in backups. Any kind of database backup needs to be done properly and users should be able to rely on the included backup feature to acheive that. DIYing it when you don't know what you're doing is a recipe for data corruption and this is what will happen if users are forced to switch to other backup methods because the built in backups aren't working.
Copy link to clipboard
Copied
You can't rely on Time Machine to back up your Lightroom catalog, because if it creates a snapshot when Lightroom is open and the file is being edited there is a very good chance the backup will be in an inconsistant state, and possibly unusable.
Of course you can rely on Time Machine too (I use a different utility, however), because you can manually start a Time Machine backup any time you want. So close Lightroom, and then start Time Machine. Because that will be the last backup of the Lightroom catalog for that day, that is the one Time Machine will keep.
Copy link to clipboard
Copied
Of course I can copy the catalog myself or use a manually triggered back up utility or whatever, I posted this in the hope Adobe would address the issues with the built in backups. Any kind of database backup needs to be done properly and users should be able to rely on the included backup feature to acheive that. DIYing it when you don't know what you're doing is a recipe for data corruption and this is what will happen if users are forced to switch to other backup methods because the built in backups aren't working.
And I already said that I agree. My answer was not to argue about it with you, but to help you and other people who might be reading this, with an idea how to deal with this issue for the time being.
Copy link to clipboard
Copied
@Sam31717367fpld: "You can't rely on Time Machine to back up your Lightroom catalog, because if it creates a snapshot when Lightroom is open and the file is being edited there is a very good chance the backup will be in an inconsistant state, and possibly unusable."
You don't need to exit LR before making a Time Machine backup.
Time Machine backs up an APFS disk using a point-in-time snapshot of the entire disk. So when you make a Time Machine backup of a LR catalog, all the backed-up files will be from the same instant of time and will be internally consistent.
Recovering the catalog database from such a point-in-time snapshot is exactly the same as if the computer's power was shut off at the time of the snapshot.
Copy link to clipboard
Copied
@Sam31717367fpld: "Adobe needs to give us options to better manage this, such as manually or automatically deleting this data from the catalog similar to how previews are handled."
Please add your constructive feedback and upvote to this feature request:
"I expect this is already in the works."
I'm not as confident. Unfortunately, Adobe rarely indicates ahead of time what their plans are.
Pruning the .lrcat-data file has a complication: It stores the computed results from all the AI commands, all of which except Generative AI Remove are idempotent. That is, you can remove the computed results and the Update AI Settings command will regenerate the exact same results.
Unfortunately, Generative AI Remove is not idempotent. That's because the Firefly generative algorithm has randomness built in, so even if you send the exact image and selection to Firefly a second time, it will deliver a different replacement patch. The standard way of handling that is to include include a random-number seed in the request to the random algorithm (and record the seed in the catalog), but LR currently doesn't do that (perhaps because Firefly doesn't provide that capability).
Thus, any pruning of the .lrcat-data file would have to leave the Generative AI Remove results in place. That's not such a big deal, because they're a tiny fraction of the same of the Denoise and Super Resolution data.
Copy link to clipboard
Copied
It's going to break more and more users' workflows as the size of files grow larger, so hopefully they realise this and are working on a fix. Otherwise eventually they'll have no choice because everyone's HDDs will fill up.
Thanks for that info on Generative AI Remove, I don't use that feature much so I guess I'll just avoid using it going forward in case I have to delete the file. I assume that will make it hard to fix since even if they start recording the seed, the seed hasn't been stored previosuly and so can't be retreived for previous edits. Even so, that shouldn't prevent the data being stored in sidecar files which I think we all agree is the solution.
Copy link to clipboard
Copied
No, we don't all agree that sidecar files are the solution. I certainly don't. One reason is that I often work with smart previews and offline originals. Just like John, I can't tell you what Adobe's plans are, but I think it would make more sense not to prune the .lrcat-data, but to change the backup system. Yes, the lrcat-data can grow quite large, but it is still relatively small compared to some other things (previews, smart previews if you use those), at least for my catalog. I think the problem is not so much the size of lrcat-data, but the fact that a Lightroom catalog backup copies the entire lrcat-data and then has to zip the whole lot. That is what takes unnecessary time and backup disk space.
So what I would prefer to see is an incremental backup system, that does not save and compress the entire lrcat-data each time, even in cases where nothing has changed (because I only added some keywords, or only made some basic non-AI adjustments), but only saves what has changed. As mentioned earlier, I clone my entire catalog folder as an extra backup. That folder is more than 500 GB and yet it takes just a few minutes to clone it, because it is done incrementally.
Copy link to clipboard
Copied
As for backups, this is a solved problem. Much more robust databases deal with this. You apply pending changes, lock the db, back up the data, and you can even indicate which changes have been backed up (add a simple UI that flags changes made after the last backup. That way I can always see, in real time, what is backed up or not.)
Copy link to clipboard
Copied
Of course it's a solved problem. Incremental backups have existed for decades. The only unsolved problem is that Lightroom Classic does not implement it. Yes, I could use a separate incremental backup utility and not use the Lightroom Classic backup system at all. But Lightroom Classic verifies and optimizes the catalog as part of the backup, and that is not done by a separate utility. Optimizing can be done from within Lightroom Classic as well, but verifying is only available in the backup system.
Copy link to clipboard
Copied
I could make a long list of things for Adobe to fix in Lightroom. Even things I don't use like Gen AI and online sync obviously need help. :sigh:
Copy link to clipboard
Copied
The original design sin was including results computed from parametric Develop settings into files included in the LR backup. The preview cache doesn't get backed up, the camera raw cache doesn't get backed up, and the results from all AI commands except Generative AI Remove (e.g. Denoise and Reflections) shouldn't be backed up either.
Different ways the developers could atone for their sin:
1. Separate the storage of the recomputable results (Denoise, Reflections, etc.) from the Generative AI Remove results, and only include the latter in LR backups. Ways of doing that:
a. Introduce a new data file, .lrcat-cache, for the recomputatable results from Denoise, Reflections, etc. and don't include that file in LR backups. The .lrcat-data file would contain only Generative AI Remove results and still be included in backups.
b. Always generate .acr sidecars containing the computed results of the AI commands and never store them in the .lrcat-data file. (Many users probably wouldn't like this, since they like keeping their photo folders "clean" and unchanging.)
c. Go back to generating separate DNGs for Denoise, etc. As newer AI commands are introduced that affect the whole photo, would they too generate DNGs? (I'm highly skeptical Adobe would go for this.)
2 Allow the .lrcat-data file to be pruned of older results, similar to how the preview cache is pruned, but continue to include the file in LR backups (so that Generative AI Remove results are backed up). A disadvantage of this approach is that, unlike the preview cache, it's computationally much more expensive to recompute Denoise, Reflections, etc on the fly than to regenerate previews.
3. Use an incremental-backup method for creating LR backups, so that only the blobs in the .lrcat-data storage that have changed get copied into the incremental backup.
* * *
I think option 1.a, putting recomputable results from Denoise etc. in a separate data file that doesn't get backed up, would be the most straightforward and simplest for users to manage.
Copy link to clipboard
Copied
Adobe has implemented option b for Camera RAW, since there is no database. This is arguably the biggest implementation difference- ACR always generates sidecars because development data can't go anywhere else.
Copy link to clipboard
Copied
I agree that 1a would be a good solution, and it would be much easier to implement than designing a new incremental backup system.
@johnrellis you should make this one an official feature request, so we can all vote for it.
Copy link to clipboard
Copied
I think that suggestion 1a from @johnrellis is a sensible solution and should be relatively easy to implement.
Copy link to clipboard
Copied
@johnrellis soultion 1a is defintely easy.
At one point I suggested to let users decide to save Ai either
[1] in the .acr sidecar (at the time LrC was not using this format)
[2] in the lrcat-data
[3] both
Still Adobe could have seeded the results of Genrative Ai in their servers, allowing them to be regenerated identically if lost locally.
Copy link to clipboard
Copied
There is a fundamental compilcation with your 1.a option
Suppose the user does not save into XMP/acr
Suppose the user has no Generative Remove or People Distractions BUT has Enhance and Ai Masks and Remove Reflections
All Ai Mask ad Enhance blobs and Remove Reflections would then be in the lrcat-data...which would NOT be included in the backup
Should the user loose the lrcat-data BUT have a backup then his/her fate would be to miserably spend hours (maybe days even) to generate those Ai maks and Enhance and Reflections again, in foreground, unable to work during the entire process.
A productivity nightmare.
• Until the LrC team gives users background processing of AI settings keeping a backup of Enhance, Ai Maks and Remove Reflections is very important for the efficiency and productivity.
I dare to suggest a twist on your option 1.a
option 1.aaa (like the batteries)
Have 3 .lrcat-data databases
.masks-lrcat-data
.filters-lrcat-data (Enhance and Remove Reflections)
.firefly-lrcat-data (Manual Generative Gemove, People Distractions AND Generative Expand if ever it arrives in LrC)
Users could decide which to include in the backup.
.
Copy link to clipboard
Copied
I doubt that all users would know what to include and what the consequences are of not including something... If this leads to most people including everything just to be sure, then we're back at square one.
Copy link to clipboard
Copied
Fundamentally, a backup should ensure that nobody has data loss. Reloading or recalculating results is fine even if its a PITA. Losing data is bad. Default settings should ensure that any data I have can be reloaded or recreated.
I used to work as a Mac Genius and I've literally seen people sitting in the Apple Store crying because their data wasn't backed up and now its gone. I've personally lost data over the years and had some additional close calls. A recent one was check printing templates; I had changed systems and somehow didn't copy over all my files. Luckily I have offsite backups at my mom's house and was able to find what I needed on my backup hard drives.
I do always write to XMP, both for redundancy and so I can use Bridge/ACR if needed.
Copy link to clipboard
Copied
If reloading and recaluclating is fine then one might as well NOT care if the Generative Remove are lost since they also can be "recomputed"
At least to me what is really precious is not the Ai content but the time spent to get it.
Losing the Ai data (in any form) for me means having to waste my time to recompute...time doens't come back.
All my Ai mask if lost can be recomputed and will come back exactly like they were before I lost them.
A colossal waste of time, a negligible loss of data.
.
Copy link to clipboard
Copied
Read the notes above. Apparently (I don't use it) Gen Ai is not deterministic which mean that if you run it twice, you get different results. Therefore, those features cannot simply be recomputed.
Look, if your computer is stolen - you can buy a new computer, reload your apps, log back into accounts, etc. None of that is lost, even though its a huge inconvenience and waste of time. But any of YOUR data that isn't backed up is simply gone.
Copy link to clipboard
Copied
Nobody stops you from backing up anything you like in order not to waste time. I don't think that Adobe has to take that into account, however. What they do have to take into account is that anything that cannot be recreated and so could get lost forever, is backed up. Nothing more, nothing less.
Copy link to clipboard
Copied
I have ZERO Generative Remove and thousands of Ai masks.
The solution proposed by @johnrellis (i.e. 2 lrcat-data databases and backup only the Generative Content one ) would never work for me.
With 3 different blob databases we would have control on what to include in the backup.
And with 3 databases users that want to backup everying would still be able to do it.
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more