A client sent me a NotePad file to be used as a newspaper legal for importing into InDesign. It has 27,000 delinquent tax entries and there are NO tab breaks for each column (TAX, YEAR, NAME, AMOUNT, COST, TOTAL), only a single space. I have tried every trick I know to turn this into tabbed entries, but nothing I do can differentiate between the single space and what should be a tabbed space.
I timed myself – to enter six tabs manually into 27,000 entries is 162,000 insertions, which takes about one minute each to insert – you literally have to read the whole stupid line to see where the tab should go. That will literally take me 18.75 days to import and create this legal!
I've asked them to send me the original spreadsheet as I could probably manipulate that into a tab delimited file or table, but so far all they sent me is NotePad. 😕
Does anyone know of a trick or script that could help? Please, I'm desperate!
I've done a lot of such conversions from clumsily exported data. Sometimes you can find a "key" that will let you put tabs in the right places, etc.
At that scale, you will save a vast amount of time by going back to the client for ANY original version of the file or a new export using more standard separators. It sounds as if it was exported from Word, Excel or a database by someone who didn't really understand what needed to happen down stream.
Explain to them clearly that you need records and fields clearly separated, and how something like CSV or tab-delimited files should be formatted. I'd ask exactly what they are exporting/retrieving the data from; that might help guide things to a more useful work file.
Thanks. I have years of experience dealing with badly exported files but this is the first time there's absolutely nothing I can do to fix it. I have asked – twice – for the original file and they keep sending me NotePad. I finally told my editor how long it will take me to do it manually and how much extra they'll owe me; I'm supposed to get the original file next week. Fingers crossed it actually happens.
Getting a properly delimited file would be best of course but if YEAR, AMOUNT, COST, TOTAL are all numbers or numbers preceded by a currency symbol, something like this —
Tax Name 2019 Joe Blow $4500.00 $1200.00 $5700.00
Another Name 2022 Fred Snurge $2300.00 $900.00 $3200.00
would it not be possible to use a regular expression to insert the tabs?
Can you show a few example lines?
There are no currency symbols in this whatsoever. This is what the file looks like – a direct copy/paste from the txt file they sent:
Type Year Tax Penalty Cost Total
TID: 30955 1701 R C SARASOTA INV LLC ETAL
6-22-9 LOTS 2, 3, S/2 NE, LESS HWY
REAL 2021 312.25 23.42 14.00 349.67
" have asked – twice – for the original file" -- it's important to realise they may not have any idea what you mean, because there may not be an "original file". You are thinking in terms of "this is in an Excel spreadsheet and they exporting it weirdly". But it could getting to them in an entirely different way. For example
- it's in a database, so they run a report app to get the info (in which case they MIGHT be able to change settings)
- it's handled by OCR of a printed report (in which case there are no columns, no cells, no tabs! Only guesswork)
- the TXT file is sent to them by a third party and it's all they had.
Instead of asking for something they don't understand, you need to explain the problem (in simple terms - perhaps get them to think of it in terms of getting the stuff into Excel), and tell them the cost of you solving it, and also the chance of errors. Conceivably, they employed you to do this BECAUSE they couldn't solve the issues themselves.
This is a county treasurer's office and they are used to providing this information to newspapers to be printed on a yearly basis. This same person has had this job for years. It's in a spreadsheet. It's ALWAYS been in a spreadsheet. That's why I can't understand why they won't send me the original file, which is what has ALWAYS been sent. It's not a matter of me not doing my job – I go above and beyond for my clients. This is the client not providing what's always been provided in the past, simply sending a text file each time I ask my editor to get the original spreadsheet file from them. A county treasurer's office won't use a text file to calculate taxes; they'll use a database or a spreadsheet. I do know my job.
Have you specifically said, "Send me the Excel file or worksheet you are maintaining this information in. I can't format just a text version of it."?
Yes, repeatedly. It's since been solved as I went to their offices and looked at the software they're using. It's a proprietary database without any ability to export/save in anything other than txt. Thanks
>>>It's a proprietary database without any ability to export/save in anything other than txt.
Tab-, comma-, and space (fixed character)-delimited are technically, text.
Yes, but the app has to be able to assign these separators.
There are many of these industrial and public-office and similar apps whose roots go back to... well, if not abacus beads, then at least mini-mainframes. The need to export to formatted file structure wasn't needed.
You can fake it (as we did with the warehouse app) by inserting dummy fields with commas or whatever in the report structure, but it ain't easy.
Can you share about 10 (redacted or not) lines of this file?
Like others already mentioned - show us an example.
But - if you have access to WORD / Excel - it would be pretty easy to "fix" the data.
1a) import your file into WORD - convert into table with space as a delimiter,
1b) import your file into Excel - convert text into columns - space as delimiter,
2) cut first two columns - TAX & YEAR - and paste into new document,
3) cut last three columns - AMOUNT, COST, TOTAL - into same new document - as additional columns, 4) in the new document - insert new column - after 2nd column - between YEAR & AMOUNT, 5) join columns you've left with in the source WORD (1a) / Excel (1b) file and paste it into 3rd column in the new document, 6) either convert into TAB delimited file or copy&paste directly to InDesign.
Or you can do all those steps in the InDesign itself - just import as plain text and convert into raw table and then above steps - creating extra table(s).
See my next reply - sorry, should have gave it a bit more thought before posting - but still doable.
Ok, after a bit of thinking - it may not be as easy but perfectly doable 😉
You need Excel - to get last three columns, you'd have to use few "=if(...)" conditions to find and transpose data.
OP notes they're used to working with crummy data. If all fields were the same length (in characters or words), it would be trivial to use Word or NotePad++ macros to delimit them. However, from the description of the info, most of those fields could have one to several words or word fragments.
Were it not for the sheer scale, I'd say do the best with macros and then hand-edit, but (1) as the OP notes, this could take tens of man-hours and (2) as others note, all this info comes from SOMEWHERE and it's very unlikely that it's not organized in a manner that would permit a CSV or tab-delimited output.
I agree that maybe the return request hasn't been clear; it's obviously going to someone who has access to a database or equivalent and doesn't understand that a raw text dump isn't usable. Make a very clear request for a comma or tab delimited file... and someone will understand that.
Hi, everyone. I just wanted to thank you all for your help. I went to the treasurer's office and looked at their software. It's proprietary and unable to save or export in ANY format other than text. I even spoke with their IT consulting company and confirmed the inability to save in any other format. So we're stuck with what we have.
Thanks again for all your help!
Wow, haven't run into that since a client who used a DOS-era warehouse management app that was at the core of EVERYTHING. I had to write elaborate post-processing stuff to get from 128-char ASCII to any useful format at all... but despite the endless headache they wouldn't upgrade to a tool from, say, the 1990s.
It's been fun. Good luck!
How are you supposed to present the data?
1st line starts from "TID:"
2nd line - address?
3rd line - or last line of the record - REAL XX.YY......
Do you have to show everything or only some info?
Then there are names of regions/places - PETITTE?
What's more - there is constant number of empty spaces in front - that would be useful.
Is this what you are looking for ? 😉
A bit "dirty" - extra spaces - easy to remove 😉 just wanted to show you that it can be done.
And next time you should upload example file to your opening post 😉
That's very close to what I'd like it to look like. Don't know how you managed to get it done or how long it took. I have less than one week to format and get this into the paper.
I've converted your txt file to tab delimited "file" and converted into table - what format exactly do you need? Show me end result that you are looking for.
We can go on priv if you prefer.
Or GREP will do the trick.