Re: split function issue by Bob
Bob
Wed Feb 28 17:31:42 CST 2007
RdS wrote:
> I am using the split function but have run into an issue. A spreadsheet has
> been saved to csv so my split function looks for seperator of ",". all works
> well except to when there is a comma inside quotes. As you know if there is
> a comma in a field when excel export it will place quotes around the value.
Not just commas.
Newlines (CR/LF pairs) can also occur in quoted strings.
>
> for example:
> 1,test,"user, test",5590, xyz
>
> how can I overcome this issue? Is there some other function I can use?
I have used two approaches, though there are probably many more possible.
One is to write an actual character-based parser, handling the data in linear fashion.
The split() function is not of much use for that.
The other is to adapt the approach to the pre-chosen tools, such as split(), by recognizing
the peculiarities of quoted strings. When I take this approach, the first split() I apply
is by quote characters, not commas. With the array returned, all odd-numbered elements
need one kind of handling, and the even-numbered ones need a different kind.
Bob
Note:
If it is possible for quote characters to appear within strings, then it is
critical to know ahead of time *precisely* which notation rules will be used.