Last time, we learned how to convert strings to floats and ints. This time, we'll learn how to process a comma-separated value or CSV string. Why might we want to know how to process a CSV string? Well, it's a very common form of textual data. So you can output CSV files from Excel or Open Office or Google Sheets. So lots of times, people will save data that needs to be processed by a separate program as a CSV file. It's fairly uncommon to have people input data as comma-separated values. But the techniques we learn in this lecture will help us later in the course when we learn how to process CSV strings from a file. What ideas do we need to use to actually do this processing? Pretty much everything we've learned about strings so far. So let's get to work. I've already written all the code, we need to solve this problem, but we'll take some time walking through the code to make sure that we understand how it works. We're starting with a user-defined type, we're going to have a vector that we used off-and-on throughout the course so far, and of course, I typedef it as well, and then I've declared a variable to be of that type. So the big idea is we are going to read in a comma-separated value input from the user, and then we're going to populate this point variable with the x and y components of that user input. We've seen this stuff before as we were getting string input. It's exactly the same just the variable name has been changed, and now we're going to find the commaIndex in the code, and I've written some more robust input validation code then we've used so far getting string input. I initialize commaIndex to negative one, and we know that can't be true. We'll use the commaIndex as a flag to tell us when we finally have a valid input from the user. We've seen this before because we're actually going to use the built-in function to find the comma in our pointString and put that into a character pointer. While we haven't gotten valid input yet, we'll find the comment in the input, and if the result isn't null, then we do what we saw before when we were searching a string using pointers. We get a pointer to the start of the pointString, the user input. So this is like the base address of the array. In fact, it is precisely the base address of the array, and then commaIndex is result, the address of the comma character minus the base address of the array, and that gives us the index, and remember this works because characters are one byte long. If we were doing something else, we'd have to figure out a different way. If our result is null, that means there was no comma. So we print an error message and we ask again. Of course, this isn't totally robust right, to the user could enter Bob comma Joe and we would act like it was correct because at least it has a comma. But this is a preliminary cut at least making sure that they've given us the appropriate format. You'd really have to error check a lot more checking for spaces and making sure they'd actually entered numbers and everything to really confirm you've got valid input from the user. But I wanted to show you an example of the general structure we would use to do that. Now, we're going to extract x from the pointString. I've included this commented out code. Because some compilers let us create an array that has a size that we've determined at run-time. Of course, Visual Studio doesn't let us do that. So we've learned in the array's lectures that we can use pointers. So we'll allocate a string, a pointer to a set of characters, and we will allocate sufficient memory for times the size of a character. How did we know how much memory to allocate? We wanted to be able to take commaIndex, and add one to it to make sure that we can have space for our null-terminator for our string and allocate that much memory. So let's think of it this way. I'm going to use a running example of three comma four. The commaIndex when we find it in three comma four is one. But we actually need two characters for our x string. We need a character for the three, and we need a character for the null terminator. So taking commaIndex and adding one gives us just the right size for a well-formed null terminated string. Then I use something called string copy to copy from the user input string into xString. So let's go look at the documentation for that function. I'm going to use this strncopy here because it writes exactly n bytes or n characters. Remember characters are bytes, and I know exactly how many I want to copy over. I want to copy over the number up to, but not including the comma. So let's go to that documentation. We see that since C99, I do the following thing. I pass in a destination and that's just the string, the xString I'm copying into, I pass in a source string and that's the pointString, and I pass in how many characters I went to copy. It's important to notice this sentence here, "If the count is reached before the entire array source was copied." So in other words, if we're not copying all of the pointsString, and we know we're not we're only copying up to the comma, up to and not including the comma, then the resulting character array is not null terminated. So I knew that I needed to manually null terminate my string because the documentation told me that that's what would happen. As we saw in the documentation, xString is the destination, pointString is the source, and commaIndex is how many characters we want to copy. In this case, we only want to copy one character, we copy the three over, and then we add the null terminator to xString. Again, because we're using zero-based indexing in our array, commaIndex is one, xString zero holds three, so xString one should hold the null terminator just like we expect. I've popped up this overlay, so you can see what we've done so far. In this particular example, commaIndex is one, xString has two characters in it. It has the character three, and it has the null terminator. Then finally, we use atof which we learned about how we can convert a string into a float, and we put that into point.x. Because that is the x field or the member of our vector structure. Remember my discussion back when we were working with arrays, if I've allocated memory, then I better free up that memory as well. So this is about using pointers, but I am again showing good programming practice. Now that we're done with the xString, point.x holds our float. So we don't need xString hanging around anymore. We can free that memory and null the pointer. Okay, this next chunk of code is debugging code. Because I will freely admit to you that when I got to processing the yString part of the user input, it wasn't working, and I did not understand why it wasn't working. So I wrote some debug code that I could use to print out the actual contents of the string, so I could understand what was happening. I certainly could have used the debugger and stepped through one at a time to look at each thing, but I just wrote some debug code in this particular case. As you can see, I've added to the overlay to show that the string we actually get returned from f get S, includes the newline character from when the user press the "Enter" key. I assumed that it didn't and that's why the code that we haven't looked at yet wasn't actually working. So it's not uncommon that when you write code it doesn't work exactly as you expect it to, and looking at what you're trying to process and making sure you understand what you're trying to process, either through printing stuff out or through using the debugger, is a big help as you try to do that. Now it's time to extract y from the pointString. Now we don't want to include that newline character in the pointString length. So I'm going to calculate length in the usual way except that after string length tells me the length, which includes the newline character, I subtract 1. So this length right here is the length of the yString that I'm going to have when all is said and done. So this length is the length of all the real data in the string, not including the newline. Next, I'm going to allocate space for my yString. I need to allocate length minus commaIndex times sizeof char, y length minus commaIndex. Let's look at our example again. Length is three. The three, the comma, and the four. So there are three characters. CommaIndex is at one. So in this particular example, length minus commaIndex, 3 minus 1 is 2. That makes sense. I need two characters for my yString. I need to add the four, and I want a well-formed string so I need to allocate the null terminator as well. This next line of code calculates an offset. Here's what we're going to use the offset for. YString starts at zero, but the place that we're copying from in pointString, is not zero. It's commaIndex plus 1. We'll see in real moments that this offset helps us index properly into the pointString as we're copying pieces into yString. I'm using a for-loop to do it. I'll start i at zero, or i is less than the length minus commaIndex minus 1 because I don't want to grab that newline character increment i. Then, I set yString i equal to pointString i plus offset. So let's do the first one which is the only one in this particular case. When i is zero, we're going to set yString 0 equal to pointString 0 plus 2. So pointString 2, which is the character 4. Our for-loop only runs once in this particular case, but we could have a longer number as well. Then finally, at the end of yString, at length minus commaIndex 1. So let's do that math. Length is three, commaIndex is one. So 3 minus 1 minus 1 is 1. So at yString 1, we put the null terminator. yString 0 holds 4, so yString 1 holds the null terminator. I've popped up more in the overlay to show that final result. Finally, we convert the yString to a floating point number and populate the other field or member in our vector variable, freeing memory as we always do. So this was a manual way to copy from the character after the comma to the end of the string, not including the newline character into yString and convert it to float. This may feel much more familiar to you because this is just Array Processing and we've learned how to do strings and stuff. The offset is a new idea, but it's just math. So that shouldn't be too hard for you. Because we haven't really covered pointers in a lot of detail, I didn't use as my base solution using string copy with pointers, but I've included it down here as an example of how we could have done this with pointers and it would have been much easier to do than all this. Let's go back and look at the string copy documentation one more time before we talk about the details. You can see here that it actually says that we're passing in a pointer to characters as our source. We've been passing in the name of a string, a character array, and it figures it all out just fine automatically for us. So we declare a character pointer that points at the very first character in pointString that we want to copy over into yString. We started the base address of the array, and then we add commaIndex to move us to the comma, and then we add one to move us past the comma. So the memory address of the start of the user input plus commaIndex to move us past the 3, plus 1 to move us past the comma. That gives us yStart a pointer to the four. That's really what it is for this particular example. Then, I can do a string copy into yString, my destination. I actually still need to do this piece. I still need to allocate memory for the yString. So I string copy into yString starting at the yStart character, and here's how many characters I copy. As we already discovered, that one we're copying over the four. We still need to null terminate the string. I'm going to fix this because this needs to be minus 1. Then finally, we convert yString to a floating point number like we did above. So this is a lot less work, but it requires you understand pointers perhaps a little better than you would reasonably expect to understand them at this point in the course based on the minimal ways in which we've been using pointers. So this is a perfectly reasonable approach that might be more familiar to. Finally, after all that hard work, this is the easy part. We're just printing out the x and y fields in our vector for the point. Let's actually finally run the code. First, I'll do it with the running example of 3,4, and as you can see it extracts 3 and 4. We'll run it one more time just in case you think it only works for one character numbers for x and y. So 3.456, 7.890, and as you can see, it works fine for those numbers as well. To recap, in this lecture, we learned how to process comma-separated value or CSV strings.