So we talked a little bit about how the different kinds of commands do the CRUD operations and WHERE clauses and things like that. Now we're going to talk about what kind of data types that you can define in the schema of your tables. So we've got some text fields, we've got binary fields, sometimes those are called blobs, numeric fields, and then auto-increment fields. VARCHAR was the one that we did before and that's an efficient storage for characters of varying length from one character up to 128 or whatever. CHAR is a fixed space. Usually we use these for shorter strings. I mean, once your string gets above 64 characters you might have a character that has a blah blah blah blah blah and then it ends and you don't want to waste the time for that. But there are certain time places in databases like for GUIDs, Global Unique Identifiers, that you know they're going to always be like 64 characters and so CHAR is used for that. I tend to use CHAR when I know like I'm going to store hash, which is exactly 64 characters, or a GUID which is exactly 64 characters. When you can do that, database loves you, especially if you're going to fill it all up. If you're not filling it all up you're not doing yourself any favors. But if you have 64 and it's always 64 and it's never lower, then the database actually kind of loves you if you can say that. So use VARCHAR unless you really know that it's kind of a fixed thing. Sometimes you have fixed things that are only like two characters long, it likes that too. So you would never say VARCHAR 2. I don't kbnow, somewhere between 64 and 128. I would always use VARCHAR, say, above 64 just kind of my own, whatever. And the way it allocates it is the VARCHAR has a count and then the actual data, so you can pack it in a lot more tighter. Whereas the CHAR is just chunk, chunk, chunk, which means it's kind of uncompressed in a sense. But if you're going to fill it all up, there's no way to compress it. Okay. Text fields. This is what I talked about it before is if you want a text field and you don't want to have the database enforce a length on you, well, then just call it a text field. And these can be short, medium, or long. There's a count at the beginning, but the count, it's not that big of a deal. The key thing is that you generally don't use these for indexing and sorting. So you don't use these in a ORDER BY, you don't want to use these in a WHERE clause, you might use them in a LIKE clause knowing that it's going to be a full table scan. So if you're doing like a blog post and comments, or even just whole web pages that you're storing because you're building a spider to read web pages, text is great. And Postgres only has one, which I really like, lots of other databases have like a whole long list of different kinds of texts. But basically, you kind of have the things where you know the length and the things that you don't know the length and so that I think is a really simplifying notion that Postgres has. But it also has a character set. So what I mean when I say both the CHAR, VARCHAR, and TEXT have character sets, that means that they are not simple eight-bit characters and so the sort of traditional Western character set that we generally call Latin-1 or ASCII, that has a 127 characters and it fits into eight bits, and that's called a byte, which we're going to see in a second. And so you know that it's very efficient. Whereas when you have characters outside of the Latin character set, which might be like a cedilia or a Asian character set, those are longer than eight bits and so those are sometimes they can be eight bits, they can be 16, and they could even be 32 bits. And so if you have 100 characters that have a character set, that could be up to 400 bytes. But the nice thing is that we humans, when we're typing into forms or typing comments in a blog post, we're typing in those character sets and so you can't just say well, I'm going to write a web application that doesn't accept Asian characters or a web application that doesn't accept Spanish characters that there's only like the ASCII characters. So it's really important for databases to know the things that have character sets. And also in different character sets the sorting is different. So these are all the things that the database people have figured out for us. So character sets are very important, indexing, sorting, end-user input, and so CHAR, VARCHAR, and TEXT handle character sets. Now, why do I make such a big fuss about that? You also can store things that don't have character sets. Now, you could do this for like a GUID, a global unique user ID which you know is just numbers and letters A through F, which could fit in one of these things, and you know that. It's a string that has up to a 128 different characters in each one. So this BYTEA is the place that we store data that we're not letting the database know its character set and so you can store blobs of information, small images, etc. Integer numbers, there's a couple of different integer sizes. The normal integer that we use is a 32-bit integer that's 2 billion. That's used for most situations and then you can save space with a small integer, but then you're limited on the range. And big integers are much larger than 2 billion. But mostly we just make integer columns. You can have a number of different floating point columns. REAL is a 32-bit floating point that has seven digits of accuracy. And what we mean by accuracy as it has seven accurate digits, but you can put the decimal point anywhere in there. We've always had these 32-bit REAL numbers in all of computation from the beginning of time that 32-bit number with seven digits of accuracy is really only good for approximate computations. If you're taking like an average of the weather temperature over a long period of time, it's probably good enough. But if you're doing some real precise stellar calculation where you're calculating forces as like stars smash into each other, REAL is not good enough because then the errors creep in because it's only got seven digits of accuracy. So that's where DOUBLE PRECISION, we call it DOUBLE PRECISION because it's twice as big and usually for scientific computations where you're going to do lots of computations over and over and over again, like in a simulation, you always use DOUBLE PRECISION. Now it turns out that neither of those are good for money because the way fractions are represented in REAL and DOUBLE PRECISION, are there fractions with powers of two in the denominator? And so it turns out that like in America you have dollars and cents. Well, there's a 100 cents in the dollar so a cent is one-one hundreth of a dollar, but that one-one hundreth is not accurately represented in REAL or DOUBLE PRECISION. So we have this NUMERIC thing where you say okay, I would like two digits and I'd like 14 digits with two digits of decimal, and then it's perfect. So you can't represent money and there's lots of movies that like talk about like what happens when you can't represent money accurately where people like take the fractions of interest that are not represented in REAL and whatever, but if you're doing money, use NUMERIC. Dates, lots of things are important in dates. There is sort of a date and a time, but the thing we tend to use a lot is a thing called a TIMESTAMP. A TIMESTAMP is a 64-bit number and it represents minutes and seconds from 4713 BC to this big long AD. Years ago, this was a 32-bit number and we had this problem that the Unix time, which was the number of seconds since January 1 1970 and 32 bits, and that was going to run out of space in 2038. And so here's another xkcd. There's an xkcd comic for just about everything nerdy. And so clearly Postgres has switched to 64 bits for their timestamps. And if they hadn't, if they've stayed with 32-bit timestamps, in 2038 all the Linux systems and all the databases were going to blow up. But when they go to 64-bit, we can go to almost 300,000 AD without running out of space. And I'm not going to worry about that. Okay? Like 2038, we're getting to 2019, and 2020, and 2022. 2038 was coming, but they just went to 64 bit, and all these computers are 64 bit, and all these databases are 64 bit. So we don't have to worry about timestamps running out of time in 2038. So up next, we're going to talk about the things I've been talking about all along about how these things work with performance and how fast they go, etc., etc.