One of my missions in life is to make getting profession on the mainframe as simple and painless as possible. However, there is such a thing as oversimplifying, and I get a little set-off when I hear someone explain data sets as well, a data set is just a file. It's a dangerous generalization to make because for a few basic examples, yeah, it appears like a data set is the same thing as a file, and they have a lot of the same characteristics. But once you dig just a little bit deeper, you'll see there are some very important differences to know about. So first off, a data set can be either sequential or partitioned. A sequential data set is full of entries called records, and they go one right after the other, after the other, after the other like a stack of pancakes. A partitioned data set has a directory and members. The directory keeps track of all the members, so they don't have to be in order, and it also makes it possible for each member to be accessed directly. This is what closely resembles the files in a folder structure you're probably familiar with. So what do we use these data sets for? Use data sets for storing data. If you're dealing with data that comes out of a program, it typically comes out line by line, and each line is recorded as a record. In a sequential data set, you've just got record after record, output after output. It could be a series of passenger numbers, or a listing of company names. In a partitioned data set, that same data gets stored as records within the PDS members. If you're writing or reading a text file, each line in that is a record. So in a sequential data set, records go into a data set, and in a partition data set, records go into members which go into a data set. There are two kinds of partitioned data sets, PDSs and PDSEs. PDSEs are an enhancement to the original PDS design, and have greater flexibility and sizing, faster searching, and automatically reclaims space when members are deleted. There's only a few places in z/OS where you're required to use a PDS instead of a PDSE. Everywhere else, people just use a PDSE. There's also a very powerful type of data set called a VSAM data set, that stands for virtual storage access method. That's another acronym I had to look up just to tell you what a stands for because everyone just calls it VSAM. So just called VSAM. A VSAM data set uses a series of keys to reference the actual pieces of data being stored. It actually quite closely resembles something like a NoSQL database, and it's used primarily by applications directly, since you can't display or edit its contents in something like OSPF. There's a lot to VSAM, and it has many uses. But because we'll be using OSPF for many of our labs, we'll be focusing on PDS data sets. A PDS dataset name can be anywhere from one to a series of 22 joined name segments. So JEFF.LOVES.PIZZA is a valid data set name. It's also true. Jeff is the high level qualifier, loves is another qualifier, and pizza is the low-level qualifier. Obviously, qualifiers are separated by a period, and you can use a maximum of 44 characters in total, including the periods. So you've got these qualifiers separated by periods. Each qualifier must begin with an alphanumeric character, or the special character at hash or dollar sign. The characters after that can be alphanumeric, special, or numeric, and each qualifier must be 1-8 characters in length. $32.FOR.PIZZA is a valid dataset name as long as the for is spelled out F-O-R and not the number four because a data set qualifier can't start with a number. $32 a pizza, that's a little ridiculous, right? Anyway SYS1.PARMLIB, well, that better be a valid dataset name. So I.SURE.DO.LOVE.THIS.AWESOME.SYSTEM is a valid data set name because it has less than 44 characters total. You should try to have some order though, and not use names that might confuse people. Just because you can do it, sometimes doesn't mean you should. Now to use a data set, it must first be allocated, and allocated just means it exists in a way that we can reference it as an object. We have a link to it. So before we can type anything into a dataset, we need to allocate it since the dataset we are typing into is going to take up space, makes sense, right? One of the easiest ways of allocating a data set is through the ISPF Panel made specifically for data sets. So in ISPF, if we go to option three for utilities, and then option two for dataset, we can use option A for allocate. We can take ISPF suggestions for naming a dataset using these three fields: the project, group, and type, that's a good start, or I can just type out the name of the dataset that I want down here. It's up to you. But if you want to have more than three qualifiers, you'll have to use the field down here labeled other partitions, sequential, or VSAM data sets. On the next screen, you need to decide if you want a sequential or a partitioned data set. The steps are largely the same, except that for a sequential, you specify zero blocks for the directory because a sequential data set has no directory, everything is just in order. To create a PDSE, you specify the word library in the data set name type field over here. If you did this right, you'll get this message on top saying data set allocated. If you get anything else, just go back and retrace your steps. It's probably something simple. Either way, now you see why data sets are not the same thing as files.