I'm working on a project in a team of five for uni this semester. It's a middleware for managing files stored at multiple cloud storage providers, like Dropbox. In particular, we have to support three specific providers and implement RAID1/RAID5 logic. Users should be presented with a simple file management UI including selection of the RAID mode.
So, just before the Christmas holidays, one of the other team members volunteered to implement RAID 5. Two weeks ago, I noticed a commit indicating he was finished. Cool. While I'm of course aware of how RAID 5 works in general (I'm using it myself with my NAS), I didn't know how it works in detail - like how exactly files are split up, how parity data is calculated, how files can be recovered and so on. So, curiously, I had a look at his source code and realized that guy had no clue either.
His code splits files into two halves. One half is uploaded to the first provider, the second half to the second one, and the complete file is uploaded to the third one. The information of which part is which is encoded in the file name: originalName_1, originalName_2, originalName_3. If the file at one of the first two providers is missing or corrupted, the code returns the complete file from the third one. On the other hand, if the file stored with the third provider is missing or corrupted, it concatenates the data from providers 1 and 2.
And all of this is hardcoded. He literally creates an array of size 3 and assigns the parts to data[0], data[1] and data[2]. I can't fucking believe it.
The part where he uploads the parts is a simple loop, iterating through the provider list. Imagine something like providers.upload(fileNames + "_" + i, data). Which means if you set up more than three accounts, the data array is accessed beyond its upper bound. The language is JavaScript, so there's no exception or crash. It returns "undefined".
So if you've got more than three accounts (in other words: if you've got more than three hard disks), only the files in the first three will contain actual data. All files in all accounts starting with the fourth will contain "undefined". Fucking great RAID logic.
Funny enough, the assignment descriptions says the UI needs to show how parity information is distributed. I wonder what he's going to do when he sees that line. If he ever sees that line. I don't think he has ever read the assignment description.
For instance, the description clearly says we also need to support version management for both RAID modes. The way I see it, we should write a version manager class which is supplied with one of the two raid controllers. Three layers: Version Manager --uses--> (one) Raid Controller --uses--> (many) Cloud Storage Connector
But no, that fucking guy implemented version management directly in the RAID 5 controller. Only in the RAID 5 controller. What a goddamn mess.
So, just before the Christmas holidays, one of the other team members volunteered to implement RAID 5. Two weeks ago, I noticed a commit indicating he was finished. Cool. While I'm of course aware of how RAID 5 works in general (I'm using it myself with my NAS), I didn't know how it works in detail - like how exactly files are split up, how parity data is calculated, how files can be recovered and so on. So, curiously, I had a look at his source code and realized that guy had no clue either.
His code splits files into two halves. One half is uploaded to the first provider, the second half to the second one, and the complete file is uploaded to the third one. The information of which part is which is encoded in the file name: originalName_1, originalName_2, originalName_3. If the file at one of the first two providers is missing or corrupted, the code returns the complete file from the third one. On the other hand, if the file stored with the third provider is missing or corrupted, it concatenates the data from providers 1 and 2.
And all of this is hardcoded. He literally creates an array of size 3 and assigns the parts to data[0], data[1] and data[2]. I can't fucking believe it.
The part where he uploads the parts is a simple loop, iterating through the provider list. Imagine something like providers.upload(fileNames + "_" + i, data). Which means if you set up more than three accounts, the data array is accessed beyond its upper bound. The language is JavaScript, so there's no exception or crash. It returns "undefined".
So if you've got more than three accounts (in other words: if you've got more than three hard disks), only the files in the first three will contain actual data. All files in all accounts starting with the fourth will contain "undefined". Fucking great RAID logic.
Funny enough, the assignment descriptions says the UI needs to show how parity information is distributed. I wonder what he's going to do when he sees that line. If he ever sees that line. I don't think he has ever read the assignment description.
For instance, the description clearly says we also need to support version management for both RAID modes. The way I see it, we should write a version manager class which is supplied with one of the two raid controllers. Three layers: Version Manager --uses--> (one) Raid Controller --uses--> (many) Cloud Storage Connector
But no, that fucking guy implemented version management directly in the RAID 5 controller. Only in the RAID 5 controller. What a goddamn mess.