From kshelby at optonline.net Tue Nov 18 00:50:07 2008 From: kshelby at optonline.net (KenShelby) Date: Mon, 17 Nov 2008 16:50:07 -0800 (PST) Subject: Journal config for SAN array Message-ID: <20525748.post@talk.nabble.com> The thread in this forum about using SSD for journals... http://www.nabble.com/journal-on-an-ssd-to19411589.html leads me to ask: What is the best way to configure journals for about half a dozen servers accessing some twenty file systems hosted on a RAID-5 SAN array? The array holds relational databases which are used in an OLTP style by web-based apps, so it sees lots of small-ish, nonsequential transactions. Presume the array has enough physical disks and cache to provide normally acceptable performance. None of the file systems on the array are shared: each file system is mounted (and zoned and masked) to one and only one server, though there may be several file systems on a server. Thanks in advance! ----- "You people from New Jersey are the Gypsies of the Galaxy!" -- View this message in context: http://www.nabble.com/Journal-config-for-SAN-array-tp20525748p20525748.html Sent from the Ext3 - User mailing list archive at Nabble.com. From thorsten.henrici at gfd.de Tue Nov 18 21:02:59 2008 From: thorsten.henrici at gfd.de (thorsten.henrici at gfd.de) Date: Tue, 18 Nov 2008 22:02:59 +0100 Subject: =?iso-8859-1?q?Thorsten_Henrici_ist_au=DFer_Haus=2E?= Message-ID: Ich werde ab 17.11.2008 nicht im B?ro sein. Ich kehre zur?ck am 21.11.2008. Ich werde Ihre Nachricht nach meiner R?ckkehr beantworten. In dringenden F?llen wenden Sie sich bitte an Herrn St?ver. I'm out of office until the 22th of September. In urgent cases please contact Mr. Karl-Heinz St?ver. -- IMPORTANT NOTICE: This email is confidential, may be legally privileged, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone else is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. From jordi.prats at gmail.com Sat Nov 22 23:23:12 2008 From: jordi.prats at gmail.com (Jordi Prats) Date: Sun, 23 Nov 2008 00:23:12 +0100 Subject: Journal config for SAN array In-Reply-To: <20525748.post@talk.nabble.com> References: <20525748.post@talk.nabble.com> Message-ID: <1908f30811221523s5f7c0298q7c8b25209b87a127@mail.gmail.com> Hi, There's not a single answer for this. It depends on each configuration, for each SAN, for each SAN topology... Is not that easy as it appears ;) regards, -- Jordi http://systemadmin.es On Tue, Nov 18, 2008 at 1:50 AM, KenShelby wrote: > > The thread in this forum about using SSD for journals... > > http://www.nabble.com/journal-on-an-ssd-to19411589.html > > leads me to ask: > > What is the best way to configure journals for about half a dozen servers > accessing some twenty file systems hosted on a RAID-5 SAN array? The array > holds relational databases which are used in an OLTP style by web-based > apps, so it sees lots of small-ish, nonsequential transactions. Presume the > array has enough physical disks and cache to provide normally acceptable > performance. None of the file systems on the array are shared: each file > system is mounted (and zoned and masked) to one and only one server, though > there may be several file systems on a server. > > Thanks in advance! > > > ----- > "You people from New Jersey are the Gypsies of the Galaxy!" > > -- > View this message in context: http://www.nabble.com/Journal-config-for-SAN-array-tp20525748p20525748.html > Sent from the Ext3 - User mailing list archive at Nabble.com. > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From kshelby at optonline.net Sun Nov 23 15:06:28 2008 From: kshelby at optonline.net (KenShelby) Date: Sun, 23 Nov 2008 07:06:28 -0800 (PST) Subject: Journal config for SAN array In-Reply-To: <1908f30811221523s5f7c0298q7c8b25209b87a127@mail.gmail.com> References: <20525748.post@talk.nabble.com> <1908f30811221523s5f7c0298q7c8b25209b87a127@mail.gmail.com> Message-ID: <20647102.post@talk.nabble.com> I'm not looking for a single "magic bullet" answer. Are there any guidelines at all, any best practices that have been found over time to be a good place to start? What I seek is either informed opinion or information - if it exists - about how to make choices on EXT3 journal configuration for a SAN installation. - Ken Hi, There's not a single answer for this. It depends on each configuration, for each SAN, for each SAN topology... Is not that easy as it appears ;) regards, -- Jordi http://systemadmin.es On Tue, Nov 18, 2008 at 1:50 AM, KenShelby wrote: > > What is the best way to configure journals... ----- "You people from New Jersey are the Gypsies of the Galaxy!" -- View this message in context: http://www.nabble.com/Journal-config-for-SAN-array-tp20525748p20647102.html Sent from the Ext3 - User mailing list archive at Nabble.com. From pegasus at nerv.eu.org Sun Nov 23 16:34:01 2008 From: pegasus at nerv.eu.org (Jure =?UTF-8?B?UGXEjWFy?=) Date: Sun, 23 Nov 2008 17:34:01 +0100 Subject: Journal config for SAN array In-Reply-To: <20647102.post@talk.nabble.com> References: <20525748.post@talk.nabble.com> <1908f30811221523s5f7c0298q7c8b25209b87a127@mail.gmail.com> <20647102.post@talk.nabble.com> Message-ID: <20081123173401.57ac1541.pegasus@nerv.eu.org> On Sun, 23 Nov 2008 07:06:28 -0800 (PST) KenShelby wrote: > > I'm not looking for a single "magic bullet" answer. Are there any > guidelines at all, any best practices that have been found over time to > be a good place to start? What I seek is either informed opinion or > information - if it exists - about how to make choices on EXT3 journal > configuration for a SAN installation. Common sense and knowlege of your application are a good start. You said that you have lots of random io, so I would be more concerned over latency than bandwith. If your SAN provides you with good enough latency then I wouldn't bother much about journal configs and go with defaults, but if your SAN sits on something like iSCSI with noticeable latency then I'd test a setup with journal on local disk, possibly even in full data journaling mode. If I had the money I'd look into SSDs for that purpose, either in form of SSD sata/sas drive or in form of PCIe card (fusion-io for example). -- Jure Pe?ar http://jure.pecar.org/ From adilger at sun.com Mon Nov 24 03:22:28 2008 From: adilger at sun.com (Andreas Dilger) Date: Sun, 23 Nov 2008 21:22:28 -0600 Subject: Journal config for SAN array In-Reply-To: <20081123173401.57ac1541.pegasus@nerv.eu.org> References: <20525748.post@talk.nabble.com> <1908f30811221523s5f7c0298q7c8b25209b87a127@mail.gmail.com> <20647102.post@talk.nabble.com> <20081123173401.57ac1541.pegasus@nerv.eu.org> Message-ID: <20081124032228.GC3186@webber.adilger.int> On Nov 23, 2008 17:34 +0100, Jure Pe?ar wrote: > On Sun, 23 Nov 2008 07:06:28 -0800 (PST) > KenShelby wrote: > > I'm not looking for a single "magic bullet" answer. Are there any > > guidelines at all, any best practices that have been found over time to > > be a good place to start? What I seek is either informed opinion or > > information - if it exists - about how to make choices on EXT3 journal > > configuration for a SAN installation. > > Common sense and knowlege of your application are a good start. > > You said that you have lots of random io, so I would be more concerned over > latency than bandwith. If your SAN provides you with good enough latency > then I wouldn't bother much about journal configs and go with defaults, but > if your SAN sits on something like iSCSI with noticeable latency then I'd > test a setup with journal on local disk, possibly even in full data > journaling mode. If I had the money I'd look into SSDs for that purpose, > either in form of SSD sata/sas drive or in form of PCIe card (fusion-io for > example). One important consideration is if you have hardware RAID-5/6, then it is a bad idea to put the journal on the same device as the filesystem. That causes a lot of read-modify-write of RAID stripes, and bad performance. Much better to put the journal on a RAID-1 device. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From lakshmipathi.g at gmail.com Wed Nov 26 04:40:47 2008 From: lakshmipathi.g at gmail.com (lakshmi pathi) Date: Wed, 26 Nov 2008 10:10:47 +0530 Subject: A Question on inode - ext3FS Message-ID: I have noticed that whenever you edit a file content,a new inode number is assigned with the File. But Sometime file changes the inode number,Sometimes it remains as older inode. Can anyone please let know the concept behind this - modifing content changes the file's inode but not all the time? I'm using Fedora 7 with ext3 file system. (Kernel : 2.6.25.4) Cheers, Lakshmipathi.G From darkonc at gmail.com Wed Nov 26 05:02:20 2008 From: darkonc at gmail.com (Stephen Samuel) Date: Tue, 25 Nov 2008 21:02:20 -0800 Subject: A Question on inode - ext3FS In-Reply-To: References: Message-ID: <6cd50f9f0811252102w445f8258m4b6195279b067486@mail.gmail.com> It's an application issue. It has to do with how the file is updated. There are two ways for an editor to update a file: write to the file in place, (nostly done with simple appends) or write a new version of the file (with a temporary name), then rename the temporary file to replace the old one. The first method only modifies the existing file, so the Inode number changes the same. The secondmethod creates a new file (with a different inode number) and then the new file (with the new inode number) replaces the old file. THe main advantage of the second method is that it's atomic. Either the file is replaced, or it isn't. Thus other users/programs which accesss the file never see intermediate results. (also the case if the program dies, or the system is reset). The first method has the advantage that the inode# stays the same, and so any programs which had the old file open will see the updates. The disadvantate is that if anything goes wrong in the middle of the update, the file could end up in an partial update state. Remember, the file is nothing more than a collection of bytes. There's no way to insert a character or two in the middle of the file. You then have to rewrite the entire file from that point on. On Tue, Nov 25, 2008 at 8:40 PM, lakshmi pathi wrote: > I have noticed that whenever you edit a file content,a new inode > number is assigned with the File. > But Sometime file changes the inode number,Sometimes it remains as older > inode. > > Can anyone please let know the concept behind this - modifing content > changes the file's inode but not all the time? > > I'm using Fedora 7 with ext3 file system. (Kernel : 2.6.25.4) > -- Stephen Samuel http://www.bcgreen.com 778-861-7641 -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel at rimspace.net Wed Nov 26 05:01:41 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Wed, 26 Nov 2008 16:01:41 +1100 Subject: A Question on inode - ext3FS References: Message-ID: <8763mbnkqy.fsf@rimspace.net> "lakshmi pathi" writes: > I have noticed that whenever you edit a file content,a new inode > number is assigned with the File. No, it never is. > But Sometime file changes the inode number,Sometimes it remains as > older inode. Sometimes your editor creates a *new* file, which has a new inode number, and replaces the original file with it. Other times the content is edited and now new file is created. Regards, Daniel From jordi.prats at gmail.com Wed Nov 26 07:38:46 2008 From: jordi.prats at gmail.com (Jordi Prats) Date: Wed, 26 Nov 2008 08:38:46 +0100 Subject: A Question on inode - ext3FS In-Reply-To: <6cd50f9f0811252102w445f8258m4b6195279b067486@mail.gmail.com> References: <6cd50f9f0811252102w445f8258m4b6195279b067486@mail.gmail.com> Message-ID: <1908f30811252338n326052e7ofc7121254326f715@mail.gmail.com> 2008/11/26 Stephen Samuel : > It's an application issue. It has to do with how the file is updated. > There are two ways for an editor to update a file: > write to the file in place, (nostly done with simple appends) or > write a new version of the file (with a temporary name), then rename the > temporary file to replace the old one. > > The first method only modifies the existing file, so the Inode number > changes the same. > > The secondmethod creates a new file (with a different inode number) and then > the new file (with the new inode number) replaces the old file. > > THe main advantage of the second method is that it's atomic. Either the file > is replaced, or it isn't. Thus other users/programs which accesss the file > never see intermediate results. (also the case if the program dies, or the > system is reset). Just curious...How is done that?There's a system call to do this replacement? Thanks! > The first method has the advantage that the inode# stays the same, and so > any programs which had the old file open will see the updates. The > disadvantate is that if anything goes wrong in the middle of the update, the > file could end up in an partial update state. > > Remember, the file is nothing more than a collection of bytes. There's no > way to insert a character or two in the middle of the file. You then have to > rewrite the entire file from that point on. > > On Tue, Nov 25, 2008 at 8:40 PM, lakshmi pathi > wrote: >> >> I have noticed that whenever you edit a file content,a new inode >> number is assigned with the File. >> But Sometime file changes the inode number,Sometimes it remains as older >> inode. >> >> Can anyone please let know the concept behind this - modifing content >> changes the file's inode but not all the time? >> >> I'm using Fedora 7 with ext3 file system. (Kernel : 2.6.25.4) > > -- > Stephen Samuel http://www.bcgreen.com > 778-861-7641 > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > -- Jordi From tytso at mit.edu Wed Nov 26 07:59:59 2008 From: tytso at mit.edu (Theodore Tso) Date: Wed, 26 Nov 2008 02:59:59 -0500 Subject: A Question on inode - ext3FS In-Reply-To: <1908f30811252338n326052e7ofc7121254326f715@mail.gmail.com> References: <6cd50f9f0811252102w445f8258m4b6195279b067486@mail.gmail.com> <1908f30811252338n326052e7ofc7121254326f715@mail.gmail.com> Message-ID: <20081126075959.GA17991@mit.edu> On Wed, Nov 26, 2008 at 08:38:46AM +0100, Jordi Prats wrote: > > THe main advantage of the second method is that it's atomic. Either the file > > is replaced, or it isn't. Thus other users/programs which accesss the file > > never see intermediate results. (also the case if the program dies, or the > > system is reset). > > Just curious...How is done that?There's a system call to do this replacement? > Yes. It's called rename(). Run "man 2 rename" to see its man page. - Ted From jnelson112390 at gmail.com Fri Nov 21 03:27:05 2008 From: jnelson112390 at gmail.com (John Nelson) Date: Thu, 20 Nov 2008 22:27:05 -0500 Subject: Commit Interval dosent work? Message-ID: <49262A89.8000306@gmail.com> I tried setting commit interval on my harddisk to 30 seconds but it still mounts at 5 second interval. I have this in fstab commit=30 Is that the write way for the fstab entry? From lakshmipathi.g at gmail.com Thu Nov 27 03:17:56 2008 From: lakshmipathi.g at gmail.com (lakshmi pathi) Date: Thu, 27 Nov 2008 03:17:56 +0000 Subject: A Question on inode - ext3FS In-Reply-To: <20081126075959.GA17991@mit.edu> References: <6cd50f9f0811252102w445f8258m4b6195279b067486@mail.gmail.com> <1908f30811252338n326052e7ofc7121254326f715@mail.gmail.com> <20081126075959.GA17991@mit.edu> Message-ID: >It's an application issue. Thank you all.Here is detailed info. of what i did: I have created two files file7 and file8 using vi : [root at localhost ~]# vi file7 [root at localhost ~]# vi file8 and added content "this " and here is the ls output : [root at localhost ~]# ls -il file7 6933339 -rw-r--r-- 1 root root 5 2008-11-26 20:56 file7 [root at localhost ~]# ls -il file8 6933340 -rw-r--r-- 1 root root 5 2008-11-26 20:56 file8 Now i changed the permission for file8 : [root at localhost ~]# chmod 777 file8 [root at localhost ~]# ls -il file8 6933340 -rwxrwxrwx 1 root root 5 2008-11-26 20:56 file8 (After changing chmod inode remains same,since file8 data block are not modified.) Now i added / appended new character to both files using vi command [root at localhost ~]# vi file7 [root at localhost ~]# ls -il file7 6933311 -rw-r--r-- 1 root root 7 2008-11-26 20:58 file7 As expected size and timestamp are updated . And along with inode numbers.(Earlier it was 6933339 -- now it is 6933311) [root at localhost ~]# vi file8 [root at localhost ~]# ls -il file8 6933340 -rwxrwxrwx 1 root root 7 2008-11-26 20:58 file8 [root at localhost ~]# But see here ,the inode remains the same(6933340) - though the content is modified Is Now i changed the file mode to 644 : [root at localhost ~]# chmod 644 file8 [root at localhost ~]# ls -il file8 6933340 -rw-r--r-- 1 root root 7 2008-11-26 20:58 file8 and appended a new character [root at localhost ~]# ls -il file8 6933451 -rw-r--r-- 1 root root 8 2008-11-26 21:07 file8 ...now the inode is also changed.(from 6933340 to 6933451) My doubts are : 1)Does files permission play any role in determining inode number of file when it's getting editted? 2)How application can decide on whether new inode / older inode,so far i thought it depends on functionality of filesystem/kernel. (I found edit files using different application like gedit doesn't change the inode.but vi changes inodes) (I have written little ext3 undelete tools,understanding these concepts will help me lot,If my assumption are wrong please correct me.) -- Cheers, Lakshmipathi.G From tytso at mit.edu Thu Nov 27 04:26:56 2008 From: tytso at mit.edu (Theodore Tso) Date: Wed, 26 Nov 2008 23:26:56 -0500 Subject: A Question on inode - ext3FS In-Reply-To: References: <6cd50f9f0811252102w445f8258m4b6195279b067486@mail.gmail.com> <1908f30811252338n326052e7ofc7121254326f715@mail.gmail.com> <20081126075959.GA17991@mit.edu> Message-ID: <20081127042656.GE14101@mit.edu> On Thu, Nov 27, 2008 at 03:17:56AM +0000, lakshmi pathi wrote: > > My doubts are : > 1)Does files permission play any role in determining inode number of > file when it's getting editted? It depends on the application. > 2)How application can decide on whether new inode / older inode,so far > i thought it depends on functionality of filesystem/kernel. It depends on the application. If the application does this when it writes the file: fd = open("filename", O_WRONLY|O_TRUNC); write(fd, buf, bufsize); close(fd); Then it will have the same inode number. Unfortunately, if your machine crashes (at exactly the wrong moment, i.e., right after the open has truncated the original file) while it is writing out Ph.D. thesis for which you have been spending the last 2 years writing, and you didn't keep any backups --- well, someone stupid enough not to do backups of their thesis probably doesn't deserve a Ph.D. :-) If the application does this when it writes the file: fd = open("filename.new", O_WRONLY|O_TRUNCATE); write(fd, buf, bufsize); close(fd); rename("filename.new", "filename); Then if you crash in the middle, you might lose what you had written in the last editing session, but at least the version of your file from the previous editing session is still safe, since we first write the new file as "filename.new" (and in the competently written version of the editor, every single system call will have appropriate error checking, which I've omitted here for clarity, but which is important since you want to make sure you know the file was correctly written and not truncated due to NFS server failing, or quota issues, or the disk filling, etc.) Note that in safe and sane way of doing things, you *will* get a new inode number --- it's unavoidable, because the old and new versions of the file co-exist at the same time for a brief period of time, so of course the new version of the file will have a new inode number. (As opposed to the insane way of doing things, where for a brief period of time, *no* copy of the content will exist on disk, and if you crash then --- oh, well. But hey! For people who care about keeping the same inode number, I guess that can be your consolation....) Some editors can be configurable which way that they do things. Also, some editors might normally prefer to use the O_TRUNC method (maybe because out of some misguided desire to keep the inode number the same, or because they don't want to bother with copying extended attributes or because they are worried about disk space, so they want to blow away the original file contents with the open (O_TRUNC), and then write the new file contents). However, for those application, if the file permissions make the file read-only, such that opening the file for writing would fail, it's possible such an insane application might then fall back to the safe and sane way of doing things. But that's purely an application decision --- the application could decide, if it is determined to do something as unsafe and risky as unprotected sex with someone they just met at some skanky bar as a one-night stand, the application could just forcibly change the permissions on the file, or just unlink the file file first. The bottom line is that it has ****NOTHING**** to do with the filesystem/kernel. It all has to do with whether the application writer cares about risking the user's data or not. - Ted From lakshmipathi.g at gmail.com Thu Nov 27 10:03:28 2008 From: lakshmipathi.g at gmail.com (lakshmi pathi) Date: Thu, 27 Nov 2008 15:33:28 +0530 Subject: A Question on inode - ext3FS In-Reply-To: <20081127042656.GE14101@mit.edu> References: <6cd50f9f0811252102w445f8258m4b6195279b067486@mail.gmail.com> <1908f30811252338n326052e7ofc7121254326f715@mail.gmail.com> <20081126075959.GA17991@mit.edu> <20081127042656.GE14101@mit.edu> Message-ID: Thanks a lot for your detailed reply with sample codes :) -- Cheers, Lakshmipathi.G On Thu, Nov 27, 2008 at 9:56 AM, Theodore Tso wrote: > On Thu, Nov 27, 2008 at 03:17:56AM +0000, lakshmi pathi wrote: >> >> My doubts are : >> 1)Does files permission play any role in determining inode number of >> file when it's getting editted? > > It depends on the application. > >> 2)How application can decide on whether new inode / older inode,so far >> i thought it depends on functionality of filesystem/kernel. > > It depends on the application. > > If the application does this when it writes the file: > > fd = open("filename", O_WRONLY|O_TRUNC); > write(fd, buf, bufsize); > close(fd); > > Then it will have the same inode number. Unfortunately, if your > machine crashes (at exactly the wrong moment, i.e., right after the > open has truncated the original file) while it is writing out > Ph.D. thesis for which you have been spending the last 2 years > writing, and you didn't keep any backups --- well, someone stupid > enough not to do backups of their thesis probably doesn't deserve a > Ph.D. :-) > > > If the application does this when it writes the file: > > fd = open("filename.new", O_WRONLY|O_TRUNCATE); > write(fd, buf, bufsize); > close(fd); > rename("filename.new", "filename); > > Then if you crash in the middle, you might lose what you had written > in the last editing session, but at least the version of your file > from the previous editing session is still safe, since we first write > the new file as "filename.new" (and in the competently written version > of the editor, every single system call will have appropriate error > checking, which I've omitted here for clarity, but which is important > since you want to make sure you know the file was correctly written > and not truncated due to NFS server failing, or quota issues, or the > disk filling, etc.) > > Note that in safe and sane way of doing things, you *will* get a new > inode number --- it's unavoidable, because the old and new versions of > the file co-exist at the same time for a brief period of time, so of > course the new version of the file will have a new inode number. (As > opposed to the insane way of doing things, where for a brief period of > time, *no* copy of the content will exist on disk, and if you crash > then --- oh, well. But hey! For people who care about keeping the > same inode number, I guess that can be your consolation....) > > Some editors can be configurable which way that they do things. > > Also, some editors might normally prefer to use the O_TRUNC method > (maybe because out of some misguided desire to keep the inode number > the same, or because they don't want to bother with copying extended > attributes or because they are worried about disk space, so they want > to blow away the original file contents with the open (O_TRUNC), and > then write the new file contents). However, for those application, if > the file permissions make the file read-only, such that opening the > file for writing would fail, it's possible such an insane application > might then fall back to the safe and sane way of doing things. But > that's purely an application decision --- the application could > decide, if it is determined to do something as unsafe and risky as > unprotected sex with someone they just met at some skanky bar as a > one-night stand, the application could just forcibly change the > permissions on the file, or just unlink the file file first. > > The bottom line is that it has ****NOTHING**** to do with the > filesystem/kernel. It all has to do with whether the application > writer cares about risking the user's data or not. > > - Ted >