From mita at miraclelinux.com  Fri Sep  9 08:42:14 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Fri, 9 Sep 2005 17:42:14 +0900
Subject: [PATCH 0/6] jbd cleanup
Message-ID: <20050909084214.GB14205@miraclelinux.com>

The following 6 patches cleanup the jbd code and kill about 200 lines. 
First of 4 patches can apply to 2.6.13-git8 and 2.6.13-mm2.
The rest of them can apply to 2.6.13-mm2.

 fs/jbd/checkpoint.c          |  179 +++++++++++--------------------------------
 fs/jbd/commit.c              |  101 ++++++++++--------------
 fs/jbd/journal.c             |   11 +-
 fs/jbd/revoke.c              |  158 ++++++++++++++-----------------------
 fs/jbd/transaction.c         |  113 +++++----------------------
 include/linux/jbd.h          |   28 +++---
 include/linux/journal-head.h |    4 
 7 files changed, 201 insertions(+), 393 deletions(-)



From mita at miraclelinux.com  Fri Sep  9 08:43:42 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Fri, 9 Sep 2005 17:43:42 +0900
Subject: [PATCH 1/6] jbd: remove duplicated debug print
In-Reply-To: <20050909084214.GB14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
Message-ID: <20050909084342.GC14205@miraclelinux.com>

remove duplicated debug print

Signed-off-by: Akinobu Mita <mita at miraclelinux.com>

---

 commit.c |    2 --
 1 files changed, 2 deletions(-)
--- 2.6-mm/fs/jbd/commit.c.orig	2005-09-02 00:53:49.000000000 +0900
+++ 2.6-mm/fs/jbd/commit.c	2005-09-02 00:54:11.000000000 +0900
@@ -425,8 +425,6 @@ write_out_data:
 
 	journal_write_revoke_records(journal, commit_transaction);
 
-	jbd_debug(3, "JBD: commit phase 2\n");
-
 	/*
 	 * If we found any dirty or locked buffers, then we should have
 	 * looped back up to the write_out_data label.  If there weren't



From mita at miraclelinux.com  Fri Sep  9 08:44:41 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Fri, 9 Sep 2005 17:44:41 +0900
Subject: [PATCH 2/6] jbd: use hlist for the revoke tables
In-Reply-To: <20050909084214.GB14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
Message-ID: <20050909084441.GD14205@miraclelinux.com>

use struct hlist_head and hlist_node for the revoke tables.

Signed-off-by: Akinobu Mita <mita at miraclelinux.com>

---

 revoke.c |   56 ++++++++++++++++++++++++++------------------------------
 1 files changed, 26 insertions(+), 30 deletions(-)

diff -Nurp 2.6.13-mm1.old/fs/jbd/revoke.c 2.6.13-mm1/fs/jbd/revoke.c
--- 2.6.13-mm1.old/fs/jbd/revoke.c	2005-09-04 21:46:35.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/revoke.c	2005-09-04 21:50:25.000000000 +0900
@@ -79,7 +79,7 @@ static kmem_cache_t *revoke_table_cache;
 
 struct jbd_revoke_record_s 
 {
-	struct list_head  hash;
+	struct hlist_node hash;
 	tid_t		  sequence;	/* Used for recovery only */
 	unsigned long	  blocknr;
 };
@@ -92,7 +92,7 @@ struct jbd_revoke_table_s
 	 * for recovery.  Must be a power of two. */
 	int		  hash_size; 
 	int		  hash_shift; 
-	struct list_head *hash_table;
+	struct hlist_head *hash_table;
 };
 
 
@@ -119,7 +119,6 @@ static inline int hash(journal_t *journa
 static int insert_revoke_hash(journal_t *journal, unsigned long blocknr,
 			      tid_t seq)
 {
-	struct list_head *hash_list;
 	struct jbd_revoke_record_s *record;
 
 repeat:
@@ -129,9 +128,9 @@ repeat:
 
 	record->sequence = seq;
 	record->blocknr = blocknr;
-	hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
 	spin_lock(&journal->j_revoke_lock);
-	list_add(&record->hash, hash_list);
+	hlist_add_head(&record->hash,
+		       &journal->j_revoke->hash_table[hash(journal, blocknr)]);
 	spin_unlock(&journal->j_revoke_lock);
 	return 0;
 
@@ -148,19 +147,16 @@ oom:
 static struct jbd_revoke_record_s *find_revoke_record(journal_t *journal,
 						      unsigned long blocknr)
 {
-	struct list_head *hash_list;
+	struct hlist_node *node;
 	struct jbd_revoke_record_s *record;
 
-	hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
-
 	spin_lock(&journal->j_revoke_lock);
-	record = (struct jbd_revoke_record_s *) hash_list->next;
-	while (&(record->hash) != hash_list) {
+	hlist_for_each_entry(record, node,
+		&journal->j_revoke->hash_table[hash(journal, blocknr)], hash) {
 		if (record->blocknr == blocknr) {
 			spin_unlock(&journal->j_revoke_lock);
 			return record;
 		}
-		record = (struct jbd_revoke_record_s *) record->hash.next;
 	}
 	spin_unlock(&journal->j_revoke_lock);
 	return NULL;
@@ -219,7 +215,7 @@ int journal_init_revoke(journal_t *journ
 	journal->j_revoke->hash_shift = shift;
 
 	journal->j_revoke->hash_table =
-		kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+		kmalloc(hash_size * sizeof(struct hlist_head), GFP_KERNEL);
 	if (!journal->j_revoke->hash_table) {
 		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
 		journal->j_revoke = NULL;
@@ -227,7 +223,7 @@ int journal_init_revoke(journal_t *journ
 	}
 
 	for (tmp = 0; tmp < hash_size; tmp++)
-		INIT_LIST_HEAD(&journal->j_revoke->hash_table[tmp]);
+		INIT_HLIST_HEAD(&journal->j_revoke->hash_table[tmp]);
 
 	journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
 	if (!journal->j_revoke_table[1]) {
@@ -246,7 +242,7 @@ int journal_init_revoke(journal_t *journ
 	journal->j_revoke->hash_shift = shift;
 
 	journal->j_revoke->hash_table =
-		kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+		kmalloc(hash_size * sizeof(struct hlist_head), GFP_KERNEL);
 	if (!journal->j_revoke->hash_table) {
 		kfree(journal->j_revoke_table[0]->hash_table);
 		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
@@ -256,7 +252,7 @@ int journal_init_revoke(journal_t *journ
 	}
 
 	for (tmp = 0; tmp < hash_size; tmp++)
-		INIT_LIST_HEAD(&journal->j_revoke->hash_table[tmp]);
+		INIT_HLIST_HEAD(&journal->j_revoke->hash_table[tmp]);
 
 	spin_lock_init(&journal->j_revoke_lock);
 
@@ -268,7 +264,7 @@ int journal_init_revoke(journal_t *journ
 void journal_destroy_revoke(journal_t *journal)
 {
 	struct jbd_revoke_table_s *table;
-	struct list_head *hash_list;
+	struct hlist_head *hash_list;
 	int i;
 
 	table = journal->j_revoke_table[0];
@@ -277,7 +273,7 @@ void journal_destroy_revoke(journal_t *j
 
 	for (i=0; i<table->hash_size; i++) {
 		hash_list = &table->hash_table[i];
-		J_ASSERT (list_empty(hash_list));
+		J_ASSERT (hlist_empty(hash_list));
 	}
 
 	kfree(table->hash_table);
@@ -290,7 +286,7 @@ void journal_destroy_revoke(journal_t *j
 
 	for (i=0; i<table->hash_size; i++) {
 		hash_list = &table->hash_table[i];
-		J_ASSERT (list_empty(hash_list));
+		J_ASSERT (hlist_empty(hash_list));
 	}
 
 	kfree(table->hash_table);
@@ -445,7 +441,7 @@ int journal_cancel_revoke(handle_t *hand
 			jbd_debug(4, "cancelled existing revoke on "
 				  "blocknr %llu\n", (unsigned long long)bh->b_blocknr);
 			spin_lock(&journal->j_revoke_lock);
-			list_del(&record->hash);
+			hlist_del(&record->hash);
 			spin_unlock(&journal->j_revoke_lock);
 			kmem_cache_free(revoke_record_cache, record);
 			did_revoke = 1;
@@ -488,7 +484,7 @@ void journal_switch_revoke_table(journal
 		journal->j_revoke = journal->j_revoke_table[0];
 
 	for (i = 0; i < journal->j_revoke->hash_size; i++) 
-		INIT_LIST_HEAD(&journal->j_revoke->hash_table[i]);
+		INIT_HLIST_HEAD(&journal->j_revoke->hash_table[i]);
 }
 
 /*
@@ -504,7 +500,6 @@ void journal_write_revoke_records(journa
 	struct journal_head *descriptor;
 	struct jbd_revoke_record_s *record;
 	struct jbd_revoke_table_s *revoke;
-	struct list_head *hash_list;
 	int i, offset, count;
 
 	descriptor = NULL; 
@@ -516,16 +511,16 @@ void journal_write_revoke_records(journa
 		journal->j_revoke_table[1] : journal->j_revoke_table[0];
 
 	for (i = 0; i < revoke->hash_size; i++) {
-		hash_list = &revoke->hash_table[i];
+		struct hlist_head *hash_list = &revoke->hash_table[i];
 
-		while (!list_empty(hash_list)) {
-			record = (struct jbd_revoke_record_s *) 
-				hash_list->next;
+		while (!hlist_empty(hash_list)) {
+			record = hlist_entry(hash_list->first,
+					struct jbd_revoke_record_s, hash);
 			write_one_revoke_record(journal, transaction,
 						&descriptor, &offset, 
 						record);
 			count++;
-			list_del(&record->hash);
+			hlist_del(&record->hash);
 			kmem_cache_free(revoke_record_cache, record);
 		}
 	}
@@ -686,7 +681,7 @@ int journal_test_revoke(journal_t *journ
 void journal_clear_revoke(journal_t *journal)
 {
 	int i;
-	struct list_head *hash_list;
+	struct hlist_head *hash_list;
 	struct jbd_revoke_record_s *record;
 	struct jbd_revoke_table_s *revoke;
 
@@ -694,9 +689,10 @@ void journal_clear_revoke(journal_t *jou
 
 	for (i = 0; i < revoke->hash_size; i++) {
 		hash_list = &revoke->hash_table[i];
-		while (!list_empty(hash_list)) {
-			record = (struct jbd_revoke_record_s*) hash_list->next;
-			list_del(&record->hash);
+		while (!hlist_empty(hash_list)) {
+			record = hlist_entry(hash_list->first,
+					struct jbd_revoke_record_s, hash);
+			hlist_del(&record->hash);
 			kmem_cache_free(revoke_record_cache, record);
 		}
 	}



From mita at miraclelinux.com  Fri Sep  9 08:46:00 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Fri, 9 Sep 2005 17:46:00 +0900
Subject: [PATCH 3/6] jbd: cleanup for initializing/destroying the revoke
	tables
In-Reply-To: <20050909084214.GB14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
Message-ID: <20050909084600.GE14205@miraclelinux.com>

use loop counter for initializing/destroying a pair of the revoke tables.

Signed-off-by: Akinobu Mita <mita at miraclelinux.com>

---

 revoke.c |  116 ++++++++++++++++++++++-----------------------------------------
 1 files changed, 42 insertions(+), 74 deletions(-)

diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/revoke.c 2.6.13-mm1/fs/jbd/revoke.c
--- 2.6.13-mm1.old/fs/jbd/revoke.c	2005-09-05 03:21:00.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/revoke.c	2005-09-05 11:16:04.000000000 +0900
@@ -193,108 +193,76 @@ void journal_destroy_revoke_caches(void)
 
 int journal_init_revoke(journal_t *journal, int hash_size)
 {
-	int shift, tmp;
+	int shift = 0;
+	int tmp = hash_size;
+	int i;
 
+	/* Check that the hash_size is a power of two */
+	J_ASSERT ((hash_size & (hash_size-1)) == 0);
 	J_ASSERT (journal->j_revoke_table[0] == NULL);
 
-	shift = 0;
-	tmp = hash_size;
-	while((tmp >>= 1UL) != 0UL)
+	while ((tmp >>= 1UL) != 0UL)
 		shift++;
 
-	journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
-	if (!journal->j_revoke_table[0])
-		return -ENOMEM;
-	journal->j_revoke = journal->j_revoke_table[0];
-
-	/* Check that the hash_size is a power of two */
-	J_ASSERT ((hash_size & (hash_size-1)) == 0);
+	for (i = 0; i < 2; i++) {
+		struct jbd_revoke_table_s *table;
 
-	journal->j_revoke->hash_size = hash_size;
+		table = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
+		if (!table)
+			goto nomem;
+
+		table->hash_size = hash_size;
+		table->hash_shift = shift;
+		table->hash_table = kmalloc(hash_size * sizeof(struct hlist_head), GFP_KERNEL);
+		if (!table->hash_table) {
+			kmem_cache_free(revoke_table_cache, table);
+			goto nomem;
+		}
 
-	journal->j_revoke->hash_shift = shift;
+		for (tmp = 0; tmp < hash_size; tmp++)
+			INIT_HLIST_HEAD(&table->hash_table[tmp]);
 
-	journal->j_revoke->hash_table =
-		kmalloc(hash_size * sizeof(struct hlist_head), GFP_KERNEL);
-	if (!journal->j_revoke->hash_table) {
-		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
-		journal->j_revoke = NULL;
-		return -ENOMEM;
-	}
-
-	for (tmp = 0; tmp < hash_size; tmp++)
-		INIT_HLIST_HEAD(&journal->j_revoke->hash_table[tmp]);
-
-	journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
-	if (!journal->j_revoke_table[1]) {
-		kfree(journal->j_revoke_table[0]->hash_table);
-		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
-		return -ENOMEM;
+		journal->j_revoke_table[i] = table;
 	}
-
 	journal->j_revoke = journal->j_revoke_table[1];
+	spin_lock_init(&journal->j_revoke_lock);
 
-	/* Check that the hash_size is a power of two */
-	J_ASSERT ((hash_size & (hash_size-1)) == 0);
-
-	journal->j_revoke->hash_size = hash_size;
-
-	journal->j_revoke->hash_shift = shift;
+	return 0;
 
-	journal->j_revoke->hash_table =
-		kmalloc(hash_size * sizeof(struct hlist_head), GFP_KERNEL);
-	if (!journal->j_revoke->hash_table) {
-		kfree(journal->j_revoke_table[0]->hash_table);
-		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
-		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[1]);
-		journal->j_revoke = NULL;
-		return -ENOMEM;
+nomem:
+	while (i--) {
+		kfree(journal->j_revoke_table[i]->hash_table);
+		kmem_cache_free(revoke_table_cache, journal->j_revoke_table[i]);
 	}
 
-	for (tmp = 0; tmp < hash_size; tmp++)
-		INIT_HLIST_HEAD(&journal->j_revoke->hash_table[tmp]);
-
-	spin_lock_init(&journal->j_revoke_lock);
-
-	return 0;
+	return -ENOMEM;
 }
 
 /* Destoy a journal's revoke table.  The table must already be empty! */
 
 void journal_destroy_revoke(journal_t *journal)
 {
-	struct jbd_revoke_table_s *table;
-	struct hlist_head *hash_list;
-	int i;
+	int j;
 
-	table = journal->j_revoke_table[0];
-	if (!table)
-		return;
+	journal->j_revoke = NULL;
 
-	for (i=0; i<table->hash_size; i++) {
-		hash_list = &table->hash_table[i];
-		J_ASSERT (hlist_empty(hash_list));
-	}
+	for (j = 0; j < 2; j++) {
+		int i;
+		struct jbd_revoke_table_s *table = journal->j_revoke_table[j];
 
-	kfree(table->hash_table);
-	kmem_cache_free(revoke_table_cache, table);
-	journal->j_revoke = NULL;
+		if (!table)
+			return;
 
-	table = journal->j_revoke_table[1];
-	if (!table)
-		return;
+		for (i = 0; i < table->hash_size; i++) {
+			struct hlist_head *hash_list = &table->hash_table[i];
+			J_ASSERT (hlist_empty(hash_list));
+		}
 
-	for (i=0; i<table->hash_size; i++) {
-		hash_list = &table->hash_table[i];
-		J_ASSERT (hlist_empty(hash_list));
+		kfree(table->hash_table);
+		kmem_cache_free(revoke_table_cache, table);
 	}
-
-	kfree(table->hash_table);
-	kmem_cache_free(revoke_table_cache, table);
-	journal->j_revoke = NULL;
 }
 
-
 #ifdef __KERNEL__
 
 /* 



From mita at miraclelinux.com  Fri Sep  9 08:47:23 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Fri, 9 Sep 2005 17:47:23 +0900
Subject: [PATCH 4/6] jbd: use list_head for the list of buffers on a
	transaction's data
In-Reply-To: <20050909084214.GB14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
Message-ID: <20050909084723.GF14205@miraclelinux.com>

use struct list_head for doubly-linked list of buffers on a transaction's
data, metadata or forget queue.

Signed-off-by: Akinobu Mita <mita at miraclelinux.com>

---

 fs/jbd/checkpoint.c          |   12 ++--
 fs/jbd/commit.c              |   79 ++++++++++++++++--------------
 fs/jbd/journal.c             |    1 
 fs/jbd/transaction.c         |  110 ++++++++-----------------------------------
 include/linux/jbd.h          |   20 +++----
 include/linux/journal-head.h |    2 
 6 files changed, 80 insertions(+), 144 deletions(-)

diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/checkpoint.c 2.6.13-mm1/fs/jbd/checkpoint.c
--- 2.6.13-mm1.old/fs/jbd/checkpoint.c	2005-09-05 03:15:17.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/checkpoint.c	2005-09-05 03:15:35.000000000 +0900
@@ -684,12 +684,12 @@ void __journal_drop_transaction(journal_
 	}
 
 	J_ASSERT(transaction->t_state == T_FINISHED);
-	J_ASSERT(transaction->t_buffers == NULL);
-	J_ASSERT(transaction->t_sync_datalist == NULL);
-	J_ASSERT(transaction->t_forget == NULL);
-	J_ASSERT(transaction->t_iobuf_list == NULL);
-	J_ASSERT(transaction->t_shadow_list == NULL);
-	J_ASSERT(transaction->t_log_list == NULL);
+	J_ASSERT(list_empty(&transaction->t_metadata_list));
+	J_ASSERT(list_empty(&transaction->t_syncdata_list));
+	J_ASSERT(list_empty(&transaction->t_forget_list));
+	J_ASSERT(list_empty(&transaction->t_io_list));
+	J_ASSERT(list_empty(&transaction->t_shadow_list));
+	J_ASSERT(list_empty(&transaction->t_logctl_list));
 	J_ASSERT(transaction->t_checkpoint_list == NULL);
 	J_ASSERT(transaction->t_checkpoint_io_list == NULL);
 	J_ASSERT(transaction->t_updates == 0);
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/commit.c 2.6.13-mm1/fs/jbd/commit.c
--- 2.6.13-mm1.old/fs/jbd/commit.c	2005-09-05 03:16:12.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/commit.c	2005-09-05 03:15:35.000000000 +0900
@@ -250,8 +250,9 @@ void journal_commit_transaction(journal_
 	 * that multiple journal_get_write_access() calls to the same
 	 * buffer are perfectly permissable.
 	 */
-	while (commit_transaction->t_reserved_list) {
-		jh = commit_transaction->t_reserved_list;
+	while (!list_empty(&commit_transaction->t_reserved_list)) {
+		jh = list_entry(commit_transaction->t_reserved_list.next,
+				struct journal_head, b_list);
 		JBUFFER_TRACE(jh, "reserved, unused: refile");
 		/*
 		 * A journal_get_undo_access()+journal_release_buffer() may
@@ -300,14 +301,9 @@ void journal_commit_transaction(journal_
 	 * will be tracked for a new trasaction only -bzzz
 	 */
 	spin_lock(&journal->j_list_lock);
-	if (commit_transaction->t_buffers) {
-		new_jh = jh = commit_transaction->t_buffers->b_tnext;
-		do {
-			J_ASSERT_JH(new_jh, new_jh->b_modified == 1 ||
-					new_jh->b_modified == 0);
-			new_jh->b_modified = 0;
-			new_jh = new_jh->b_tnext;
-		} while (new_jh != jh);
+	list_for_each_entry(jh, &commit_transaction->t_metadata_list, b_list) {
+		J_ASSERT_JH(jh, jh->b_modified == 1 || jh->b_modified == 0);
+		jh->b_modified = 0;
 	}
 	spin_unlock(&journal->j_list_lock);
 
@@ -319,7 +315,7 @@ void journal_commit_transaction(journal_
 	err = 0;
 	/*
 	 * Whenever we unlock the journal and sleep, things can get added
-	 * onto ->t_sync_datalist, so we have to keep looping back to
+	 * onto ->t_syncdata_list, so we have to keep looping back to
 	 * write_out_data until we *know* that the list is empty.
 	 */
 	bufs = 0;
@@ -331,11 +327,12 @@ write_out_data:
 	cond_resched();
 	spin_lock(&journal->j_list_lock);
 
-	while (commit_transaction->t_sync_datalist) {
+	while (!list_empty(&commit_transaction->t_syncdata_list)) {
 		struct buffer_head *bh;
 
-		jh = commit_transaction->t_sync_datalist;
-		commit_transaction->t_sync_datalist = jh->b_tnext;
+		jh = list_entry(commit_transaction->t_syncdata_list.next,
+				struct journal_head, b_list);
+		list_move_tail(&jh->b_list, &commit_transaction->t_syncdata_list);
 		bh = jh2bh(jh);
 		if (buffer_locked(bh)) {
 			BUFFER_TRACE(bh, "locked");
@@ -389,10 +386,11 @@ write_out_data:
 	/*
 	 * Wait for all previously submitted IO to complete.
 	 */
-	while (commit_transaction->t_locked_list) {
+	while (!list_empty(&commit_transaction->t_locked_list)) {
 		struct buffer_head *bh;
 
-		jh = commit_transaction->t_locked_list->b_tprev;
+		jh = list_entry(commit_transaction->t_locked_list.prev,
+				struct journal_head, b_list);
 		bh = jh2bh(jh);
 		get_bh(bh);
 		if (buffer_locked(bh)) {
@@ -431,7 +429,7 @@ write_out_data:
 	 * any then journal_clean_data_list should have wiped the list
 	 * clean by now, so check that it is in fact empty.
 	 */
-	J_ASSERT (commit_transaction->t_sync_datalist == NULL);
+	J_ASSERT (list_empty(&commit_transaction->t_syncdata_list));
 
 	jbd_debug (3, "JBD: commit phase 3\n");
 
@@ -444,11 +442,12 @@ write_out_data:
 
 	descriptor = NULL;
 	bufs = 0;
-	while (commit_transaction->t_buffers) {
+	while (!list_empty(&commit_transaction->t_metadata_list)) {
 
 		/* Find the next buffer to be journaled... */
 
-		jh = commit_transaction->t_buffers;
+		jh = list_entry(commit_transaction->t_metadata_list.next,
+				struct journal_head, b_list);
 
 		/* If we're in abort mode, we just un-journal the buffer and
 		   release it for background writing. */
@@ -460,7 +459,7 @@ write_out_data:
 			 * any descriptor buffers which may have been
 			 * already allocated, even if we are now
 			 * aborting. */
-			if (!commit_transaction->t_buffers)
+			if (list_empty(&commit_transaction->t_metadata_list))
 				goto start_journal_io;
 			continue;
 		}
@@ -569,7 +568,7 @@ write_out_data:
 		   let the IO rip! */
 
 		if (bufs == journal->j_wbufsize ||
-		    commit_transaction->t_buffers == NULL ||
+		    list_empty(&commit_transaction->t_metadata_list) ||
 		    space_left < sizeof(journal_block_tag_t) + 16) {
 
 			jbd_debug(4, "JBD: Submit %d IOs\n", bufs);
@@ -601,8 +600,8 @@ start_journal_io:
 	/* Lo and behold: we have just managed to send a transaction to
            the log.  Before we can commit it, wait for the IO so far to
            complete.  Control buffers being written are on the
-           transaction's t_log_list queue, and metadata buffers are on
-           the t_iobuf_list queue.
+           transaction's t_logctl_list queue, and metadata buffers are on
+           the t_io_list queue.
 
 	   Wait for the buffers in reverse order.  That way we are
 	   less likely to be woken up until all IOs have completed, and
@@ -616,10 +615,11 @@ start_journal_io:
 	 * See __journal_try_to_free_buffer.
 	 */
 wait_for_iobuf:
-	while (commit_transaction->t_iobuf_list != NULL) {
+	while (!list_empty(&commit_transaction->t_io_list)) {
 		struct buffer_head *bh;
 
-		jh = commit_transaction->t_iobuf_list->b_tprev;
+		jh = list_entry(commit_transaction->t_io_list.prev,
+				struct journal_head, b_list);
 		bh = jh2bh(jh);
 		if (buffer_locked(bh)) {
 			wait_on_buffer(bh);
@@ -637,7 +637,7 @@ wait_for_iobuf:
 		journal_unfile_buffer(journal, jh);
 
 		/*
-		 * ->t_iobuf_list should contain only dummy buffer_heads
+		 * ->t_io_list should contain only dummy buffer_heads
 		 * which were created by journal_write_metadata_buffer().
 		 */
 		BUFFER_TRACE(bh, "dumping temporary bh");
@@ -648,7 +648,8 @@ wait_for_iobuf:
 
 		/* We also have to unlock and free the corresponding
                    shadowed buffer */
-		jh = commit_transaction->t_shadow_list->b_tprev;
+		jh = list_entry(commit_transaction->t_shadow_list.prev,
+				struct journal_head, b_list);
 		bh = jh2bh(jh);
 		clear_bit(BH_JWrite, &bh->b_state);
 		J_ASSERT_BH(bh, buffer_jbddirty(bh));
@@ -666,16 +667,17 @@ wait_for_iobuf:
 		__brelse(bh);
 	}
 
-	J_ASSERT (commit_transaction->t_shadow_list == NULL);
+	J_ASSERT (list_empty(&commit_transaction->t_shadow_list));
 
 	jbd_debug(3, "JBD: commit phase 5\n");
 
 	/* Here we wait for the revoke record and descriptor record buffers */
  wait_for_ctlbuf:
-	while (commit_transaction->t_log_list != NULL) {
+	while (!list_empty(&commit_transaction->t_logctl_list)) {
 		struct buffer_head *bh;
 
-		jh = commit_transaction->t_log_list->b_tprev;
+		jh = list_entry(commit_transaction->t_logctl_list.prev,
+				struct journal_head, b_list);
 		bh = jh2bh(jh);
 		if (buffer_locked(bh)) {
 			wait_on_buffer(bh);
@@ -710,12 +712,12 @@ wait_for_iobuf:
 
 	jbd_debug(3, "JBD: commit phase 7\n");
 
-	J_ASSERT(commit_transaction->t_sync_datalist == NULL);
-	J_ASSERT(commit_transaction->t_buffers == NULL);
+	J_ASSERT(list_empty(&commit_transaction->t_syncdata_list));
+	J_ASSERT(list_empty(&commit_transaction->t_metadata_list));
 	J_ASSERT(commit_transaction->t_checkpoint_list == NULL);
-	J_ASSERT(commit_transaction->t_iobuf_list == NULL);
-	J_ASSERT(commit_transaction->t_shadow_list == NULL);
-	J_ASSERT(commit_transaction->t_log_list == NULL);
+	J_ASSERT(list_empty(&commit_transaction->t_io_list));
+	J_ASSERT(list_empty(&commit_transaction->t_shadow_list));
+	J_ASSERT(list_empty(&commit_transaction->t_logctl_list));
 
 restart_loop:
 	/*
@@ -723,11 +725,12 @@ restart_loop:
 	 * to this list we have to be careful and hold the j_list_lock.
 	 */
 	spin_lock(&journal->j_list_lock);
-	while (commit_transaction->t_forget) {
+	while (!list_empty(&commit_transaction->t_forget_list)) {
 		transaction_t *cp_transaction;
 		struct buffer_head *bh;
 
-		jh = commit_transaction->t_forget;
+		jh = list_entry(commit_transaction->t_forget_list.next,
+				struct journal_head, b_list);
 		spin_unlock(&journal->j_list_lock);
 		bh = jh2bh(jh);
 		jbd_lock_bh_state(bh);
@@ -811,7 +814,7 @@ restart_loop:
 	 * Now recheck if some buffers did not get attached to the transaction
 	 * while the lock was dropped...
 	 */
-	if (commit_transaction->t_forget) {
+	if (!list_empty(&commit_transaction->t_forget_list)) {
 		spin_unlock(&journal->j_list_lock);
 		spin_unlock(&journal->j_state_lock);
 		goto restart_loop;
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/journal.c 2.6.13-mm1/fs/jbd/journal.c
--- 2.6.13-mm1.old/fs/jbd/journal.c	2005-09-05 03:15:17.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/journal.c	2005-09-05 03:15:39.000000000 +0900
@@ -1761,6 +1761,7 @@ repeat:
 		set_buffer_jbd(bh);
 		bh->b_private = jh;
 		jh->b_bh = bh;
+		INIT_LIST_HEAD(&jh->b_list);
 		get_bh(bh);
 		BUFFER_TRACE(bh, "added journal_head");
 	}
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/transaction.c 2.6.13-mm1/fs/jbd/transaction.c
--- 2.6.13-mm1.old/fs/jbd/transaction.c	2005-09-05 03:15:17.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/transaction.c	2005-09-05 03:15:35.000000000 +0900
@@ -51,6 +51,14 @@ get_transaction(journal_t *journal, tran
 	transaction->t_tid = journal->j_transaction_sequence++;
 	transaction->t_expires = jiffies + journal->j_commit_interval;
 	spin_lock_init(&transaction->t_handle_lock);
+	INIT_LIST_HEAD(&transaction->t_reserved_list);
+	INIT_LIST_HEAD(&transaction->t_locked_list);
+	INIT_LIST_HEAD(&transaction->t_metadata_list);
+	INIT_LIST_HEAD(&transaction->t_syncdata_list);
+	INIT_LIST_HEAD(&transaction->t_forget_list);
+	INIT_LIST_HEAD(&transaction->t_io_list);
+	INIT_LIST_HEAD(&transaction->t_shadow_list);
+	INIT_LIST_HEAD(&transaction->t_logctl_list);
 
 	/* Set up the commit timer for the new transaction. */
 	journal->j_commit_timer->expires = transaction->t_expires;
@@ -1414,64 +1422,12 @@ int journal_force_commit(journal_t *jour
 	return ret;
 }
 
-/*
- *
- * List management code snippets: various functions for manipulating the
- * transaction buffer lists.
- *
- */
-
-/*
- * Append a buffer to a transaction list, given the transaction's list head
- * pointer.
- *
- * j_list_lock is held.
- *
- * jbd_lock_bh_state(jh2bh(jh)) is held.
- */
-
-static inline void 
-__blist_add_buffer(struct journal_head **list, struct journal_head *jh)
-{
-	if (!*list) {
-		jh->b_tnext = jh->b_tprev = jh;
-		*list = jh;
-	} else {
-		/* Insert at the tail of the list to preserve order */
-		struct journal_head *first = *list, *last = first->b_tprev;
-		jh->b_tprev = last;
-		jh->b_tnext = first;
-		last->b_tnext = first->b_tprev = jh;
-	}
-}
-
-/* 
- * Remove a buffer from a transaction list, given the transaction's list
- * head pointer.
- *
- * Called with j_list_lock held, and the journal may not be locked.
- *
- * jbd_lock_bh_state(jh2bh(jh)) is held.
- */
-
-static inline void
-__blist_del_buffer(struct journal_head **list, struct journal_head *jh)
-{
-	if (*list == jh) {
-		*list = jh->b_tnext;
-		if (*list == jh)
-			*list = NULL;
-	}
-	jh->b_tprev->b_tnext = jh->b_tnext;
-	jh->b_tnext->b_tprev = jh->b_tprev;
-}
-
 /* 
  * Remove a buffer from the appropriate transaction list.
  *
  * Note that this function can *change* the value of
- * bh->b_transaction->t_sync_datalist, t_buffers, t_forget,
- * t_iobuf_list, t_shadow_list, t_log_list or t_reserved_list.  If the caller
+ * bh->b_transaction->t_syncdata_list, t_metadata_list, t_forget_list,
+ * t_io_list, t_shadow_list, t_logctl_list or t_reserved_list.  If the caller
  * is holding onto a copy of one of thee pointers, it could go bad.
  * Generally the caller needs to re-read the pointer from the transaction_t.
  *
@@ -1479,7 +1435,6 @@ __blist_del_buffer(struct journal_head *
  */
 void __journal_temp_unlink_buffer(struct journal_head *jh)
 {
-	struct journal_head **list = NULL;
 	transaction_t *transaction;
 	struct buffer_head *bh = jh2bh(jh);
 
@@ -1495,35 +1450,12 @@ void __journal_temp_unlink_buffer(struct
 	switch (jh->b_jlist) {
 	case BJ_None:
 		return;
-	case BJ_SyncData:
-		list = &transaction->t_sync_datalist;
-		break;
 	case BJ_Metadata:
-		transaction->t_nr_buffers--;
-		J_ASSERT_JH(jh, transaction->t_nr_buffers >= 0);
-		list = &transaction->t_buffers;
-		break;
-	case BJ_Forget:
-		list = &transaction->t_forget;
-		break;
-	case BJ_IO:
-		list = &transaction->t_iobuf_list;
-		break;
-	case BJ_Shadow:
-		list = &transaction->t_shadow_list;
-		break;
-	case BJ_LogCtl:
-		list = &transaction->t_log_list;
-		break;
-	case BJ_Reserved:
-		list = &transaction->t_reserved_list;
-		break;
-	case BJ_Locked:
-		list = &transaction->t_locked_list;
+		transaction->t_nr_metadata--;
+		J_ASSERT_JH(jh, transaction->t_nr_metadata >= 0);
 		break;
 	}
-
-	__blist_del_buffer(list, jh);
+	list_del(&jh->b_list);
 	jh->b_jlist = BJ_None;
 	if (test_clear_buffer_jbddirty(bh))
 		mark_buffer_dirty(bh);	/* Expose it to the VM */
@@ -1924,7 +1856,7 @@ int journal_invalidatepage(journal_t *jo
 void __journal_file_buffer(struct journal_head *jh,
 			transaction_t *transaction, int jlist)
 {
-	struct journal_head **list = NULL;
+	struct list_head *list = NULL;
 	int was_dirty = 0;
 	struct buffer_head *bh = jh2bh(jh);
 
@@ -1959,23 +1891,23 @@ void __journal_file_buffer(struct journa
 		J_ASSERT_JH(jh, !jh->b_frozen_data);
 		return;
 	case BJ_SyncData:
-		list = &transaction->t_sync_datalist;
+		list = &transaction->t_syncdata_list;
 		break;
 	case BJ_Metadata:
-		transaction->t_nr_buffers++;
-		list = &transaction->t_buffers;
+		transaction->t_nr_metadata++;
+		list = &transaction->t_metadata_list;
 		break;
 	case BJ_Forget:
-		list = &transaction->t_forget;
+		list = &transaction->t_forget_list;
 		break;
 	case BJ_IO:
-		list = &transaction->t_iobuf_list;
+		list = &transaction->t_io_list;
 		break;
 	case BJ_Shadow:
 		list = &transaction->t_shadow_list;
 		break;
 	case BJ_LogCtl:
-		list = &transaction->t_log_list;
+		list = &transaction->t_logctl_list;
 		break;
 	case BJ_Reserved:
 		list = &transaction->t_reserved_list;
@@ -1985,7 +1917,7 @@ void __journal_file_buffer(struct journa
 		break;
 	}
 
-	__blist_add_buffer(list, jh);
+	list_add_tail(&jh->b_list, list);
 	jh->b_jlist = jlist;
 
 	if (was_dirty)
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/include/linux/jbd.h 2.6.13-mm1/include/linux/jbd.h
--- 2.6.13-mm1.old/include/linux/jbd.h	2005-09-05 03:15:24.000000000 +0900
+++ 2.6.13-mm1/include/linux/jbd.h	2005-09-05 03:15:35.000000000 +0900
@@ -459,39 +459,39 @@ struct transaction_s 
 	 */
 	unsigned long		t_log_start;
 
-	/* Number of buffers on the t_buffers list [j_list_lock] */
-	int			t_nr_buffers;
+	/* Number of buffers on the t_metadata_list [j_list_lock] */
+	int			t_nr_metadata;
 
 	/*
 	 * Doubly-linked circular list of all buffers reserved but not yet
 	 * modified by this transaction [j_list_lock]
 	 */
-	struct journal_head	*t_reserved_list;
+	struct list_head	t_reserved_list;
 
 	/*
 	 * Doubly-linked circular list of all buffers under writeout during
 	 * commit [j_list_lock]
 	 */
-	struct journal_head	*t_locked_list;
+	struct list_head	t_locked_list;
 
 	/*
 	 * Doubly-linked circular list of all metadata buffers owned by this
 	 * transaction [j_list_lock]
 	 */
-	struct journal_head	*t_buffers;
+	struct list_head	t_metadata_list;
 
 	/*
 	 * Doubly-linked circular list of all data buffers still to be
 	 * flushed before this transaction can be committed [j_list_lock]
 	 */
-	struct journal_head	*t_sync_datalist;
+	struct list_head	t_syncdata_list;
 
 	/*
 	 * Doubly-linked circular list of all forget buffers (superseded
 	 * buffers which we can un-checkpoint once this transaction commits)
 	 * [j_list_lock]
 	 */
-	struct journal_head	*t_forget;
+	struct list_head	t_forget_list;
 
 	/*
 	 * Doubly-linked circular list of all buffers still to be flushed before
@@ -509,20 +509,20 @@ struct transaction_s 
 	 * Doubly-linked circular list of temporary buffers currently undergoing
 	 * IO in the log [j_list_lock]
 	 */
-	struct journal_head	*t_iobuf_list;
+	struct list_head	t_io_list;
 
 	/*
 	 * Doubly-linked circular list of metadata buffers being shadowed by log
 	 * IO.  The IO buffers on the iobuf list and the shadow buffers on this
 	 * list match each other one for one at all times. [j_list_lock]
 	 */
-	struct journal_head	*t_shadow_list;
+	struct list_head	t_shadow_list;
 
 	/*
 	 * Doubly-linked circular list of control buffers being written to the
 	 * log. [j_list_lock]
 	 */
-	struct journal_head	*t_log_list;
+	struct list_head	t_logctl_list;
 
 	/*
 	 * Protects info related to handles
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/include/linux/journal-head.h 2.6.13-mm1/include/linux/journal-head.h
--- 2.6.13-mm1.old/include/linux/journal-head.h	2005-09-05 03:15:24.000000000 +0900
+++ 2.6.13-mm1/include/linux/journal-head.h	2005-09-05 03:15:35.000000000 +0900
@@ -72,7 +72,7 @@ struct journal_head {
 	 * Doubly-linked list of buffers on a transaction's data, metadata or
 	 * forget queue. [t_list_lock] [jbd_lock_bh_state()]
 	 */
-	struct journal_head *b_tnext, *b_tprev;
+	struct list_head b_list;
 
 	/*
 	 * Pointer to the compound transaction against which this buffer



From mita at miraclelinux.com  Fri Sep  9 08:48:51 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Fri, 9 Sep 2005 17:48:51 +0900
Subject: [-mm PATCH 5/6] jbd: use list_head for the list of all transactions
	waiting for
In-Reply-To: <20050909084214.GB14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
Message-ID: <20050909084851.GG14205@miraclelinux.com>

use struct list_head for a linked circular list of all transactions
waiting for checkpointing on a journal control structure.

Signed-off-by: Akinobu Mita <mita at miraclelinux.com>

---

 fs/jbd/checkpoint.c  |   48 ++++++++++++++++++++----------------------------
 fs/jbd/commit.c      |   16 ++--------------
 fs/jbd/journal.c     |    9 +++++----
 fs/jbd/transaction.c |    1 +
 include/linux/jbd.h  |    4 ++--
 5 files changed, 30 insertions(+), 48 deletions(-)

diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/checkpoint.c 2.6.13-mm1/fs/jbd/checkpoint.c
--- 2.6.13-mm1.old/fs/jbd/checkpoint.c	2005-09-04 23:31:48.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/checkpoint.c	2005-09-05 00:23:28.000000000 +0900
@@ -180,8 +180,10 @@ static void __wait_cp_io(journal_t *jour
 	this_tid = transaction->t_tid;
 restart:
 	/* Didn't somebody clean up the transaction in the meanwhile */
-	if (journal->j_checkpoint_transactions != transaction ||
-		transaction->t_tid != this_tid)
+	if (list_empty(&journal->j_checkpoint_transactions) ||
+	    list_entry(journal->j_checkpoint_transactions.next, transaction_t,
+			t_cplist) != transaction ||
+	    transaction->t_tid != this_tid)
 		return;
 	while (!released && transaction->t_checkpoint_io_list) {
 		jh = transaction->t_checkpoint_io_list;
@@ -328,9 +330,10 @@ int log_do_checkpoint(journal_t *journal
 	 * and write it.
 	 */
 	spin_lock(&journal->j_list_lock);
-	if (!journal->j_checkpoint_transactions)
+	if (list_empty(&journal->j_checkpoint_transactions))
 		goto out;
-	transaction = journal->j_checkpoint_transactions;
+	transaction = list_entry(journal->j_checkpoint_transactions.next,
+				 transaction_t, t_cplist);
 	this_tid = transaction->t_tid;
 restart:
 	/*
@@ -338,8 +341,10 @@ restart:
 	 * done (maybe it's a new transaction, but it fell at the same
 	 * address).
 	 */
- 	if (journal->j_checkpoint_transactions == transaction ||
-			transaction->t_tid == this_tid) {
+ 	if ((!list_empty(&journal->j_checkpoint_transactions) &&
+	     list_entry(journal->j_checkpoint_transactions.next,
+			transaction_t, t_cplist) == transaction) ||
+	      transaction->t_tid == this_tid) {
 		int batch_count = 0;
 		struct buffer_head *bhs[NR_BATCH];
 		struct journal_head *jh;
@@ -410,7 +415,7 @@ out:
 
 int cleanup_journal_tail(journal_t *journal)
 {
-	transaction_t * transaction;
+	transaction_t * transaction = NULL;
 	tid_t		first_tid;
 	unsigned long	blocknr, freed;
 
@@ -423,7 +428,9 @@ int cleanup_journal_tail(journal_t *jour
 
 	spin_lock(&journal->j_state_lock);
 	spin_lock(&journal->j_list_lock);
-	transaction = journal->j_checkpoint_transactions;
+	if (!list_empty(&journal->j_checkpoint_transactions))
+		transaction = list_entry(journal->j_checkpoint_transactions.next,
+					 transaction_t, t_cplist);
 	if (transaction) {
 		first_tid = transaction->t_tid;
 		blocknr = transaction->t_log_start;
@@ -530,18 +537,11 @@ static int journal_clean_one_cp_list(str
 
 int __journal_clean_checkpoint_list(journal_t *journal)
 {
-	transaction_t *transaction, *last_transaction, *next_transaction;
+	transaction_t *transaction, *next_transaction;
 	int ret = 0, released;
 
-	transaction = journal->j_checkpoint_transactions;
-	if (!transaction)
-		goto out;
-
-	last_transaction = transaction->t_cpprev;
-	next_transaction = transaction;
-	do {
-		transaction = next_transaction;
-		next_transaction = transaction->t_cpnext;
+	list_for_each_entry_safe(transaction, next_transaction,
+				&journal->j_checkpoint_transactions, t_cplist) {
 		ret += journal_clean_one_cp_list(transaction->
 				t_checkpoint_list, &released);
 		if (need_resched())
@@ -557,7 +557,7 @@ int __journal_clean_checkpoint_list(jour
 				t_checkpoint_io_list, &released);
 		if (need_resched())
 			goto out;
-	} while (transaction != last_transaction);
+	}
 out:
 	return ret;
 }
@@ -673,15 +673,7 @@ void __journal_insert_checkpoint(struct 
 void __journal_drop_transaction(journal_t *journal, transaction_t *transaction)
 {
 	assert_spin_locked(&journal->j_list_lock);
-	if (transaction->t_cpnext) {
-		transaction->t_cpnext->t_cpprev = transaction->t_cpprev;
-		transaction->t_cpprev->t_cpnext = transaction->t_cpnext;
-		if (journal->j_checkpoint_transactions == transaction)
-			journal->j_checkpoint_transactions =
-				transaction->t_cpnext;
-		if (journal->j_checkpoint_transactions == transaction)
-			journal->j_checkpoint_transactions = NULL;
-	}
+	list_del(&transaction->t_cplist);
 
 	J_ASSERT(transaction->t_state == T_FINISHED);
 	J_ASSERT(list_empty(&transaction->t_metadata_list));
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/commit.c 2.6.13-mm1/fs/jbd/commit.c
--- 2.6.13-mm1.old/fs/jbd/commit.c	2005-09-04 23:31:48.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/commit.c	2005-09-04 23:41:01.000000000 +0900
@@ -835,20 +835,8 @@ restart_loop:
 	if (commit_transaction->t_checkpoint_list == NULL) {
 		__journal_drop_transaction(journal, commit_transaction);
 	} else {
-		if (journal->j_checkpoint_transactions == NULL) {
-			journal->j_checkpoint_transactions = commit_transaction;
-			commit_transaction->t_cpnext = commit_transaction;
-			commit_transaction->t_cpprev = commit_transaction;
-		} else {
-			commit_transaction->t_cpnext =
-				journal->j_checkpoint_transactions;
-			commit_transaction->t_cpprev =
-				commit_transaction->t_cpnext->t_cpprev;
-			commit_transaction->t_cpnext->t_cpprev =
-				commit_transaction;
-			commit_transaction->t_cpprev->t_cpnext =
-				commit_transaction;
-		}
+		list_add_tail(&commit_transaction->t_cplist,
+			 &journal->j_checkpoint_transactions);
 	}
 	spin_unlock(&journal->j_list_lock);
 
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/journal.c 2.6.13-mm1/fs/jbd/journal.c
--- 2.6.13-mm1.old/fs/jbd/journal.c	2005-09-04 23:31:48.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/journal.c	2005-09-04 23:33:19.000000000 +0900
@@ -653,6 +653,7 @@ static journal_t * journal_init_common (
 		goto fail;
 	memset(journal, 0, sizeof(*journal));
 
+	INIT_LIST_HEAD(&journal->j_checkpoint_transactions);
 	init_waitqueue_head(&journal->j_wait_transaction_locked);
 	init_waitqueue_head(&journal->j_wait_logspace);
 	init_waitqueue_head(&journal->j_wait_done_commit);
@@ -1130,7 +1131,7 @@ void journal_destroy(journal_t *journal)
 
 	/* Totally anal locking here... */
 	spin_lock(&journal->j_list_lock);
-	while (journal->j_checkpoint_transactions != NULL) {
+	while (!list_empty(&journal->j_checkpoint_transactions)) {
 		spin_unlock(&journal->j_list_lock);
 		log_do_checkpoint(journal);
 		spin_lock(&journal->j_list_lock);
@@ -1138,7 +1139,7 @@ void journal_destroy(journal_t *journal)
 
 	J_ASSERT(journal->j_running_transaction == NULL);
 	J_ASSERT(journal->j_committing_transaction == NULL);
-	J_ASSERT(journal->j_checkpoint_transactions == NULL);
+	J_ASSERT(list_empty(&journal->j_checkpoint_transactions));
 	spin_unlock(&journal->j_list_lock);
 
 	/* We can now mark the journal as empty. */
@@ -1352,7 +1353,7 @@ int journal_flush(journal_t *journal)
 
 	/* ...and flush everything in the log out to disk. */
 	spin_lock(&journal->j_list_lock);
-	while (!err && journal->j_checkpoint_transactions != NULL) {
+	while (!err && !list_empty(&journal->j_checkpoint_transactions)) {
 		spin_unlock(&journal->j_list_lock);
 		err = log_do_checkpoint(journal);
 		spin_lock(&journal->j_list_lock);
@@ -1375,7 +1376,7 @@ int journal_flush(journal_t *journal)
 
 	J_ASSERT(!journal->j_running_transaction);
 	J_ASSERT(!journal->j_committing_transaction);
-	J_ASSERT(!journal->j_checkpoint_transactions);
+	J_ASSERT(list_empty(&journal->j_checkpoint_transactions));
 	J_ASSERT(journal->j_head == journal->j_tail);
 	J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence);
 	spin_unlock(&journal->j_state_lock);
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/transaction.c 2.6.13-mm1/fs/jbd/transaction.c
--- 2.6.13-mm1.old/fs/jbd/transaction.c	2005-09-04 23:31:47.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/transaction.c	2005-09-04 23:33:19.000000000 +0900
@@ -59,6 +59,7 @@ get_transaction(journal_t *journal, tran
 	INIT_LIST_HEAD(&transaction->t_io_list);
 	INIT_LIST_HEAD(&transaction->t_shadow_list);
 	INIT_LIST_HEAD(&transaction->t_logctl_list);
+	INIT_LIST_HEAD(&transaction->t_cplist);
 
 	/* Set up the commit timer for the new transaction. */
 	journal->j_commit_timer->expires = transaction->t_expires;
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/include/linux/jbd.h 2.6.13-mm1/include/linux/jbd.h
--- 2.6.13-mm1.old/include/linux/jbd.h	2005-09-04 23:32:35.000000000 +0900
+++ 2.6.13-mm1/include/linux/jbd.h	2005-09-04 23:33:15.000000000 +0900
@@ -545,7 +545,7 @@ struct transaction_s 
 	 * Forward and backward links for the circular list of all transactions
 	 * awaiting checkpoint. [j_list_lock]
 	 */
-	transaction_t		*t_cpnext, *t_cpprev;
+	struct list_head	t_cplist;
 
 	/*
 	 * When will the transaction expire (become due for commit), in jiffies?
@@ -667,7 +667,7 @@ struct journal_s
 	 * ... and a linked circular list of all transactions waiting for
 	 * checkpointing. [j_list_lock]
 	 */
-	transaction_t		*j_checkpoint_transactions;
+	struct list_head	j_checkpoint_transactions;
 
 	/*
 	 * Wait queue for waiting for a locked transaction to start committing,



From mita at miraclelinux.com  Fri Sep  9 08:50:07 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Fri, 9 Sep 2005 17:50:07 +0900
Subject: [-mm PATCH 6/6] jbd: use list_head for a transaction checkpoint list
In-Reply-To: <20050909084214.GB14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
Message-ID: <20050909085007.GH14205@miraclelinux.com>

use struct list_head for doubly-linked list of buffers still remaining to be
flushed before an old transaction can be checkpointed.

Signed-off-by: Akinobu Mita <mita at miraclelinux.com>

---

 fs/jbd/checkpoint.c          |  119 +++++++------------------------------------
 fs/jbd/commit.c              |    4 -
 fs/jbd/journal.c             |    1 
 fs/jbd/transaction.c         |    2 
 include/linux/jbd.h          |    4 -
 include/linux/journal-head.h |    2 
 6 files changed, 30 insertions(+), 102 deletions(-)

diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/checkpoint.c 2.6.13-mm1/fs/jbd/checkpoint.c
--- 2.6.13-mm1.old/fs/jbd/checkpoint.c	2005-09-05 03:21:20.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/checkpoint.c	2005-09-05 03:21:33.000000000 +0900
@@ -22,71 +22,7 @@
 #include <linux/jbd.h>
 #include <linux/errno.h>
 #include <linux/slab.h>
-
-/*
- * Unlink a buffer from a transaction checkpoint list.
- *
- * Called with j_list_lock held.
- */
-
-static void __buffer_unlink_first(struct journal_head *jh)
-{
-	transaction_t *transaction;
-
-	transaction = jh->b_cp_transaction;
-
-	jh->b_cpnext->b_cpprev = jh->b_cpprev;
-	jh->b_cpprev->b_cpnext = jh->b_cpnext;
-	if (transaction->t_checkpoint_list == jh) {
-		transaction->t_checkpoint_list = jh->b_cpnext;
-		if (transaction->t_checkpoint_list == jh)
-			transaction->t_checkpoint_list = NULL;
-	}
-}
-
-/*
- * Unlink a buffer from a transaction checkpoint(io) list.
- *
- * Called with j_list_lock held.
- */
-
-static inline void __buffer_unlink(struct journal_head *jh)
-{
-	transaction_t *transaction;
-
-	transaction = jh->b_cp_transaction;
-
-	__buffer_unlink_first(jh);
-	if (transaction->t_checkpoint_io_list == jh) {
-		transaction->t_checkpoint_io_list = jh->b_cpnext;
-		if (transaction->t_checkpoint_io_list == jh)
-			transaction->t_checkpoint_io_list = NULL;
-	}
-}
-
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
-	transaction_t *transaction;
-
-	transaction = jh->b_cp_transaction;
-	__buffer_unlink_first(jh);
-
-	if (!transaction->t_checkpoint_io_list) {
-		jh->b_cpnext = jh->b_cpprev = jh;
-	} else {
-		jh->b_cpnext = transaction->t_checkpoint_io_list;
-		jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
-		jh->b_cpprev->b_cpnext = jh;
-		jh->b_cpnext->b_cpprev = jh;
-	}
-	transaction->t_checkpoint_io_list = jh;
-}
+#include <linux/list.h>
 
 /*
  * Try to release a checkpointed buffer from its transaction.
@@ -185,8 +121,9 @@ restart:
 			t_cplist) != transaction ||
 	    transaction->t_tid != this_tid)
 		return;
-	while (!released && transaction->t_checkpoint_io_list) {
-		jh = transaction->t_checkpoint_io_list;
+	while (!released && !list_empty(&transaction->t_checkpoint_io_list)) {
+		jh = list_entry(transaction->t_checkpoint_io_list.next,
+				struct journal_head, b_cplist);
 		bh = jh2bh(jh);
 		if (!jbd_trylock_bh_state(bh)) {
 			jbd_sync_bh(journal, bh);
@@ -288,7 +225,9 @@ static int __process_buffer(journal_t *j
 		J_ASSERT_BH(bh, !buffer_jwrite(bh));
 		set_buffer_jwrite(bh);
 		bhs[*batch_count] = bh;
-		__buffer_relink_io(jh);
+		list_del(&jh->b_cplist);
+		list_add(&jh->b_cplist,
+			 &jh->b_cp_transaction->t_checkpoint_io_list);
 		jbd_unlock_bh_state(bh);
 		(*batch_count)++;
 		if (*batch_count == NR_BATCH) {
@@ -350,10 +289,11 @@ restart:
 		struct journal_head *jh;
 		int retry = 0;
 
-		while (!retry && transaction->t_checkpoint_list) {
+		while (!retry && !list_empty(&transaction->t_checkpoint_list)) {
 			struct buffer_head *bh;
 
-			jh = transaction->t_checkpoint_list;
+			jh = list_entry(transaction->t_checkpoint_list.next,
+					struct journal_head, b_cplist);
 			bh = jh2bh(jh);
 			if (!jbd_trylock_bh_state(bh)) {
 				jbd_sync_bh(journal, bh);
@@ -488,20 +428,14 @@ int cleanup_journal_tail(journal_t *jour
  * Returns number of bufers reaped (for debug)
  */
 
-static int journal_clean_one_cp_list(struct journal_head *jh, int *released)
+static int journal_clean_one_cp_list(struct list_head *head, int *released)
 {
-	struct journal_head *last_jh;
-	struct journal_head *next_jh = jh;
+	struct journal_head *jh, *next_jh;
 	int ret, freed = 0;
 
 	*released = 0;
-	if (!jh)
-		return 0;
 
- 	last_jh = jh->b_cpprev;
-	do {
-		jh = next_jh;
-		next_jh = jh->b_cpnext;
+	list_for_each_entry_safe(jh, next_jh, head, b_cplist) {
 		/* Use trylock because of the ranking */
 		if (jbd_trylock_bh_state(jh2bh(jh))) {
 			ret = __try_to_free_cp_buf(jh);
@@ -520,7 +454,7 @@ static int journal_clean_one_cp_list(str
 		 */
 		if (need_resched())
 			return freed;
-	} while (jh != last_jh);
+	}
 
 	return freed;
 }
@@ -542,7 +476,7 @@ int __journal_clean_checkpoint_list(jour
 
 	list_for_each_entry_safe(transaction, next_transaction,
 				&journal->j_checkpoint_transactions, t_cplist) {
-		ret += journal_clean_one_cp_list(transaction->
+		ret += journal_clean_one_cp_list(&transaction->
 				t_checkpoint_list, &released);
 		if (need_resched())
 			goto out;
@@ -553,7 +487,7 @@ int __journal_clean_checkpoint_list(jour
 		 * t_checkpoint_list with removing the buffer from the list as
 		 * we can possibly see not yet submitted buffers on io_list
 		 */
-		ret += journal_clean_one_cp_list(transaction->
+		ret += journal_clean_one_cp_list(&transaction->
 				t_checkpoint_io_list, &released);
 		if (need_resched())
 			goto out;
@@ -596,11 +530,11 @@ int __journal_remove_checkpoint(struct j
 	}
 	journal = transaction->t_journal;
 
-	__buffer_unlink(jh);
+	list_del(&jh->b_cplist);
 	jh->b_cp_transaction = NULL;
 
-	if (transaction->t_checkpoint_list != NULL ||
-	    transaction->t_checkpoint_io_list != NULL)
+	if (!list_empty(&transaction->t_checkpoint_list) ||
+	    !list_empty(&transaction->t_checkpoint_io_list))
 		goto out;
 	JBUFFER_TRACE(jh, "transaction has no more buffers");
 
@@ -648,16 +582,7 @@ void __journal_insert_checkpoint(struct 
 	J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
 
 	jh->b_cp_transaction = transaction;
-
-	if (!transaction->t_checkpoint_list) {
-		jh->b_cpnext = jh->b_cpprev = jh;
-	} else {
-		jh->b_cpnext = transaction->t_checkpoint_list;
-		jh->b_cpprev = transaction->t_checkpoint_list->b_cpprev;
-		jh->b_cpprev->b_cpnext = jh;
-		jh->b_cpnext->b_cpprev = jh;
-	}
-	transaction->t_checkpoint_list = jh;
+	list_add(&jh->b_cplist, &transaction->t_checkpoint_list);
 }
 
 /*
@@ -682,8 +607,8 @@ void __journal_drop_transaction(journal_
 	J_ASSERT(list_empty(&transaction->t_io_list));
 	J_ASSERT(list_empty(&transaction->t_shadow_list));
 	J_ASSERT(list_empty(&transaction->t_logctl_list));
-	J_ASSERT(transaction->t_checkpoint_list == NULL);
-	J_ASSERT(transaction->t_checkpoint_io_list == NULL);
+	J_ASSERT(list_empty(&transaction->t_checkpoint_list));
+	J_ASSERT(list_empty(&transaction->t_checkpoint_io_list));
 	J_ASSERT(transaction->t_updates == 0);
 	J_ASSERT(journal->j_committing_transaction != transaction);
 	J_ASSERT(journal->j_running_transaction != transaction);
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/commit.c 2.6.13-mm1/fs/jbd/commit.c
--- 2.6.13-mm1.old/fs/jbd/commit.c	2005-09-05 03:21:20.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/commit.c	2005-09-05 03:21:33.000000000 +0900
@@ -714,7 +714,7 @@ wait_for_iobuf:
 
 	J_ASSERT(list_empty(&commit_transaction->t_syncdata_list));
 	J_ASSERT(list_empty(&commit_transaction->t_metadata_list));
-	J_ASSERT(commit_transaction->t_checkpoint_list == NULL);
+	J_ASSERT(list_empty(&commit_transaction->t_checkpoint_list));
 	J_ASSERT(list_empty(&commit_transaction->t_io_list));
 	J_ASSERT(list_empty(&commit_transaction->t_shadow_list));
 	J_ASSERT(list_empty(&commit_transaction->t_logctl_list));
@@ -832,7 +832,7 @@ restart_loop:
 	journal->j_committing_transaction = NULL;
 	spin_unlock(&journal->j_state_lock);
 
-	if (commit_transaction->t_checkpoint_list == NULL) {
+	if (list_empty(&commit_transaction->t_checkpoint_list)) {
 		__journal_drop_transaction(journal, commit_transaction);
 	} else {
 		list_add_tail(&commit_transaction->t_cplist,
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/journal.c 2.6.13-mm1/fs/jbd/journal.c
--- 2.6.13-mm1.old/fs/jbd/journal.c	2005-09-05 03:21:20.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/journal.c	2005-09-05 03:21:36.000000000 +0900
@@ -1763,6 +1763,7 @@ repeat:
 		bh->b_private = jh;
 		jh->b_bh = bh;
 		INIT_LIST_HEAD(&jh->b_list);
+		INIT_LIST_HEAD(&jh->b_cplist);
 		get_bh(bh);
 		BUFFER_TRACE(bh, "added journal_head");
 	}
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/fs/jbd/transaction.c 2.6.13-mm1/fs/jbd/transaction.c
--- 2.6.13-mm1.old/fs/jbd/transaction.c	2005-09-05 03:21:20.000000000 +0900
+++ 2.6.13-mm1/fs/jbd/transaction.c	2005-09-05 03:21:36.000000000 +0900
@@ -60,6 +60,8 @@ get_transaction(journal_t *journal, tran
 	INIT_LIST_HEAD(&transaction->t_shadow_list);
 	INIT_LIST_HEAD(&transaction->t_logctl_list);
 	INIT_LIST_HEAD(&transaction->t_cplist);
+	INIT_LIST_HEAD(&transaction->t_checkpoint_list);
+	INIT_LIST_HEAD(&transaction->t_checkpoint_io_list);
 
 	/* Set up the commit timer for the new transaction. */
 	journal->j_commit_timer->expires = transaction->t_expires;
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/include/linux/jbd.h 2.6.13-mm1/include/linux/jbd.h
--- 2.6.13-mm1.old/include/linux/jbd.h	2005-09-05 03:21:20.000000000 +0900
+++ 2.6.13-mm1/include/linux/jbd.h	2005-09-05 03:21:33.000000000 +0900
@@ -497,13 +497,13 @@ struct transaction_s 
 	 * Doubly-linked circular list of all buffers still to be flushed before
 	 * this transaction can be checkpointed. [j_list_lock]
 	 */
-	struct journal_head	*t_checkpoint_list;
+	struct list_head	t_checkpoint_list;
 
 	/*
 	 * Doubly-linked circular list of all buffers submitted for IO while
 	 * checkpointing. [j_list_lock]
 	 */
-	struct journal_head	*t_checkpoint_io_list;
+	struct list_head	t_checkpoint_io_list;
 
 	/*
 	 * Doubly-linked circular list of temporary buffers currently undergoing
diff -X 2.6.13-mm1/Documentation/dontdiff -Nurp 2.6.13-mm1.old/include/linux/journal-head.h 2.6.13-mm1/include/linux/journal-head.h
--- 2.6.13-mm1.old/include/linux/journal-head.h	2005-09-05 03:20:41.000000000 +0900
+++ 2.6.13-mm1/include/linux/journal-head.h	2005-09-05 03:21:33.000000000 +0900
@@ -86,7 +86,7 @@ struct journal_head {
 	 * before an old transaction can be checkpointed.
 	 * [j_list_lock]
 	 */
-	struct journal_head *b_cpnext, *b_cpprev;
+	struct list_head b_cplist;
 };
 
 #endif		/* JOURNAL_HEAD_H_INCLUDED */



From akpm at osdl.org  Fri Sep  9 09:15:22 2005
From: akpm at osdl.org (Andrew Morton)
Date: Fri, 9 Sep 2005 02:15:22 -0700
Subject: [PATCH 0/6] jbd cleanup
In-Reply-To: <20050909084214.GB14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
Message-ID: <20050909021522.1a271e4b.akpm@osdl.org>

Akinobu Mita <mita at miraclelinux.com> wrote:
>
> The following 6 patches cleanup the jbd code and kill about 200 lines. 
>

Thanks, but I'm not inclined to apply them.

a) Maybe 70-80% of the Linux world uses this filesystem.  We need to be
   very cautious in making changes to it.

b) A relatively large number of people are carrying quite large
   out-of-tree patches, some of which they're hoping to merge sometime. 
   Admittedly more against ext3 than JBD, but there is potential here to
   cause those people trouble.

Plus the switch to list_heads in journal_s has some impact on type safety
and debuggability - I considered doing it years ago but decided not to
because I found I _used_ those pointers fairly commonly in development. 
list_heads are a bit of a pain in gdb (kgdb and kernel core dumps), for
example.



From tytso at mit.edu  Fri Sep  9 18:16:49 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Fri, 9 Sep 2005 14:16:49 -0400
Subject: [PATCH 1/6] jbd: remove duplicated debug print
In-Reply-To: <20050909084342.GC14205@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
	<20050909084342.GC14205@miraclelinux.com>
Message-ID: <20050909181649.GC24228@thunk.org>

On Fri, Sep 09, 2005 at 05:43:42PM +0900, Akinobu Mita wrote:
> remove duplicated debug print

> -	jbd_debug(3, "JBD: commit phase 2\n");
> -

If you're going to do this, please renumber the rest of the "commit
phase n" messages.  Or the debugging messages will look very funny.

						- Ted



From mita at miraclelinux.com  Sat Sep 10 14:36:04 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Sat, 10 Sep 2005 23:36:04 +0900
Subject: [PATCH 1/6] jbd: remove duplicated debug print
In-Reply-To: <20050909181649.GC24228@thunk.org>
References: <20050909084214.GB14205@miraclelinux.com>
	<20050909084342.GC14205@miraclelinux.com>
	<20050909181649.GC24228@thunk.org>
Message-ID: <20050910143604.GA7593@miraclelinux.com>

On Fri, Sep 09, 2005 at 02:16:49PM -0400, Theodore Ts'o wrote:
> On Fri, Sep 09, 2005 at 05:43:42PM +0900, Akinobu Mita wrote:
> > remove duplicated debug print
> 
> > -	jbd_debug(3, "JBD: commit phase 2\n");
> > -
> 
> If you're going to do this, please renumber the rest of the "commit
> phase n" messages.  Or the debugging messages will look very funny.

The second duplicated "commit phase 2" only does:

 	J_ASSERT (commit_transaction->t_sync_datalist == NULL);

So I thought it might be accidentaly inserted.
diff -U 9 :

--- ./fs/jbd/commit.c.orig	2005-09-10 22:09:05.000000000 +0900
+++ ./fs/jbd/commit.c	2005-09-10 22:09:25.000000000 +0900
@@ -419,20 +419,18 @@ write_out_data:
 		cond_resched_lock(&journal->j_list_lock);
 	}
 	spin_unlock(&journal->j_list_lock);
 
 	if (err)
 		__journal_abort_hard(journal);
 
 	journal_write_revoke_records(journal, commit_transaction);
 
-	jbd_debug(3, "JBD: commit phase 2\n");
-
 	/*
 	 * If we found any dirty or locked buffers, then we should have
 	 * looped back up to the write_out_data label.  If there weren't
 	 * any then journal_clean_data_list should have wiped the list
 	 * clean by now, so check that it is in fact empty.
 	 */
 	J_ASSERT (commit_transaction->t_sync_datalist == NULL);
 
 	jbd_debug (3, "JBD: commit phase 3\n");



From mita at miraclelinux.com  Sat Sep 10 14:55:25 2005
From: mita at miraclelinux.com (Akinobu Mita)
Date: Sat, 10 Sep 2005 23:55:25 +0900
Subject: [PATCH 0/6] jbd cleanup
In-Reply-To: <20050909021522.1a271e4b.akpm@osdl.org>
References: <20050909084214.GB14205@miraclelinux.com>
	<20050909021522.1a271e4b.akpm@osdl.org>
Message-ID: <20050910145525.GB7593@miraclelinux.com>

On Fri, Sep 09, 2005 at 02:15:22AM -0700, Andrew Morton wrote:
> Akinobu Mita <mita at miraclelinux.com> wrote:
> >
> > The following 6 patches cleanup the jbd code and kill about 200 lines. 
> >
> 
> Thanks, but I'm not inclined to apply them.
> 
> a) Maybe 70-80% of the Linux world uses this filesystem.  We need to be
>    very cautious in making changes to it.

And we need many eyeballs.
(I've tried to understand how the jbd works several times.
 But I always failed.)

> b) A relatively large number of people are carrying quite large
>    out-of-tree patches, some of which they're hoping to merge sometime. 
>    Admittedly more against ext3 than JBD, but there is potential here to
>    cause those people trouble.
> 
> Plus the switch to list_heads in journal_s has some impact on type safety
> and debuggability - I considered doing it years ago but decided not to
> because I found I _used_ those pointers fairly commonly in development. 
> list_heads are a bit of a pain in gdb (kgdb and kernel core dumps), for
> example.

About the debuggability of list_heads, how about adding the kind of
the following gdb macros in .gdbinit?

---

define list_entry
	set $ptr=$arg0
	p ($arg1 *)((char *)$ptr - (size_t) &(($arg1 *)0)->$arg2)
end

define list_entry_s
	set $ptr=$arg0
	p (struct $arg1 *)((char *)$ptr - (size_t) &((struct $arg1 *)0)->$arg2)
end

define to_journal_head
	list_entry_s $arg0 journal_head b_list
end



From akpm at osdl.org  Sat Sep 10 21:58:48 2005
From: akpm at osdl.org (Andrew Morton)
Date: Sat, 10 Sep 2005 14:58:48 -0700
Subject: [PATCH 0/6] jbd cleanup
In-Reply-To: <20050910145525.GB7593@miraclelinux.com>
References: <20050909084214.GB14205@miraclelinux.com>
	<20050909021522.1a271e4b.akpm@osdl.org>
	<20050910145525.GB7593@miraclelinux.com>
Message-ID: <20050910145848.51881e61.akpm@osdl.org>

Akinobu Mita <mita at miraclelinux.com> wrote:
>
> On Fri, Sep 09, 2005 at 02:15:22AM -0700, Andrew Morton wrote:
> > Akinobu Mita <mita at miraclelinux.com> wrote:
> > >
> > > The following 6 patches cleanup the jbd code and kill about 200 lines. 
> > >
> > 
> > Thanks, but I'm not inclined to apply them.
> > 
> > a) Maybe 70-80% of the Linux world uses this filesystem.  We need to be
> >    very cautious in making changes to it.
> 
> And we need many eyeballs.

True.  And the only way to really learn code is to make changes to it.

> (I've tried to understand how the jbd works several times.
>  But I always failed.)

It's very hard to reverse engineer the high-level design concepts from the
implementation.  And the design concepts in JBD are really complex, which
is a problem fo us.

When I first had to learn the thing 4-5 years back I sat down for a solid
week and wrote a 40-odd page how-it-works document for myself, just to
force it into my head.  It was probably about 50% accurate, but it was a
useful exercise.

> About the debuggability of list_heads, how about adding the kind of
> the following gdb macros in .gdbinit?
> 
> ---
> 
> define list_entry
> 	set $ptr=$arg0
> 	p ($arg1 *)((char *)$ptr - (size_t) &(($arg1 *)0)->$arg2)
> end
> 
> define list_entry_s
> 	set $ptr=$arg0
> 	p (struct $arg1 *)((char *)$ptr - (size_t) &((struct $arg1 *)0)->$arg2)
> end
> 
> define to_journal_head
> 	list_entry_s $arg0 journal_head b_list
> end

Here's mine ;)

# list_entry list type member
define list_entry
	set $off = (int)&(((struct $arg1 *)0)->$arg2)
	set $addr = (int)$arg0
	set $res = $addr - $off
	printf "0x%x\n", (struct $arg1 *)$res
end



From myLC at gmx.net  Sat Sep 17 15:27:18 2005
From: myLC at gmx.net (myLC at gmx.net)
Date: Sat, 17 Sep 2005 17:27:18 +0200
Subject: turning off journaling on the fly?
Message-ID: <432C35D6.9070402@gmx.net>

Dear penguin lovers, =)

I'm running Linux (2.6) on a satellite receiver with
harddrive. The latter is formated in ext3.
So far, everthing works fine.

Now here's the small problem:
The receiver is in the same room I sleep in and when it
records at night time I can hear the journaling going on
(heads clicking - even though the HD is set to silent mode
via hdparm). This is surely due to the journaling as during
playback the heads remain quiet.

Is there a way to disable journaling on the fly (some option
in /sys)?
Or can I remount the harddisk from ext3 to ext2 on the fly -
and does this work when it's being written to?
And - last but not least - would the solution (if there is
any) be "riskless"?


Thank you very much for any help!
                                      myLC at gmx.net

PS.: I'm not subscribed to the mailing list,
      thus I can only read direct replies.



From evilninja at gmx.net  Mon Sep 19 13:35:15 2005
From: evilninja at gmx.net (evilninja)
Date: Mon, 19 Sep 2005 15:35:15 +0200
Subject: turning off journaling on the fly?
In-Reply-To: <432C35D6.9070402@gmx.net>
References: <432C35D6.9070402@gmx.net>
Message-ID: <432EBE93.1060509@gmx.net>

myLC at gmx.net schrieb:
> Is there a way to disable journaling on the fly (some option
> in /sys)?

i'm not aware of such a switch, but two things come into my mind:

1) there is a mount option "commit":

 "commit=nrsec
     Sync  all  data  and  metadata  every nrsec seconds. The default
     value is 5 seconds. Zero means default.

2) the laptop-mode module [1]

both should reduce disk-activity by 1) enlarging the commit-interval
and 2) by grouping write activities.

you can turn off the journal with tune2fs(8).

hth,
Christian.

[1] http://www.xs4all.nl/~bsamwel/laptop_mode/
-- 
BOFH excuse #78:

Yes, yes, its called a design limitation



From camilo at mesias.co.uk  Tue Sep 20 11:47:41 2005
From: camilo at mesias.co.uk (Cam)
Date: Tue, 20 Sep 2005 12:47:41 +0100
Subject: ext3 incompatability between linux 2.4/ppc and linux 2.6/x86
Message-ID: <432FF6DD.5000803@mesias.co.uk>

Hi,

I'm using ext3 filesystems in embedded devices (storage is on 512Mb or 
1Gb CF cards). A typical development cycle would see the filesystem 
created on the desktop PC running linux 2.4 (eg. RedHat 9). The CF card 
would be installed in the hardware and linux 2.4 (eg. Montavista Pro 
3.1, on PPC) would boot from the CF.

Recently I tried a linux 2.6 desktop (CentOS) for the same task and 
found problems. Specifically the embedded device won't boot from the CF 
anymore. Since we use several partitions it's possible to boot from an 
old partition. We can then mount the new partition but attempts to write 
to it fail and the partition becomes RO mounted. Here are the logs 
associated with those operations:

boot:

kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 212k init
                                        ?attempt to access beyond end of 
device
03:02: rw=0, want=841835629, limit=151200
attempt to access beyond end of device
03:02: rw=0, want=841835629, limit=151200
Kernel panic: No init found.  Try passing init= option to kernel.
  <0>Rebooting in 180 seconds..


mount/write:

e2fsck 1.35 (28-Feb-2004)
/dev/hda2 has gone 36663 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/hda2: 2297/37848 files (1.9% non-contiguous), 101563/151200 blocks
...
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
/dev/hda2 on /file-system/root2 type ext3 (rw,noatime,errors=remount-ro)
...
# rm -rf /file-system/root2/*
EXT3-fs error (device ide0(3,2)): ext3_free_blocks: Freeing blocks not 
in datazone - block = 1752392034, count = 1
Aborting journal on device ide0(3,2).
Remounting filesystem read-only
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_jdEXT3-fs error (device ide0(3,2)) in ext3_truncate: Journal has 
aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_jdEXT3-fs error (device ide0(3,2)) in ext3_orphan_del: Journal 
has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in 
__ext3_jdEXT3-fs error (device ide0(3,2)) in ext3_delete_inode: Journal 
has aborted
rm: cannot unlink `/file-system/root2/bin/chroot': Read-only file system
rm: cannot unlink `/file-system/root2/bin/run-parts': Read-only file system
rm: cannot unlink `/file-system/root2/bin/tempfile': Read-only file system


Looking at the versions, on the 2.4 desktop I have e2fsprogs-1.32-6, on 
embedded I have e2fsprogs-1.27-1. On the 2.6 desktop it's e2fsprogs-1.35-12.

I built e2fsprogs-1.38 for the desktop and the result was the same.

I used dumpe2fs on the working and non-working filesystems and found 
that the newer FS has different features:

< Filesystem features:      has_journal filetype sparse_super
 > Filesystem features:      has_journal resize_inode filetype sparse_super

After writing to a new FS on the desktop a further feature is added,

< Filesystem features:      has_journal resize_inode filetype sparse_super
 > Filesystem features:      has_journal ext_attr resize_inode filetype 
sparse_super

I'm not convinced the features are relevant though because if I mkfs 
with -O to restrict the features, the result is the same. I wonder if it 
could be an endianness issue?

What should I do to investigate this further? Are there known 
incompatabilities with ext3 between different kernels? And are there any 
tricks I can use in 2.6 to make a 2.4 compatible filesystem?

Thanks in advance for any help,

-Cam

-- 
camilo at mesias.co.uk                                                 <--



From adilger at clusterfs.com  Tue Sep 20 13:26:18 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 20 Sep 2005 07:26:18 -0600
Subject: ext3 incompatability between linux 2.4/ppc and linux 2.6/x86
In-Reply-To: <432FF6DD.5000803@mesias.co.uk>
References: <432FF6DD.5000803@mesias.co.uk>
Message-ID: <20050920132618.GI12946@schatzie.adilger.int>

On Sep 20, 2005  12:47 +0100, Cam wrote:
> Looking at the versions, on the 2.4 desktop I have e2fsprogs-1.32-6, on 
> embedded I have e2fsprogs-1.27-1. On the 2.6 desktop it's e2fsprogs-1.35-12.
> 
> I built e2fsprogs-1.38 for the desktop and the result was the same.
> 
> I used dumpe2fs on the working and non-working filesystems and found 
> that the newer FS has different features:
> 
> < Filesystem features:      has_journal filetype sparse_super
> > Filesystem features:      has_journal resize_inode filetype sparse_super

The resize_inode feature is relatively new, but _should_ be harmless for
a kernel that doesn't understand it (it is just a file in the filesystem).
That said, it is quite unlikely that you will ever need this for embedded
systems, so you can turn it off at mke2fs time or afterward with tune2fs
with "-O ^resize_inode".

> After writing to a new FS on the desktop a further feature is added,
> 
> < Filesystem features:      has_journal resize_inode filetype sparse_super
> > Filesystem features:      has_journal ext_attr resize_inode filetype 
> sparse_super

The ext_attr feature is probably from selinux.  This can be a problem
for older kernels (quite sadly, as there is a "feature" which slipped
in under the radar).  The problem is that selinux added support for
EAs on symlinks, but this confuses older kernels into thinking that a
fast symlink (stored in the inode) has an external block and is (wrongly)
considered a slow symlink.  The older kernel then tries to decode the
EA as a symlink.  I don't know if this is causing your problem though.

I'm not sure if there is some way to prevent selinux from tagging all
of the files in the filesystem or not (e.g. mount option or other).

There is a trivial change to the ext3 code to fix this for your embedded
platform - add ext3_inode_is_fast_symlink() to check for i_file_acl and
change ext3_read_inode() to use this instead of just checking i_blocks).

> I'm not convinced the features are relevant though because if I mkfs 
> with -O to restrict the features, the result is the same. I wonder if it 
> could be an endianness issue?

Note that in newer e2fsprogs you need to use "mke2fs -O none -O {features}"
to clear the default feature set.  Also, it isn't clear whether this will
prevent selinux from enabling the ext_attr feature.

I would initially suspect an endian issue, but none of the values printed
in the error messages appear to be byte-swapped values.  They instead look
like ASCII values (e.g. "md-2" and "bash").

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From camilo at mesias.co.uk  Tue Sep 20 14:26:02 2005
From: camilo at mesias.co.uk (Cam)
Date: Tue, 20 Sep 2005 15:26:02 +0100
Subject: ext3 incompatability between linux 2.4/ppc and linux 2.6/x86
In-Reply-To: <20050920132618.GI12946@schatzie.adilger.int>
References: <432FF6DD.5000803@mesias.co.uk>
	<20050920132618.GI12946@schatzie.adilger.int>
Message-ID: <43301BFA.4030701@mesias.co.uk>

Andreas

Thanks for the prompt and informative reply. It looks like this is a 
'known fault'.

> The ext_attr feature is probably from selinux.
[...]
> I'm not sure if there is some way to prevent selinux from tagging all
> of the files in the filesystem or not (e.g. mount option or other).

Strangely google for "selinux mount ext_attr disable" gave a bugzilla
entry as first result:

  https://bugzilla.redhat.com/bugzilla/long_list.cgi?buglist=137068

> There is a trivial change to the ext3 code to fix this for your embedded
> platform - add ext3_inode_is_fast_symlink() to check for i_file_acl and
> change ext3_read_inode() to use this instead of just checking i_blocks).

Unfortunately a change to the embedded system in the field is
unattractive at the moment.

> Note that in newer e2fsprogs you need to use "mke2fs -O none -O {features}"
> to clear the default feature set.  Also, it isn't clear whether this will
> prevent selinux from enabling the ext_attr feature.

It doesn't, although disabling selinux is effective. Using your mke2fs 
and disabling selinux is a good workaround.

> none of the values printed
> in the error messages appear to be byte-swapped values.  They instead look
> like ASCII values (e.g. "md-2" and "bash").

I see your point. I missed that but will check in future!

Thanks again,

-Cam

-- 
camilo at mesias.co.uk                                                 <--



From tytso at mit.edu  Tue Sep 20 21:25:55 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Tue, 20 Sep 2005 17:25:55 -0400
Subject: turning off journaling on the fly?
In-Reply-To: <432C35D6.9070402@gmx.net>
References: <432C35D6.9070402@gmx.net>
Message-ID: <20050920212555.GC6179@thunk.org>

On Sat, Sep 17, 2005 at 05:27:18PM +0200, myLC at gmx.net wrote:
> Dear penguin lovers, =)
> 
> I'm running Linux (2.6) on a satellite receiver with
> harddrive. The latter is formated in ext3.
> So far, everthing works fine.
> 
> Now here's the small problem:
> The receiver is in the same room I sleep in and when it
> records at night time I can hear the journaling going on
> (heads clicking - even though the HD is set to silent mode
> via hdparm). This is surely due to the journaling as during
> playback the heads remain quiet.

Something on your system must be causing writes to the filesystem;
turning off journaling might lower the total number of writes, but it
won't make this problem go away altogether.

I'd check /var/log to see what might be causing log messages (the most
likely cause) and see if you can disable or lower the syslog threshold
so that they don't get written to disk.

						- Ted



From eric at lammerts.org  Thu Sep 22 04:53:44 2005
From: eric at lammerts.org (Eric Lammerts)
Date: Thu, 22 Sep 2005 00:53:44 -0400 (EDT)
Subject: repeated crashes
Message-ID: <Pine.LNX.4.61.0509220032310.21513@ally.lammerts.org>


Hello,
I've got a problem that is not solved after an e2fsck.
What happens is that the kernel (vanilla 2.6.12) does this:

journal_bmap: journal block not found at offset 1036 on hda6
Aborting journal on device hda6.
ext3_abort called.

The filesystem is mounted with errors=panic, so the system reboots. At 
boot-up an e2fsck is run on /dev/hda6. Sometimes it finds errors, 
sometimes not. Example:

e2fsck 1.35 (28-Feb-2004)
data: recovering journal
data contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #73 (26, counted=0).
Fix? yes
Free blocks count wrong for group #74 (5071, counted=667).
Fix? yes
Free blocks count wrong for group #75 (3585, counted=2844).
Fix? yes
Free blocks count wrong (1503376, counted=1498205).
Fix? yes
data: ***** FILE SYSTEM WAS MODIFIED *****
data: 1960/1343488 files (34.2% non-contiguous), 1186650/2684855 
blocks

But soon after that, the same kernel message happens again.
I've also tried a newer e2fsck, from the e2fsck-static 1.38-2 Debian 
package, but that one didn't solve the problem either.

Dumpe2fs output:

# dumpe2fs -h /dev/hda6
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:   data
Last mounted on:          <not available>
Filesystem UUID:          beb02481-d5a9-40b3-8d25-ff412629b14b
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype needs_recovery sparse_super
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1343488
Block count:              2684855
Reserved block count:     134242
Free blocks:              1550359
Free inodes:              1341562
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Wed Jan  2 22:35:26 2002
Last mount time:          Thu Sep 22 00:16:41 2005
Last write time:          Thu Sep 22 00:16:41 2005
Mount count:              1
Maximum mount count:      -1
Last checked:             Thu Sep 22 00:16:40 2005
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      a1a3ccb8-023e-41ec-8af1-b2221c8da6b4
Journal backup:           inode blocks

Then when I look at the journal inode:

# debugfs /dev/hda6
debugfs 1.35 (28-Feb-2004)
debugfs:  stat <8>
Inode: 8   Type: regular    Mode:  0600   Flags: 0x0   Generation: 0
User:     0   Group:     0   Size: 33554432
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 8304
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x3c33d186 -- Wed Jan  2 22:35:34 2002
atime: 0x00000000 -- Wed Dec 31 19:00:00 1969
mtime: 0x3c33d186 -- Wed Jan  2 22:35:34 2002
BLOCKS:
(0-11):521-532, (IND):533, (12-1035):534-1557, (DIND):1558
TOTAL: 1038

debugfs:  bmap <8> 1035
1557
debugfs:  bmap <8> 1036
0

It seems a lot of blocks are not allocated! That is wrong, isn't it? 
Shouldn't e2fsck repair this then?

Eric



From adilger at clusterfs.com  Thu Sep 22 05:26:54 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 21 Sep 2005 23:26:54 -0600
Subject: repeated crashes
In-Reply-To: <Pine.LNX.4.61.0509220032310.21513@ally.lammerts.org>
References: <Pine.LNX.4.61.0509220032310.21513@ally.lammerts.org>
Message-ID: <20050922052654.GJ6289@schatzie.adilger.int>

On Sep 22, 2005  00:53 -0400, Eric Lammerts wrote:
> journal_bmap: journal block not found at offset 1036 on hda6
> Aborting journal on device hda6.
> ext3_abort called.
> 
> The filesystem is mounted with errors=panic, so the system reboots. At 
> boot-up an e2fsck is run on /dev/hda6. Sometimes it finds errors, 
> sometimes not. Example:
> 
> e2fsck 1.35 (28-Feb-2004)
> data: recovering journal
> data contains a file system with errors, check forced.
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong for group #73 (26, counted=0).
> Fix? yes
> Free blocks count wrong for group #74 (5071, counted=667).
> Fix? yes
> Free blocks count wrong for group #75 (3585, counted=2844).
> Fix? yes
> Free blocks count wrong (1503376, counted=1498205).
> Fix? yes
> data: ***** FILE SYSTEM WAS MODIFIED *****
> data: 1960/1343488 files (34.2% non-contiguous), 1186650/2684855 
> blocks
> 
> But soon after that, the same kernel message happens again.
> I've also tried a newer e2fsck, from the e2fsck-static 1.38-2 Debian 
> package, but that one didn't solve the problem either.

This sounds a LOT like your disk is going bad.  Having e2fsck fix problems
like this, then immediately getting errors again is something I've seen
in the past and it turned out that the disk was flaky.  Try running
"badblocks" on the disk in non-destructive write mode and see what it finds.
I'd strongly recommend a backup at this point if you don't already have it.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From maillists at hosttuls.com  Fri Sep 23 21:43:29 2005
From: maillists at hosttuls.com (Brandon Evans)
Date: Fri, 23 Sep 2005 14:43:29 -0700
Subject: 17G File size limit?
Message-ID: <43347701.50603@hosttuls.com>

Hi everyone,
   This is a strange problem I have been having.  I'm not sure where the 
problem is, so I figured I'd start here.

I as having problems with Bacula stopping on 17Gig Volume sizes, so I 
decided to try to Just dd a 50 gig file.  Sure enough, once the file hit 
17 gigs dd stopped and spit out an error


(pandora bacula)# dd if=/dev/zero of=bigfile bs=1M count=50000
File size limit exceeded
(pandora bacula)#


(pandora bacula)# ll
total 20334813
-rw-r--r--  1 root root 17247252480 Sep 23 00:44 bigfile
-rw-r-----  1 root root   302323821 Sep 23 01:10 Default-0001
-rw-r-----  1 root root   156637059 Sep 18 01:08 Diff-wi0001
-rw-r-----  1 root root    46985831 Sep  6 19:38 Full-0001
-rw-r-----  1 root root    47126293 Sep  7 14:39 Full-0002
-rw-r-----  1 root root  2841621607 Sep 13 17:11 Full-wi0001
-rw-r-----  1 root root     1584252 Sep 18 01:05 Inc-0001
-rw-r-----  1 root root    97963834 Sep 14 01:05 Inc-wi0001

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda2             9.7G  5.8G  3.4G  64% /
/dev/hda1              99M   20M   75M  21% /boot
/dev/hda4             102G  2.2G   94G   3% /home
/dev/md2              221G   90G  120G  43% /mnt/storage
none                 1014M     0 1014M   0% /dev/shm
/dev/mapper/lvg01-coraid
                       812G  693G  114G  86% /mnt/coraid



There are a few layers on this partation, so I figured I'd start at the 
top with you guys and work my way down.  The partation this size limit 
is on looks like so...

/mnt/coraid
+--------+
| Ext3   |
+--------+
| LVM 2  |
+--------+
| Raid 5 |
+--------+


So any one of these layers could be the problem.  I was able to create a 
100 Gig file on the /home partition, so perhaps ext3 is not the problem, 
but I'm really not sure.


The system is CentOs 4.1 running 2.6.13.2 (also tried 2.6.12.2)

Any insight would be great

-- 

Thanks,
     Brandon Evans

  "I wouldn't recommend sex, drugs or insanity for everyone, but they've 
always worked for me."
-Hunter S. Thompson



From matts at ksu.edu  Fri Sep 23 21:50:29 2005
From: matts at ksu.edu (Matt Stegman)
Date: Fri, 23 Sep 2005 16:50:29 -0500 (CDT)
Subject: 17G File size limit?
In-Reply-To: <43347701.50603@hosttuls.com>
Message-ID: <Pine.GSO.4.44L.0509231648590.23379-100000@unix2.cc.ksu.edu>

What does "ulimit -a" report for your maximum allowed file size?  Could
you have limited yourself somehow?

-- 
Matt Stegman

On Fri, 23 Sep 2005, Brandon Evans wrote:

> Hi everyone,
>    This is a strange problem I have been having.  I'm not sure where the
> problem is, so I figured I'd start here.
>
> I as having problems with Bacula stopping on 17Gig Volume sizes, so I
> decided to try to Just dd a 50 gig file.  Sure enough, once the file hit
> 17 gigs dd stopped and spit out an error
>
>
> (pandora bacula)# dd if=/dev/zero of=bigfile bs=1M count=50000
> File size limit exceeded
> (pandora bacula)#
>
>
> (pandora bacula)# ll
> total 20334813
> -rw-r--r--  1 root root 17247252480 Sep 23 00:44 bigfile
> -rw-r-----  1 root root   302323821 Sep 23 01:10 Default-0001
> -rw-r-----  1 root root   156637059 Sep 18 01:08 Diff-wi0001
> -rw-r-----  1 root root    46985831 Sep  6 19:38 Full-0001
> -rw-r-----  1 root root    47126293 Sep  7 14:39 Full-0002
> -rw-r-----  1 root root  2841621607 Sep 13 17:11 Full-wi0001
> -rw-r-----  1 root root     1584252 Sep 18 01:05 Inc-0001
> -rw-r-----  1 root root    97963834 Sep 14 01:05 Inc-wi0001
>
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/hda2             9.7G  5.8G  3.4G  64% /
> /dev/hda1              99M   20M   75M  21% /boot
> /dev/hda4             102G  2.2G   94G   3% /home
> /dev/md2              221G   90G  120G  43% /mnt/storage
> none                 1014M     0 1014M   0% /dev/shm
> /dev/mapper/lvg01-coraid
>                        812G  693G  114G  86% /mnt/coraid
>
>
>
> There are a few layers on this partation, so I figured I'd start at the
> top with you guys and work my way down.  The partation this size limit
> is on looks like so...
>
> /mnt/coraid
> +--------+
> | Ext3   |
> +--------+
> | LVM 2  |
> +--------+
> | Raid 5 |
> +--------+
>
>
> So any one of these layers could be the problem.  I was able to create a
> 100 Gig file on the /home partition, so perhaps ext3 is not the problem,
> but I'm really not sure.
>
>
> The system is CentOs 4.1 running 2.6.13.2 (also tried 2.6.12.2)
>
> Any insight would be great
>
>



From maillists at hosttuls.com  Fri Sep 23 22:05:53 2005
From: maillists at hosttuls.com (Brandon Evans)
Date: Fri, 23 Sep 2005 15:05:53 -0700
Subject: 17G File size limit?
In-Reply-To: <Pine.GSO.4.44L.0509231648590.23379-100000@unix2.cc.ksu.edu>
References: <Pine.GSO.4.44L.0509231648590.23379-100000@unix2.cc.ksu.edu>
Message-ID: <43347C41.7060605@hosttuls.com>

Matt Stegman wrote:
> What does "ulimit -a" report for your maximum allowed file size?  Could
> you have limited yourself somehow?
> 

ulimit -a shows

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 16383
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16383
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


-- 

Thanks,
     Brandon Evans

  "I wouldn't recommend sex, drugs or insanity for everyone, but they've 
always worked for me."
-Hunter S. Thompson



From Richard.Wolber at boeing.com  Fri Sep 23 23:11:27 2005
From: Richard.Wolber at boeing.com (EXT-Wolber, Richard)
Date: Fri, 23 Sep 2005 16:11:27 -0700
Subject: Unmounted File Handle
Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB992004EB@XCH-NW-5V2.nw.nos.boeing.com>

Is it practical to get a R/W file handle opened against an existing file
on an unmounted ext2 filesystem?

--
Chuck Wolber
Electronic Flight Bag
Crew Information Systems/ Linux Wonk
253.576.1154

"You can't connect the dots looking forward; 
you can only connect them looking backwards." 
		--Steve Jobs




From matts at ksu.edu  Sat Sep 24 17:23:09 2005
From: matts at ksu.edu (Matt Stegman)
Date: Sat, 24 Sep 2005 12:23:09 -0500 (CDT)
Subject: 17G File size limit?
In-Reply-To: <43347C41.7060605@hosttuls.com>
Message-ID: <Pine.GSO.4.44L.0509241206570.23379-100000@unix2.cc.ksu.edu>

Hmm, OK.  The only place I've ever seen that error was when ulimited.
Have you looked through the system logs for error messages?  Does "dmesg"
report anything that might be related?  I notice that you've got three
volumes with this much free space available.  Do you get the same results
on all three volumes?  Are they all ext3 filesystems?

-- 
Matt Stegman

On Fri, 23 Sep 2005, Brandon Evans wrote:

> Matt Stegman wrote:
> > What does "ulimit -a" report for your maximum allowed file size?  Could
> > you have limited yourself somehow?
> >
>
> ulimit -a shows
>
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> file size               (blocks, -f) unlimited



From tytso at mit.edu  Sat Sep 24 19:52:16 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Sat, 24 Sep 2005 15:52:16 -0400
Subject: 17G File size limit?
In-Reply-To: <43347701.50603@hosttuls.com>
References: <43347701.50603@hosttuls.com>
Message-ID: <20050924195216.GA6443@thunk.org>

On Fri, Sep 23, 2005 at 02:43:29PM -0700, Brandon Evans wrote:
> Hi everyone,
>   This is a strange problem I have been having.  I'm not sure where the 
> problem is, so I figured I'd start here.
> 
> I as having problems with Bacula stopping on 17Gig Volume sizes, so I 
> decided to try to Just dd a 50 gig file.  Sure enough, once the file hit 
> 17 gigs dd stopped and spit out an error
> 
> (pandora bacula)# dd if=/dev/zero of=bigfile bs=1M count=50000
> File size limit exceeded
> (pandora bacula)#
> 
> (pandora bacula)# ll
> total 20334813
> -rw-r--r--  1 root root 17247252480 Sep 23 00:44 bigfile

If you are using a 1k filesystem, then a file can consist of ten
direct blocks, plus 256 data blocks addressed via the indirect block,
plus 256*256 data blocks addressed from the indirect block, plus
256*256*256 data blocks from the triple-indirect block:

(10 + 256 + 256*256 + 256*256*256) * 1024 = 17247252480

Does that number look familiar?  So the problem is that you created
the file system using a 1k blocksize.  Filesystems with a 1k blocksize
are horribly inefficient for large files, and they max out at a little
over a little over 16 gigabytes.  (Note that 16 gigs is 17179869184
bytes, unless you are a disk drive company in which case your
marketing department calls it 17 gigs.  :-)

							- Ted








> -rw-r-----  1 root root   302323821 Sep 23 01:10 Default-0001
> -rw-r-----  1 root root   156637059 Sep 18 01:08 Diff-wi0001
> -rw-r-----  1 root root    46985831 Sep  6 19:38 Full-0001
> -rw-r-----  1 root root    47126293 Sep  7 14:39 Full-0002
> -rw-r-----  1 root root  2841621607 Sep 13 17:11 Full-wi0001
> -rw-r-----  1 root root     1584252 Sep 18 01:05 Inc-0001
> -rw-r-----  1 root root    97963834 Sep 14 01:05 Inc-wi0001
> 
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/hda2             9.7G  5.8G  3.4G  64% /
> /dev/hda1              99M   20M   75M  21% /boot
> /dev/hda4             102G  2.2G   94G   3% /home
> /dev/md2              221G   90G  120G  43% /mnt/storage
> none                 1014M     0 1014M   0% /dev/shm
> /dev/mapper/lvg01-coraid
>                       812G  693G  114G  86% /mnt/coraid
> 
> 
> 
> There are a few layers on this partation, so I figured I'd start at the 
> top with you guys and work my way down.  The partation this size limit 
> is on looks like so...
> 
> /mnt/coraid
> +--------+
> | Ext3   |
> +--------+
> | LVM 2  |
> +--------+
> | Raid 5 |
> +--------+
> 
> 
> So any one of these layers could be the problem.  I was able to create a 
> 100 Gig file on the /home partition, so perhaps ext3 is not the problem, 
> but I'm really not sure.
> 
> 
> The system is CentOs 4.1 running 2.6.13.2 (also tried 2.6.12.2)
> 
> Any insight would be great
> 
> -- 
> 
> Thanks,
>     Brandon Evans
> 
>  "I wouldn't recommend sex, drugs or insanity for everyone, but they've 
> always worked for me."
> -Hunter S. Thompson
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users



From tytso at mit.edu  Sun Sep 25 02:19:59 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Sat, 24 Sep 2005 22:19:59 -0400
Subject: Unmounted File Handle
In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB992004EB@XCH-NW-5V2.nw.nos.boeing.com>
References: <8C7C41A176AC0B468BEFB2EFD9BDAB992004EB@XCH-NW-5V2.nw.nos.boeing.com>
Message-ID: <20050925021959.GA19847@thunk.org>

On Fri, Sep 23, 2005 at 04:11:27PM -0700, EXT-Wolber, Richard wrote:
> Is it practical to get a R/W file handle opened against an existing file
> on an unmounted ext2 filesystem?

What do you mean by a "read/write file handle"?  

Do you mean opening a file descriptor using the open(2) system call?
Do you mean opening a stdio stream handle using the fopen(3) library
call?  In either case, no, you can can only open() or fopen() a file
on a mounted filesystem, and it doesn't matter which filesystem you
are using.

There are a set of interfaces as part of the ext2fs library which
would allow you to manipulate a file on an unmounted filesystem.

						- Ted