ie8 fix

Between the Lines

Larry Dignan, Andrew Nusca and Rachel King

Why writing a Windows compatible file server is (still) hard

By | August 4, 2009, 3:00am PDT

Summary: Sometimes I encounter a coding problem so intransigent that fixing it is a triumph worth sharing with the world. Have I mentioned how much I hate Microsoft Excel? Welcome to a day in the life of a network engineer.

[The opinions expressed here are mine alone, and not those of Google, Inc. my current employer.]

I don’t often write about my day to day work, but sometimes I run across a problem that is so intransigent that it was a triumph when I finally fixed it. If you take an engineering job in the software industry, this is the kind of thing you might end up working on. If you find this column fun and interesting, then you might be a good candidate for a network engineer. Even if you don’t, I hope you’ll appreciate the insane level of detail network engineers have to know on your behalf, to make something as simple as “saving a file” work seamlessly across operating systems.

One of the remedies imposed on Microsoft after they lost the European Union workgroup-server antitrust case was the requirement to publish the full specifications for third-party software to interoperate with their operating systems. They are still in the process of doing this, but there are now thousands of pages of documentation out there, in theory fully specifying the Server Message Block/Common Internet File System (SMB/CIFS) protocol that Samba and Windows file servers implement. So surely anyone and their auntie (assuming your auntie is a network engineer :-) can now write their own SMB/CIFS server by just reading this copious documentation. After all, now that it’s all documented, how hard can it be ?

A bug I fixed this week illustrates why I still think Samba is the leading choice for interoperability between Windows and Linux/UNIX systems. It concerns a strange tale of Microsoft Office and the “Offline Files” remote synchronization feature. “Offline Files” in Microsoft Windows allows a user to save a version of a file they’re working on from a remote file system on their local laptop, and have it re-synchronized to a server when they get back online.

A user of Samba reported a bug that showed conclusively that trying to synchronize a Microsoft Office file against a Samba server wasn’t working. The Windows client “Sync Center” application kept telling the user that the file on the remote Samba disk had been changed since it was saved, and he knew this wasn’t the case.

It got stranger. It only happened with Vista, not with XP or Windows 2003. It only happened with Microsoft Office 2003 (all other versions of Office worked fine). It only reliably happened with Microsoft Excel, no other Microsoft Office application. Have I mentioned how much I hate Microsoft Excel ? I quake in fear whenever I see an Excel interoperability bug logged against Samba. That application is perverse in the things it will do to a remote file server.

I looked at my nice new shiny downloaded Microsoft documentation. There was nothing related to this problem in there. The document describing the precise behavior of an NTFS filesystem as seen over the wire from an SMB/CIFS server is yet to be finished. They’re still working on it. OK, so let me check what happens when you use this version of Excel to do the very same thing against Windows. Maybe it’s a real bug that fails against a Microsoft file server too; stranger things have been known. No, it worked fine against a Windows 2003 server, which to be honest did not surprise me. Microsoft tests the hell out of Microsoft Office before shipping any software that interacts with it in any way.

Time to get out the big guns. A debug log from Samba at our highest logging level, and a network packet capture trace (using the Open Source software “wireshark”) of when the problem was happening. Looking at the log didn’t show any obvious errors, other than the fact that Excel does an insane number of operations over the network to do something as simple as a “Save File” (if you’ve ever wondered why Excel is slow, look at what it does over a network). A brief glance at the network capture trace didn’t help either, everything looked fine except that on the save operation to the Samba server, Excel strangely decided to abort half way through.

This was getting more interesting. It seemed to be a generic failure of the “Save” operation, nothing to do with the “Sync” feature at all. So let’s test saving an Excel file against a Samba share without the “Sync” feature turned on in the client. Surely this must work, we also never ship a version of Samba without testing against Microsoft Office. Yes indeed, a normal save worked fine. So it was something to do with the “Sync” feature. But what could it be ?

The only thing to do was to do a second wireshark trace from the client to a Windows 2003 server, and then compare the two packet traces, the “bad” against the “good”, packet by packet.

Except of course it’s not that easy (nothing in Windows interoperability ever is :-). Due to the differences in response times between servers, slight differences in supported features, and of course the fact that the Samba architecture is completely different from that of the Windows CIFS server, the packet streams soon become very different. But after you’ve been doing this work for 17 years, you start to recognize the fingerprints of the broad actions that clients are trying to do, even with a protocol as chatty on the network as SMB/CIFS.

It took a couple of weeks of staring at the packet traces, on and off, but I eventually narrowed it down to a difference once Excel had written a temporary file out to the remote disk. Things started to be very different (and obviously wrong) at that exact point. So I started to look at the packets very closely.

The client was trying to set a “created” time stamp, to make the temporary file pretend to have been created at exactly the time as the original file. Now one of the interesting things in writing Samba is that is has to run on top of POSIX. A POSIX system is very different from Windows, so one of the challenges we have is to be able to emulate the different Windows features on top of standard POSIX.

A POSIX file system doesn’t have a “create” time stamp, so when we’re reporting back to Windows when a file was created, we have to look at all the available time stamps from the system, and just pick the earliest. This has always worked in the past, but maybe we’d finally run into a situation where we need that exact create time stamp as set by the client.

So I spent part of a day adding a temporary “created” time stamp into Samba, only held in memory. If this worked and fixed the bug I’d then find somewhere to store this on disk (probably in an “extended attribute”).

No, this still didn’t fix it. This was starting to make me very angry as it made no sense. I stared at the packet traces again. Even more closely. Then something jumped out at me.

The SMB/CIFS protocol has a feature where a client can be notified when a change is made on a remote file or directory. It’s called a “change notify”. Normally it’s used to allow a client to discover when another client is modifying the same file system (it’s the reason Windows “Explorer” windows spontaneously refresh with new files if a work colleague modifies the directory you’re looking at).  But even if a client modifies the file itself, the server still must send “change notify” packets to let the client know a file it has just requested to be modified has actually been modified. At the point in the packet stream, just after the create time stamp change was requested, the Windows server was sending a “change notify” packet, but the Samba server was sending the “change notify” after the file was written to instead. It was exactly the same packet, surely that couldn’t be the problem ?

I looked at our code. As POSIX can’t store a created time stamp, if the client requests it to be changed (and no other time stamps) we simply return a success code. But we weren’t sending a “change notify” back after this request, as technically we weren’t changing the time at this point. Instead we were sending it back after the file write, when we were changing the file. So I added code to send the “change notify” back after the time stamp change.

And the bug disappeared!

I went into one of my colleague’s office and kicked the hell out of one of the much loved Google beanbags, all the while screaming obscenities into the air for a good five minutes. He looked on with bemused amusement. I finally calmed down enough to explain the problem. One packet being returned at the wrong time. One single mis-timed packet caused a ripple effect in the Windows client file system software that was seen all the way up in the complex user interface of only that particular version of Excel, when interacting with the “Offline Files” feature, only on Windows Vista.

The remaining task was to add a regression test into our test suite, so that this specific bug is tested for before we release any new versions of Samba. The code isn’t done until it’s properly tested. But at least the user is now happy.

Interoperability with Windows is hard. But somebody has to do it. And if you’re going to do something, you might as well try and do it well (and try and have some fun at the same time :-) .

Stop the press. As I go to publish this, the user still occasionally reports the failure even with the patch, just not as often. Looks like there may be a secondary timing effect in play as well. Oh well, no one can say this job is dull.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
88
Comments

Join the conversation!

Just In

RE: Why writing a Windows compatible file server is (still) hard
dsfwrryd3401-24353672353182640509235579913603 6th Nov
fnulla,good post!
0 Votes
+ -
This article was more chatty...
bjbrock 4th Aug 2009
than SMB/CIFS. I was ready for the answer to the riddle about half way through. But other than that... interesting stuff. It makes me want to test the issue. Thanks.
0 Votes
+ -
Agreed
GuidingLight 4th Aug 2009
I had the feeling he could have gotten to the point in a few less paragraphs.
0 Votes
+ -
the point...
mojorison67@... 8th Aug 2009
The point was: that it took a long time to diagnose the problem and it still wasn't fixed. Writing it like that wouldn't really relate the frustration though would it?
What a waste of time artificially maintaining two different systems and then have to deal w/ the hassle of it.
0 Votes
+ -
easier still, don't use ms office
stevey_d 4th Aug 2009
switch to openoffice.
0 Votes
+ -
Give up a lot to gain a little?
John Zern 4th Aug 2009
Sorry, OO still falls short of MSO
0 Votes
+ -
how?
stevey_d 4th Aug 2009
saying "it's better" isn't much of an argument.
0 Votes
+ -
I use MS office all the time for work. OO (NeoOffice on my Mac) works EXACTLY the same! Down to the keystroke! Love it!

Open Office is the FUTURE! Everyday Janes and Joes WILL use Open Office because it's FREE! And guess what? They will know how to use Microsoft Office for that office job just by using Open Office! Free Training! No Books or high priced MS warez needed!
0 Votes
+ -
But our users would lose a ton of stuff - while an option
TheBottomLineIsAllThatMatters 4th Aug 2009
for simple offices, that doesn't work here in addition to all of the software that provides plug-ins for Excel to work with their tools - Cognos, SAP, Oracle, Oracle Express (yes we still use that too). Open office is just not an option...

0 Votes
+ -
Care to donate to a Windows Users fund?
No More Microsoft Software Ever! Updated - 4th Aug 2009
It may be easier, but it's not cheaper!

You only spell out claptrap. It would be EASIER to pretend to be straight. It would be EASIER to marry a white man/woman. It would be EASIER to vote Republican. It would be EASIER to be a Catholic.

Easy is not right. Right is each human beings right. Time for Microsoft to be right.

Microsoft - We're right if you give us your money!
0 Votes
+ -
The problem is not with Samba, but with layers and layers of legacy code in Microsoft Office that make it hard for Microsoft to keep it compatible with itself, let alone understand how to tell outside parties how to interoperate with it.

I can't say for sure, since I don't work at Microsoft, but I would not be surprised if there is, at some level inside the Excel source code, some file handling structures or routines descended from non-networked Windows or even DOS versions of the program. I doubt that Multiplan code still lurks in there, but it's a possibility.

If the code is similar to other MS code that I have worked with in the past, it was written so that it could do some level of pseudo-threading using messages and a state machine. If a specific message arrives out of order, or fails to arrive when it's expected, the state machine has no action to take, and it hangs or times out.

Ever notice that sometimes Explorer (the desktop, not IE) just stops responding even when apps keep running? Some years ago, I traced this to a missed message locking up the window proc. I reported this to MS, and they told me that I was wrong, and I never heard anything more about it from them. I have since dropped my MSDN membership and am no longer up on Windows programming, so things may have changed since then.

Using Windows as a file server vs. a Linux or Unix server has several serious drawbacks. The Windows network stack (via Winsock) is not as responsive or robust as that available on Linux and Unix systems. Typically, on a given piece of hardware, Linux can service more clients through Samba than Windows Server can, and with better performance. At the same time, the same Linux or Unix system can serve as an NFS server, and be available for other purposes as well.
0 Votes
+ -
Good job
CounterEthicsCommissioner-23034636492738337469105860790963 4th Aug 2009
nt
0 Votes
+ -
Lol
jdbukis@... 4th Aug 2009
So A windows feature is not properly supported in samba.
Looks like they need to get there fingers out.
0 Votes
+ -
ROTFLMAO!
No More Microsoft Software Ever! 4th Aug 2009
So Microsoft creates roadblocks to not use MS warez! Go figure!
0 Votes
+ -
Feature
mojorison67@... 8th Aug 2009
Yeah, this sounds EXACTLY like a MS "feature".
0 Votes
+ -
Blame MS!
winux apple picker 4th Aug 2009
Good Job! Nicely written - you have good career prospects in thriller story writing or something like that! But this isn't anything new in software world - small or large issue, we wont stop until we find the root cause of problem(even after quick-patching).
While reading this, I could also smell that MS-Haters' proximty. Shouldnt they be complaining that it's MS' fault that the particular offline-excel situation doesnt work properly?? Shouldn't you guys be demanding Ballmer's resignation on this single issue?
0 Votes
+ -
Easy solutuion
No_Ax_to_Grind 4th Aug 2009
Run Windows server.
0 Votes
+ -
@No_Ax_to_Grind
Axsimulate Updated - 4th Aug 2009
Running a Windows server can be VERY expensive. Much more than any other server available. All you need to do is ask one question, How many CALs do I need? This the very essence of why there is a push to move away from MS.
0 Votes
+ -
I understand and even agree
No_Ax_to_Grind 4th Aug 2009
But, you can pay the cost up front, or you can have coders spoending vast amounts of time like this. It comes down to making a choice.
0 Votes
+ -
Saves a lot of money when people collaborate on solutions like Samba. It's the Open Source way happy.

Jeremy.
0 Votes
+ -
re coders for free
midcapwarrior@... Updated - 5th Aug 2009
Aren't you paid by google to do this collaboration. You may work cheap but..
Your comapany is subsidizing the cost of Samba support. That not free, it means your work is dependent on their willingness to eat the cost. Maybe they will do it forever maybe not still does not make it free. Problems like these are the least likely to be solved by non-corporate interests. They are not sexy or particularly interesting but necessary.
0 Votes
+ -
No I don't work cheap .
JeremyAllison 4th Aug 2009
But it remains, that if I'm not being paid by *you* to do this then I'm very cheap indeed (for your company). Collaboration is the key. Everyone who works on Samba (or Linux) pool their resources so everyone else gets the benefit.

And for generic file serving software like Samba then anyone who wants to use it benefits.

Jeremy.
0 Votes
+ -
I agree in full
No_Ax_to_Grind 4th Aug 2009
If I can get your work for nothing why not take it?

But back to the point, I would assume that there would not be anything easy in what you are trying to do.
0 Votes
+ -
doesn't make any sense
stevey_d 4th Aug 2009
when you open the hood of your car, and see that it's very badly
engineered, you base your whole fleet on the same product?
Er no,
go for something with open standards. Openoffice, far better.
Or even better still, lose the whole office paradigm. These days best
of class is video / CGI / wiki.
You can upload instruction videos direct from your cellphone to
youtube. Why write stuff?
If you do, write it in an easy to access format like a wiki.
How often do you solve problems by reading manuals - answer -
never. You always google for it.
0 Votes
+ -
I opened my hood
No_Ax_to_Grind 4th Aug 2009
on the old 32 Ford and decided I wanted to use a small block Chevy engine. There was nothing easy about it...
0 Votes
+ -
NT
  • Flagged
"A bug I fixed this week illustrates why I still think Samba is the leading choice for interoperability between Windows and Linux/UNIX systems." Nothing in the rest of your tale supports this statement.
are fighting. They keep adding features and timing dependencies that are really obscure and not necessary. And, of course MS apps for no good reason are made to depend on these obscure details and ordering of events on the file server that they should never depend on. Same kinds of issues with Win32.

The good thing, is this causes a lot of MS engineers endless problems as well.

0 Votes
+ -
You do understand...
wolf_z 4th Aug 2009
...this was an obscure bug caused by an impedience mismatch between Windows and Posix right? Probably Excel 2003 was checking at each step for a confirmation and when it didn't get it, it aborted--like it's supposed to.

It wasn't a "timing" issue, it was a notification issue. Samba wasn't notifying for understandable reasons--but it also wasn't following Windows protocol.

On top of which, Excel 2003 is a 6 year old program that is no longer being sold.

The real issue was Samba was *not* doing something Windows did, because of the underlying Posix structure. This sort of stuff will happen when you try to match disparate systems.

The good news is Samba is now more inline with Windows, *which is the point of the exercise*.

Good job finding a tough bug, Jeremy.
0 Votes
+ -
From what I read in the article, it seems to happen only with Vista and not with XP.
Does that mean that XP was not following this protocol ?

It seems to me, without having all info here, that there was a hiccup situation that happened ONLY with the combination of using Vista and Excel2003 together on the client and Samba at the server.

When using Excel2003 on XP, the problem dit NOT turn up.

Could it be that Vista unintentionally did not follow Windows protocol ?
Was it Excel that did not follow ?
Could it be an undetected bug in Vista ?

So far there is no answer to these questions.
0 Votes
+ -
Wow, that is really
GuidingLight 4th Aug 2009
one stretch of the old imagination.

Blame it all on Microsoft, nothing in the Samba code that was at error. And that entire "timing issue" explanation was an amussing read!

Could it be that the problem they are encountering is that instead of attempting to drag Windows down to the Samba level, they should instead concentrate on having Samba rise to the Windows level?
0 Votes
+ -
Not so imaginary
colinnwn 4th Aug 2009
If this error were occurring on the most recent version of a software package, maybe you could reason it was a "feature" to support a new enhancement. But this is happening on only one OS, in one component of a legacy version of the software.

Regardless of whether this "feature" is a bug, or really does support an enhancement, it shows poor coding practices on Microsoft's part because it increases their testing time, cost, and potential to introduce future bugs inadvertently. Any new enhancement (if that is what it is for) could have been implemented in a more robust way.
0 Votes
+ -
you're not an engineer, are you?
stevey_d 4th Aug 2009
the extemely poor microsoft implementation has passed you by.
  • Flagged
0 Votes
+ -
Have you seen the code?
Joeman57 4th Aug 2009
How do you know it's an extremely poor implementation?
0 Votes
+ -
Or even better
nkahindo 4th Aug 2009
Or even better, slap M$ with another anti-trust law suit, and make them document Windows faster!
0 Votes
+ -
Now there's a plan! [nt]
zkiwi 4th Aug 2009
0 Votes
+ -
I'd look forward to that
Wintel BSOD 4th Aug 2009
grin
0 Votes
+ -
Donniechild is good at that...nt
TheBottomLineIsAllThatMatters 4th Aug 2009
nt
0 Votes
+ -
We do not know this yet.
Did you miss the part that it happened ONLY with the Excel2003-Vista combination .
Excel2003 with XP or Windows 2003 did NOT have this problem.
0 Votes
+ -
No wonder...
gtvr 4th Aug 2009
Windows CIFS sucks over a WAN. The files aren't that big, it's just so much overhead. Maybe MS could fix that instead of spending so much R&D on widgets and toolbars.
0 Votes
+ -
SAMBA Rocks!
SpikeyMike Updated - 4th Aug 2009
"A bug I fixed this week illustrates why I still think Samba is the leading choice for interoperability between Windows and Linux/UNIX systems."

If faster equates to better, then SAMBA does SMB/CIF better than windows does.

Or, more accurately, Linux makes a better File/Print/Application server than Windows does.

We replaced our windows servers with Linux back in '05. Since then, our network has never been faster or more trouble-free. We could not have done that if it weren't for SAMBA.

-Mike

P.S. - As a developer, I have to really appreciate the fact that Jeremy and the rest of the SAMBA team go the extra mile. I think I would have submitted this as a bug for Microsoft to fix. My reasoning is that the same application (MS Office 2003 - Excel) works differently on Vista.
So if the user had just used a Microsoft Windows server none of this would ever have happened? Sounds like a pretty simple solution to me.

Interoperability with Windows is hard. But somebody has to do it.

No they don't. Nobody asked anyone to do it.
0 Votes
+ -
Why even bother?
Mike Cox 4th Aug 2009
Why write a "compatible" file server when you can just buy the latest and greatest from Microsoft? If anything, Samba and all of the other "wannabe" file servers out there just take away from the grandeur and stability of what Microsoft provides. My rep and I disdain open source and feel that the world owes Microsoft a living for all of the wonder that Redmond has brought to the PC world. We hired an admin here once who was a solid MCSE. When he recommended a Samba solution once I terminated him on the spot.
0 Votes
+ -
Terminated?
zkiwi 4th Aug 2009
I think LoveRock would make a better terminator.
0 Votes
+ -
Well, it sucks - it spams the whole network constantly with invalid packets (I know, I sniffed and checked them - don't EVER configure a firewall to drop invalid packets - eventhougsome could cause a buffer overflow somewhere - on a Windows network), it doesn't ensure code page consistency (good luck storing UTF-16 encoded Asian named files on it!), it is still not fully documented and the only full current implementation is neither backward compatible nor is it open - and it costs a LOT.

Then, Excel; here, it abuses the CIFS protocol and doesn't react well (read the bug description: Excel reports that the sync was successful, while in fact changes haven't been committed to disk) to a timing difference.

Do you know what else has timing differences? Thread synchronization on multicore systems. A few years back, many a blockbuster game came out that suddenly reacted badly if you had a multicore system (never mind making use of both cores); so, you had to DISABLE a CPU core to run these softwares (nevermind knowing how to do this) and game software editors got HELL of a bashing because they did programming mistakes as stupid as that: make a thread dependent upon another thread's result in a specific time frame.

Here, the bug COULD have happened on a Windows server too; if the second sync broadcast was, for any reason, intercepted (badly behaved firewall, packet loss, network saturation while QoS rates it as 'low priority', enough that it times out), the Excel user would still have had a notification that his file had been sync'ed, while the server would have committed nothing - who can spell 'data loss'?

Now, to answer people that say "don't bother, why don't you run full Windows?", let's take the following case.

You run a university; you can have as little as 300 students on site, but some times (before exams, etc.) your number of users could be as high as 2500, all connecting simultaneously. Each and every connection to a Windows server must be paid for (Windows Server requires per-connection licenses), and you have 3 servers that can all be accessed simultaneously.

That means, during normal operation, you could do well with 900 connection licenses; your budget covers it. But, at some critical time, you may have 7500 simultaneous connections! What do you do?

- argue your case that you need the budget for the full 7500 connections, while 900 are usually enough?

- buy much less, and warn your students that there will be a round-robin for available connections?

- buy only 900, and find a pretext to crash the network whenever everybody's on premise?

Or, you do that: you run a couple Samba servers, that can serve as little as 300 users and scale up to a dozen thousands for the same price, while broadcasting valid packets (so you could configure your firewalls, bridges and DMZ to intercept and drop invalid packets, reducing buffer overflows in network stacks somewhere).

Or else, you install and run Windows' NFS client to access NFS shares (and configure your firewall... you get the idea)... But MS apps don't know how to deal with NFS shares.

Bummer. Looks like we do need Samba after all.
0 Votes
+ -
You would buy per seat licences, which are the same cost, which allow them to connect to as many servers as they want.
0 Votes
+ -
Oh. And then...?
Mitch 74 9th Aug 2009
You buy 7500 seat licenses, or only 300?
0 Votes
+ -
9.5 ! Mikey! We missed you! You got two fish today!
No More Microsoft Software Ever! 4th Aug 2009
Coat them in flour, sear and serve with raspberry sauce! YUM!
0 Votes
+ -
Re; I terminated him on the spot.
hkommedal 5th Aug 2009
R.I.P.
9.0 this time.
0 Votes
+ -
Saved me a ton of money
terry flores Updated - 4th Aug 2009
Sorry, this is a reply to Loverock's message above:

"No they don't. Nobody asked anyone to do it. "

Yes, we did, and it saved our company a TON of money. Using Linux as not only our LAMP servers for web applications but also for all of our file/print servers was the best decision we ever made.

A heartfelt Thank You to Jeremy and the entire Samba team for their support and diligence in fixing MANY bugs caused by undocumented behaviors of MS file systems.

Is it Microsoft's fault? No, but yes. No, they are certainly free to design their file systems and applications any way they want. But when they failed to comply with court-mandated docs, it did become their fault. And Microsoft can't plead poverty or lack of skills to complete the docs, it's simple mean-spirited and intentional non-cooperation.
0 Votes
+ -
RE: Why writing a Windows compatible file server is (still) hard
dsfwrryd3401-24353672353182640509235579913603 6th Nov
fnulla,good post!

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix