Samba 4: A Case Study (Part 1)

I work in the IT department for the Computer Science Department at Taylor University. We do things a little differently here in terms of our lab machine setup. We dual boot Ubuntu Linux and Windows 7, but we centralize our file serving and domain controlling through Samba on Linux–as opposed to a Windows Active Directory Solution. Currently, we use Samba 3.5.x as our Primary Domain Controller with an OpenLDAP backend to host user accounts. This has been working, but it has issues. Samba 3 was designed for the NT style domain, which is really starting to show its age when compared to modern Active Directory. For instance, to edit a group policy setting, we currently have to manually edit our Windows OS image and push it out to every machine (alternatively, we have an updater script that allows us to do it without re-imaging, but it’s still a pain).

I started following Samba 4 development late last year, which just had its first official release a couple months ago, and I’m now in the process of building a production active directory cluster to replace the Samba 3 server. The official HOWTO is a little skimpy (listed here:, so I thought  I would add some of my experience setting this stuff up. As a disclaimer, I’m no expert, so this just my understanding based on messing around with it. If you notice something that’s not correct, please leave a comment or email me.

System Architecture

Before I go into details, I wanted to give a bit of an overview of the server architecture I went with. Although the Samba team would love for you to use Samba 4 exclusively, my personal experience tells me that it really should have a Windows DC attached to it (two would be better). I guess it depends on your specific needs, but one big reason we’re opting for Samba is the unification of file space and permissions between Linux and Windows.

One key thing to know about Active Directory is that it’s deeply tied into DNS. You can’t query for anything in the domain without a working DNS configuration. For a Linux-only solution, there are two options: use a heavily modified Bind configuration integrated with Samba (which has issues), or just choose the new default internal DNS server built by the Samba team.

Messing with Bind isn’t my cup of tea, and if you’re looking for help there, I’m not your guy. I’m not really excited about using the custom DNS solution either, for a couple reasons. For one, it’s brand new, so it’s a security risk, and it lacks the stability of industry standard DNS solutions. It seems like the Samba team just got fed up hacking Bind to match their needs, so they just wrote their own. That’s all well and good, but I’d rather use the Windows DNS solution. It’s turnkey, stable, and still allows for Linux integration I need.

I fiddles around with a couple different server configurations. I first tried using S4 as the initial DC. I built a Windows Server 2008 R2 machine (sidenote: don’t bother with Windows Server 2012, it’s not supported yet), and tried joining it to the Samba domain. I kept running into strange errors in dcpromo.exe, so that didn’t work.

In my second attempt, I built a Windows server box and joined the S4 server to it, using the instructions listed here:

This worked pretty well. I was able to join successfully and get replication working; I really like having the Windows GUI for DNS and Group Policy settings accessible from Windows. Because I don’t quite trust S4 yet, I added  a second Windows DC to the cluster. I’ve noticed that the S4 server screws up some features. For instance, I can’t demote any of the domain controllers (I get weird errors), so I have to manually remove the connections and DNS entries. This isn’t the end of the world for us because we have a small cluster. For larger clusters this would be a big show stopper. I’ve seen some chatter about this on the samba news list, but no solutions.  There are other minor issues I’ve run into, but I won’t list them here. I’ll probably make a separate post for that.

Those hiccups aside, I’m able to replicate the directory entries between all domain controllers. I can also create accounts on either the S4 or Windows machines successfully. I created a couple fresh Windows 7 builds and configured some group policy settings for them (including folder redirection and roaming profiles). It didn’t take too long to get that working.

There’s a lot of details I’ve come across in setting this up, so I’m going to try and spread it out over a few posts. Stay tuned for updates! If you have any questions about things, feel free to comment or email me.

Gaussian Blur Effect

Note: This is taken from the MetaVoxel development blog,

I started a task last week to add a simple Gaussian blur effect to the background blocks in the scene.  Before we were just drawing them with no filtering, so they looked “pixelated.” I tried using bilinear filtering, but that looks pretty ugly too. It ended up taking way too much time, but I learned a lot about image compositing and running convolution kernels on the GPU. Here are the results:

blur blur2

The algorithm splits the scene up into layers, similar to how you might do it in Photoshop. The “blur” layer is saved to a separate render target and filtered using a two-pass Gaussian blur. This makes for a nice softening effect. The layers are then composited together in a final merge step and rendered to the screen.

For those who care about the details, I ran into some issues with compositing, and it turned out that I’d been thinking about alpha blending all wrong. Most resources I’ve seen out there use linear interpolation for blending, based on the source alpha. The equation looks like:

dest = src_alpha * src_color+ (1 – src_alpha) * dst_color

In OpenGL, this is done through glBlendFunc( GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA ).

There are some issues with this blend mode. For one, how do you represent complete transparency? Is it (0, 0, 0, 0)? Or (255, 255, 255, 0)? You can pick any arbitrary “color” for your transparent surface, which doesn’t really make physical sense. Even worse, if you render a texture into a transparent render target, you end up with discolored fringes:


The solution is to use premultiplied alpha. We instead use the blend mode:

dest = src_color + (1 – src_alpha) * dst_color

This way, (0, 0, 0, 0) is the only true transparent color. It reacts properly to filtering and avoids the ugly fringes. Once I switched the alpha blending to use premultiplied alpha, I was able to set the layer render targets to fully transparent, render into them, and then blend the layers together correctly.

This requires a bit of overhead for texture loading, because the input color is multiplied by the alpha channel. The results look very good though.

Buffer Streaming in OpenGL

I spent a bit of time recently designing a sprite batching system using the (more) modern OpenGL 3.x core functionality rather than the old immediate mode stuff. I thought I would share some of my observations about working with it. My first approach to the problem consisted of allocating a single, large dynamic vertex buffer object, which I mapped via glMapBuffer() to copy in data for each batch of sprites. I used a static index buffer that I locked once and filled at initialization time, since I only ever needed to draw quads. This worked fine, except it was wicked slow.

For one, MetaVoxel uses alpha blending very heavily, requiring everything to be drawn back to front. As you might expect, this results in an output sensitive batch size, because it relies on the next quad requiring an identical rendering state. Sorting by texture isn’t possible, unfortunately. Additionally, each time I mapped the buffer, I would map the entire thing all at once–even for small batches. Anyway, it didn’t take much to bring things to a crawl. Profiling revealed a vast majority of time spent in the driver, waiting on glMapBuffer.

Clearly, I was doing something wrong. The key issue turned out to be buffer synchronization between the CPU and GPU. I found some great resources below that go into detail explaining how to optimize this.

I won’t rehash the details of how buffer orphaning works, see the above articles if you’re interested. The basic idea is that the driver is very conservative and will happily stall waiting for a buffer to flush to the GPU before letting you write to it again. You either need to coax the driver to allocate a new buffer for you, or implement a ring buffer scheme yourself. I ended up implementing the following techniques that got things running smoothly.

    I used glMapBufferRange to map only the required amount of data per batch, specifying the GL_MAP_INVALIDATE_RANGE_BIT.

  • I created a ring buffer of vertex buffer objects, mapping each one in sequence with each new batch. Rather than coaxing the driver to give me a fresh set of data each time, I implemented it myself.
  • I specified GL_STREAM_DRAW as the driver hint.

When I tested this on several machines, I saw vastly better performance–indicating less synchronization between the CPU and GPU.