@alexjs
flickr
last.fm
Mobile
<< >>
  1. Celery and a failing MySQL Server

    Celery is a distributed task queue for Python. It’s pretty useful, and a lot of apps I’m involved in deploying seem to be using it lately.

    Something it seems to struggle with is stability; in the event of a database disappearing, being unable to resolve a database’s hostname, or a single connection to a database failing, it just shuts down. 

    I needed this to not happen, when running things in “the cloud” (sorry) you’re very much at the mercy of other people controlling your networking/tin/everything - so you need to write applications that are capable of a little bit of failure (even if the application was originally written in this way to avoid split brain or similar). To get around this, we implemented monit. I am definitely not a fan of apps automatically restarting, but it was the only trivial resolution in this situation. Just append this to your monit config and you should be sorted. My understanding is that there isn’t a better solution yet, but would be interested to know if anyone has seen one.

    check process celeryd with pidfile /var/run/celeryd.pid

            start program = "/etc/init.d/celeryd start" with timeout 10 seconds

            stop program = "/etc/init.d/celeryd stop"

            if changed pid then restart

            if 5 restarts within 5 cycles then timeout

                    alert youremailaddresshere

    (I appreciate this is especially tedious, but this is for my reference)

    11 months ago  /  1 note  /  Comments

  2. Making nginx ignore query string parameters

    When using nginx as a caching proxy, I found myself needing to ignore particular parameters for both the cache key and the values being passed to the backend. In this particular situation the value I wanted to ignore was ‘uid’. An example URI being:

    http://myapplication.fqdn/foo.ext?env=bar&uid=baz&node=qux

    or

    http://myapplication.fqdn/foo.ext?uid=bar

    To ignore this, in the top of my site configuration I put:

    proxy_cache_key         "$scheme$host$uri$is_args$args";

    in the server stanza:

              if ($args ~ (.*?)(?:^|(&))uid=[^&]*(?:(\2.*)|&(.*))?) {

                    set $args $1$3$4;

            }

            if ($args ~ (^\w)) {

                    set $args ?$args;

            }

    and the location stanza:

    proxy_pass              http://appservers$uri$args;

    So now my backend servers see:

    GET /foo.ext?env=bar&node=qux

    or

    GET /bar.ext

    and seldom few hits get through to there anyway, as the cache key flattens it appropriately.

    Easy.

    EDIT: The ‘easy’ bit is a lie, it seems. Thanks to @davidgl for pulling me out of regex hell. Several revisions here helped by him.

    1 year ago  /  0 notes  /  Comments

  3. fail2ban time offset issues

    While trying to set up fail2ban, I found that even though my regex/logs matched up nothing was being banned/caught by fail2ban

    After a bit of investigation it seems that the auth.log time was being written in GMT whereas fail2ban was expecting it in BST:

    ==> /var/log/auth.log <==

    Oct 11 20:52:21 ns2 sshd[18119]: Invalid user test from 1.2.3.4

    Oct 11 20:52:21 ns2 sshd[18119]: Failed none for invalid user test from 1.2.3.4 port 47862 ssh2

    Oct 11 20:52:28 ns2 sshd[18119]: Failed password for invalid user test from 1.2.3.4 port 47862 ssh2

    ==> /var/log/fail2ban.log <==

    2010-10-11 21:52:04,017 fail2ban.filter : DEBUG  /var/log/auth.log has been modified

    2010-10-11 21:52:04,029 fail2ban.filter.datedetector: DEBUG  Sorting the template list

    Fairly trivial fix of:

    rm /etc/localtime

    ln -s /usr/share/zoneinfo/Europe/London /etc/localtime

    and I am now successfully banning myself from accessing my server. Vunderbar.

    1 year ago  /  2 notes  /  Comments

  4. MessageLabs Mail Filtering and Vague Errors

    450 Requested action aborted [7.2] 20412, please visit www.messagelabs.com/support for more details about this error message.

    It took a remarkably large amount of searching to find out what ‘[7.2]’ meant in this error message, and why we kept getting a mailserver’s IP blacklisted, but if this happens to you, hopefully this will help resolve it.

    When MessageLabs returns a [7.2], this seems to mean that they’ve checked the IP address of the host which is connecting to their MX against the CBL. Connections will be dropped immediately, rather than mail being rejected, as such:

    # telnet cluster8a.eu.messagelabs.com 25

    Trying 85.158.143.51…

    Connected to cluster8a.eu.messagelabs.com (85.158.143.51).

    Escape character is ‘^]’.

    450 Requested action aborted [7.2] 20412, please visit www.messagelabs.com/support for more details about this error message.

    Connection closed by foreign host. 

    The easiest way to get around this is to fix your mail server, then request delisting from the CBL.

    In a completely unrelated note (ahem), it seems that you may be added to the CBL if you send an email from a domain where the sending mail server is explicitly disallowed by SPF records (such as -all with no matching include), to a gmail address; Google will automatically (?) submit the IP address to the CBL and your problems will begin (again).

    I highly recommend robtex as a lazy way to check your hosts against blacklists.

    1 year ago  /  0 notes  /  Comments

  5. VMWare ESX and a full SQL Server Database

    Hypothetical situation. You installed VMWare ESX, possibly upgraded from 3.5 to 4, went with the embedded SQL Server, and Many Years Later the VirtualCenter server no longer starts. You look through the event logs and the best you can find is:

    Faulting application vpxd.exe, version 4.0.10021.0, faulting module kernel32.dll, version 5.2.3790.4480, fault address 0x0000bef7.

    So you decide to look at general application eventlog events rather than just for VMware:

    Could not allocate space for object ‘dbo.VPX_EVENT’.’PK_VPX_EVENT’ in database ‘VIM_VCDB’ because the ‘PRIMARY’ filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

    “Great”, you think. I can just pass this over to a DBA to get them to increase the filegroup size. Then you dig a bit deeper and look at the event log for SQLServer:

    CREATE DATABASE or ALTER DATABASE failed because the resulting cumulative database size would exceed your licensed limit of 4096 MB per database.

    “Oh no!” you sob. You really don’t want to try migrating to an enterprise database right now. Worry not, there’s a VMWare solution. This easy process is:

    • Install Microsoft SQL Server Management Studio Express
    • Download and extract VCDB_PURGE_MSSQL.zip
    • Make sure all VMWare VirtualCenter processes are stopped
    • Open Microsoft SQL Server Management Studio Express
    • File -> Open -> Choose the extracted sql script
    • Change the database from ‘master’ to ‘VIM_VCDB’ in the dropdown on the top bar
    • Press ‘Execute’
    • Evaluate the deleted rows, make sure it’s not more than you’d expect (ok, I didn’t do this)
    • Change

    SET @DELETE_DATA = 0

    to

    SET @DELETE_DATA = 1

    • Press ‘Execute’ again.
    • Wait. Get a coffee. Get eight. It will eventually finish:

    ****************** SUMMARY *******************
    Deleted 8400 rows from VPX_TASK table.
    Deleted 2585209 rows from VPX_EVENT_ARG table.
    Deleted 1662120 rows from VPX_EVENT table.
    Deleted 0 rows from VPX_HIST_STAT1 table.
    Deleted 0 rows from VPX_SAMPLE_TIME1 table.
    Deleted 0 rows from VPX_HIST_STAT2 table.
    Deleted 0 rows from VPX_SAMPLE_TIME2 table.
    Deleted 0 rows from VPX_HIST_STAT3 table.
    Deleted 0 rows from VPX_SAMPLE_TIME3 table.
    Deleted 105331 rows from VPX_HIST_STAT4 table.
    Deleted 373 rows from VPX_SAMPLE_TIME4 table.

    • Start VCenter Server. Wait. Try and connect. Hope. Pray.
    • Connect to VCenter Server
    • From the client, press Ctrl-Shift-I
    • Go to ‘Database Retention Policy’, and enable it.

    Hopefully this will save someone a bit of googling.

    1 year ago  /  4 notes  /  Comments

  6. 
Well, it&#8217;s one louder, isn&#8217;t it? It&#8217;s not ten. You see, most blokes, you  know, will be playing at ten. You&#8217;re on ten here, all the way up, all  the way up, all the way up, you&#8217;re on ten on your guitar. Where can you  go from there? Where?

    Well, it’s one louder, isn’t it? It’s not ten. You see, most blokes, you know, will be playing at ten. You’re on ten here, all the way up, all the way up, all the way up, you’re on ten on your guitar. Where can you go from there? Where?

    1 year ago  /  0 notes  /  Comments

  7. Checking SSH Private Keys for Passphrases

    Imposing ridiculously over the top security policies? Want to make sure any SSH private keys on your jump-off/administration server have a passphrase?

    Don’t waste time trying to get expect working…

    expect <<EOF

    spawn ssh-keygen -f file -y

    expect -timeout 1 "Enter passphrase:" {exit 1}

    EOF

    Just look at the damn file (thanks @ealexhudson and @Azquelt) and check if it’s got ‘Proc-Type: 4,ENCRYPTED’ in

    Without

    root@a-server ~ # find /home/*/.ssh/ -name "id_*sa" -exec grep -L ENCRYPTED {} \; | wc -l

    19

    With

    root@a-server ~ # find /home/*/.ssh/ -name "id_*sa" -exec grep -l ENCRYPTED {} \; | wc -l

    1

    Lovely. This of course doesn’t solve the issue of checking, from the SSH public keys, whether the private keys have passphrases or not.

    1 year ago  /  0 notes  /  Comments

  8. LVM Stale NFS File Handles (Part 1)

    So, here’s an interesting issue

    (initramfs) mount
    rootfs on / type rootfs (rw)
    none on /sys type sysfs (rw,nosuid,nodev,noexec)
    none on /proc type proc (rw,nosuid,nodev,noexec)
    udev on /dev type tmpfs (rw,size=10240k,mode=755)
    /dev/pudding/root on /mnt type ext3 (rw,errors=continue,data=ordered)

    So I’m using BusyBox, with an LVM volume mounted on /mnt. Happy?

    (initramfs) ls /mnt
    ls: /mnt/initrd.img.old: Stale NFS file handle
    ls: /mnt/vmlinuz: Stale NFS file handle
    ls: /mnt/vmlinuz.old: Stale NFS file handle

    Only one directory (was, a while ago) exported by NFS, which isn’t one that is affected, and the box has never mounted anything by NFS. It seems like the error can be caused when a file is open and the disk falls out from underneath it, and an ambiguous error code is sent back which is interpreted as a stale filehandle. Either way, the superblock on this particular FS is corrupted, so the next step would be to attempt to recover using one of the backup superblocks. I’ll attempt this later and let you know how it goes. I’m sure you’ll be on the edge of your seats.

    1 year ago  /  1 note  /  Comments

  9. Bitlbee -> Facebook rename script update

    If you’ve recently noticed that your renamed users in bitlbee have changed back to UIDs, it’s probably because Facebook have changed their UID string from being ‘uNNNNNN’ to ‘-NNNNNN’. Not a huge problem, just change the script:

    % diff bitlbee_rename.pl.old bitlbee_rename.pl
    22c22
    <   if($channel == $bitlbeeChannel && $nick == $username && $nick =~ m/^u\d+/ && $host == “chat.facebook.com”)
    —-
    >   if($channel == $bitlbeeChannel && $nick == $username && $nick =~ m/^-\d+/ && $host == “chat.facebook.com”)
    25c25
    <     $server->command(“whois $nick”);
    —-
    >     $server->command(“whois \”$nick\”“);

    (My) updated version here. Hope this helps at least one person.

    Update

    @TheSamoth has pointed out that new bitlbee (v >=1.2.5)doesn’t actually require the rename script, as it has the functionality built in and can be enabled with. Thanks!

    account set facebook/nick_source full_name

    1 year ago  /  0 notes  /  Comments

  10. MegaRaid Lies

    Dell PowerEdge 1850. I’ve never seen it in the flesh, but believe it has a MegaRAID card.

    # lsscsi
    [0:0:6:0]    process PE/PV    1x2 SCSI BP      1.0   -
    [0:1:0:0]    disk    MegaRAID LD 0 RAID1   69G 516A  /dev/sda
    # grep -i raid /var/log/dmesg
    [   20.251664] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
    [   20.690899] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
    [   20.690929] megaraid: probe new device 0x1028:0x0013:0x1028:0x016c: bus 2:slot 14:func 0
    [   20.690964] megaraid 0000:02:0e.0: PCI INT A -> GSI 46 (level, low) -> IRQ 46
    [   21.324054] megaraid: fw version:[516A] bios version:[H418]
    [   21.332182] scsi0 : LSI Logic MegaRAID driver
    [   21.332598] scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
    [   24.821907] scsi 0:1:0:0: Direct-Access     MegaRAID LD 0 RAID1   69G 516A PQ: 0 ANSI: 2

    Seems fine, right?

    # ./MegaCli64 -adpCount
    Controller Count: 0.
    Exit Code: 0x00

    Hrm.

    /opt/MegaRAID/MegaCli # omreport storage controller
    No controllers found

    Starting to get tweaky.

    Update

    Thanks to jtopper for the help so far. Getting a bit further, but still:

    wget http://www.lsi.com/DistributionSystem/AssetDocument/files/support/rsa/utilities/megaconf/ut_linux_megarc_1.11.zip

    unzip ut_linux_megarc_1.11.zip

    sudo ./megarc.bin -AllAdpInfo

            Failed to get driver version

            No Adapters Found

    And…

    $ grep MAJOR megarc
    MAJOR=`grep megadev /proc/devices|awk ‘{print $1}’`
    $ grep -ci mega /proc/devices
    0

    Further Update


    I’ve finally managed to get to the bottom of this. Looks like any app which creates the /dev/megadev0 device does it with the wrong major. To fix this, based on some brilliant info, I used a major of 10 (now that 252 is used for usbmon), and a minor from /proc/misc.

    # mknod /dev/megadev0 c 10 59

    $ sudo ./megarc -dispCfg -a0


            **********************************************************************
                  MEGARC MegaRAID Configuration Utility(LINUX)-1.11(12-07-2004)
                  By LSI Logic Corp.,USA
            **********************************************************************
              [Note: For SATA-2, 4 and 6 channel controllers, please specify
              Ch=0 Id=0..15 for specifying physical drive(Ch=channel, Id=Target)]

            Type ? as command line arg for help


            Finding Devices On Each MegaRAID Adapter…
            Scanning Ha 0, Chnl 0 Target 15


            **********************************************************************
                  Existing Logical Drive Information
                  By LSI Logic Corp.,USA
            **********************************************************************
              [Note: For SATA-2, 4 and 6 channel controllers, please specify
              Ch=0 Id=0..15 for specifying physical drive(Ch=channel, Id=Target)]


              Logical Drive : 0( Adapter: 0 ):  Status: OPTIMAL
            —————————————————————————-
            SpanDepth :01     RaidLevel: 1  RdAhead : Adaptive  Cache: DirectIo
            StripSz   :064KB   Stripes  : 2  WrPolicy: WriteBack

            Logical Drive 0 : SpanLevel_0 Disks
            Chnl  Target  StartBlock   Blocks      Physical Target Status
            ——  ———  —————   ———      ———————————
            0      00    0x00000000   0x0887c000   ONLINE
            0      01    0x00000000   0x0887c000   ONLINE

    Hope this helps someone, and many thanks to these guys.

    1 year ago  /  1 note  /  Comments

Tweetline by
Marcus Mo