NCBI Home Page NCBI Site Search page NCBI Guide that lists and describes the NCBI resources
Conserved domains on  [gi|2462559783|ref|XP_054174265|]
View 

activity-dependent neuroprotector homeobox protein 2 isoform X1 [Homo sapiens]

Protein Classification

Graphical summary

 Zoom to residue level

show extra options »

Show site features     Horizontal zoom: ×

List of domain hits

Name Accession Description Interval E-value
ADNP_N super family cl45031
Activity-dependent neuroprotector homeobox protein N-terminal; This entry represent the ...
1-962 1.55e-63

Activity-dependent neuroprotector homeobox protein N-terminal; This entry represent the N-terminal domain of Activity-dependent neuroprotector homeobox protein (ADNP, also known as Activity- dependent neuroprotective protein), which contains zinc finger motifs. It is involved in transcriptional regulation and it is vital for mammalian brain formation. In humans, de novo mutations result in a syndromic form of autism-like spectrum disorder (ASD), including cognitive and motor deficits, the ADNP syndrome. This protein is also related to autophagy and the pathophysiology of schizophrenia.


The actual alignment was detected with superfamily member pfam19627:

Pssm-ID: 466132 [Multi-domain]  Cd Length: 744  Bit Score: 230.51  E-value: 1.55e-63
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783    1 MFQIPVENLDNIRK----------------------DLKGFDPGEKYFHNTSWGDVSLWEPSGKKVR-YRTKPYCCGLCK 57
Cdd:pfam19627    1 MFQLPVNNLGSLRKarknvkkilsdigleyckehieDFKDFEPNDFYIKNTSWDDVCLWDPSLTKNQdYRTKPFCCSGCP 80
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783   58 YSTKVLTSFKNHLHRYHEDEIDQELVIPCPNCVFASQPKVVGRHFRMFHAPVRKVQNYTVNILGETKSSRSDVIS----- 132
Cdd:pfam19627   81 FSSKFFSAYKSHFRNVHSEDFENRILLNCPYCTYNGNKKTLETHIKLFHMPNNVVRQPSGGPVGFKDKSKQDSLKpkqgd 160
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  133 ------FTCLKCNFSNTLYYSMKKHVLVAHFHYLINSYFGlRTEEmgeqpKT-NDTVSIEKIPPPDKYYCKKCNANASSQ 205
Cdd:pfam19627  161 sveqavYYCKKCTYRDPLYNVVRKHIYREHFQHVAAPYVA-KPGE-----KSvNGAVASSNTRDDGSIHCKRCLFMPRTY 234
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  206 DALMYHiltsdihrdlenklrsVISEHIKrtgllkqthiapkpaahlaapangsapsapaqppcfhlalpqnspspaAGQ 285
Cdd:pfam19627  235 EALVQH----------------VIEDHER------------------------------------------------IGY 250
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  286 PVTVAQGapgslthsppaagqshmtlvssplpvgqnsltlqppapqpvflshgvplHQSVnppVLPLSQPvgpvnksvgt 365
Cdd:pfam19627  251 QVTAMIG-------------------------------------------------HTNV---VVPRSKP---------- 268
                          410       420       430       440       450       460       470       480
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  366 svlpinqtvrpgvLPLTQPVGPINRPVGpgvLPVSPSVTPGVLQAvspgvlsvsravpsgvlpagqmtpagqmtpagvip 445
Cdd:pfam19627  269 -------------LMLIAPKPQDKKSLG---VTQKGGLVTGNVRS----------------------------------- 297
                          490       500       510       520       530       540       550       560
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  446 gqtatsgvLPTGQMvqsgvlpvgqtapSRVLPPGqtaplrvisagqvvpSGLLSPnqtvsssavVPVNQGvnsgvlQLSQ 525
Cdd:pfam19627  298 --------LSSQQM-------------NRLSIPK---------------ANLLSN---------VHLKQG------SYGL 326
                          570       580       590       600       610       620       630       640
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  526 PVVSGVLPVGQPVRPGvlqlnqtvgtniLPVNQPVR-PGASQNttfltsgsiLRQLIPTGkqvNGIPTYTLAPVSVTLP- 603
Cdd:pfam19627  327 KSMPSFYVLGQQVRLS------------LPGNAQVSvPQQSQT---------VKQLLPGG---NGRPSTVGSSQSGQQPa 382
                          650       660       670       680       690       700       710       720
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  604 ---VPPGGLATVAPPQM-PIQLLPSGAAAPMAGSMPGMPSppvlvnaaqsvfvqASSSAADTNQVLKqakqWKTCPVCNE 679
Cdd:pfam19627  383 rfsVQSGNSASSSSSQLkSPPLSSSVAATRALGQGPSKSS--------------ASAAGLNTSYTQK----WKICTICNE 444
                          730       740       750       760       770       780       790       800
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  680 LFPSNVYQVHMEVAHKhsesksgeklePEKLAACAPFLKWMREKTVRCLSCKCLVSEEELIHHLLMHGLGCLFCPCTFHD 759
Cdd:pfam19627  445 LFPENVYSAHFEKEHK-----------AEKVPAVANYIMKIHNFTSKCLYCNRYLPSDTLLNHMLIHGLSCPYCRSTFND 513
                          810       820       830       840       850       860       870       880
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  760 IKGLSEHSRNRHLGKKKLPMDYSNRGFQLDVDaNGNLLFPHLDFITILPKEKLGEREVYLA---------ILAGIHSKSL 830
Cdd:pfam19627  514 VEKMVAHMRMVHPDEEVGPRTDSPLTFDLTLQ-QGNPKNIQLLVTTYNMRDAPEESVAFHAqnnspqpkkPKPKVQEKSD 592
                          890       900       910       920       930       940       950       960
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  831 VPvyVKVRPQaegTPGSTGKRV--STCPFCF----GPFvtTEAYELHLKERHHIMPTVHTVLKSPAFKCIHCCGVYTGNM 904
Cdd:pfam19627  593 VP--VKSSPQ---AAVPYKKDVgkTLCPLCFsilkGPI--SDALAHHLRERHQVIQTVHPVEKKLTYKCIHCLGVYTSNM 665
                          970       980       990      1000      1010      1020
                   ....*....|....*....|....*....|....*....|....*....|....*....|.
gi 2462559783  905 TLAAIAVHLVRCRSAPK--DSSSDLQAQPGFIHNSELLLVSGEVMH-DSSFSVKRKLPDGH 962
Cdd:pfam19627  666 TASTITLHLVHCRGVGKtqNGQDKSAPSPRVTQSPGAAPLKRELEHvDPALPKKRKLDDEE 726
PHA03247 super family cl33720
large tegument protein UL36; Provisional
246-492 2.16e-09

large tegument protein UL36; Provisional


The actual alignment was detected with superfamily member PHA03247:

Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 62.26  E-value: 2.16e-09
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  246 PKPAAhLAAPANGSAPSAPAQPPCFHLALPQNSP-SPAAGQPVTVAQGAPGSLTHSPPAAGQSHMTLVSSPLPVGQNSLT 324
Cdd:PHA03247  2758 ARPPT-TAGPPAPAPPAAPAAGPPRRLTRPAVASlSESRESLPSPWDPADPPAAVLAPAAALPPAASPAGPLPPPTSAQP 2836
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  325 LQPPAPqPVFLSHGVPLHQSVNP--PVLPLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQPVGpiNRPVGPGVLPVSPS 402
Cdd:PHA03247  2837 TAPPPP-PGPPPPSLPLGGSVAPggDVRRRPPSRSPAAKPAAPARPPVRRLARPAVSRSTESFA--LPPDQPERPPQPQA 2913
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  403 VTPGVLQAVSPGVLSVSRAVPSGVLPAGQMTP----AGQMTPAGVIPgqTATSGVLPTGQmVQSGVLPVGQTAPSRVLPP 478
Cdd:PHA03247  2914 PPPPQPQPQPPPPPQPQPPPPPPPRPQPPLAPttdpAGAGEPSGAVP--QPWLGALVPGR-VAVPRFRVPQPAPSREAPA 2990
                          250
                   ....*....|....
gi 2462559783  479 GQTAPLRVISAGQV 492
Cdd:PHA03247  2991 SSTPPLTGHSLSRV 3004
HOX smart00389
Homeodomain; DNA-binding factors that are involved in the transcriptional regulation of key ...
1021-1074 1.58e-06

Homeodomain; DNA-binding factors that are involved in the transcriptional regulation of key developmental processes


:

Pssm-ID: 197696 [Multi-domain]  Cd Length: 57  Bit Score: 46.09  E-value: 1.58e-06
                            10        20        30        40        50
                    ....*....|....*....|....*....|....*....|....*....|....
gi 2462559783  1021 PKKYEGRSYEEKKQFLKDYFHKKPYPSKKEIELLSSLFWVWKIDVASFFGKRRY 1074
Cdd:smart00389    1 KRRKRTSFTPEQLEELEKEFQKNPYPSREEREELAKKLGLSERQVKVWFQNRRA 54
 
Name Accession Description Interval E-value
ADNP_N pfam19627
Activity-dependent neuroprotector homeobox protein N-terminal; This entry represent the ...
1-962 1.55e-63

Activity-dependent neuroprotector homeobox protein N-terminal; This entry represent the N-terminal domain of Activity-dependent neuroprotector homeobox protein (ADNP, also known as Activity- dependent neuroprotective protein), which contains zinc finger motifs. It is involved in transcriptional regulation and it is vital for mammalian brain formation. In humans, de novo mutations result in a syndromic form of autism-like spectrum disorder (ASD), including cognitive and motor deficits, the ADNP syndrome. This protein is also related to autophagy and the pathophysiology of schizophrenia.


Pssm-ID: 466132 [Multi-domain]  Cd Length: 744  Bit Score: 230.51  E-value: 1.55e-63
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783    1 MFQIPVENLDNIRK----------------------DLKGFDPGEKYFHNTSWGDVSLWEPSGKKVR-YRTKPYCCGLCK 57
Cdd:pfam19627    1 MFQLPVNNLGSLRKarknvkkilsdigleyckehieDFKDFEPNDFYIKNTSWDDVCLWDPSLTKNQdYRTKPFCCSGCP 80
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783   58 YSTKVLTSFKNHLHRYHEDEIDQELVIPCPNCVFASQPKVVGRHFRMFHAPVRKVQNYTVNILGETKSSRSDVIS----- 132
Cdd:pfam19627   81 FSSKFFSAYKSHFRNVHSEDFENRILLNCPYCTYNGNKKTLETHIKLFHMPNNVVRQPSGGPVGFKDKSKQDSLKpkqgd 160
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  133 ------FTCLKCNFSNTLYYSMKKHVLVAHFHYLINSYFGlRTEEmgeqpKT-NDTVSIEKIPPPDKYYCKKCNANASSQ 205
Cdd:pfam19627  161 sveqavYYCKKCTYRDPLYNVVRKHIYREHFQHVAAPYVA-KPGE-----KSvNGAVASSNTRDDGSIHCKRCLFMPRTY 234
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  206 DALMYHiltsdihrdlenklrsVISEHIKrtgllkqthiapkpaahlaapangsapsapaqppcfhlalpqnspspaAGQ 285
Cdd:pfam19627  235 EALVQH----------------VIEDHER------------------------------------------------IGY 250
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  286 PVTVAQGapgslthsppaagqshmtlvssplpvgqnsltlqppapqpvflshgvplHQSVnppVLPLSQPvgpvnksvgt 365
Cdd:pfam19627  251 QVTAMIG-------------------------------------------------HTNV---VVPRSKP---------- 268
                          410       420       430       440       450       460       470       480
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  366 svlpinqtvrpgvLPLTQPVGPINRPVGpgvLPVSPSVTPGVLQAvspgvlsvsravpsgvlpagqmtpagqmtpagvip 445
Cdd:pfam19627  269 -------------LMLIAPKPQDKKSLG---VTQKGGLVTGNVRS----------------------------------- 297
                          490       500       510       520       530       540       550       560
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  446 gqtatsgvLPTGQMvqsgvlpvgqtapSRVLPPGqtaplrvisagqvvpSGLLSPnqtvsssavVPVNQGvnsgvlQLSQ 525
Cdd:pfam19627  298 --------LSSQQM-------------NRLSIPK---------------ANLLSN---------VHLKQG------SYGL 326
                          570       580       590       600       610       620       630       640
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  526 PVVSGVLPVGQPVRPGvlqlnqtvgtniLPVNQPVR-PGASQNttfltsgsiLRQLIPTGkqvNGIPTYTLAPVSVTLP- 603
Cdd:pfam19627  327 KSMPSFYVLGQQVRLS------------LPGNAQVSvPQQSQT---------VKQLLPGG---NGRPSTVGSSQSGQQPa 382
                          650       660       670       680       690       700       710       720
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  604 ---VPPGGLATVAPPQM-PIQLLPSGAAAPMAGSMPGMPSppvlvnaaqsvfvqASSSAADTNQVLKqakqWKTCPVCNE 679
Cdd:pfam19627  383 rfsVQSGNSASSSSSQLkSPPLSSSVAATRALGQGPSKSS--------------ASAAGLNTSYTQK----WKICTICNE 444
                          730       740       750       760       770       780       790       800
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  680 LFPSNVYQVHMEVAHKhsesksgeklePEKLAACAPFLKWMREKTVRCLSCKCLVSEEELIHHLLMHGLGCLFCPCTFHD 759
Cdd:pfam19627  445 LFPENVYSAHFEKEHK-----------AEKVPAVANYIMKIHNFTSKCLYCNRYLPSDTLLNHMLIHGLSCPYCRSTFND 513
                          810       820       830       840       850       860       870       880
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  760 IKGLSEHSRNRHLGKKKLPMDYSNRGFQLDVDaNGNLLFPHLDFITILPKEKLGEREVYLA---------ILAGIHSKSL 830
Cdd:pfam19627  514 VEKMVAHMRMVHPDEEVGPRTDSPLTFDLTLQ-QGNPKNIQLLVTTYNMRDAPEESVAFHAqnnspqpkkPKPKVQEKSD 592
                          890       900       910       920       930       940       950       960
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  831 VPvyVKVRPQaegTPGSTGKRV--STCPFCF----GPFvtTEAYELHLKERHHIMPTVHTVLKSPAFKCIHCCGVYTGNM 904
Cdd:pfam19627  593 VP--VKSSPQ---AAVPYKKDVgkTLCPLCFsilkGPI--SDALAHHLRERHQVIQTVHPVEKKLTYKCIHCLGVYTSNM 665
                          970       980       990      1000      1010      1020
                   ....*....|....*....|....*....|....*....|....*....|....*....|.
gi 2462559783  905 TLAAIAVHLVRCRSAPK--DSSSDLQAQPGFIHNSELLLVSGEVMH-DSSFSVKRKLPDGH 962
Cdd:pfam19627  666 TASTITLHLVHCRGVGKtqNGQDKSAPSPRVTQSPGAAPLKRELEHvDPALPKKRKLDDEE 726
PHA03247 PHA03247
large tegument protein UL36; Provisional
248-641 5.40e-15

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 80.37  E-value: 5.40e-15
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  248 PAAHLAAPANGSAPSAPAQPPcfhlalpqnsPSPAAGQPVTVAQGAPGSLTHSPPAA--GQSHMTLVSSPLPVGQNSLTL 325
Cdd:PHA03247  2557 PAAPPAAPDRSVPPPRPAPRP----------SEPAVTSRARRPDAPPQSARPRAPVDdrGDPRGPAPPSPLPPDTHAPDP 2626
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  326 QPPAPQPVFLSHGVPLHQSVNPPVLPLSQPVGPV------NKSVGTSVLPINQTVRPGVLPLTQPVGPINRPVGPGVLPV 399
Cdd:PHA03247  2627 PPPSPSPAANEPDPHPPPTVPPPERPRDDPAPGRvsrprrARRLGRAAQASSPPQRPRRRAARPTVGSLTSLADPPPPPP 2706
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  400 SPSVTPGVLQAVSPgvLSVSRAVPSGVLPAGQMTPAGQMTPAG-VIPGQTATSGVLPTGQMVQSGVLPVGQ-TAPSRVLP 477
Cdd:PHA03247  2707 TPEPAPHALVSATP--LPPGPAAARQASPALPAAPAPPAVPAGpATPGGPARPARPPTTAGPPAPAPPAAPaAGPPRRLT 2784
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  478 PGQTAPLRVISAGQVVPSGLLSPNQTVSS-SAVVPVNQGVNSGVlqlsqPVVSGVLPVGQPVRPGVLQLNQTVGTNILPV 556
Cdd:PHA03247  2785 RPAVASLSESRESLPSPWDPADPPAAVLApAAALPPAASPAGPL-----PPPTSAQPTAPPPPPGPPPPSLPLGGSVAPG 2859
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  557 NQPVRPGASQNTTFLTSGsilrqliPTGKQVNGIPTYTLAPVSVTLPVPPGGLATVAPPQMPIQLLPSGAAAPMAGSMPG 636
Cdd:PHA03247  2860 GDVRRRPPSRSPAAKPAA-------PARPPVRRLARPAVSRSTESFALPPDQPERPPQPQAPPPPQPQPQPPPPPQPQPP 2932

                   ....*
gi 2462559783  637 MPSPP 641
Cdd:PHA03247  2933 PPPPP 2937
PHA03247 PHA03247
large tegument protein UL36; Provisional
246-492 2.16e-09

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 62.26  E-value: 2.16e-09
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  246 PKPAAhLAAPANGSAPSAPAQPPCFHLALPQNSP-SPAAGQPVTVAQGAPGSLTHSPPAAGQSHMTLVSSPLPVGQNSLT 324
Cdd:PHA03247  2758 ARPPT-TAGPPAPAPPAAPAAGPPRRLTRPAVASlSESRESLPSPWDPADPPAAVLAPAAALPPAASPAGPLPPPTSAQP 2836
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  325 LQPPAPqPVFLSHGVPLHQSVNP--PVLPLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQPVGpiNRPVGPGVLPVSPS 402
Cdd:PHA03247  2837 TAPPPP-PGPPPPSLPLGGSVAPggDVRRRPPSRSPAAKPAAPARPPVRRLARPAVSRSTESFA--LPPDQPERPPQPQA 2913
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  403 VTPGVLQAVSPGVLSVSRAVPSGVLPAGQMTP----AGQMTPAGVIPgqTATSGVLPTGQmVQSGVLPVGQTAPSRVLPP 478
Cdd:PHA03247  2914 PPPPQPQPQPPPPPQPQPPPPPPPRPQPPLAPttdpAGAGEPSGAVP--QPWLGALVPGR-VAVPRFRVPQPAPSREAPA 2990
                          250
                   ....*....|....
gi 2462559783  479 GQTAPLRVISAGQV 492
Cdd:PHA03247  2991 SSTPPLTGHSLSRV 3004
SP1-4_arthropods_N cd22553
N-terminal domain of transcription factor Specificity Protein (SP) 1-4 from arthropods; ...
362-679 1.58e-07

N-terminal domain of transcription factor Specificity Protein (SP) 1-4 from arthropods; Specificity Proteins (SPs) are transcription factors that are involved in many cellular processes, including cell differentiation, cell growth, apoptosis, immune responses, response to DNA damage, and chromatin remodeling. There are many SPs in vertebrates (9 SPs in humans and mice, 7 SPs in the chicken, and 11 SPs in teleost fish), but arthropods only have 3 SPs. One SP is clade SP1-4, which is expressed ubiquitously throughout development. SP1-4 belongs to a family of proteins, called the SP/Kruppel or Krueppel-like Factor (KLF) family, characterized by a C-terminal DNA-binding domain of 81 amino acids consisting of three Kruppel-like C2H2 zinc fingers. These factors bind to a loose consensus motif, namely NNRCRCCYY (where N is any nucleotide; R is A/G, and Y is C/T), such as the recurring motifs in GC and GT boxes (5'-GGGGCGGGG-3' and 5-GGTGTGGGG-3') that are present in promoters and more distal regulatory elements of mammalian genes. SP factors preferentially bind GC boxes, while KLFs bind CACCC boxes. Another characteristic hallmark of SP factors is the presence of the Buttonhead (BTD) box CXCPXC, just N-terminal to the zinc fingers. The function of the BTD box is unknown, but it is thought to play an important physiological role. Another feature of most SP factors is the presence of a conserved amino acid stretch, the so-called SP box, located close to the N-terminus. This model represents the N-terminal domain of SP1-4 from arthropods.


Pssm-ID: 411778 [Multi-domain]  Cd Length: 384  Bit Score: 55.03  E-value: 1.58e-07
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  362 SVGTSVLPINQTVRPGVLPLTQPVGPINRPVGPGVLPVSPSVTPGVLQAVSpgVLSVSRAVPSGVLPAGQMTPAG-QMTP 440
Cdd:cd22553     35 ETHDPLILSPPLSQPQQIITAQSSGSAAGGVAYSVSPAVQTVTVDGHEAIF--IPANSGLLQTNNQQAIQLAPGGtQAIL 112
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  441 AGvipGQTATSGVLPTGQMVQSGVLPV-GQTAPSRV---LPP---GQTAPLRV-ISA--GQVVPSGLLSPNQTVSSSAVV 510
Cdd:cd22553    113 AN---QQTLIRPNTVQGQANASNVLQNiAQIASGGNavqLPLnnmTQTIPVQVpVSTanGQTVYQTIQVPIQAIQSGNAG 189
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  511 PVNQGVNSGVL-QLSQPvvsgvlpvgqpvrpGVLQLNQTVGTNILPVNQPVRPGASQNTTFL------TSGSILRQLIPT 583
Cdd:cd22553    190 GGNQALQAQVIpQLAQA--------------AQLQPQQLAQVSSQGYIQQIPANASQQQPQMvqqgpnQSGQIIGQVASA 255
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  584 -GKQVNGIPTYTLApvSVTLPVPPGGLATVAP-PQMPIQLLPSGAAAPMAGSMPGMPSPPVLVNAAQSVFVQASSSAADT 661
Cdd:cd22553    256 sSIQAAAIPLTVYT--GALAGQNGSNQQQVGQiVTSPIQGMTQGLTAPASSSIPTVVQQQAIQGNPLPPGTQIIAAGQQL 333
                          330       340       350
                   ....*....|....*....|....*....|....*.
gi 2462559783  662 NQVLKQAKQWK------------------TCPVCNE 679
Cdd:cd22553    334 QQDPNDPTKWQvvadgtpgskkrlrrvacTCPNCRD 369
HOX smart00389
Homeodomain; DNA-binding factors that are involved in the transcriptional regulation of key ...
1021-1074 1.58e-06

Homeodomain; DNA-binding factors that are involved in the transcriptional regulation of key developmental processes


Pssm-ID: 197696 [Multi-domain]  Cd Length: 57  Bit Score: 46.09  E-value: 1.58e-06
                            10        20        30        40        50
                    ....*....|....*....|....*....|....*....|....*....|....
gi 2462559783  1021 PKKYEGRSYEEKKQFLKDYFHKKPYPSKKEIELLSSLFWVWKIDVASFFGKRRY 1074
Cdd:smart00389    1 KRRKRTSFTPEQLEELEKEFQKNPYPSREEREELAKKLGLSERQVKVWFQNRRA 54
Soli_cterm TIGR03437
Solibacter uncharacterized C-terminal domain; This model describes a protein domain found in ...
443-645 4.34e-06

Solibacter uncharacterized C-terminal domain; This model describes a protein domain found in 90 proteins of Solibacter usitatus Ellin6076, nearly always as the C-terminal domain of a much larger protein. No homologs to this domain are detected outside of S. usitatus, a member of the Acidobacteria.


Pssm-ID: 274578 [Multi-domain]  Cd Length: 215  Bit Score: 48.81  E-value: 4.34e-06
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  443 VIPGQTAT---SGVLPTGQMVQSGVLPVgQTAPSRVLPPGQTAPLRVISAGQV---VPSGLLSPNQTVsssaVVpVNQGV 516
Cdd:TIGR03437    2 VAPGSIVSifgTNLAPATLTAAGGPLPT-SLGGVSVTVNGVAAPLLYVSPGQInaqVPYEVAPGAATV----TV-TYNGG 75
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  517 NSGVLQLS-QPVVSGVLPVGQ-PVRPGVLQLNQtvGTNILPVNQPVRPGaSQNTTFLTSGSILRQLIPTGKQVNGIPTY- 593
Cdd:TIGR03437   76 ASAAVTVTvAAAAPGIFTLDGsGTGQAAALNNQ--DGSVNSAANPAAPG-DVVVLYATGLGPTSPAVADGAPAPSSPLAp 152
                          170       180       190       200       210       220
                   ....*....|....*....|....*....|....*....|....*....|....*....|.
gi 2462559783  594 TLAPVSVTL-----PVPPGGLATVAPPQMPIQL-LPSGAAA---PMAGSMPGMPSPPVLVN 645
Cdd:TIGR03437  153 ALAPVTVTIggvpaTVLYAGLAPGFVGLYQVNVrVPAGLATgavPVVITVGGVTSNAVTIA 213
homeodomain cd00086
Homeodomain; DNA binding domains involved in the transcriptional regulation of key eukaryotic ...
1022-1078 7.71e-06

Homeodomain; DNA binding domains involved in the transcriptional regulation of key eukaryotic developmental processes; may bind to DNA as monomers or as homo- and/or heterodimers, in a sequence-specific manner.


Pssm-ID: 238039 [Multi-domain]  Cd Length: 59  Bit Score: 44.16  E-value: 7.71e-06
                           10        20        30        40        50
                   ....*....|....*....|....*....|....*....|....*....|....*..
gi 2462559783 1022 KKYEGRSYEEKKQFLKDYFHKKPYPSKKEIELLSSLFWVWKIDVASFFGKRRYICMK 1078
Cdd:cd00086      1 RRKRTRFTPEQLEELEKEFEKNPYPSREEREELAKELGLTERQVKIWFQNRRAKLKR 57
PPE COG5651
PPE-repeat protein [Function unknown];
340-542 3.17e-05

PPE-repeat protein [Function unknown];


Pssm-ID: 444372 [Multi-domain]  Cd Length: 385  Bit Score: 47.58  E-value: 3.17e-05
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  340 PLHQSVNPPVLPLSQPVGPVNKSVGTSV--LPINQTVRPGVLPLTQPVGPINRPVGPGVLPVSPSVTPGVLQAVSPGVLS 417
Cdd:COG5651    170 PPPTITNPGGLLGAQNAGSGNTSSNPGFanLGLTGLNQVGIGGLNSGSGPIGLNSGPGNTGFAGTGAAAGAAAAAAAAAA 249
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  418 VSRAVPSGVLPAGQMTPAGQMTPAGVIPGQTATSGVLPTGQMVQSGVLPVGQTAPSRVLPPGQTAPLRVISAGqVVPSGL 497
Cdd:COG5651    250 AAGAGASAALASLAATLLNASSLGLAATAASSAATNLGLAGSPLGLAGGGAGAAAATGLGLGAGGAAGAAGAT-GAGAAL 328
                          170       180       190       200
                   ....*....|....*....|....*....|....*....|....*
gi 2462559783  498 LSPNQTVSSSAVVPVNQGVNSGVLQLSQPVVSGVLPVGQPVRPGV 542
Cdd:COG5651    329 GAGAAAAAAGAAAGAGAAAAAAAGGAGGGGGGALGAGGGGGSAGA 373
Atrophin-1 pfam03154
Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian ...
242-412 2.64e-04

Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian atrophy (DRPLA) gene. DRPLA OMIM:125370 is a progressive neurodegenerative disorder. It is caused by the expansion of a CAG repeat in the DRPLA gene on chromosome 12p. This results in an extended polyglutamine region in atrophin-1, that is thought to confer toxicity to the protein, possibly through altering its interactions with other proteins. The expansion of a CAG repeat is also the underlying defect in six other neurodegenerative disorders, including Huntington's disease. One interaction of expanded polyglutamine repeats that is thought to be pathogenic is that with the short glutamine repeat in the transcriptional coactivator CREB binding protein, CBP. This interaction draws CBP away from its usual nuclear location to the expanded polyglutamine repeat protein aggregates that are characteriztic of the polyglutamine neurodegenerative disorders. This interferes with CBP-mediated transcription and causes cytotoxicity.


Pssm-ID: 460830 [Multi-domain]  Cd Length: 991  Bit Score: 45.14  E-value: 2.64e-04
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  242 THIAPKPAAH-LAAPANGSAPSApaQPPCFHLaLPQNS--PSPAAgQPVTVAQgapgSLTHSPPAAgqshmtlvSSPLPV 318
Cdd:pfam03154  388 SNLPPPPALKpLSSLSTHHPPSA--HPPPLQL-MPQSQqlPPPPA-QPPVLTQ----SQSLPPPAA--------SHPPTS 451
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  319 GQNSLTLQPPAPQPVFLSHGVPL------HQSVNPPVLPLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQPVGPINRPV 392
Cdd:pfam03154  452 GLHQVPSQSPFPQHPFVPGGPPPitppsgPPTSTSSAMPGIQPPSSASVSSSGPVPAAVSCPLPPVQIKEEALDEAEEPE 531
                          170       180
                   ....*....|....*....|
gi 2462559783  393 GPGVLPVSPSVTPGVLQAVS 412
Cdd:pfam03154  532 SPPPPPRSPSPEPTVVNTPS 551
 
Name Accession Description Interval E-value
ADNP_N pfam19627
Activity-dependent neuroprotector homeobox protein N-terminal; This entry represent the ...
1-962 1.55e-63

Activity-dependent neuroprotector homeobox protein N-terminal; This entry represent the N-terminal domain of Activity-dependent neuroprotector homeobox protein (ADNP, also known as Activity- dependent neuroprotective protein), which contains zinc finger motifs. It is involved in transcriptional regulation and it is vital for mammalian brain formation. In humans, de novo mutations result in a syndromic form of autism-like spectrum disorder (ASD), including cognitive and motor deficits, the ADNP syndrome. This protein is also related to autophagy and the pathophysiology of schizophrenia.


Pssm-ID: 466132 [Multi-domain]  Cd Length: 744  Bit Score: 230.51  E-value: 1.55e-63
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783    1 MFQIPVENLDNIRK----------------------DLKGFDPGEKYFHNTSWGDVSLWEPSGKKVR-YRTKPYCCGLCK 57
Cdd:pfam19627    1 MFQLPVNNLGSLRKarknvkkilsdigleyckehieDFKDFEPNDFYIKNTSWDDVCLWDPSLTKNQdYRTKPFCCSGCP 80
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783   58 YSTKVLTSFKNHLHRYHEDEIDQELVIPCPNCVFASQPKVVGRHFRMFHAPVRKVQNYTVNILGETKSSRSDVIS----- 132
Cdd:pfam19627   81 FSSKFFSAYKSHFRNVHSEDFENRILLNCPYCTYNGNKKTLETHIKLFHMPNNVVRQPSGGPVGFKDKSKQDSLKpkqgd 160
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  133 ------FTCLKCNFSNTLYYSMKKHVLVAHFHYLINSYFGlRTEEmgeqpKT-NDTVSIEKIPPPDKYYCKKCNANASSQ 205
Cdd:pfam19627  161 sveqavYYCKKCTYRDPLYNVVRKHIYREHFQHVAAPYVA-KPGE-----KSvNGAVASSNTRDDGSIHCKRCLFMPRTY 234
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  206 DALMYHiltsdihrdlenklrsVISEHIKrtgllkqthiapkpaahlaapangsapsapaqppcfhlalpqnspspaAGQ 285
Cdd:pfam19627  235 EALVQH----------------VIEDHER------------------------------------------------IGY 250
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  286 PVTVAQGapgslthsppaagqshmtlvssplpvgqnsltlqppapqpvflshgvplHQSVnppVLPLSQPvgpvnksvgt 365
Cdd:pfam19627  251 QVTAMIG-------------------------------------------------HTNV---VVPRSKP---------- 268
                          410       420       430       440       450       460       470       480
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  366 svlpinqtvrpgvLPLTQPVGPINRPVGpgvLPVSPSVTPGVLQAvspgvlsvsravpsgvlpagqmtpagqmtpagvip 445
Cdd:pfam19627  269 -------------LMLIAPKPQDKKSLG---VTQKGGLVTGNVRS----------------------------------- 297
                          490       500       510       520       530       540       550       560
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  446 gqtatsgvLPTGQMvqsgvlpvgqtapSRVLPPGqtaplrvisagqvvpSGLLSPnqtvsssavVPVNQGvnsgvlQLSQ 525
Cdd:pfam19627  298 --------LSSQQM-------------NRLSIPK---------------ANLLSN---------VHLKQG------SYGL 326
                          570       580       590       600       610       620       630       640
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  526 PVVSGVLPVGQPVRPGvlqlnqtvgtniLPVNQPVR-PGASQNttfltsgsiLRQLIPTGkqvNGIPTYTLAPVSVTLP- 603
Cdd:pfam19627  327 KSMPSFYVLGQQVRLS------------LPGNAQVSvPQQSQT---------VKQLLPGG---NGRPSTVGSSQSGQQPa 382
                          650       660       670       680       690       700       710       720
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  604 ---VPPGGLATVAPPQM-PIQLLPSGAAAPMAGSMPGMPSppvlvnaaqsvfvqASSSAADTNQVLKqakqWKTCPVCNE 679
Cdd:pfam19627  383 rfsVQSGNSASSSSSQLkSPPLSSSVAATRALGQGPSKSS--------------ASAAGLNTSYTQK----WKICTICNE 444
                          730       740       750       760       770       780       790       800
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  680 LFPSNVYQVHMEVAHKhsesksgeklePEKLAACAPFLKWMREKTVRCLSCKCLVSEEELIHHLLMHGLGCLFCPCTFHD 759
Cdd:pfam19627  445 LFPENVYSAHFEKEHK-----------AEKVPAVANYIMKIHNFTSKCLYCNRYLPSDTLLNHMLIHGLSCPYCRSTFND 513
                          810       820       830       840       850       860       870       880
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  760 IKGLSEHSRNRHLGKKKLPMDYSNRGFQLDVDaNGNLLFPHLDFITILPKEKLGEREVYLA---------ILAGIHSKSL 830
Cdd:pfam19627  514 VEKMVAHMRMVHPDEEVGPRTDSPLTFDLTLQ-QGNPKNIQLLVTTYNMRDAPEESVAFHAqnnspqpkkPKPKVQEKSD 592
                          890       900       910       920       930       940       950       960
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  831 VPvyVKVRPQaegTPGSTGKRV--STCPFCF----GPFvtTEAYELHLKERHHIMPTVHTVLKSPAFKCIHCCGVYTGNM 904
Cdd:pfam19627  593 VP--VKSSPQ---AAVPYKKDVgkTLCPLCFsilkGPI--SDALAHHLRERHQVIQTVHPVEKKLTYKCIHCLGVYTSNM 665
                          970       980       990      1000      1010      1020
                   ....*....|....*....|....*....|....*....|....*....|....*....|.
gi 2462559783  905 TLAAIAVHLVRCRSAPK--DSSSDLQAQPGFIHNSELLLVSGEVMH-DSSFSVKRKLPDGH 962
Cdd:pfam19627  666 TASTITLHLVHCRGVGKtqNGQDKSAPSPRVTQSPGAAPLKRELEHvDPALPKKRKLDDEE 726
PHA03247 PHA03247
large tegument protein UL36; Provisional
248-641 5.40e-15

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 80.37  E-value: 5.40e-15
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  248 PAAHLAAPANGSAPSAPAQPPcfhlalpqnsPSPAAGQPVTVAQGAPGSLTHSPPAA--GQSHMTLVSSPLPVGQNSLTL 325
Cdd:PHA03247  2557 PAAPPAAPDRSVPPPRPAPRP----------SEPAVTSRARRPDAPPQSARPRAPVDdrGDPRGPAPPSPLPPDTHAPDP 2626
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  326 QPPAPQPVFLSHGVPLHQSVNPPVLPLSQPVGPV------NKSVGTSVLPINQTVRPGVLPLTQPVGPINRPVGPGVLPV 399
Cdd:PHA03247  2627 PPPSPSPAANEPDPHPPPTVPPPERPRDDPAPGRvsrprrARRLGRAAQASSPPQRPRRRAARPTVGSLTSLADPPPPPP 2706
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  400 SPSVTPGVLQAVSPgvLSVSRAVPSGVLPAGQMTPAGQMTPAG-VIPGQTATSGVLPTGQMVQSGVLPVGQ-TAPSRVLP 477
Cdd:PHA03247  2707 TPEPAPHALVSATP--LPPGPAAARQASPALPAAPAPPAVPAGpATPGGPARPARPPTTAGPPAPAPPAAPaAGPPRRLT 2784
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  478 PGQTAPLRVISAGQVVPSGLLSPNQTVSS-SAVVPVNQGVNSGVlqlsqPVVSGVLPVGQPVRPGVLQLNQTVGTNILPV 556
Cdd:PHA03247  2785 RPAVASLSESRESLPSPWDPADPPAAVLApAAALPPAASPAGPL-----PPPTSAQPTAPPPPPGPPPPSLPLGGSVAPG 2859
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  557 NQPVRPGASQNTTFLTSGsilrqliPTGKQVNGIPTYTLAPVSVTLPVPPGGLATVAPPQMPIQLLPSGAAAPMAGSMPG 636
Cdd:PHA03247  2860 GDVRRRPPSRSPAAKPAA-------PARPPVRRLARPAVSRSTESFALPPDQPERPPQPQAPPPPQPQPQPPPPPQPQPP 2932

                   ....*
gi 2462559783  637 MPSPP 641
Cdd:PHA03247  2933 PPPPP 2937
PHA03247 PHA03247
large tegument protein UL36; Provisional
246-640 7.02e-14

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 76.90  E-value: 7.02e-14
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  246 PKPAAHLAAPANGSA---PSAPAQP--PCFHLALPQNSPSPAAGQPVTVAQGAPgsltHSPPAAGQSHmtlvSSPLPVGQ 320
Cdd:PHA03247  2571 PRPAPRPSEPAVTSRarrPDAPPQSarPRAPVDDRGDPRGPAPPSPLPPDTHAP----DPPPPSPSPA----ANEPDPHP 2642
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  321 NSLTLQPPAPQPVFLSHGVPLHQSVNP---PVLPLSQPVGPVNKSVGTSVLPINQTVRPGvlPLTQPVGPINRPVGPGVl 397
Cdd:PHA03247  2643 PPTVPPPERPRDDPAPGRVSRPRRARRlgrAAQASSPPQRPRRRAARPTVGSLTSLADPP--PPPPTPEPAPHALVSAT- 2719
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  398 PVSPSVTPGVLQAVSPGVLSVSRAVPSG-VLPAGQMTPAGQMTPAGviPGQTATSGVLPTGQMVQSGVLPVGQTAPSRVL 476
Cdd:PHA03247  2720 PLPPGPAAARQASPALPAAPAPPAVPAGpATPGGPARPARPPTTAG--PPAPAPPAAPAAGPPRRLTRPAVASLSESRES 2797
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  477 PPGQTAPLRViSAGQVVPSGLLSPNQTVSS-----SAVVPVNQGVNSGVLQLSQPVVSGVLPVGQPVRPGVLQlnQTVGT 551
Cdd:PHA03247  2798 LPSPWDPADP-PAAVLAPAAALPPAASPAGplpppTSAQPTAPPPPPGPPPPSLPLGGSVAPGGDVRRRPPSR--SPAAK 2874
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  552 NILPVNQPVR----PGASQNTTFLTSGSILRQLIPTGK-QVNGIPTYTLAPVSVTLPVPPgglatvAPPQMPIQLLPSGA 626
Cdd:PHA03247  2875 PAAPARPPVRrlarPAVSRSTESFALPPDQPERPPQPQaPPPPQPQPQPPPPPQPQPPPP------PPPRPQPPLAPTTD 2948
                          410
                   ....*....|....
gi 2462559783  627 AAPMAGSMPGMPSP 640
Cdd:PHA03247  2949 PAGAGEPSGAVPQP 2962
PHA03247 PHA03247
large tegument protein UL36; Provisional
246-492 2.16e-09

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 62.26  E-value: 2.16e-09
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  246 PKPAAhLAAPANGSAPSAPAQPPCFHLALPQNSP-SPAAGQPVTVAQGAPGSLTHSPPAAGQSHMTLVSSPLPVGQNSLT 324
Cdd:PHA03247  2758 ARPPT-TAGPPAPAPPAAPAAGPPRRLTRPAVASlSESRESLPSPWDPADPPAAVLAPAAALPPAASPAGPLPPPTSAQP 2836
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  325 LQPPAPqPVFLSHGVPLHQSVNP--PVLPLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQPVGpiNRPVGPGVLPVSPS 402
Cdd:PHA03247  2837 TAPPPP-PGPPPPSLPLGGSVAPggDVRRRPPSRSPAAKPAAPARPPVRRLARPAVSRSTESFA--LPPDQPERPPQPQA 2913
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  403 VTPGVLQAVSPGVLSVSRAVPSGVLPAGQMTP----AGQMTPAGVIPgqTATSGVLPTGQmVQSGVLPVGQTAPSRVLPP 478
Cdd:PHA03247  2914 PPPPQPQPQPPPPPQPQPPPPPPPRPQPPLAPttdpAGAGEPSGAVP--QPWLGALVPGR-VAVPRFRVPQPAPSREAPA 2990
                          250
                   ....*....|....
gi 2462559783  479 GQTAPLRVISAGQV 492
Cdd:PHA03247  2991 SSTPPLTGHSLSRV 3004
PHA03247 PHA03247
large tegument protein UL36; Provisional
245-578 9.00e-09

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 59.95  E-value: 9.00e-09
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  245 APKPAAHLAAPANGSAPSAPAQPPcfhlALPQNSPSPAAGQPVTVAQGAPGS-LTHSPPAAGQSHMTLVSsPLPVGQNSL 323
Cdd:PHA03247  2688 ARPTVGSLTSLADPPPPPPTPEPA----PHALVSATPLPPGPAAARQASPALpAAPAPPAVPAGPATPGG-PARPARPPT 2762
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  324 TLQPPAPQPVFLSHGVPLHQSVNPPVLPLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQ-PVGPINRPvgPGVLPVSPS 402
Cdd:PHA03247  2763 TAGPPAPAPPAAPAAGPPRRLTRPAVASLSESRESLPSPWDPADPPAAVLAPAAALPPAAsPAGPLPPP--TSAQPTAPP 2840
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  403 VTPGVLQAvspgvlsvSRAVPSGVLPAGqmtPAGQMTPAGVIPGQTATSGVLPTGQMVQSgvlPVGQTAPSRVLPPGQTA 482
Cdd:PHA03247  2841 PPPGPPPP--------SLPLGGSVAPGG---DVRRRPPSRSPAAKPAAPARPPVRRLARP---AVSRSTESFALPPDQPE 2906
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  483 PLRVISAGQvvPSGLLSPNQTVSSSAVVPVNQGVNSGVLQ-----LSQPVVSGVLPVGQ--PVRPGVLQLNQTvgtnILP 555
Cdd:PHA03247  2907 RPPQPQAPP--PPQPQPQPPPPPQPQPPPPPPPRPQPPLApttdpAGAGEPSGAVPQPWlgALVPGRVAVPRF----RVP 2980
                          330       340
                   ....*....|....*....|...
gi 2462559783  556 VNQPVRPGASQNTTFLTSGSILR 578
Cdd:PHA03247  2981 QPAPSREAPASSTPPLTGHSLSR 3003
PHA03379 PHA03379
EBNA-3A; Provisional
233-564 1.05e-07

EBNA-3A; Provisional


Pssm-ID: 223066 [Multi-domain]  Cd Length: 935  Bit Score: 56.22  E-value: 1.05e-07
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  233 IKRTGllKQTHIAPKPAAHLAAPANGSaPSAPAQPPcfHLALPQNSPSPAAGQPVTVAQGAPGSLTHSPPAAGQSHMtlv 312
Cdd:PHA03379   391 LMRAG--KLTERAREALEKASEPTYGT-PRPPVEKP--RPEVPQSLETATSHGSAQVPEPPPVHDLEPGPLHDQHSM--- 462
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  313 sSPLPVGQNsltlqPPAP----QPVFLSHGVPlhQSVNPPVLPLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQPVGPI 388
Cdd:PHA03379   463 -APCPVAQL-----PPGPlqdlEPGDQLPGVV--QDGRPACAPVPAPAGPIVRPWEASLSQVPGVAFAPVMPQPMPVEPV 534
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  389 NRPVGPGVLPVSPSVTPGVLQAvsPGVLSVSRAV----------PSGVLPAGQMT---------PAGQMTPAGV------ 443
Cdd:PHA03379   535 PVPTVALERPVCPAPPLIAMQG--PGETSGIVRVrerwrpapwtPNPPRSPSQMSvrdrlarlrAEAQPYQASVevqppq 612
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  444 ---IPGQTATSGVL-PTGQM------------VQSGVLPVGQtAPSRVLPPGQ-------TAPLRViSAGQVVPSGLLSP 500
Cdd:PHA03379   613 ltqVSPQQPMEYPLePEQQMfpgspfsqvadvMRAGGVPAMQ-PQYFDLPLQQpisqgapLAPLRA-SMGPVPPVPATQP 690
                          330       340       350       360       370       380       390
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|...
gi 2462559783  501 nQTVSSSAVVPVNQGVNSGVLQLSQPVVSGVLP---------VGQPVRPGVLQlNQTVGtniLPVNQPVRPGA 564
Cdd:PHA03379   691 -QYFDIPLTEPINQGASAAHFLPQQPMEGPLVPerwmfqgatLSQSVRPGVAQ-SQYFD---LPLTQPINHGA 758
SP1-4_arthropods_N cd22553
N-terminal domain of transcription factor Specificity Protein (SP) 1-4 from arthropods; ...
362-679 1.58e-07

N-terminal domain of transcription factor Specificity Protein (SP) 1-4 from arthropods; Specificity Proteins (SPs) are transcription factors that are involved in many cellular processes, including cell differentiation, cell growth, apoptosis, immune responses, response to DNA damage, and chromatin remodeling. There are many SPs in vertebrates (9 SPs in humans and mice, 7 SPs in the chicken, and 11 SPs in teleost fish), but arthropods only have 3 SPs. One SP is clade SP1-4, which is expressed ubiquitously throughout development. SP1-4 belongs to a family of proteins, called the SP/Kruppel or Krueppel-like Factor (KLF) family, characterized by a C-terminal DNA-binding domain of 81 amino acids consisting of three Kruppel-like C2H2 zinc fingers. These factors bind to a loose consensus motif, namely NNRCRCCYY (where N is any nucleotide; R is A/G, and Y is C/T), such as the recurring motifs in GC and GT boxes (5'-GGGGCGGGG-3' and 5-GGTGTGGGG-3') that are present in promoters and more distal regulatory elements of mammalian genes. SP factors preferentially bind GC boxes, while KLFs bind CACCC boxes. Another characteristic hallmark of SP factors is the presence of the Buttonhead (BTD) box CXCPXC, just N-terminal to the zinc fingers. The function of the BTD box is unknown, but it is thought to play an important physiological role. Another feature of most SP factors is the presence of a conserved amino acid stretch, the so-called SP box, located close to the N-terminus. This model represents the N-terminal domain of SP1-4 from arthropods.


Pssm-ID: 411778 [Multi-domain]  Cd Length: 384  Bit Score: 55.03  E-value: 1.58e-07
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  362 SVGTSVLPINQTVRPGVLPLTQPVGPINRPVGPGVLPVSPSVTPGVLQAVSpgVLSVSRAVPSGVLPAGQMTPAG-QMTP 440
Cdd:cd22553     35 ETHDPLILSPPLSQPQQIITAQSSGSAAGGVAYSVSPAVQTVTVDGHEAIF--IPANSGLLQTNNQQAIQLAPGGtQAIL 112
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  441 AGvipGQTATSGVLPTGQMVQSGVLPV-GQTAPSRV---LPP---GQTAPLRV-ISA--GQVVPSGLLSPNQTVSSSAVV 510
Cdd:cd22553    113 AN---QQTLIRPNTVQGQANASNVLQNiAQIASGGNavqLPLnnmTQTIPVQVpVSTanGQTVYQTIQVPIQAIQSGNAG 189
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  511 PVNQGVNSGVL-QLSQPvvsgvlpvgqpvrpGVLQLNQTVGTNILPVNQPVRPGASQNTTFL------TSGSILRQLIPT 583
Cdd:cd22553    190 GGNQALQAQVIpQLAQA--------------AQLQPQQLAQVSSQGYIQQIPANASQQQPQMvqqgpnQSGQIIGQVASA 255
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  584 -GKQVNGIPTYTLApvSVTLPVPPGGLATVAP-PQMPIQLLPSGAAAPMAGSMPGMPSPPVLVNAAQSVFVQASSSAADT 661
Cdd:cd22553    256 sSIQAAAIPLTVYT--GALAGQNGSNQQQVGQiVTSPIQGMTQGLTAPASSSIPTVVQQQAIQGNPLPPGTQIIAAGQQL 333
                          330       340       350
                   ....*....|....*....|....*....|....*.
gi 2462559783  662 NQVLKQAKQWK------------------TCPVCNE 679
Cdd:cd22553    334 QQDPNDPTKWQvvadgtpgskkrlrrvacTCPNCRD 369
Atrophin-1 pfam03154
Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian ...
279-642 1.91e-07

Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian atrophy (DRPLA) gene. DRPLA OMIM:125370 is a progressive neurodegenerative disorder. It is caused by the expansion of a CAG repeat in the DRPLA gene on chromosome 12p. This results in an extended polyglutamine region in atrophin-1, that is thought to confer toxicity to the protein, possibly through altering its interactions with other proteins. The expansion of a CAG repeat is also the underlying defect in six other neurodegenerative disorders, including Huntington's disease. One interaction of expanded polyglutamine repeats that is thought to be pathogenic is that with the short glutamine repeat in the transcriptional coactivator CREB binding protein, CBP. This interaction draws CBP away from its usual nuclear location to the expanded polyglutamine repeat protein aggregates that are characteriztic of the polyglutamine neurodegenerative disorders. This interferes with CBP-mediated transcription and causes cytotoxicity.


Pssm-ID: 460830 [Multi-domain]  Cd Length: 991  Bit Score: 55.54  E-value: 1.91e-07
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  279 PSPAAGQPVTVAQGAPGSLTHSPPAAGQSHMTLVSSPLPVGQNSlTLQPPAPQPVFLSHGVPLHQSVNPPVLPLSQPVGP 358
Cdd:pfam03154  149 PSPQDNESDSDSSAQQQILQTQPPVLQAQSGAASPPSPPPPGTT-QAATAGPTPSAPSVPPQGSPATSQPPNQTQSTAAP 227
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  359 VNKSVGTSVLPINQ--TVRPGVLPLTQPVGPINRPVGPGVLPVSPSVTPGVLQAVSPGVLSVSRAVPSGVLPAGQMTPAG 436
Cdd:pfam03154  228 HTLIQQTPTLHPQRlpSPHPPLQPMTQPPPPSQVSPQPLPQPSLHGQMPPMPHSLQTGPSHMQHPVPPQPFPLTPQSSQS 307
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  437 QM--TPAGVIPGQTATSGVLPTGQ-MVQSGVLPVGQTAPSRVLP-----PGQTAPLRVISAGQ--------VVPSGLLSP 500
Cdd:pfam03154  308 QVppGPSPAAPGQSQQRIHTPPSQsQLQSQQPPREQPLPPAPLSmphikPPPTTPIPQLPNPQshkhpphlSGPSPFQMN 387
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  501 NQTVSSSAVVPVNQGVNSGVLQLSQPVVSgVLPVGQPVRPGVLQlnqtvgTNILPVNQPVRPGASQNTTFLTSGSILRQl 580
Cdd:pfam03154  388 SNLPPPPALKPLSSLSTHHPPSAHPPPLQ-LMPQSQQLPPPPAQ------PPVLTQSQSLPPPAASHPPTSGLHQVPSQ- 459
                          330       340       350       360       370       380
                   ....*....|....*....|....*....|....*....|....*....|....*....|..
gi 2462559783  581 iptgkqvNGIPTYTLAPVSVTLPVPPGGLATVAPPQMPIQLLPSGAAAPMAGSMPGMPSPPV 642
Cdd:pfam03154  460 -------SPFPQHPFVPGGPPPITPPSGPPTSTSSAMPGIQPPSSASVSSSGPVPAAVSCPL 514
HOX smart00389
Homeodomain; DNA-binding factors that are involved in the transcriptional regulation of key ...
1021-1074 1.58e-06

Homeodomain; DNA-binding factors that are involved in the transcriptional regulation of key developmental processes


Pssm-ID: 197696 [Multi-domain]  Cd Length: 57  Bit Score: 46.09  E-value: 1.58e-06
                            10        20        30        40        50
                    ....*....|....*....|....*....|....*....|....*....|....
gi 2462559783  1021 PKKYEGRSYEEKKQFLKDYFHKKPYPSKKEIELLSSLFWVWKIDVASFFGKRRY 1074
Cdd:smart00389    1 KRRKRTSFTPEQLEELEKEFQKNPYPSREEREELAKKLGLSERQVKVWFQNRRA 54
Atrophin-1 pfam03154
Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian ...
241-658 1.65e-06

Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian atrophy (DRPLA) gene. DRPLA OMIM:125370 is a progressive neurodegenerative disorder. It is caused by the expansion of a CAG repeat in the DRPLA gene on chromosome 12p. This results in an extended polyglutamine region in atrophin-1, that is thought to confer toxicity to the protein, possibly through altering its interactions with other proteins. The expansion of a CAG repeat is also the underlying defect in six other neurodegenerative disorders, including Huntington's disease. One interaction of expanded polyglutamine repeats that is thought to be pathogenic is that with the short glutamine repeat in the transcriptional coactivator CREB binding protein, CBP. This interaction draws CBP away from its usual nuclear location to the expanded polyglutamine repeat protein aggregates that are characteriztic of the polyglutamine neurodegenerative disorders. This interferes with CBP-mediated transcription and causes cytotoxicity.


Pssm-ID: 460830 [Multi-domain]  Cd Length: 991  Bit Score: 52.46  E-value: 1.65e-06
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  241 QTHIAPKPAAHLAAPANGSAPSaPAQPPCFHLALPQNSPSPAAGQPVTVAQGAPGSLTHSPPaagQSHMTLVSSPLPVGQ 320
Cdd:pfam03154  175 QAQSGAASPPSPPPPGTTQAAT-AGPTPSAPSVPPQGSPATSQPPNQTQSTAAPHTLIQQTP---TLHPQRLPSPHPPLQ 250
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  321 nSLTLQPPAPQPVFLSHGVPLHQSVNPPvLPLSQPVGP--VNKSVGTSVLPI-NQTVRPGVLPLTQPVGPI---NRPVGP 394
Cdd:pfam03154  251 -PMTQPPPPSQVSPQPLPQPSLHGQMPP-MPHSLQTGPshMQHPVPPQPFPLtPQSSQSQVPPGPSPAAPGqsqQRIHTP 328
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  395 GVLPVSPSVTPGVLQAVSPGVLSVSRAVPSGVLPAGQM-TPAGQMTPAGVipgqtatSGVLPTgQMvqsgvlpvgqtaPS 473
Cdd:pfam03154  329 PSQSQLQSQQPPREQPLPPAPLSMPHIKPPPTTPIPQLpNPQSHKHPPHL-------SGPSPF-QM------------NS 388
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  474 RVLPPGQTAPLRVISAGQVvPSGLLSPnqtvsssavvpvnqgvnsgvLQLsqpvvsgvLPVGQPVRPgvlqlnqtvgtni 553
Cdd:pfam03154  389 NLPPPPALKPLSSLSTHHP-PSAHPPP--------------------LQL--------MPQSQQLPP------------- 426
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  554 lPVNQPvrPGASQNTTFLTSGSilrqliptgkqvNGIPTYTLAPVSVTLPVPPGGLATVAPPQMpiqLLPSGAAAPMAGS 633
Cdd:pfam03154  427 -PPAQP--PVLTQSQSLPPPAA------------SHPPTSGLHQVPSQSPFPQHPFVPGGPPPI---TPPSGPPTSTSSA 488
                          410       420
                   ....*....|....*....|....*
gi 2462559783  634 MPGMpSPPVLVNAAQSVFVQASSSA 658
Cdd:pfam03154  489 MPGI-QPPSSASVSSSGPVPAAVSC 512
PHA03247 PHA03247
large tegument protein UL36; Provisional
255-659 2.87e-06

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 51.86  E-value: 2.87e-06
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  255 PANGSAPSAPAQPPCfhlalPQNSPSPAAGQPVTVAQGAPGSLTHSPPA-AGQSHMTLVSSPLPVGQNSLTLQPPAPQPV 333
Cdd:PHA03247  2483 PAEARFPFAAGAAPD-----PGGGGPPDPDAPPAPSRLAPAILPDEPVGePVHPRMLTWIRGLEELASDDAGDPPPPLPP 2557
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  334 FLSHGVPlHQSVnPPVLPLSQPVGPVNKSvgtsvlpinQTVRPGVLPltQPvgpiNRPVGPGVLPVSPsvtpgvlqavsP 413
Cdd:PHA03247  2558 AAPPAAP-DRSV-PPPRPAPRPSEPAVTS---------RARRPDAPP--QS----ARPRAPVDDRGDP-----------R 2609
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  414 GVLSVSRAVPSGVLPAgqmTPAGQMTPAGVIPGQTATSGVLPTGQmvqsgvlPVGQTAPSRVLPPgqtapLRVISAGQvv 493
Cdd:PHA03247  2610 GPAPPSPLPPDTHAPD---PPPPSPSPAANEPDPHPPPTVPPPER-------PRDDPAPGRVSRP-----RRARRLGR-- 2672
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  494 PSGLLSPNQTVSSSAVVPVNQGVNSgvlqLSQPVVSGVLPVGQPvRPGVLQLNQTVGTNILPVNQPVRPGASqnttfLTS 573
Cdd:PHA03247  2673 AAQASSPPQRPRRRAARPTVGSLTS----LADPPPPPPTPEPAP-HALVSATPLPPGPAAARQASPALPAAP-----APP 2742
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  574 GSILRQLIPTGKQVNGIPTYTLAPVSvtlPVPPGGLATVAPPQMPIQLLPSGAAAPMAGSMPGMPSPPVLVNAAQSVFVQ 653
Cdd:PHA03247  2743 AVPAGPATPGGPARPARPPTTAGPPA---PAPPAAPAAGPPRRLTRPAVASLSESRESLPSPWDPADPPAAVLAPAAALP 2819

                   ....*.
gi 2462559783  654 ASSSAA 659
Cdd:PHA03247  2820 PAASPA 2825
Soli_cterm TIGR03437
Solibacter uncharacterized C-terminal domain; This model describes a protein domain found in ...
443-645 4.34e-06

Solibacter uncharacterized C-terminal domain; This model describes a protein domain found in 90 proteins of Solibacter usitatus Ellin6076, nearly always as the C-terminal domain of a much larger protein. No homologs to this domain are detected outside of S. usitatus, a member of the Acidobacteria.


Pssm-ID: 274578 [Multi-domain]  Cd Length: 215  Bit Score: 48.81  E-value: 4.34e-06
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  443 VIPGQTAT---SGVLPTGQMVQSGVLPVgQTAPSRVLPPGQTAPLRVISAGQV---VPSGLLSPNQTVsssaVVpVNQGV 516
Cdd:TIGR03437    2 VAPGSIVSifgTNLAPATLTAAGGPLPT-SLGGVSVTVNGVAAPLLYVSPGQInaqVPYEVAPGAATV----TV-TYNGG 75
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  517 NSGVLQLS-QPVVSGVLPVGQ-PVRPGVLQLNQtvGTNILPVNQPVRPGaSQNTTFLTSGSILRQLIPTGKQVNGIPTY- 593
Cdd:TIGR03437   76 ASAAVTVTvAAAAPGIFTLDGsGTGQAAALNNQ--DGSVNSAANPAAPG-DVVVLYATGLGPTSPAVADGAPAPSSPLAp 152
                          170       180       190       200       210       220
                   ....*....|....*....|....*....|....*....|....*....|....*....|.
gi 2462559783  594 TLAPVSVTL-----PVPPGGLATVAPPQMPIQL-LPSGAAA---PMAGSMPGMPSPPVLVN 645
Cdd:TIGR03437  153 ALAPVTVTIggvpaTVLYAGLAPGFVGLYQVNVrVPAGLATgavPVVITVGGVTSNAVTIA 213
homeodomain cd00086
Homeodomain; DNA binding domains involved in the transcriptional regulation of key eukaryotic ...
1022-1078 7.71e-06

Homeodomain; DNA binding domains involved in the transcriptional regulation of key eukaryotic developmental processes; may bind to DNA as monomers or as homo- and/or heterodimers, in a sequence-specific manner.


Pssm-ID: 238039 [Multi-domain]  Cd Length: 59  Bit Score: 44.16  E-value: 7.71e-06
                           10        20        30        40        50
                   ....*....|....*....|....*....|....*....|....*....|....*..
gi 2462559783 1022 KKYEGRSYEEKKQFLKDYFHKKPYPSKKEIELLSSLFWVWKIDVASFFGKRRYICMK 1078
Cdd:cd00086      1 RRKRTRFTPEQLEELEKEFEKNPYPSREEREELAKELGLTERQVKIWFQNRRAKLKR 57
DUF4813 pfam16072
Domain of unknown function (DUF4813); This family of proteins is functionally uncharacterized. ...
423-659 1.13e-05

Domain of unknown function (DUF4813); This family of proteins is functionally uncharacterized. This family of proteins is found in eukaryotes. Proteins in this family are typically between 345 and 672 amino acids in length.


Pssm-ID: 435117 [Multi-domain]  Cd Length: 288  Bit Score: 48.60  E-value: 1.13e-05
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  423 PSGVLPAGqmtpaGQMTPAGVIPGqtaTSGVLPTGqmvqsGVlPVGQT----APSRVLPPGQTaplrVISAGQVVPSGLL 498
Cdd:pfam16072   13 PGGYAPAG-----ATYHPAGQVPA---GATYYPSG-----GV-PHGATyypqAPVAAVPAGAT----YLPAGAAIPAGAT 74
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  499 SPNQTVSSSAVVPVNQGVNSG---------VLQLSQPVVSGVLPVGQPVRPGVLQLNQTVGTNILPVNQPvrPGASQNTT 569
Cdd:pfam16072   75 YYPQAPKSSSGLGLGTGLIAGalggailghALTPTQTRVVEHAPSSGGGGGGGGYSNGNNEDKIIIINNG--PPGSVTTT 152
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  570 FLTSGSilrQLIPTGKQVNGiptytlAPVSVTLPVPPGGLATVAPPQMPIQlLPSGAAAPMAGSMPGMPSPPVLVNAAQS 649
Cdd:pfam16072  153 SAGSGT---TVINAGGQQPA------APAAPAYPVAPAAYPAQAPAAAPAP-APGAPQTPLAPLNPVAAAPAAAAGAAAA 222
                          250
                   ....*....|
gi 2462559783  650 VFVQASSSAA 659
Cdd:pfam16072  223 PVVAAAAPAA 232
PHA03247 PHA03247
large tegument protein UL36; Provisional
244-412 1.44e-05

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 49.55  E-value: 1.44e-05
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  244 IAPKPAAHLAAPANGSAPSAPAQPPCFHLA--------LPQNSP--SPAAGQPVTVAQGAPGSLTHSPPAAGQSHMTLVS 313
Cdd:PHA03247  2828 LPPPTSAQPTAPPPPPGPPPPSLPLGGSVApggdvrrrPPSRSPaaKPAAPARPPVRRLARPAVSRSTESFALPPDQPER 2907
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  314 SPLPVGQNSLTLQPPAPQPVFLSHGVPLHQSVNPPVLPLSQPvGPVNKSVGTSVLPINQTVRPGVLPLTQPVGPINRPVG 393
Cdd:PHA03247  2908 PPQPQAPPPPQPQPQPPPPPQPQPPPPPPPRPQPPLAPTTDP-AGAGEPSGAVPQPWLGALVPGRVAVPRFRVPQPAPSR 2986
                          170
                   ....*....|....*....
gi 2462559783  394 PGVLPVSPSVTPGVLQAVS 412
Cdd:PHA03247  2987 EAPASSTPPLTGHSLSRVS 3005
SP1-4_arthropods_N cd22553
N-terminal domain of transcription factor Specificity Protein (SP) 1-4 from arthropods; ...
238-579 1.57e-05

N-terminal domain of transcription factor Specificity Protein (SP) 1-4 from arthropods; Specificity Proteins (SPs) are transcription factors that are involved in many cellular processes, including cell differentiation, cell growth, apoptosis, immune responses, response to DNA damage, and chromatin remodeling. There are many SPs in vertebrates (9 SPs in humans and mice, 7 SPs in the chicken, and 11 SPs in teleost fish), but arthropods only have 3 SPs. One SP is clade SP1-4, which is expressed ubiquitously throughout development. SP1-4 belongs to a family of proteins, called the SP/Kruppel or Krueppel-like Factor (KLF) family, characterized by a C-terminal DNA-binding domain of 81 amino acids consisting of three Kruppel-like C2H2 zinc fingers. These factors bind to a loose consensus motif, namely NNRCRCCYY (where N is any nucleotide; R is A/G, and Y is C/T), such as the recurring motifs in GC and GT boxes (5'-GGGGCGGGG-3' and 5-GGTGTGGGG-3') that are present in promoters and more distal regulatory elements of mammalian genes. SP factors preferentially bind GC boxes, while KLFs bind CACCC boxes. Another characteristic hallmark of SP factors is the presence of the Buttonhead (BTD) box CXCPXC, just N-terminal to the zinc fingers. The function of the BTD box is unknown, but it is thought to play an important physiological role. Another feature of most SP factors is the presence of a conserved amino acid stretch, the so-called SP box, located close to the N-terminus. This model represents the N-terminal domain of SP1-4 from arthropods.


Pssm-ID: 411778 [Multi-domain]  Cd Length: 384  Bit Score: 48.48  E-value: 1.57e-05
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  238 LLKQTHIAPKPAAHLAAPANGSAPSAPAQPPCFHLALPQNSPSPAAGQP--VTVAQ---GAPGSLTHSPPAAGQShMTL- 311
Cdd:cd22553      1 FNQSQQVAPSELAQVATTASNIGGQQKQAQSDSSETHDPLILSPPLSQPqqIITAQssgSAAGGVAYSVSPAVQT-VTVd 79
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  312 ----VSSPLPVGQNSLTLQPPAPQPVFLSHGVPLHQSVnppvlpLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQPVGP 387
Cdd:cd22553     80 gheaIFIPANSGLLQTNNQQAIQLAPGGTQAILANQQT------LIRPNTVQGQANASNVLQNIAQIASGGNAVQLPLNN 153
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  388 INRPVgPGVLPVSPSVTPGVLQAVSPGVLSVSRAVPSGVLPAGQMTPAGQMTPAGVI-PGQTATsgvlPTGQMVQSGVLP 466
Cdd:cd22553    154 MTQTI-PVQVPVSTANGQTVYQTIQVPIQAIQSGNAGGGNQALQAQVIPQLAQAAQLqPQQLAQ----VSSQGYIQQIPA 228
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  467 VGQTAPSRVLPPGQTaplrviSAGQVVPSGL-LSPNQTVSSSAVVPVNQGVNSGVLQLSQPVVSGVLPVGQPVRPGVLQL 545
Cdd:cd22553    229 NASQQQPQMVQQGPN------QSGQIIGQVAsASSIQAAAIPLTVYTGALAGQNGSNQQQVGQIVTSPIQGMTQGLTAPA 302
                          330       340       350
                   ....*....|....*....|....*....|....
gi 2462559783  546 NQTVGTNILPvNQPVRPGASQNTTFLTSGSILRQ 579
Cdd:cd22553    303 SSSIPTVVQQ-QAIQGNPLPPGTQIIAAGQQLQQ 335
PPE COG5651
PPE-repeat protein [Function unknown];
340-542 3.17e-05

PPE-repeat protein [Function unknown];


Pssm-ID: 444372 [Multi-domain]  Cd Length: 385  Bit Score: 47.58  E-value: 3.17e-05
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  340 PLHQSVNPPVLPLSQPVGPVNKSVGTSV--LPINQTVRPGVLPLTQPVGPINRPVGPGVLPVSPSVTPGVLQAVSPGVLS 417
Cdd:COG5651    170 PPPTITNPGGLLGAQNAGSGNTSSNPGFanLGLTGLNQVGIGGLNSGSGPIGLNSGPGNTGFAGTGAAAGAAAAAAAAAA 249
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  418 VSRAVPSGVLPAGQMTPAGQMTPAGVIPGQTATSGVLPTGQMVQSGVLPVGQTAPSRVLPPGQTAPLRVISAGqVVPSGL 497
Cdd:COG5651    250 AAGAGASAALASLAATLLNASSLGLAATAASSAATNLGLAGSPLGLAGGGAGAAAATGLGLGAGGAAGAAGAT-GAGAAL 328
                          170       180       190       200
                   ....*....|....*....|....*....|....*....|....*
gi 2462559783  498 LSPNQTVSSSAVVPVNQGVNSGVLQLSQPVVSGVLPVGQPVRPGV 542
Cdd:COG5651    329 GAGAAAAAAGAAAGAGAAAAAAAGGAGGGGGGALGAGGGGGSAGA 373
PHA02682 PHA02682
ORF080 virion core protein; Provisional
246-353 4.72e-05

ORF080 virion core protein; Provisional


Pssm-ID: 177464 [Multi-domain]  Cd Length: 280  Bit Score: 46.39  E-value: 4.72e-05
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  246 PKPAAHLAAPANGSAPSAPAQPPCFHLALPQNSPSPAAGQPvtvAQGAPGSLTHSPPAagqshmtlvsSPLPvgqnslTL 325
Cdd:PHA02682    96 PACAPAAPAPAVTCPAPAPACPPATAPTCPPPAVCPAPARP---APACPPSTRQCPPA----------PPLP------TP 156
                           90       100
                   ....*....|....*....|....*....
gi 2462559783  326 QP-PAPQPVFlshgvpLHQSVNPPVLPLS 353
Cdd:PHA02682   157 KPaPAAKPIF------LHNQLPPPDYPAA 179
PRK10263 PRK10263
DNA translocase FtsK; Provisional
246-478 9.91e-05

DNA translocase FtsK; Provisional


Pssm-ID: 236669 [Multi-domain]  Cd Length: 1355  Bit Score: 46.62  E-value: 9.91e-05
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  246 PKPAAHLAAPAngSAPSAPAQPPCFHLALPQNSPSPAAGQPVTVAQGAPGSLTHSPPAAGQ---SHMTLVSSPLPVGQNS 322
Cdd:PRK10263   362 PVPGPQTGEPV--IAPAPEGYPQQSQYAQPAVQYNEPLQQPVQPQQPYYAPAAEQPAQQPYyapAPEQPAQQPYYAPAPE 439
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  323 LTL-----QPPAPQPVFLSHgvPLHQSVNPPVLPLSQPVG-----PVNKSVGTSVLPINQTVRPGVLPL----------- 381
Cdd:PRK10263   440 QPVagnawQAEEQQSTFAPQ--STYQTEQTYQQPAAQEPLyqqpqPVEQQPVVEPEPVVEETKPARPPLyyfeeveekra 517
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  382 ---------TQPV-GPI--NRPVGPGVLPVSPSVTPGVLQAvsPGVLSVSRAVPSGVLPAGqmTPAGQMTPAGvipgqTA 449
Cdd:PRK10263   518 rereqlaawYQPIpEPVkePEPIKSSLKAPSVAAVPPVEAA--AAVSPLASGVKKATLATG--AAATVAAPVF-----SL 588
                          250       260
                   ....*....|....*....|....*....
gi 2462559783  450 TSGVLPTGQmVQSGVLPvGQTAPSRVLPP 478
Cdd:PRK10263   589 ANSGGPRPQ-VKEGIGP-QLPRPKRIRVP 615
Atrophin-1 pfam03154
Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian ...
242-412 2.64e-04

Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian atrophy (DRPLA) gene. DRPLA OMIM:125370 is a progressive neurodegenerative disorder. It is caused by the expansion of a CAG repeat in the DRPLA gene on chromosome 12p. This results in an extended polyglutamine region in atrophin-1, that is thought to confer toxicity to the protein, possibly through altering its interactions with other proteins. The expansion of a CAG repeat is also the underlying defect in six other neurodegenerative disorders, including Huntington's disease. One interaction of expanded polyglutamine repeats that is thought to be pathogenic is that with the short glutamine repeat in the transcriptional coactivator CREB binding protein, CBP. This interaction draws CBP away from its usual nuclear location to the expanded polyglutamine repeat protein aggregates that are characteriztic of the polyglutamine neurodegenerative disorders. This interferes with CBP-mediated transcription and causes cytotoxicity.


Pssm-ID: 460830 [Multi-domain]  Cd Length: 991  Bit Score: 45.14  E-value: 2.64e-04
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  242 THIAPKPAAH-LAAPANGSAPSApaQPPCFHLaLPQNS--PSPAAgQPVTVAQgapgSLTHSPPAAgqshmtlvSSPLPV 318
Cdd:pfam03154  388 SNLPPPPALKpLSSLSTHHPPSA--HPPPLQL-MPQSQqlPPPPA-QPPVLTQ----SQSLPPPAA--------SHPPTS 451
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  319 GQNSLTLQPPAPQPVFLSHGVPL------HQSVNPPVLPLSQPVGPVNKSVGTSVLPINQTVRPGVLPLTQPVGPINRPV 392
Cdd:pfam03154  452 GLHQVPSQSPFPQHPFVPGGPPPitppsgPPTSTSSAMPGIQPPSSASVSSSGPVPAAVSCPLPPVQIKEEALDEAEEPE 531
                          170       180
                   ....*....|....*....|
gi 2462559783  393 GPGVLPVSPSVTPGVLQAVS 412
Cdd:pfam03154  532 SPPPPPRSPSPEPTVVNTPS 551
PAT1 pfam09770
Topoisomerase II-associated protein PAT1; Members of this family are necessary for accurate ...
245-391 2.66e-04

Topoisomerase II-associated protein PAT1; Members of this family are necessary for accurate chromosome transmission during cell division.


Pssm-ID: 401645 [Multi-domain]  Cd Length: 846  Bit Score: 45.03  E-value: 2.66e-04
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  245 APKPAAhlaaPANGSAPSAPAQPPCFHLALPQNSPSPAAGQ--PVTVAQGAPGSLTHSPPAA----GQSHMTLVSSPLPV 318
Cdd:pfam09770  207 AKKPAQ----QPAPAPAQPPAAPPAQQAQQQQQFPPQIQQQqqPQQQPQQPQQHPGQGHPVTilqrPQSPQPDPAQPSIQ 282
                           90       100       110       120       130       140       150
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....
gi 2462559783  319 GQNSLTLQPPAPQPVflshgVPLHQSVNPPVLPLSQPVGPVNKSVGTSVLPINQTVR-PGVLPltQPVGPINRP 391
Cdd:pfam09770  283 PQAQQFHQQPPPVPV-----QPTQILQNPNRLSAARVGYPQNPQPGVQPAPAHQAHRqQGSFG--RQAPIITHP 349
PHA03247 PHA03247
large tegument protein UL36; Provisional
248-643 3.94e-04

large tegument protein UL36; Provisional


Pssm-ID: 223021 [Multi-domain]  Cd Length: 3151  Bit Score: 44.93  E-value: 3.94e-04
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  248 PAAHLAAP-ANGSAPSAPAQPPCFHLALPQNSPSPAAGQPVTV----------------AQGAPGSLTHS--PPAAGQSH 308
Cdd:PHA03247  2489 PFAAGAAPdPGGGGPPDPDAPPAPSRLAPAILPDEPVGEPVHPrmltwirgleelasddAGDPPPPLPPAapPAAPDRSV 2568
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  309 MTLVSSPLPVGqnsltlqpPAPQPVFLSHGVPLHQsvNPPVLPLSQPVGPVNKSVGTSVLPINQTVRPgvlPLTQPVGPI 388
Cdd:PHA03247  2569 PPPRPAPRPSE--------PAVTSRARRPDAPPQS--ARPRAPVDDRGDPRGPAPPSPLPPDTHAPDP---PPPSPSPAA 2635
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  389 NRPVGPGVLPVSPSVTPGvlQAVSPGVLSVSRAVPSGVLPAGQMTPAGQMTPAGVIPGqtatsgVLPTGQMVQSGVLPVG 468
Cdd:PHA03247  2636 NEPDPHPPPTVPPPERPR--DDPAPGRVSRPRRARRLGRAAQASSPPQRPRRRAARPT------VGSLTSLADPPPPPPT 2707
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  469 QTAPSRVLPPGQTAPLRVISAGQVVPSGLLSPnqtvsssavvpvnqgvnsgvlqLSQPVVSGVLPVGQPVRPGVLQLNQT 548
Cdd:PHA03247  2708 PEPAPHALVSATPLPPGPAAARQASPALPAAP----------------------APPAVPAGPATPGGPARPARPPTTAG 2765
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  549 VGTNILPVNQPVRPGASQNTTFLTSGSILRQLIPTGKQVngiptytlAPVSVTLPVPPGGLATVAPPQMPiqLLPSGAAA 628
Cdd:PHA03247  2766 PPAPAPPAAPAAGPPRRLTRPAVASLSESRESLPSPWDP--------ADPPAAVLAPAAALPPAASPAGP--LPPPTSAQ 2835
                          410
                   ....*....|....*
gi 2462559783  629 PMAGSMPGMPSPPVL 643
Cdd:PHA03247  2836 PTAPPPPPGPPPPSL 2850
PRK12323 PRK12323
DNA polymerase III subunit gamma/tau;
241-419 7.34e-04

DNA polymerase III subunit gamma/tau;


Pssm-ID: 237057 [Multi-domain]  Cd Length: 700  Bit Score: 43.71  E-value: 7.34e-04
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  241 QTHIAPKPAAHLAAPANGSAPSAPAQPPCFHLALPqnSPSPAAGQPVTVAQGAPGSLTHSPPAAgqshmtlvSSPLPVGQ 320
Cdd:PRK12323   392 PAAAAPAPAAPPAAPAAAPAAAAAARAVAAAPARR--SPAPEALAAARQASARGPGGAPAPAPA--------PAAAPAAA 461
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  321 NSLTLQPPAPQPVFLSHGVPLHQSVNPPV-----------LPLSQPV-GPVNKSVGTSVLPINQTVRPGVLPL-----TQ 383
Cdd:PRK12323   462 ARPAAAGPRPVAAAAAAAPARAAPAAAPApadddpppweeLPPEFASpAPAQPDAAPAGWVAESIPDPATADPddafeTL 541
                          170       180       190
                   ....*....|....*....|....*....|....*.
gi 2462559783  384 PVGPINRPVGPGVLPVSPSVTPGVLQAVSPGVLSVS 419
Cdd:PRK12323   542 APAPAAAPAPRAAAATEPVVAPRPPRASASGLPDMF 577
Med15 pfam09606
ARC105 or Med15 subunit of Mediator complex non-fungal; The approx. 70 residue Med15 domain of ...
291-689 8.68e-04

ARC105 or Med15 subunit of Mediator complex non-fungal; The approx. 70 residue Med15 domain of the ARC-Mediator co-activator is a three-helix bundle with marked similarity to the KIX domain. The sterol regulatory element binding protein (SREBP) family of transcription activators use the ARC105 subunit to activate target genes in the regulation of cholesterol and fatty acid homeostasis. In addition, Med15 is a critical transducer of gene activation signals that control early metazoan development.


Pssm-ID: 312941 [Multi-domain]  Cd Length: 732  Bit Score: 43.46  E-value: 8.68e-04
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  291 QGAPGSLTHSPPAAGQSHMtlVSSPLPVGQN--SLTLQPPAPQPVFLSHGVPLHQSVNPPVL--PLSQPVGPVNKSVGTS 366
Cdd:pfam09606   60 QQQPQGGQGNGGMGGGQQG--MPDPINALQNlaGQGTRPQMMGPMGPGPGGPMGQQMGGPGTasNLLASLGRPQMPMGGA 137
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  367 VLPINQTvrpGVLPLTQPVGpinrpVGPGVLPVSPSVTPGVLQAvspgvlsvsravpsgvlPAGQMTPAGQMTPaGVIPG 446
Cdd:pfam09606  138 GFPSQMS---RVGRMQPGGQ-----AGGMMQPSSGQPGSGTPNQ-----------------MGPNGGPGQGQAG-GMNGG 191
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  447 QTATSGVLPTGQMVQSGVL-------PVGQTAPSRVLPP---GQTAPLRVISAGQVVPSGllsPNQTVSSSAVVPVNQgV 516
Cdd:pfam09606  192 QQGPMGGQMPPQMGVPGMPgpadagaQMGQQAQANGGMNpqqMGGAPNQVAMQQQQPQQQ---GQQSQLGMGINQMQQ-M 267
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  517 NSGVLQLSQPVVSGVLPVGQPVRPGVLQLNQTVGTNILPVNQPVRPgasqnttfltsgsilRQlipTGKQVNGIPTytlA 596
Cdd:pfam09606  268 PQGVGGGAGQGGPGQPMGPPGQQPGAMPNVMSIGDQNNYQQQQTRQ---------------QQ---QQQGGNHPAA---H 326
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  597 PVSVTLPVPPGGLATVAPPQMPIQLLPSGA-----AAPMAGSMPGMPSPPVLVNAAQSVFVQasssaadTNQVLKQAKQw 671
Cdd:pfam09606  327 QQQMNQSVGQGGQVVALGGLNHLETWNPGNfgglgANPMQRGQPGMMSSPSPVPGQQVRQVT-------PNQFMRQSPQ- 398
                          410
                   ....*....|....*...
gi 2462559783  672 ktcpvcnelfPSNVYQVH 689
Cdd:pfam09606  399 ----------PSVPSPQG 406
SP4_N cd22536
N-terminal domain of transcription factor Specificity Protein (SP) 4; Specificity Proteins ...
255-648 1.13e-03

N-terminal domain of transcription factor Specificity Protein (SP) 4; Specificity Proteins (SPs) are transcription factors that are involved in many cellular processes, including cell differentiation, cell growth, apoptosis, immune responses, response to DNA damage, and chromatin remodeling. Human SP4 is a risk gene of multiple psychiatric disorders including schizophrenia, bipolar disorder, and major depression. SP4 belongs to a family of proteins, called the SP/Kruppel or Krueppel-like Factor (KLF) family, characterized by a C-terminal DNA-binding domain of 81 amino acids consisting of three Kruppel-like C2H2 zinc fingers. These factors bind to a loose consensus motif, namely NNRCRCCYY (where N is any nucleotide; R is A/G, and Y is C/T), such as the recurring motifs in GC and GT boxes (5'-GGGGCGGGG-3' and 5-GGTGTGGGG-3') that are present in promoters and more distal regulatory elements of mammalian genes. SP factors preferentially bind GC boxes, while KLFs bind CACCC boxes. Another characteristic hallmark of SP factors is the presence of the Buttonhead (BTD) box CXCPXC, just N-terminal to the zinc fingers. The function of the BTD box is unknown, but it is thought to play an important physiological role. Another feature of most SP factors is the presence of a conserved amino acid stretch, the so-called SP box, located close to the N-terminus. SP factors may be separated into three groups based on their domain architecture and the similarity of their N-terminal transactivation domains: SP1-4, SP5, and SP6-9. The transactivation domains between the three groups are not homologous to one another. SP1-4 have similar N-terminal transactivation domains characterized by glutamine-rich regions, which, in most cases, have adjacent serine/threonine-rich regions. This model represents the N-terminal domain of SP4.


Pssm-ID: 411773 [Multi-domain]  Cd Length: 623  Bit Score: 42.98  E-value: 1.13e-03
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  255 PANGSAPSAPAQppcFHLALPQNSPSPAAGQPVTVA---QGAPGSLTHSPPAAGQSHMTLVSSP--LPVGQNSLTLQPPA 329
Cdd:cd22536    115 KAGNSNASAPGQ---FQVIQVQNMQNPSGSVQYQVIpqiQTVEGQQIQISPANATALQDLQGQIqlIPAGNNQAILTTPN 191
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  330 PQP-------VFLSHGVPLHqsVNPPV-LPLSQPVGPVNKSVGTSVLPINQtvrpGVLPLTQPVgpINRPVGPG-----V 396
Cdd:cd22536    192 RTAsgniiaqNLANQTVPVQ--IRPGVsIPLQLQTIPGAQAQVVTTLPINI----GGVTLALPV--INNVAAGGgsgqlV 263
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  397 LPVSPSVTPGVLQAVSPGVLSVSRAVPSgvlpagqmtpagqmTPAGVIPGQTATSGVLPTGQMVQSGVLPVGQ--TAPSR 474
Cdd:cd22536    264 QPSDGGVSNGNQLVSTPITTASVSTMPE--------------SPSSSTTCTTTASTSLTSSDTLVSSAETGQYasTAASS 329
                          250       260       270       280       290       300       310       320
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  475 VL----PPGQTAPLRVISAGQVVPSGLLS-PNQTVSSSAVVPVNQGVNSgVLQLSQPVVSgVLPVGQPVRPgVLQLNQTV 549
Cdd:cd22536    330 ERteeePQTSAAESEAQSSSQLQSNGLQNvQDQSNSLQQVQIVGQPILQ-QIQIQQPQQQ-IIQAIQPQSF-QLQSGQTI 406
                          330       340       350       360       370       380       390       400
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  550 GTNILPVNQPVRPGASQNTT-------FLT-SGSI---------LRQLIPTGKQVNGIPTY-TLAPVSVTlpvppGGLAT 611
Cdd:cd22536    407 QTIQQQPLQNVQLQAVQSPTqvlirapTLTpSGQIswqtvqvqnIQSLSNLQVQNAGLPQQlTLTPVSSS-----AGGTT 481
                          410       420       430
                   ....*....|....*....|....*....|....*..
gi 2462559783  612 VAppqmpiQLLPsgaaAPMAGSmpgmpspPVLVNAAQ 648
Cdd:cd22536    482 IA------QIAP----VAVAGT-------PITLNAAQ 501
Atrophin-1 pfam03154
Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian ...
225-480 1.50e-03

Atrophin-1 family; Atrophin-1 is the protein product of the dentatorubral-pallidoluysian atrophy (DRPLA) gene. DRPLA OMIM:125370 is a progressive neurodegenerative disorder. It is caused by the expansion of a CAG repeat in the DRPLA gene on chromosome 12p. This results in an extended polyglutamine region in atrophin-1, that is thought to confer toxicity to the protein, possibly through altering its interactions with other proteins. The expansion of a CAG repeat is also the underlying defect in six other neurodegenerative disorders, including Huntington's disease. One interaction of expanded polyglutamine repeats that is thought to be pathogenic is that with the short glutamine repeat in the transcriptional coactivator CREB binding protein, CBP. This interaction draws CBP away from its usual nuclear location to the expanded polyglutamine repeat protein aggregates that are characteriztic of the polyglutamine neurodegenerative disorders. This interferes with CBP-mediated transcription and causes cytotoxicity.


Pssm-ID: 460830 [Multi-domain]  Cd Length: 991  Bit Score: 42.83  E-value: 1.50e-03
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  225 LRSVISEHIKRtglLKQTHIAPKPAAHLAAPANgsAPSAPAQPPCFHLALP------QNSPS----PAAGQPVTV----- 289
Cdd:pfam03154  231 IQQTPTLHPQR---LPSPHPPLQPMTQPPPPSQ--VSPQPLPQPSLHGQMPpmphslQTGPShmqhPVPPQPFPLtpqss 305
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  290 -AQGAPGSLTHSPPAAGQSHMTLVSSPLPVGQNSLTLQPPAPQPVFL------------------SHGVPLHQSVNPPV- 349
Cdd:pfam03154  306 qSQVPPGPSPAAPGQSQQRIHTPPSQSQLQSQQPPREQPLPPAPLSMphikpppttpipqlpnpqSHKHPPHLSGPSPFq 385
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  350 LPLSQPVGPVNK---SVGTSVLPINQTVRPGVLPLTQPVGPinRPVGPGVLPVSPSVTPGVLQAVSPGvlSVSRAVPSGV 426
Cdd:pfam03154  386 MNSNLPPPPALKplsSLSTHHPPSAHPPPLQLMPQSQQLPP--PPAQPPVLTQSQSLPPPAASHPPTS--GLHQVPSQSP 461
                          250       260       270       280       290
                   ....*....|....*....|....*....|....*....|....*....|....*....
gi 2462559783  427 LPAGQMTPAG--QMTPAGVIPgqTATSGVLPTGQMVQSGVLPVGQTAP---SRVLPPGQ 480
Cdd:pfam03154  462 FPQHPFVPGGppPITPPSGPP--TSTSSAMPGIQPPSSASVSSSGPVPaavSCPLPPVQ 518
PRK07994 PRK07994
DNA polymerase III subunits gamma and tau; Validated
247-411 1.78e-03

DNA polymerase III subunits gamma and tau; Validated


Pssm-ID: 236138 [Multi-domain]  Cd Length: 647  Bit Score: 42.55  E-value: 1.78e-03
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  247 KPAAHLAAPANGSAPSAPAqppcfhlALPQNSPSPAAGQPVTVAQGAPGSLTHSPPAAGQSHMTLVSSPLPVGQNSLTLQ 326
Cdd:PRK07994   360 HPAAPLPEPEVPPQSAAPA-------ASAQATAAPTAAVAPPQAPAVPPPPASAPQQAPAVPLPETTSQLLAARQQLQRA 432
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  327 PPAPQPvflshgvplhqsvnppvlPLSQPVGPVNKSVGTSVLPINQTVRPgvLPLTQPVGPIN------RPVGPGVLPVS 400
Cdd:PRK07994   433 QGATKA------------------KKSEPAAASRARPVNSALERLASVRP--APSALEKAPAKkeayrwKATNPVEVKKE 492
                          170
                   ....*....|.
gi 2462559783  401 PSVTPGVLQAV 411
Cdd:PRK07994   493 PVATPKALKKA 503
PPE COG5651
PPE-repeat protein [Function unknown];
409-662 1.84e-03

PPE-repeat protein [Function unknown];


Pssm-ID: 444372 [Multi-domain]  Cd Length: 385  Bit Score: 41.80  E-value: 1.84e-03
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  409 QAVSpgvLSVSRAVPSGVLPAGQMTPAGQMTPAGVIPGQTATS---GVLPTGQMVQSGVLPVGQTAPSRVLPPGQTAPLR 485
Cdd:COG5651    155 AAAS---AAAVALTPFTQPPPTITNPGGLLGAQNAGSGNTSSNpgfANLGLTGLNQVGIGGLNSGSGPIGLNSGPGNTGF 231
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  486 VISAGQVVPSGLLSPNQTVSSSAVVPVNQGVNSGVLQLSQPVVSGVLPvgqpvrpgvlqlNQTVGTNILPVNQPVRPGAS 565
Cdd:COG5651    232 AGTGAAAGAAAAAAAAAAAAGAGASAALASLAATLLNASSLGLAATAA------------SSAATNLGLAGSPLGLAGGG 299
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  566 QNTTFLTSGSilrqliptgkqvNGIPTYTLAPVSVTLPVPPGGLATVAPPQMPIQLLPSGAAAPMAGSMPGMPSPPVLVN 645
Cdd:COG5651    300 AGAAAATGLG------------LGAGGAAGAAGATGAGAALGAGAAAAAAGAAAGAGAAAAAAAGGAGGGGGGALGAGGG 367
                          250
                   ....*....|....*..
gi 2462559783  646 AAQSVFVQASSSAADTN 662
Cdd:COG5651    368 GGSAGAAAGAASGGGAA 384
half-pint TIGR01645
poly-U binding splicing factor, half-pint family; The proteins represented by this model ...
384-500 7.47e-03

poly-U binding splicing factor, half-pint family; The proteins represented by this model contain three RNA recognition motifs (rrm: pfam00076) and have been characterized as poly-pyrimidine tract binding proteins associated with RNA splicing factors. In the case of PUF60 (GP|6176532), in complex with p54, and in the presence of U2AF, facilitates association of U2 snRNP with pre-mRNA.


Pssm-ID: 130706 [Multi-domain]  Cd Length: 612  Bit Score: 40.44  E-value: 7.47e-03
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  384 PVGPINRPVGPGVLPVSPSVTPGvlqAVSPGVLSVSRAVPSGVLPAGQMTPAgqmTPAGVIPGQTATSGVLPTGQMVQSG 463
Cdd:TIGR01645  284 PPDALLQPATVSAIPAAAAVAAA---AATAKIMAAEAVAGAAVLGPRAQSPA---TPSSSLPTDIGNKAVVSSAKKEAEE 357
                           90       100       110
                   ....*....|....*....|....*....|....*..
gi 2462559783  464 VLPVGQTAPSRVLPPGQTAPLRVISAGQVVPSGLLSP 500
Cdd:TIGR01645  358 VPPLPQAAPAVVKPGPMEIPTPVPPPGLAIPSLVAPP 394
PHA03378 PHA03378
EBNA-3B; Provisional
245-466 8.76e-03

EBNA-3B; Provisional


Pssm-ID: 223065 [Multi-domain]  Cd Length: 991  Bit Score: 40.44  E-value: 8.76e-03
                           10        20        30        40        50        60        70        80
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  245 APKPAAHLAA-------PANGSAPSAPAQPPCFHLALPQNSPSPA---AGQPVTVAQGAPGSLTHSPPAAGQSHMTlvSS 314
Cdd:PHA03378   700 APTPMRPPAAppgraqrPAAATGRARPPAAAPGRARPPAAAPGRArppAAAPGRARPPAAAPGRARPPAAAPGAPT--PQ 777
                           90       100       110       120       130       140       150       160
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  315 PLPVGQNSLTLQP---PAPQPVflSHGVPLHQSVNPPVLPLSQpvGPVNKSVGTSVLPINQTVRPGVlpLTQPVGPINRP 391
Cdd:PHA03378   778 PPPQAPPAPQQRPrgaPTPQPP--PQAGPTSMQLMPRAAPGQQ--GPTKQILRQLLTGGVKRGRPSL--KKPAALERQAA 851
                          170       180       190       200       210       220       230       240
                   ....*....|....*....|....*....|....*....|....*....|....*....|....*....|....*....|
gi 2462559783  392 VGPGVLPVS---------PSVTPGVLQAVS-PGVLSVSRAVPSGVLP------AGQMTPAGQMTPAGVIPGQTATSGVLP 455
Cdd:PHA03378   852 AGPTPSPGSgtsdkivqaPVFYPPVLQPIQvMRQLGSVRAAAASTVTqapteyTGERRGVGPMHPTDIPPSKRAKTDAYV 931
                          250
                   ....*....|.
gi 2462559783  456 TGQMVQSGVLP 466
Cdd:PHA03378   932 ESQPPHGGQSH 942
 
Blast search parameters
Data Source: Precalculated data, version = cdd.v.3.21
Preset Options:Database: CDSEARCH/cdd   Low complexity filter: no  Composition Based Adjustment: yes   E-value threshold: 0.01

References:

  • Wang J et al. (2023), "The conserved domain database in 2023", Nucleic Acids Res.51(D)384-8.
  • Lu S et al. (2020), "The conserved domain database in 2020", Nucleic Acids Res.48(D)265-8.
  • Marchler-Bauer A et al. (2017), "CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.", Nucleic Acids Res.45(D)200-3.
Help | Disclaimer | Write to the Help Desk
NCBI | NLM | NIH